How can we distill the capabilities of large language models into a compact, efficient, frugal and accessible small language model (SLM) without sacrificing performance?
Problem The recent development of large language models (LLMs) has achieved remarkable success in natural language processing tasks, demonstrating impressive capabilities in complex reasoning, knowledge retention, and nuanced text generation. However, these models are often prohibitively large, requiring significant computational resources and energy consumption. This poses a major challenge for deploying LLMs in resource-constrained environments, such as edge devices, mobile phones, or low-power embedded systems.
While LLMs have shown incredible abilities, their smaller counterparts, small language models (SLMs), often struggle to replicate these feats. The primary hurdle in developing capable SLMs lies in the inherent trade-off between model size and performance. As a result, there is a growing need to distill the vast knowledge and reasoning capabilities of LLMs into a more compact architecture without significant degradation in performance or frugal optimised multi LLM systems, that match each request to the most efficient sequence of model prompt chains.
Objective Explore different methods to distill the capabilities of the Swiss large language model (LLM) into a small language model (SLM) with minimal performance loss. The goal is to create an SLM that can efficiently perform a range of natural language processing tasks. Proposed approaches may include, but are not limited to:
Knowledge distillation
- Model pruning
- Quantization
- Frugal multi-model systems
- Efficient neural network architectures
Work in a setting where the large language model (LLM) is provided, and the task is to distill its capabilities into a smaller model (SLM). The evaluation metrics will include:
- Performance on a range of natural language processing tasks
- Model size and computational efficiency
- Energy consumption and carbon footprint
Support for Hackers
- A pre-trained Swiss large language model (LLM) will be provided, along with a range of natural language processing tasks and evaluation metrics.
- A tutorial notebook will introduce participants to the concepts of knowledge distillation, model pruning, and efficient neural network architectures.
- Help from our team of experts will be available to guide participants and answer questions.
Technical Preferences
- PyTorch or TensorFlow for model development.
- Creativity is welcome: combine knowledge distillation, model pruning, quantization, or other innovative techniques to achieve the goal.
Why hack? Because developing capable small language models (SLMs) can enable a wide range of applications, from edge devices and mobile phones to low-power embedded systems. By participating in this challenge, you will be working on a real-world problem with significant impact, including:
- Enabling efficient natural language processing on resource-constrained devices
- Reducing energy consumption and carbon footprint
- Democratizing access to AI capabilities
About the Challenge Partner
Swisscom is Switzerland’s leading telecom provider and one of its most innovative IT companies. We operate one of the largest and most complex networks in the country. By joining this challenge, you’ll be working on problems at the intersection of AI, graphs, and real-world network resilience.