SLM
SMALL LANGUAGE MODEL
As its name implies, small language models (SLM) are smaller than scale and scope than LLM.
​
While SLM parameters range from a few million to a few billion, which means it require less memory and computational power, making them perfect for resource-constrained environments. In certain domain-specific tasks, SLMs excel superior performance.
how does slm work?
Pruning
-
Like pruning trees, pruning removes unnecessary parameters from the model
-
Model fine tuning is required after model is being pruned to avoid any overpruning to degrade the model performance.
Knowledge distillation
-
Knowledge distillation is a transfer learning from a LLM into smaller model.
Low-rank factorization
-
Low-rank factorization decomposes a large matrix of weights into a smaller, lower-rank matrix. This results in fewer parameters, decrease the number of computations and simplify complex matrix operations.
Quantization
-
Quantization converts high-precision data to lower-precision data, it can lighten the computional load and speed up inferencing​