SLM

SMALL LANGUAGE MODEL

As its name implies, small language models (SLM) are smaller than scale and scope than LLM.

While SLM parameters range from a few million to a few billion, which means it require less memory and computational power, making them perfect for resource-constrained environments. In certain domain-specific tasks, SLMs excel superior performance.

how does slm work?

Pruning

Like pruning trees, pruning removes unnecessary parameters from the model
Model fine tuning is required after model is being pruned to avoid any overpruning to degrade the model performance.

Knowledge distillation

Knowledge distillation is a transfer learning from a LLM into smaller model.

Low-rank factorization

Low-rank factorization decomposes a large matrix of weights into a smaller, lower-rank matrix. This results in fewer parameters, decrease the number of computations and simplify complex matrix operations.

Quantization

Quantization converts high-precision data to lower-precision data, it can lighten the computional load and speed up inferencing

AI Data Intelligence