Optimization¶
Quantization¶
Reduces the model size by decreasing the precision of weights (e.g., from 32-bit floating-point to 8-bit integers), significantly lowering memory requirements without drastically impacting accuracy
Pruning¶
Model pruning remove weights that don't contribute much to the overall performance, such as unnecessary neurons or layers, which reduces the model size and complexity.