9. Model Deployment¶

In earlier chapters, we discussed the basic components of the machine learning model training system. In this chapter, we look at the basics of model deployment, a process whereby a trained model is deployed in a runtime environment for inference. We explore the conversion from a training model into an inference model, model compression methods that adapt to hardware restrictions, model inference and performance optimization, and model security protection.

The key aspects this chapter explores are as follows:

Conversion and optimization from a training model to an inference model.
Common methods for model compression: quantization, sparsification, and knowledge distillation.
Model inference process and common methods for performance optimization.
Common methods for model security protection.