.. _ch:deploy: Model Deployment ================ In earlier chapters, we discussed the basic components of the machine learning model training system. In this chapter, we look at the basics of model deployment, a process whereby a trained model is deployed in a runtime environment for inference. We explore the conversion from a training model into an inference model, model compression methods that adapt to hardware restrictions, model inference and performance optimization, and model security protection. The key aspects this chapter explores are as follows: 1. Conversion and optimization from a training model to an inference model. 2. Common methods for model compression: quantization, sparsification, and knowledge distillation. 3. Model inference process and common methods for performance optimization. 4. Common methods for model security protection. .. toctree:: :maxdepth: 2 Overview Conversion_to_Inference_Model_and_Model_Optimization Model_Compression Advanced_Efficient_Techniques Model_Inference Security_Protection_of_Models Chapter_Summary Further_Reading