8.5. 总结¶
面向深度学习计算任务,加速器通常都是由多种片上缓存以及多种运算单元组成来提升性能。
未来性能增长需要依赖架构上的改变,即需要利用可编程的硬件加速器来实现性能突破。
出于计算效率和易用性等原因,加速器一般会具有多个等级的编程方式,包括:算子库层级,编程原语层级和指令层级。
越底层的编程方式越能够灵活地控制加速器,但同时对程序员的能力要求也越高。
8.7. 参考文献¶
- Bastoul, 2004
Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think. Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004. (pp. 7–16).
- Chen et al., 2018
Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., … Krishnamurthy, A. (2018). Tvm: end-to-end optimization stack for deep learning. arXiv preprint arXiv:1802.04799, 11, 20.
- Lattner et al., 2020
Lattner, C., Amini, M., Bondhugula, U., Cohen, A., Davis, A., Pienaar, J., … Zinenko, O. (2020). Mlir: a compiler infrastructure for the end of moore’s law. arXiv preprint arXiv:2002.11054.
- Liao et al., 2021
Liao, H., Tu, J., Xia, J., Liu, H., Zhou, X., Yuan, H., & Hu, Y. (2021). Ascend: a scalable and unified architecture for ubiquitous deep neural network computing : industry track paper. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (pp. 789–801). doi:10.1109/HPCA51647.2021.00071
- NVIDIA, 2017
NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.
- Ragan-Kelley et al., 2013
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices, 48(6), 519–530.
- Vasilache et al., 2022
Vasilache, N., Zinenko, O., Bik, A. J., Ravishankar, M., Raoux, T., Belyaev, A., … others. (2022). Composable and modular code generation in mlir: a structured and retargetable approach to tensor compiler construction. arXiv preprint arXiv:2202.03293.
- Verdoolaege, 2010
Verdoolaege, S. (2010). Isl: an integer set library for the polyhedral model. International Congress on Mathematical Software (pp. 299–302).
- Zhao et al., 2021
Zhao, J., Li, B., Nie, W., Geng, Z., Zhang, R., Gao, X., … others. (2021). Akg: automatic kernel generation for neural processing units using polyhedral transformations. Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (pp. 1233–1248).
- Zheng et al., 2020
Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., … others. (2020). Ansor: generating $\$High-Performance$\$ tensor programs for deep learning. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) (pp. 863–879).