8.5. 总结

  • 面向深度学习计算任务,加速器通常都是由多种片上缓存以及多种运算单元组成来提升性能。

  • 未来性能增长需要依赖架构上的改变,即需要利用可编程的硬件加速器来实现性能突破。

  • 出于计算效率和易用性等原因,加速器一般会具有多个等级的编程方式,包括:算子库层级,编程原语层级和指令层级。

  • 越底层的编程方式越能够灵活地控制加速器,但同时对程序员的能力要求也越高。

8.6. 扩展阅读

8.7. 参考文献

Bastoul, 2004

Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think. Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004. (pp. 7–16).

Chen et al., 2018

Chen, T., Moreau, T., Jiang, Z., Shen, H., Yan, E. Q., Wang, L., … Krishnamurthy, A. (2018). Tvm: end-to-end optimization stack for deep learning. arXiv preprint arXiv:1802.04799, 11, 20.

Lattner et al., 2020

Lattner, C., Amini, M., Bondhugula, U., Cohen, A., Davis, A., Pienaar, J., … Zinenko, O. (2020). Mlir: a compiler infrastructure for the end of moore’s law. arXiv preprint arXiv:2002.11054.

Liao et al., 2021

Liao, H., Tu, J., Xia, J., Liu, H., Zhou, X., Yuan, H., & Hu, Y. (2021). Ascend: a scalable and unified architecture for ubiquitous deep neural network computing : industry track paper. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (pp. 789–801). doi:10.1109/HPCA51647.2021.00071

NVIDIA, 2017

NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.

Ragan-Kelley et al., 2013

Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices, 48(6), 519–530.

Vasilache et al., 2022

Vasilache, N., Zinenko, O., Bik, A. J., Ravishankar, M., Raoux, T., Belyaev, A., … others. (2022). Composable and modular code generation in mlir: a structured and retargetable approach to tensor compiler construction. arXiv preprint arXiv:2202.03293.

Verdoolaege, 2010

Verdoolaege, S. (2010). Isl: an integer set library for the polyhedral model. International Congress on Mathematical Software (pp. 299–302).

Zhao et al., 2021

Zhao, J., Li, B., Nie, W., Geng, Z., Zhang, R., Gao, X., … others. (2021). Akg: automatic kernel generation for neural processing units using polyhedral transformations. Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (pp. 1233–1248).

Zheng et al., 2020

Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., … others. (2020). Ansor: generating $\$High-Performance$\$ tensor programs for deep learning. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) (pp. 863–879).