15.6. 小结¶

在这一章，我们简单介绍了强化学习的基本概念，包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等，给读者对强化学习问题的基本认识。当前，强化学习是一个快速发展的深度学习分支，许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面，由于强化学习问题设置的特殊性（如需要与环境交互进行采样等），也使得相应算法对计算系统的要求更高：如何更好地平衡样本采集和策略训练过程？如何均衡 CPU 和 GPU 等不同计算硬件的能力？如何在大规模分布式系统上有效部署强化学习智能体？都需要对计算机系统的设计和使用有更好的理解。

15.7. 参考文献¶

Berner et al., 2019: Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … others. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
Cassirer et al., 2021: Cassirer, A., Barth-Maron, G., Brevdo, E., Ramos, S., Boyd, T., Sottiaux, T., & Kroiss, M. (2021). Reverb: a framework for experience replay. arXiv preprint arXiv:2102.04736.
Ding et al., 2020: Ding, Z., Yu, T., Huang, Y., Zhang, H., Li, G., Guo, Q., … Dong, H. (2020). Efficient reinforcement learning development with rlzoo. arXiv preprint arXiv:2009.08644.
Espeholt et al., 2019: Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., & Michalski, M. (2019). Seed rl: scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591.
Espeholt et al., 2018: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., … others. (2018). Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561.
Foerster et al., 2018: Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. Proceedings of the AAAI conference on artificial intelligence.
Han et al., 2020: Han, L., Xiong, J., Sun, P., Sun, X., Fang, M., Guo, Q., … others. (2020). Tstarbot-x: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. arXiv preprint arXiv:2011.13729.
Hoffman et al., 2020: Hoffman, M., Shahriari, B., Aslanides, J., Barth-Maron, G., Behbahani, F., Norman, T., … others. (2020). Acme: a research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979.
Horgan et al., 2018: Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., & Silver, D. (2018). Distributed Prioritized Experience Replay.
Lanctot et al., 2017: Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., … Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.
Liang et al., 2017: Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Gonzalez, J., … Stoica, I. (2017). Ray rllib: a composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, p. 85.
Makoviychuk et al., 2021: Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., … others. (2021). Isaac gym: high performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.
Mnih et al., 2016: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., … Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning (ICML) (pp. 1928–1937).
Mnih et al., 2013: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Moritz et al., 2018: Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., … others. (2018). Ray: a distributed framework for emerging $\$AI$\$ applications. 13th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 18) (pp. 561–577).
NVIDIA, 2017: NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.
Rashid et al., 2018: Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning (pp. 4295–4304).
Sunehag et al., 2017: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., … others. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.
Vinyals et al., 2019: Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … others. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
Wang et al., 2021: Wang, X., Song, J., Qi, P., Peng, P., Tang, Z., Zhang, W., … others. (2021). Scc: an efficient deep reinforcement learning agent mastering the game of starcraft ii. International Conference on Machine Learning (pp. 10905–10915).