15.6. 小结

在这一章,我们简单介绍了强化学习的基本概念,包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等,给读者对强化学习问题的基本认识。当前,强化学习是一个快速发展的深度学习分支,许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面,由于强化学习问题设置的特殊性(如需要与环境交互进行采样等),也使得相应算法对计算系统的要求更高:如何更好地平衡样本采集和策略训练过程?如何均衡 CPU 和 GPU 等不同计算硬件的能力?如何在大规模分布式系统上有效部署强化学习智能体?都需要对计算机系统的设计和使用有更好的理解。

15.7. 参考文献

Berner et al., 2019

Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … others. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.

Cassirer et al., 2021

Cassirer, A., Barth-Maron, G., Brevdo, E., Ramos, S., Boyd, T., Sottiaux, T., & Kroiss, M. (2021). Reverb: a framework for experience replay. arXiv preprint arXiv:2102.04736.

Ding et al., 2020

Ding, Z., Yu, T., Huang, Y., Zhang, H., Li, G., Guo, Q., … Dong, H. (2020). Efficient reinforcement learning development with rlzoo. arXiv preprint arXiv:2009.08644.

Espeholt et al., 2019

Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., & Michalski, M. (2019). Seed rl: scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591.

Espeholt et al., 2018

Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., … others. (2018). Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561.

Foerster et al., 2018

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. Proceedings of the AAAI conference on artificial intelligence.

Han et al., 2020

Han, L., Xiong, J., Sun, P., Sun, X., Fang, M., Guo, Q., … others. (2020). Tstarbot-x: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. arXiv preprint arXiv:2011.13729.

Hoffman et al., 2020

Hoffman, M., Shahriari, B., Aslanides, J., Barth-Maron, G., Behbahani, F., Norman, T., … others. (2020). Acme: a research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979.

Horgan et al., 2018

Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., & Silver, D. (2018). Distributed Prioritized Experience Replay.

Lanctot et al., 2017

Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., … Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.

Liang et al., 2017

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Gonzalez, J., … Stoica, I. (2017). Ray rllib: a composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, p. 85.

Makoviychuk et al., 2021

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., … others. (2021). Isaac gym: high performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.

Mnih et al., 2016

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., … Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning (ICML) (pp. 1928–1937).

Mnih et al., 2013

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

Moritz et al., 2018

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., … others. (2018). Ray: a distributed framework for emerging $\$AI$\$ applications. 13th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 18) (pp. 561–577).

NVIDIA, 2017

NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.

Rashid et al., 2018

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning (pp. 4295–4304).

Sunehag et al., 2017

Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., … others. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.

Vinyals et al., 2019

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … others. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.

Wang et al., 2021

Wang, X., Song, J., Qi, P., Peng, P., Tang, Z., Zhang, W., … others. (2021). Scc: an efficient deep reinforcement learning agent mastering the game of starcraft ii. International Conference on Machine Learning (pp. 10905–10915).