12.7. Chapter Summary¶
The RL systems are usually consisted of the agent, environment, and a policy. The reward function can be also important for a policy to be optimized in RL.
Single-agent single-node RL system is a relatively simple framework in RL systems. It is also composed with several key components.
Distributed RL systems are more complex than single-node system, but with benefits of speeding up the policy optimization process.
Multi-agent RL involves more than one agent to interact with each other and the environment. Thus multi-agent RL systems can be more complicated than single-agent RL system, even with different objectives.
Due to the particularity of reinforcement learning problem settings (e.g., sampling through interaction with the environment), related algorithms pose stricter requirements on the computing system. This raises a couple of questions: How can we better balance sample collection and strategy training while also evenly utilizing the capabilities of different compute hardware such as CPUs and GPUs? And how can reinforcement learning agents be deployed in a large-scale distributed system? To find the answers to these questions, we must deeply understand the design and use of computer systems.