Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2016. ,
The arcade learning environment: An evaluation platform for general agents, Journal of Artificial Intelligence Research, vol.47, pp.253-279, 2013. ,
A distributional perspective on reinforcement learning, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.449-458, 2017. ,
, Dopamine: A research framework for deep reinforcement learning, 2018.
, , 2018.
Go-explore: a new approach for hard-exploration problems, 2019. ,
Montezumas Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems, 2018. ,
, Remi Munos, Demis Hassabis, Olivier Pietquin, et al. Noisy networks for exploration, 2017.
How to match DeepMinds Deep Q-Learning score in Breakout, 2018. ,
A neuroevolution approach to general atari game playing, IEEE Transactions on Computational Intelligence and AI in Games, vol.6, issue.4, pp.355-366, 2014. ,
Deep reinforcement learning that matters, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. ,
Rainbow: Combining improvements in deep reinforcement learning, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. ,
, , 2018.
Open source implementation of rainbow by kaixhin, 2018. ,
Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents, Journal of Artificial Intelligence Research, vol.61, pp.523-562, 2018. ,
Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015. ,
Asynchronous methods for deep reinforcement learning, International conference on machine learning, pp.1928-1937, 2016. ,
, , 2018.
Count-based exploration with neural density models, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2721-2730, 2017. ,
Automatic differentiation in pytorch, 2017. ,
Curiosity-driven exploration by self-supervised prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.16-17, 2017. ,
Reproducibility checklist, 2019. ,
Observe and look further: Achieving consistent performance on atari, 2018. ,
Redis database, 2019. ,
, , 2015.
, Proximal policy optimization algorithms, 2017.
Mastering the game of go with deep neural networks and tree search, nature, vol.529, issue.7587, p.484, 2016. ,
Mujoco: A physics engine for model-based control, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.5026-5033, 2012. ,
Deep reinforcement learning with double q-learning, Thirtieth AAAI Conference on Artificial Intelligence, 2016. ,
Learning values across many orders of magnitude, Advances in Neural Information Processing Systems, vol.29, pp.4287-4295, 2016. ,
, Chris Apps, Koray Kavukcuoglu, Demis Hassabis, and David Silver. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II, 2019.
Dueling network architectures for deep reinforcement learning, 2015. ,
A Supplementary materials: Implementation details, Machine learning, vol.8, issue.3-4, pp.279-292, 1992. ,
We tested this initial implementation on some games with the exact same training conditions as in the original Rainbow to ensure our results were consistent ,