M. Abadi, A. Agarwal, P. Barham, E. Brevdo, M. Wicke et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2016.

Y. Marc-g-bellemare, J. Naddaf, M. Veness, and . Bowling, The arcade learning environment: An evaluation platform for general agents, Journal of Artificial Intelligence Research, vol.47, pp.253-279, 2013.

W. Marc-g-bellemare, R. Dabney, and . Munos, A distributional perspective on reinforcement learning, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.449-458, 2017.

P. Samuel-castro, S. Moitra, C. Gelada, S. Kumar, and M. G. Bellemare, Dopamine: A research framework for deep reinforcement learning, 2018.

W. Dabney, G. Ostrovski, D. Silver, and R. Munos, , 2018.

A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, Go-explore: a new approach for hard-exploration problems, 2019.

K. O. Lehman-joel, H. Stanley-ecoffet-adrien, J. Joost, and . Clune, Montezumas Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems, 2018.

M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband et al., Remi Munos, Demis Hassabis, Olivier Pietquin, et al. Noisy networks for exploration, 2017.

F. M. Graetz, How to match DeepMinds Deep Q-Learning score in Breakout, 2018.

M. Hausknecht, J. Lehman, R. Miikkulainen, and P. Stone, A neuroevolution approach to general atari game playing, IEEE Transactions on Computational Intelligence and AI in Games, vol.6, issue.4, pp.355-366, 2014.

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup et al., Deep reinforcement learning that matters, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

M. Hessel, J. Modayil, H. Van-hasselt, T. Schaul, G. Ostrovski et al., Rainbow: Combining improvements in deep reinforcement learning, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

D. Horgan, J. Quan, D. Budden, G. Barth-maron, M. Hessel et al., , 2018.

. Kaixhin, Open source implementation of rainbow by kaixhin, 2018.

C. Marlos, . Machado, G. Marc, E. Bellemare, J. Talvitie et al., Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents, Journal of Artificial Intelligence Research, vol.61, pp.523-562, 2018.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, p.529, 2015.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap et al., Asynchronous methods for deep reinforcement learning, International conference on machine learning, pp.1928-1937, 2016.

. Openai and . Openai-five, , 2018.

G. Ostrovski, G. Marc, A. Bellemare, R. Van-den-oord, and . Munos, Count-based exploration with neural density models, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2721-2730, 2017.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, 2017.

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, Curiosity-driven exploration by self-supervised prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.16-17, 2017.

J. Pineau, Reproducibility checklist, 2019.

T. Pohlen, B. Piot, T. Hester, M. G. Azar, D. Horgan et al., Observe and look further: Achieving consistent performance on atari, 2018.

. Redis, Redis database, 2019.

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, , 2015.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, 2017.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of go with deep neural networks and tree search, nature, vol.529, issue.7587, p.484, 2016.

E. Todorov, T. Erez, and Y. Tassa, Mujoco: A physics engine for model-based control, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.5026-5033, 2012.

A. Hado-van-hasselt, D. Guez, and . Silver, Deep reinforcement learning with double q-learning, Thirtieth AAAI Conference on Artificial Intelligence, 2016.

A. Hado-p-van-hasselt, A. Guez, M. Guez, V. Hessel, D. Mnih et al., Learning values across many orders of magnitude, Advances in Neural Information Processing Systems, vol.29, pp.4287-4295, 2016.

O. Vinyals, I. Babuschkin, J. Chung, M. Mathieu, M. Jaderberg et al., Chris Apps, Koray Kavukcuoglu, Demis Hassabis, and David Silver. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II, 2019.

Z. Wang, T. Schaul, M. Hessel, H. Van-hasselt, M. Lanctot et al., Dueling network architectures for deep reinforcement learning, 2015.

J. Christopher, P. Watkins, and . Dayan, A Supplementary materials: Implementation details, Machine learning, vol.8, issue.3-4, pp.279-292, 1992.

. Practically, We tested this initial implementation on some games with the exact same training conditions as in the original Rainbow to ensure our results were consistent