Uncovering Instabilities in Variational-Quantum Deep Q-Networks


Quantum computing holds the promise of achieving computational speed-ups by exploiting the principles of quantum mechanics. While certain quantum algorithms are proven to have theoretical speed-ups, the practical utility of these algorithms is limited on current noisy intermediate-scale quantum (NISQ) devices due to imperfections, noise and limited qubit capacity. Despite these limitations, variational hybrid quantum-classical algorithms are often proposed as a means to reap quantum advance from NISQ devices.

This thesis focuses on the empirical analysis of a recent class of variational hybrid approaches to quantum reinforcement learning, specifically variational quantum deep Q-Learning (VQ-DQL). We show that VQ-DQL approaches are subject to instabilities that cause the learned policy to diverge, study the extent to which this afflicts reproduciblity of established results based on classical simulation, and perform systematic experiments to identify potential explanations for the observed instabilities. Additionally, we validate the VQ-DQL algorithm on an actual quantum processing unit and investigate differences in behaviour between simulated and physical quantum systems, which suffer from imperfections. The experiments demonstrate that it is inconclusive whether known quantum approaches, especially on current NISQ devices, provide an advantage over classical approaches. Nevertheless, these findings highlight promising areas for future research, particularly in exploring the potential of a hardware-software co-design approach.