Robust Adversarial Reinforcement Learning for Antineutrino-based Nuclear Reactor Safeguards
Antineutrino-based nuclear safeguards have been proposed to address many nuclear reactor verification challenges. Theoretically, these systems can detect reactor on-off status, monitor thermal power levels, and verify the special nuclear material (SNM) within a core. The situational details of these proposed capabilities, however, dictate the plausibility of applying antineutrino detectors for nuclear safeguards. For the most complex proposed capability, verifying SNM inventory, system performance depends highly on both general reactor-detector parameters, such as the reactor design of interest and detector efficiency, as well as scenario unknowns, such as diverted assembly targets and replacement fuels. An object-oriented modeling and simulation tool was developed for researchers and decision makers to explore various system-scenario parameters for antineutrino-based safeguards development and assessment. This tool comprises five modules: adversarial agent, diversion simulation, spectra simulation, system sensitivity, and protagonist agent. By iterating over these modules, the adversarial agent learns to select the most threatening diversion scenario while the protagonist agent trains the most well-prepared diversion classifier. This iterative process, referred to as robust adversarial reinforcement learning, could result in a fully robust nuclear safeguard - equally ready for any diversion scenarios of interest. A case study demonstrated that models only became semi-robust to the simulated diversion scenarios. While the semi-robust machine learning models did not perform as well as statistical classifiers, diversion-targeted machine learning models indicate that there is still room for system improvement.