Michael Harradon, Kevin Golan, Oliver Daniels-Koch, Avi Pfeffer, and Robert Hyland
Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, Florida (4 December 2024)
Artificial intelligence (AI)–based systems show great promise for supporting complex decision-making and planning. AI systems can consider a massive option space that far exceeds current human processes. Notably, AI systems, particularly deep reinforcement learning (DRL), achieved expert levels of play for strategy games, generating innovative strategies by exploring numerous courses of action (COAs) to make successful strategic choices in complex scenarios. The opportunity exists for AI systems to assist human planning staffs with constructing more high-quality plans, analyzing plan strengths and weaknesses more deeply, and exploring a larger number of plan alternatives in a fixed planning time. AI-based modeling and simulation could accelerate COA planning activities that require reasoning across multiple, interconnected domains. Such planning support must consider interacting effects across physical domains (e.g., air, land, sea, undersea) and interacting support functions (e.g., logistics, communications) — a massive action space to consider—which exceeds the action space processed by state-of-the-art game-playing systems. This paper reports on a new DRL approach, called neural program policies (NPPs), which incorporates structure and domain-specific information into policies described by deep neural networks to vastly reduce the action space into a learned, compact, and meaningful set. We first describe the policy domain-specific language (DSL) that abstracts the actions and observations employed by a deep reinforcement learner. Then, we apply NPPs to two problem categories—control problem benchmarks provided by OpenAI gym and multi-domain reasoning using a StarCraft II simulation, extended with sea, undersea, and novel support functions. We conclude with results for (1) generating multi-domain COA traces and (2) NPP-based agent performance (e.g., running a surrogate model ~10,000x faster than real time, vastly reducing the action space, and enabling the interpretation of AI-generated COA traces). The resulting system supports a human-AI team paradigm to increase the number and quality of multi-domain plans considered.
For More Information
To learn more or request a copy of a paper (if available), contact Michael Harradon or Kevin Golan.
(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)