A probabilistic programming language
to simplify machine learning
Automated Probabilistic Programming Representation and Inference Languages (APPRIL)
Machine learning is a branch of artificial intelligence that focuses on programming computer systems to automatically learn, act, and improve with experience. It has led to developments such as more effective web searches, an improved understanding of the human genome, and even improved robots. While machine learning has been successfully applied to many domains, the amount of time, effort, and expertise required to implement and deploy machine learning applications is a significant burden to its ubiquitous adoption. The Defense Advanced Research Projects Agency (DARPA) is seeking ways to democratize and extend the capabilities of machine learning so non-experts can easily create complex machine learning applications. As a result, DARPA initiated the Probabilistic Programming for Advancing Machine Learning (PPAML) program.
“We want to do for machine learning what the advent of high-level program languages 50 years ago did for the software development community as a whole,” explained Dr. Kathleen Fisher, DARPA program manager. “Our goal is that future machine learning projects won’t require people to know everything about both the domain of interest and machine learning to build useful machine learning applications. Through new probabilistic programming languages specifically tailored to probabilistic inference, we hope to decisively reduce the current barriers to machine learning and foster a boom in innovation, productivity and effectiveness.”
Dr. Kathleen Fisher, DARPA program manager
The Charles River Analytics Solution
As part of the PPAML program, Charles River created a powerful new machine learning system using probabilistic programming in the Automated Probabilistic Programming Representation and Inference Languages (APPRIL) program. Charles River partnered with Profs. Stuart Russell and Ras Bodik of the University of California, Berkeley, and Profs. Rina Dechter, Alex Ihler, and Eric Mjolsness of the University of California, Irvine, in the APPRIL effort.
“In probabilistic programming, an individual describes a probabilistic model of the domain, and the system automatically creates algorithms to reason with the model,” explained Dr. Avi Pfeffer, principal scientist at Charles River. “This model is expressed using programming language concepts, which can include complex data structures and control flow constructs.”
The APPRIL system provides a common framework for a variety of probabilistic programming languages and learning and inference algorithms. Anchoring this framework is Figaro™, Charles River’s open-source probabilistic programming language. Figaro is a flexible and powerful language that can represent a variety of models, processes, and systems. While Figaro is a complete probabilistic programming language with its own suite of inference algorithms, users can also access the power of the APPRIL system through BLOG, another probabilistic programming language developed by Prof. Stuart Russell. Finally, the APPRIL system develops and publishes an intermediate model representation so a variety of non-Figaro inference algorithms can be applied to any model created in Figaro. As a result, the APPRIL system adds to Figaro a variety of award-winning inference algorithms from the University of California, Irvine.
Under the APPRIL program, Charles River expanded the Figaro probabilistic programming language into a robust system with advanced algorithms and automatic problem-solving capabilities. The APPRIL system allows non-expert machine learning practitioners to build and represent rich and complex probabilistic models that suit their needs using a variety of probabilistic programming interfaces, including Figaro. Because the APPRIL system also includes advanced inference algorithms from a variety of sources, efficient and effective machine learning can be performed on large, rich, and complex user-created probabilistic models.
The benefits of the APPRIL system have already been demonstrated, both inside and outside of the PPAML program. For instance, Figaro was employed to solve a variety of challenge problems posited by the PPAML program. This included developing a system to detect user workflows from activity data that significantly outperformed existing methods. In addition, Figaro has been employed outside the PPAML program to solve a variety of problems, such as multi-sensor fusion, cyber situational awareness, and learning lineages of families of malware, shown here.
A visualization displaying the temporal ordering and lineage being generated by the probabilistic model in real-time, and allowing evidence to be applied interactively by a user
This material is based upon work sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-14-C-0011. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force.