Advanced Semantic Extractor

An NLP tool that extracts key report information
while adhering to Army semantics

An NLP tool that extracts key report information while adhering to Army semantics

Advanced Semantic Extractor

Charles River Analytics is developing technology that uses an innovative hybrid approach to natural language processing to extract relevant information from Army reports while adhering to the semantics that Army personnel use.

Army intelligence analysts work with large volumes of information in the form of reports. Ironically, because the documents were written for human consumption, they contain ambiguous and implicit information that is difficult for computers to extract, leaving the analysts to manually comb through the data looking for key nuggets of information.

Illustration depicting large amounts of data from reports being analyzed and disseminated to users

A novel approach

Fine-tuning a large language model (LLM) typically requires thousands of input-output training examples, which are difficult to obtain in practice. This project involves developing a novel approach that enables LLMs to be extensively fine-tuned given only a few representative examples.

Initially, Charles River developed and demonstrated an early prototype. The team is continuing to mature the technology, perform evaluations on Army data, and prepare the technology for integration with the Army Intelligence Data Platform (AIDP).

The benefits of maturing this technology are far-reaching. Dr. Terry Patten, Principal Investigator, explains, “Every large organization faces challenges around extracting information from unstructured documents written by people—from technical product reviews to medical or legal documents. This technology shows how generic LLMs can be adapted to applications that involve highly specialized language.”

“Off-the-shelf large language models have impressive language processing capabilities. But they are not geared toward military language, so they struggle with the jargon and phraseology. We’re training these models to understand the linguistic idiosyncrasies and nuances that appear in military reports.

Dr. Terry Patten
Principal Scientist and Principal Investigator on the Advanced Semantic Extractor project

“Through our work, Charles River has pioneered techniques to train an LLM for particular types of language, and we are already using these techniques on other programs. It’s exciting to show how LLMs can be adapted efficiently to applications that involve idiosyncratic language and in different domains.”

Dr. Michael Giancola
AI Scientist and Co-Principal Investigator on the Advanced Semantic Extractor project

Contact us to learn more about our capabilities in natural language processing and adaptive intelligent training.

This material is based upon work supported by the ASA(ALT) SBIR CCoE under Contract No. W51701-24-C-0126. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of ASA(ALT) SBIR CCoE.

Our passion for science and engineering drives us to find impactful, actionable solutions.