Avi Pfeffer, Chief Scientist, Charles River Analytics
I’m delighted to announce the release of Scruff™, the new AI framework from Charles River Analytics. Scruff, which is implemented in Julia, is a framework for building integrated AI systems that combine different modeling paradigms in a coherent framework.
The name “Scruff” derives from old debates in AI: the “neats” held that scalable AI systems could only be developed in a consistent framework, while the “scruffies” held that many methods needed to be integrated to build real-world systems. Scruff attempts to be scruffy in a neat way.
Although its roots are in probabilistic programming, Scruff goes beyond the simple ability to express generative probabilistic models. It enables models of different kinds, including non-probabilistic models, to be composed, integrated, reasoned with. Scruff also provides a flexible temporal reasoning framework for reasoning that includes asynchronous, continuous-time modeling and the ability to reason hierarchically. Finally, Scruff includes an extensible, structured library that enables composition, reuse, and experimentation with different representations and algorithms.
Scruff’s power lies in making three key distinctions:
- Scruff distinguishes between the representation of a relationship in general and its realization in each situation. In mathematics, a function is something we can define and reason about independently of the variables to which it is applied. Operations like composition apply to functions themselves, without considering any variables. Similarly, in Scruff, a stochastic function (known as sfunc, pronounced “essfunc”), is a representation of a relationship that is possibly probabilistic. We can combine and compose sfuncs in a variety of ways.
- Scruff distinguishes between the mathematical definitions of computations that can be performed on sfuncs and the implementation of those computations on specific sfuncs. Scruff provides a small number of standard operators, which represent specific computations, such as computing the conditional probability density of a value, computing the support, taking a sample, or propagating a likelihood function backwards through the sfunc. Algorithms are written using these operators, and, as a result, are independent of the specific representation of sfuncs and implementations of these operators. This enables algorithms to work for any sfuncs for which the required operators are implemented. Different algorithms use different operators, so they can work with different sfuncs.
- Scruff distinguishes between variables that can vary over time and specific instances of those variables at a particular time. Variables have models that specify which sfuncs to use for specific instances. These sfuncs can be specified no matter what time interval holds between different instances, which enables Scruff’s flexible, asynchronous treatment of time.
So far, so abstract! Let’s look at what you can do with Scruff.
Using higher order sfuncs to reason about novel behavior
I’m going to build an example that uses higher order sfuncs to reason about novel behavior. I will show how to reason about this example statically, and then show a variation of this example to reason dynamically with asynchronous filtering. In these examples, I’ll highlight some of the key features; you can find a more complete explanation in the Scruff tutorial, available as part of the Scruff distribution at https://github.com/charles-river-analytics/Scruff.jl/tree/develop/docs/src/tutorial.
Defining variables
First, I’ll define some variables.
known = Cat(setup.known_sfs, setup.known_probs)
Each definition in this example specifies an sfunc, creates a model, and associates it with a variable. Because this is a static example, the models are simple, and each definition just associates the variable with the sfunc used to represent it. In this example, known represents a categorical distribution over known sfuncs to generate the behavior, as represented in the setup data structure (not shown). In the setup, these might be normal distributions, e.g., Normal(0,1) and Normal (1,1). Making these part of the setup lets us experiment with different known sfuncs.
Now let’s define some sfuncs that generate novel behavior.
novelty_mean = Normal(setup.novelty_prior_mean, setup.novelty_prior_sd)()(:novelty_mean)
novelty = Det(Tuple{Float64}, Dist{Float64}, m -> Normal(m[1], setup.novelty_sd))()(:novelty)
This code first defines the mean of the novelty distribution to be a Normal with parameters specified in the setup, which will typically include a broad standard deviation. Then, novelty will deterministically create a Normal whose mean is its only argument. At the moment, novelty just represents this relationship; we’ll connect it to novelty_mean later. Note that both known and novelty are second order sfuncs: they are sfuncs whose output is itself an sfunc.
Next, we’re going to introduce some logic to choose between known and novelty.
is_novel = Flip(setup.novelty_prob)()(:is_novel)
behavior = If{Dist{Float64}}()()(:behavior)
The probability that is_novel will be true is specified by the setup. We define behavior to use the If sfunc. Unlike just about every programming language you’ve seen (probabilistic or not), Scruff doesn’t use If to define a specific control flow. Rather, If defines a very general type of relationship. Later in the program, we’ll say exactly that it is used to select novelty or known depending on whether is_novel is true. The If sfunc is parameterized by the type of output it produces, which could be any type in Julia. In this example, the output is itself an sfunc of type Dist{Float64}, which means an unconditional distribution that outputs a Float64.
Creating a network
Now let’s create some observations and build a network.
variables = [known, is_novel, novelty_mean, novelty, behavior]
graph = VariableGraph(novelty => [novelty_mean], behavior => [is_novel, novelty, known])
obs = [5.0, 6.0, 7.0, 8.0, 9.0]
for i in 1:length(obs)
o = Generate{Float64}()()(obsname(i))
push!(variables, o)
graph[o] = [behavior]
end
net = InstantNetwork(variables, graph)
There are a few things I’d like to point out:
- Each observation is created using Generate, which is a higher order sfunc that takes an sfunc as argument and generates a value from it. We’ll make it take the value of behavior, which is a Dist{Float64}, and generate a Float64. In this way, we can create many observations from the same uncertain sfunc.
- Note the use of a VariableGraph to specify the relationships. For example, we know that behavior is represented by an If, and
graph specifies that the arguments to the If are the test is_novel and the two possible behavior sfuncs. Also note how this graph is built incrementally as we add observations. - We create an InstantNetwork that has all the information. Instant indicates that there are no dynamics in this example.
Running inference to detect and reason about novelty
Now we can have some fun with the example. The following code snippet does the necessary steps: It creates create an evidence dictionary that maps the appropriate observation variables to their values; it instantiates a likelihood weighting algorithm that uses 1,000 samples; it creates a runtime, which is a data structure that holds inference information; and finally calls infer to perform inference.
evidence = Dict{Symbol, Score}()
for (i,x) in enumerate(obs)
evidence[obsname(i)] = HardScore(x)
end
algorithm = LW(1000)
runtime = Runtime(net)
infer(algorithm, runtime, evidence)
Once inference has completed, we can ask some queries.
is_novel = get_node(net, :is_novel)
novelty_mean = get_node(net, :novelty_mean)
println("Probability of novel = ", probability(alg, runtime, is_novel, true))
println("Posterior mean of novel behavior = ", mean(alg, runtime, novelty_mean))
The answers depend on the setup. For example, if the known behavior means are 0 and 1 and the standard deviation of known behaviors is 1, you might see output like this:
Probability of novel = 1.0
Posterior mean of novel behavior = 7.334211013744095
Whereas if the standard deviation of known behaviors is 4, the output is more likely to be the following:
Probability of novel = 0.1988404327033635
Posterior mean of novel behavior = 0.631562661691411
This indicates that the surprising observations are very hard to explain by known behaviors when the standard deviation is low, leading inference to be fully confident that the behavior is novel, and with a novelty mean that matches the observations. But if the standard deviation is higher, the observations are well-explained by the known behaviors, so the probability of novel is much lower. Also, in our model, the novelty_mean is generated every time, even when the behavior is not novel, and its prior mean is 0, so its posterior is only slightly larger than 0.
Asynchronous filtering to reason about novel dynamic behavior
So far, the example has been static. Now let’s look at how you can do asynchronous filtering in Scruff.
In this extended example, the known or novel behavior represents the velocity of an object moving in one dimension. We’re going to imagine getting the following sequence of observations. Each observation consists of a time and a position. If the known behaviors have velocity 0 and 1, the first sequence of observations, with velocity 2, might be explained by sensor noise, but the second sequence cannot.
obs1 = [(1.0, 2.1), (3.0, 5.8), (3.5, 7.5)] # consistent with v=2
obs2 = [(1.0, 4.9), (3.0, 17.8), (3.5, 20.5)] # consistent with v=6
Creating a continuous time model
Recall that a model specifies what sfunc to use at a given time point. If we want to reason in continuous time, we need to create a VariableTimeModel. We define two functions, one called make_initial that creates the sfunc for the initial time point, and one called make_transition for the transition model. The make_transition takes the current time as well as the time of previous instantiations of the parents as arguments. Here’s what it looks like for the position variable:
struct PositionModel <: VariableTimeModel{Tuple{}, Tuple{Float64, Float64}, Float64}
setup::NoveltySetup
end
function make_initial(::PositionModel, ::Float64)::Dist{Float64}
return Constant(0.0)
end
function make_transition(posmod::PositionModel, parenttimes::Tuple{Float64, Float64}, time::Float64)::SFunc{Tuple{Float64, Float64}, Float64}
function f(pair)
(prevval, velocity) = pair
Normal(prevval + t * velocity, t * posmod.setup.transition_sd)
end
t = time - parenttimes[1]
return Chain(Tuple{Float64, Float64}, Float64, f)
end
make_initial returns a Constant sfunc because the object always starts at position 0. The code for make_transition is a little technical, but the key idea is that it returns a Normal sfunc whose mean and standard deviation are scaled by the time delta since the last instantiation of position. Now we can define our variables:
known_velocity = StaticModel(Cat(setup.known_velocities, setup.known_probs))(:known_velocity)
is_novel = StaticModel(Flip(setup.novelty_prob))(:is_novel)
novel_velocity = StaticModel(Normal(setup.novelty_prior_mean, etup.novelty_prior_sd))(:novel_velocity)
velocity = StaticModel(If{Float64}())(:velocity)
position = PositionModel(setup)(:position)
observation = SimpleModel(LinearGaussian((1.0,), 0.0, setup.observation_sd))(:observation)
The code is similar to the previous code, but now we have different kinds of models:
- A StaticModel is one in which the value of the variable is determined in the initial step and stays the same thereafter. In this example, the choice of known or novel velocity is made in the initial step and the velocity stays the same after that.
- For position we use the PositionModel we just defined above.
- Finally, since an observation is created afresh at each instantiation, we use a SimpleModel, which says that the same sfunc is always used.
The process of creating a DynamicNetwork is similar to the process for an InstantNetwork, except there are now two graphs, one for the initial state and the other for the transition.
variables = [known_velocity, is_novel, novel_velocity, velocity, position, observation]
initial_graph = VariableGraph(velocity => [is_novel, novel_velocity, known_velocity], observation => [position])
transition_graph = VariableGraph(known_velocity => [known_velocity], is_novel => [is_novel], novel_velocity =>[novel_velocity], velocity => [velocity], position => [position, velocity], observation => [position])
net DynamicNetwork(variables, initial_graph, transition_graph)
Running experiments with an asynchronous particle filter
Now we can run some experiments. I’ll define a do_experiment function that takes a sequence of observations and a filtering algorithm and performs asynchronous filtering. Each time an observation is received, the algorithm updates its beliefs about the presence of novelty and the velocity.
function do_experiment (obs::Vector{Tuple{Float64, Float64}})
runtime = Runtime(net, 0.0) # Set the time type to Float64 and initial time to 0
alg = CoherentPF(1000)
init_filter(alg, runtime)
is_novel = get_node(net, :is_novel)
velocity = get_node(net, :velocity)
observation = get_node(net, :observation)
for (time, x) in obs
evidence = Dict{Symbol, Score}(:observation => HardScore(x))
println("Observing ", x, " at time ", time)
# At a minimum, we need to include query and evidence # variables in the filter step
filter_step(alg, runtime, Variable[is_novel, velocity, observation], time, evidence)
println("Probability of novel = ", probability(alg, runtime, is_novel, true))
println("Posterior mean of velocity = ", mean(alg, runtime, velocity))
end
end
In this code, the key line is where filter_step is performed. Note that only the query and evidence variables (is_novel, velocity, and observation) need to be instantiated. The algorithm we are using is a coherent particle filter with 1,000 particles. “Coherent” means that it maintains coherence of parent-child relationships in the instantiated variables. In this case, the algorithm is able to determine that position also needs to be instantiated. However, the static variables known_velocity and novel_velocity are never instantiated after the initial step.
When we run this experiment with the first observation sequence from above (the one consistent with velocity 2), we get output like the following:
Observing 2.1 at time 1.0
Probability of novel = 0.0351642575352557
Posterior mean of velocity = 0.5411884423148781
Observing 5.8 at time 3.0
Probability of novel = 0.057222570825582145
Posterior mean of velocity = 0.8705507592898075
Observing 7.5 at time 3.5
Probability of novel = 0.08166159149240186
Posterior mean of velocity = 1.007810909419299
We see that probability of novel stays quite small, but does increase with each observation, and that the posterior mean of velocity also increases.
Now let’s use the second observation sequence, which is consistent with velocity 6. We get output like this:
Observing 4.9 at time 1.0
Probability of novel = 0.6741688102988623
Posterior mean of velocity = 3.6150131656907174
Observing 17.8 at time 3.0
Probability of novel = 1.0
Posterior mean of velocity = 5.898986723263269
Observing 20.5 at time 3.5
Probability of novel = 1.0
Posterior mean of velocity = 5.86994402484129
Using this observation sequence, the algorithm has already concluded by the second observation that the behavior is novel. The posterior mean of velocity also converges quickly.
Explore Scruff further
I hope this gives you a taste of what Scruff can do. If you are interested, please check out the GitHub repository at https://github.com/charles-river-analytics/Scruff.jl. You can download the source code or install Scruff as a Julia package. We welcome contributions from the community to develop Scruff further.
If you like this work and are interested in working on Scruff and applications, as well as many other interesting projects, we are looking to grow our probabilistic modeling team at Charles River Analytics. Reach out to Bobby Mayfield (bmayfield@cra.com) to learn more.