CrowdSim: A Generative Model of Crowdsourced Survey Responses

Michael Lepori, Derek Thayer, Sean Guarino, Leonard Eusebi

Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, Florida (1 December 2021) 

Career development relies on an understanding of possible future roles, available training experiences, and current skills. To target training where it will be most effective, trainees and instructors must understand how these elements interact, which requires analysis of historical data. These data can be difficult and often sensitive to collect from human subjects, requiring longitudinal studies with privacy safeguards. Synthetic data generation that simulates the knowledge and biases of individuals offers an opportunity to develop the algorithms that are most likely to succeed, prior to human-use testing.

This paper presents a generative model of crowd knowledge and responses to bootstrap the evaluation of algorithms whose recommendations or measurements will produce downstream effects. The simulation models the effect of well-known cognitive biases that influence knowledge recall or learning outcomes. For example, the availability heuristic (Tversky and Kahneman, 1973) may cause the importance of an infrequent or highly variable task to be less accurately recalled. Simulations must model the strength and prevalence of each cognitive bias, parameters which may vary with domain and are therefore difficult to set a priori. By adjusting the relative influences of declarative knowledge, anecdotal experience, cognitive biases, and other factors, the generative model creates many simulations with different underlying assumptions, allowing it to probe the generality of algorithms trained or selected using a subset of the simulations. We present an application of this generative model to crowdsourced survey responses. Our approach enables researchers to select the best performing algorithms under a range of psychologically plausible assumptions. Furthermore, the generative model provides an open and scalable source of data that is not constrained by issues of data privacy, security, or collection cost. This model selection approach extends to related domains where relevant psychological effects can be approximated, such as tailoring training recommendations and intelligent tutoring to individual backgrounds.

For More Information

To learn more or request a copy of a paper (if available), contact Leonard Eusebi.

(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)