Datasim
xAPI Data Simulation Tools
Install / Use
/learn @yetanalytics/DatasimREADME
Data and Training Analytics Simulated Input Modeler (DATASIM)
What is DATASIM?
DATASIM is an open source R&D project designed to provide specifications and a reference model application for the purpose of generating simulated xAPI data at scale.
DATASIM provides DoD distributed learning stakeholders and the broader xAPI community with the ability to simulate learning activities and generate the resulting xAPI statements at scale both in order to benchmark and stress-test the design of applications with the Total Learning Architecture and to provide stakeholders a way to evaluate the implementation of xAPI data design using the xAPI Profile specification. Ultimately, DATASIM can be used to support conformance testing of applications across the future learning ecosystem.
Early work on DATASIM was originally funded by the Advanced Distributed Learning Initiative.
Installation
To use the core DATASIM library in your project, use the following dependency in your deps.edn file:
com.yetanalytics/datasim {:mvn/version "0.4.4"}
If you wish to install DATASIM as an application with features such as CLI or the webserver, perform the following steps:
- Clone the DATASIM GitHub repo
- Execute the
make bundlecommand
See Deployment Models for more information about the differences between using DATASIM as a library and as an app.
Usage
Simulation Inputs
The inputs to DATASIM consist of four parts, each represented by JSON. They are as follows:
xAPI Profile(s)
One or more valid xAPI Profiles are required for DATASIM to generate xAPI Statements. You can learn more about the xAPI Profile Specification here. This input can either be a single Profile JSON-LD document or an array of JSON-LD format profiles. At this time all referenced concepts in a Profile must be included in the input. For instance if in "Profile A" I have a Pattern that references a Statement Template found in "Profile B", both Profiles must be included in an array as the Profile input.
Note that by default, any patterns with a primary property set to true in the provided profiles will be used for generation. You can control which profiles these primary patterns are sourced from with the genProfiles option by supplying one or more profile IDs. You can further control which specific primary patterns are used with the genPatterns option by supplying one or more pattern IDs.
Personae
Predefined xAPI Actors (upon whom the simulation will be based) are required to run a DATASIM simulation. This takes the form of a JSON array of xAPI Groups, each object containing an array of conformant Actor members, an example of which is below:
[
{
"name": "trainees1",
"objectType": "Group",
"member": [
{
"name": "Bob Fakename",
"mbox": "mailto:bob@example.org"
},
{
"name": "Alice Faux",
"mbox": "mailto:alice@example.org"
}
]
},
{
"name": "trainees2",
"objectType": "Group",
"member": [
{
"name": "Fred Ersatz",
"mbox": "mailto:fred@example.org"
}
]
}
]
Models
Models represents user-provided influences on xAPI simulation. Each model is a JSON object that consists of the following properties:
personae: An array of Actors, Groups, or Role objects that define who the model applies to. If this is missing, then the model serves as the default model for the simulation. Eachpersonaearray must be unique, though Actors, Groups, or Roles may repeat across different models.verbs: An array of objects with Verbidandweightvalues. Validweightvalues range from0to1, where0denotes that that component will not be chosen (unless all other weights are also0). If not present, a default weight of0.5will be used.activities: An array of objects with Activityidandweightvalues (as described underverbs).activityTypes: An array of objects with Activity Typeidandweightvalues (as described underverbs).patterns: An array of objects with Patternidand the following additional optional values:weights: An array of child Pattern/Templateidandweightvalues. Each weight affects how likely each of the Pattern's child patterns are chosen (foralternates) or how likely the child Pattern will be selected at all (foroptional, for thesenullis also a valid option). This has no effect onsequence,zeroOrMore, oroneOrMorePatterns.repeat-max: A positive integer representing the maximum number of times (exclusive) the child pattern can be generated. Only affectszeroOrMoreandoneOrMorepatterns.bounds: An array of objects containing key-value pairs where each value is an array of singular values (e.g."January") or arrays of start, end, and optional step values (e.g.["January", "October"]). For example{"years": [2023], "months": [[1, 5]]}describes an inclusive bound from January to May 2023; making themonthsbound[[1, 5, 2]]would have restricted it to only January, March, and May 2023. If not present,boundsindicates an infinite bound, such that any timestamp is valid. The following are valid bound values:years: Any positive integermonths:1to12, or their name equivalents, i.e."January"to"December"daysOfMonth:1to31(though29or30are skipped at runtime for months that do not include these days)daysOfWeek:0to6, or their name equivalents, i.e."Sunday"to"Saturday"hours:0to23minutes:0to59seconds:0to59
boundRestarts: An array of Pattern IDs to retry if the timestamp violatesbounds. The top-most Pattern inboundRestartswill be tried, e.g. if Pattern A is a parent of Pattern B and both are listed inboundRestarts, it will be Pattern A that is retried. IfboundRestartsis empty or not present, or if none of the ancestor Patterns are included, then Statement generation will continue at its current point.periods: An array of objects that specify the amount of time between generated Statements. Only the first valid period in the array will be applied to generate the next Statement (seeboundsproperty). Each period object has the following optional properties:min: a minimum amount of time between Statements; default is0meanthe average amount of time between Statements (added on top ofmin); default is1fixed: a fixed amount of time between Statements; overridesminandmeanunit: the time unit for all temporal values. Valid values aremillis,seconds,minutes,hours,days, andweeks; the default isminutesbounds: an array of the temporal bounds the period can apply in. During generation, the current Statement timestamp is checked against each period'sbounds, and the first period whose bound satisfies the timestamp will be used to generate the next Statement timestamp. A nonexistingboundsvalue indicates an infinite bound, i.e. any timestamp is always valid. The syntax is the same as the top-levelboundsarray. At least one period must not have aboundsvalue, so it can act as the default period.
templates: An array of objects with Statement Templateidand optionalbounds,boundRestarts, andperiodproperties, as explained above inpatterns. Note thatweightsandrepeat-maxdo not apply here.objectOverrides: An array of objects containing (xAPI)objectandweight. If present, these objects will overwrite any that would have been set by the Profile.
An example of a model array with valid personae, verbs, and templates is shown below:
[
{
"personae": [
{
"id": "mbox::mailto:bob@example.org",
"type": "Agent"
}
],
"verbs": [
{
"component": "https://example.org/verb/did",
"weight": 0.8
}
],
"templates": [
{
"component": "https://w3id.org/xapi/cmi5#satisfied",
"bounds": [
{
"years": [2023],
"months": [["January", "May"]]
}
],
"boundRestarts": [
"https://w3id.org/xapi/cmi5#toplevel"
],
"period": {
"min": 1,
"mean": 2.0,
"unit": "second"
}
}
]
}
]
Simulation Parameters
The simulation parameters input covers the details of the simulation not covered by other pieces. This includes Start Time, End Time, Timezone, Max (number of statements) and seed. When run, the simulation will create a time sequence from the Start Time to the End Time and generated xAPI statements will have corresponding dates and times. The seed is important as it controls the inputs to all random value generation and corresponds to repeatability. A simulation run with the same inputs and the same seed will deterministically create the same xAPI Statements, but changing the seed value will create an entirely different simulation. An example of simulation parameters is below:
{
"start": "2019-11-18T11:38:39.219768Z",
"end": "2019-11-19T11:38:39.219768Z",
"max": 200,
"
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
