SkillAgentSearch skills...

Datasim

xAPI Data Simulation Tools

Install / Use

/learn @yetanalytics/Datasim
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Data and Training Analytics Simulated Input Modeler (DATASIM)

Docker Image Version (latest semver)

What is DATASIM?

DATASIM is an open source R&D project designed to provide specifications and a reference model application for the purpose of generating simulated xAPI data at scale.

DATASIM provides DoD distributed learning stakeholders and the broader xAPI community with the ability to simulate learning activities and generate the resulting xAPI statements at scale both in order to benchmark and stress-test the design of applications with the Total Learning Architecture and to provide stakeholders a way to evaluate the implementation of xAPI data design using the xAPI Profile specification. Ultimately, DATASIM can be used to support conformance testing of applications across the future learning ecosystem.

Early work on DATASIM was originally funded by the Advanced Distributed Learning Initiative.

Installation

To use the core DATASIM library in your project, use the following dependency in your deps.edn file:

com.yetanalytics/datasim {:mvn/version "0.4.4"}

If you wish to install DATASIM as an application with features such as CLI or the webserver, perform the following steps:

  1. Clone the DATASIM GitHub repo
  2. Execute the make bundle command

See Deployment Models for more information about the differences between using DATASIM as a library and as an app.

Usage

Simulation Inputs

The inputs to DATASIM consist of four parts, each represented by JSON. They are as follows:

xAPI Profile(s)

One or more valid xAPI Profiles are required for DATASIM to generate xAPI Statements. You can learn more about the xAPI Profile Specification here. This input can either be a single Profile JSON-LD document or an array of JSON-LD format profiles. At this time all referenced concepts in a Profile must be included in the input. For instance if in "Profile A" I have a Pattern that references a Statement Template found in "Profile B", both Profiles must be included in an array as the Profile input.

Note that by default, any patterns with a primary property set to true in the provided profiles will be used for generation. You can control which profiles these primary patterns are sourced from with the genProfiles option by supplying one or more profile IDs. You can further control which specific primary patterns are used with the genPatterns option by supplying one or more pattern IDs.

Personae

Predefined xAPI Actors (upon whom the simulation will be based) are required to run a DATASIM simulation. This takes the form of a JSON array of xAPI Groups, each object containing an array of conformant Actor members, an example of which is below:

    [
        {
            "name": "trainees1",
            "objectType": "Group",
            "member": [
                {
                    "name": "Bob Fakename",
                    "mbox": "mailto:bob@example.org"
                },
                {
                    "name": "Alice Faux",
                    "mbox": "mailto:alice@example.org"
                }
            ]
        },
        {
            "name": "trainees2",
            "objectType": "Group",
            "member": [
                {
                    "name": "Fred Ersatz",
                    "mbox": "mailto:fred@example.org"
                }
            ]
        }
    ]

Models

Models represents user-provided influences on xAPI simulation. Each model is a JSON object that consists of the following properties:

  • personae: An array of Actors, Groups, or Role objects that define who the model applies to. If this is missing, then the model serves as the default model for the simulation. Each personae array must be unique, though Actors, Groups, or Roles may repeat across different models.
  • verbs: An array of objects with Verb id and weight values. Valid weight values range from 0 to 1, where 0 denotes that that component will not be chosen (unless all other weights are also 0). If not present, a default weight of 0.5 will be used.
  • activities: An array of objects with Activity id and weight values (as described under verbs).
  • activityTypes: An array of objects with Activity Type id and weight values (as described under verbs).
  • patterns: An array of objects with Pattern id and the following additional optional values:
    • weights: An array of child Pattern/Template id and weight values. Each weight affects how likely each of the Pattern's child patterns are chosen (for alternates) or how likely the child Pattern will be selected at all (for optional, for these null is also a valid option). This has no effect on sequence, zeroOrMore, or oneOrMore Patterns.
    • repeat-max: A positive integer representing the maximum number of times (exclusive) the child pattern can be generated. Only affects zeroOrMore and oneOrMore patterns.
    • bounds: An array of objects containing key-value pairs where each value is an array of singular values (e.g. "January") or arrays of start, end, and optional step values (e.g. ["January", "October"]). For example {"years": [2023], "months": [[1, 5]]} describes an inclusive bound from January to May 2023; making the months bound [[1, 5, 2]] would have restricted it to only January, March, and May 2023. If not present, bounds indicates an infinite bound, such that any timestamp is valid. The following are valid bound values:
      • years: Any positive integer
      • months: 1 to 12, or their name equivalents, i.e. "January" to "December"
      • daysOfMonth: 1 to 31 (though 29 or 30 are skipped at runtime for months that do not include these days)
      • daysOfWeek: 0 to 6, or their name equivalents, i.e. "Sunday" to "Saturday"
      • hours: 0 to 23
      • minutes: 0 to 59
      • seconds: 0 to 59
    • boundRestarts: An array of Pattern IDs to retry if the timestamp violates bounds. The top-most Pattern in boundRestarts will be tried, e.g. if Pattern A is a parent of Pattern B and both are listed in boundRestarts, it will be Pattern A that is retried. If boundRestarts is empty or not present, or if none of the ancestor Patterns are included, then Statement generation will continue at its current point.
    • periods: An array of objects that specify the amount of time between generated Statements. Only the first valid period in the array will be applied to generate the next Statement (see bounds property). Each period object has the following optional properties:
      • min: a minimum amount of time between Statements; default is 0
      • mean the average amount of time between Statements (added on top of min); default is 1
      • fixed: a fixed amount of time between Statements; overrides min and mean
      • unit: the time unit for all temporal values. Valid values are millis, seconds, minutes, hours, days, and weeks; the default is minutes
      • bounds: an array of the temporal bounds the period can apply in. During generation, the current Statement timestamp is checked against each period's bounds, and the first period whose bound satisfies the timestamp will be used to generate the next Statement timestamp. A nonexisting bounds value indicates an infinite bound, i.e. any timestamp is always valid. The syntax is the same as the top-level bounds array. At least one period must not have a bounds value, so it can act as the default period.
  • templates: An array of objects with Statement Template id and optional bounds, boundRestarts, and period properties, as explained above in patterns. Note that weights and repeat-max do not apply here.
  • objectOverrides: An array of objects containing (xAPI) object and weight. If present, these objects will overwrite any that would have been set by the Profile.

An example of a model array with valid personae, verbs, and templates is shown below:

[
    {
        "personae": [
            {
                "id": "mbox::mailto:bob@example.org",
                "type": "Agent"
            }
        ],
        "verbs": [
            {
                "component": "https://example.org/verb/did",
                "weight": 0.8
            }
        ],
        "templates": [
            {
                "component": "https://w3id.org/xapi/cmi5#satisfied",
                "bounds": [
                    {
                        "years": [2023],
                        "months": [["January", "May"]]
                    }
                ],
                "boundRestarts": [
                    "https://w3id.org/xapi/cmi5#toplevel"
                ],
                "period": {
                    "min": 1,
                    "mean": 2.0,
                    "unit": "second"
                }
            }
        ]
    }
]

Simulation Parameters

The simulation parameters input covers the details of the simulation not covered by other pieces. This includes Start Time, End Time, Timezone, Max (number of statements) and seed. When run, the simulation will create a time sequence from the Start Time to the End Time and generated xAPI statements will have corresponding dates and times. The seed is important as it controls the inputs to all random value generation and corresponds to repeatability. A simulation run with the same inputs and the same seed will deterministically create the same xAPI Statements, but changing the seed value will create an entirely different simulation. An example of simulation parameters is below:

    {
        "start": "2019-11-18T11:38:39.219768Z",
        "end": "2019-11-19T11:38:39.219768Z",
        "max": 200,
        "

Related Skills

View on GitHub
GitHub Stars23
CategoryDevelopment
Updated14d ago
Forks10

Languages

Clojure

Security Score

90/100

Audited on Mar 24, 2026

No findings