⚠️ REPO MOVED TO https://github.com/repetere/jsonstack-data ⚠️

ModelScript

Description

ModelScript is a javascript module with simple and efficient tools for data mining and data analysis in JavaScript. ModelScript can be used with ML.js, pandas-js, and numjs, to approximate the equivalent R/Python tool chain in JavaScript.

In Python, data preparation is typically done in a DataFrame, ModelScript encourages a more R like workflow where the data preparation is in it's native structure.

Installation

$ npm i modelscript

Full Documentation

Usage (basic)

ModelScript is an EcmaScript module and designed to be imported in an ES2015+ environment. In order to use in older environment, please use const modelscript = require('modelscript/build/modelscript.cjs.js') for older versions of node and <script type="text/javascript" src=".../path/to/.../modelscript/build/modelscript.umd.js"/>

"modelscript" : {
  ml:{ //see https://github.com/mljs/ml
    UpperConfidenceBound [Class: UpperConfidenceBound]{ // Implementation of the Upper Confidence Bound algorithm
      predict(), //returns next action based off of the upper confidence bound
      learn(), //single step training method
      train(), //training method for upper confidence bound calculations
    },
    ThompsonSampling [Class: ThompsonSampling]{ //Implementation of the Thompson Sampling algorithm
      predict(), //returns next action based off of the thompson sampling
      learn(), //single step training method
      train(), //training method for thompson sampling calculations
    },
  },
  nlp:{ //see https://github.com/NaturalNode/natural
    ColumnVectorizer [Class: ColumnVectorizer]{ //class creating sparse matrices from a corpus
      get_tokens(), // Returns a distinct array of all tokens after fit_transform
      get_vector_array(), //Returns array of arrays of strings for dependent features from sparse matrix word map
      fit_transform(options), //Fits and transforms data by creating column vectors (a sparse matrix where each row has every word in the corpus as a column and the count of appearances in the corpus)
      get_limited_features(options), //Returns limited sets of dependent features or all dependent features sorted by word count
      evaluateString(testString), //returns word map with counts
      evaluate(testString), //returns new matrix of words with counts in columns
    }
  },
  csv:{
    loadCSV: [Function: loadCSV], //asynchronously loads CSVs, either a filepath or a remote URI
    loadTSV: [Function: loadTSV], //asynchronously loads TSVs, either a filepath or a remote URI
  },
  model_selection: {
    train_test_split: [Function: train_test_split], // splits data into training and testing sets
    cross_validation_split: [Function: kfolds], //splits data into k-folds
    cross_validate_score: [Function: cross_validate_score],//test model variance and bias
    grid_search: [Function: grid_search], // tune models with grid search for optimal performance
  },
  DataSet [Class: DataSet]: { //class for manipulating an array of objects (typically from CSV data)
    columnMatrix(vectors), //returns a matrix of values by combining column arrays into a matrix
    columnArray(columnName, options), // - returns a new array of a selected column from an array of objects, can filter, scale and replace values
    columnReplace(columnName, options), // - returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values
    columnScale(columnName, options), // - returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet
    columnDescale(columnName, options), // - Returns a new array of descaled values
    selectColumns(columns, options), //returns a list of objects with only selected columns as properties
    labelEncoder(columnName, options), // - returns a new array and label encodes a selected column
    labelDecode(columnName, options), // - returns a new array and decodes an encoded column back to the original array values
    oneHotEncoder(columnName, options), // - returns a new object of one hot encoded values
    columnMatrix(columnName, options), // - returns a matrix of values from multiple columns
    columnReducer(newColumnName, options), // - returns a new array of a selected column that is passed a reducer function, this is used to create new columns for aggregate statistics
    columnMerge(name, data), // - returns a new column that is merged onto the data set
    filterColumn(options), // - filtered rows of data,
    fitColumns(options), // - mutates data property of DataSet by replacing multiple columns in a single command
    static reverseColumnMatrix(options), // returns an array of objects by applying labels to matrix of columns
    static reverseColumnVector(options), // returns an array of objects by applying labels to column vector
  },
  calc:{
    getTransactions: [Function getTransactions], // Formats an array of transactions into a sparse matrix like format for Apriori/Eclat
    assocationRuleLearning: [async Function assocationRuleLearning], // returns association rule learning results using apriori
  },
  util: {
    range: [Function], // range helper function
    rangeRight: [Function], //range right helper function
    scale: [Function: scale], //scale / normalize data
    avg: [Function: arithmeticMean], // aritmatic mean
    mean: [Function: arithmeticMean], // aritmatic mean
    sum: [Function: sum],
    max: [Function: max],
    min: [Function: min],
    sd: [Function: standardDeviation], // standard deviation
    StandardScalerTransforms: [Function: StandardScalerTransforms], // returns two functions that can standard scale new inputs and reverse scale new outputs
    MinMaxScalerTransforms: [Function: MinMaxScalerTransforms], // returns two functions that can mix max scale new inputs and reverse scale new outputs
    StandardScaler: [Function: StandardScaler], // standardization (z-scores)
    MinMaxScaler: [Function: MinMaxScaler], // min-max scaling
    ExpScaler: [Function: ExpScaler], // exponent scaling
    LogScaler: [Function: LogScaler], // natual log scaling
    squaredDifference: [Function: squaredDifference], // Returns an array of the squared different of two arrays
    standardError: [Function: standardError], // The standard error of the estimate is a measure of the accuracy of predictions made with a regression line
    coefficientOfDetermination: [Function: coefficientOfDetermination],
    adjustedCoefficentOfDetermination: [Function: adjustedCoefficentOfDetermination],
    adjustedRSquared: [Function: adjustedCoefficentOfDetermination],
    rBarSquared: [Function: adjustedCoefficentOfDetermination],
    r: [Function: coefficientOfCorrelation],
    coefficientOfCorrelation: [Function: coefficientOfCorrelation],
    rSquared: [Function: rSquared], //r^2
    pivotVector: [Function: pivotVector], // returns an array of vectors as an array of arrays
    pivotArrays: [Function: pivotArrays], // returns a matrix of values by combining arrays into a matrix
    standardScore: [Function: standardScore], // Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.
    zScore: [Function: standardScore], // alias for standardScore.
    approximateZPercentile: [Function: approximateZPercentile], // approximate the p value from a z score
  },
  preprocessing: {
    DataSet: [Class DataSet],
  },
}

Examples (JavaScript / Python / R)

Loading CSV Data

Javascript

import { default as jsk } from 'modelscript';
let dataset;

//In JavaScript, by default most I/O Operations are asynchronous, see the notes section for more
ms.loadCSV('/some/file/path.csv')
  .then(csvData=>{
    dataset = new ms.DataSet(csvData);
    console.log({csvData});
    /* csvData [{
      'Country': 'Brazil',
      'Age': '44',
      'Salary': '72000',
      'Purchased': 'N',
    },
    ...
    {
      'Country': 'Mexico',
      'Age': '27',
      'Salary': '48000',
      'Purchased': 'Yes',
    }] */
  })
  .catch(console.error);

// or from URL
ms.loadCSV('https://example.com/some/file/path.csv')

Python

import pandas as pd

#Importing the dataset
dataset = pd.read_csv('/some/file/path.csv')

R

# Importingd the dataset
dataset = read.csv('Data.csv')

Handling Missing Data

Javascript

//column Array returns column of data by name
// [ '44','27','30','38','40','35','','48','50', '37' ]
const OringalAgeColumn = dataset.columnArray('Age'); 

//column Replace returns new Array with replaced missing data
//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const ReplacedAgeMeanColumn = dataset.columnReplace('Age',{strategy:'mean'}); 

//fit Columns, mutates dataset
dataset.fitColumns({
  columns:[{name:'Age',strategy:'mean'}]
});
/*
dataset
class DataSet
  data:[
    {
      'Country': 'Brazil',
      'Age': '38.77777777777778',
      'Salary': '72000',
      'Purchased': 'N',
    }
    ...
  ]
*/

Python

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy = 'mean', axis=0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

R

# Taking care of t

Modelscript

Install / Use

README

⚠️ REPO MOVED TO https://github.com/repetere/jsonstack-data ⚠️

ModelScript

Description

Installation

Full Documentation

Usage (basic)

Examples (JavaScript / Python / R)

Loading CSV Data

Javascript

Python

R

Handling Missing Data

Javascript

Python

R