Bashml

BashML is a Machine Learning library developed entirely in bash

Generate Convert Improve

Install / Use

/learn @cablelabs/Bashml

About this skill

Quality Score

0/100

README

BashML Library

BashML is a Machine Learning library developed entirely in bash.

The purpose of the library is to run on embedded or restricted Linux devices that lack support for higher-level programming languages and where disk space and memory may be limited or package installation not possible or difficult. The benefits include:

zero install
zero config
single dependency (bash)
easy to use
composition friendly (pipes)
universally applicable to multitude of ML problems
multi-platform
light-weight (~800 lines of code, ~19kB)

The solution has been tested on MacOS, Ubuntu Linux, Windows (Git-bash), Android, RaspberryPi, and OpenWrt.

The following models are currently supported.

Multiple Linear Regression (MLR), supervised
Q-table Reinforement Learning, a.k.a. Q-learning (QL), unsupervised
Decision Trees (DT), supervised
Random Forest (RF), supervised
Logistic Regression (LR), supervised

General Design

BashML is designed to take advatage of pipes and basic input/output handling.

Calculations are done with the bc command as bash only supports integer arithmetic natively. Simple sed,awk, and tr commands, combined with cat, echo and printf comprise the bulk of the code. All these commands are pre-installed with bash on most Unix/Linux systems, including MacOS.

Multipe Linear Regression

MLR models are simple yet powerful, in that most supervised feature and output vector ML formulations can be expressed as MLR models. BashML supports training and execution of MLR models using ordinary least squares (OLS) fitting.

See: ols,mlr

Qlearning

Similarly QL is the simplest form or reinforcement learning, yet it is generally applicable to many RL problems.

See: ql

Decision Trees and Random Forest

Desicion Trees also lend themselves to simple training and execution of many classification problems. BashML implements the ID3 algorithm as well as a Random Forest feature bagging algorithm.

See: id3,dt,rf

Logistic Regression

LR models are popular when the outcomes are probabilities of events. Although similar to MLR the fitting is quite different as it uses gradient descent.

See: lr

Generic Math Functions

To avoid external dependencies some generic math functions have been implemented, including:

entropy,gain,maxgain used to build decision trees.
dot,mult,transpose,invert used to build MLR and LR models

Portability Functions

Although many of the basic built-in utilities on Linux are standardized by POSIX or defacto standard in GNU Core utilities they may not all be present on some of the stripped down flavors and forks of Linux. Hence, some simple portability functions were implemented to provide cross-platform support without having to install additional packages. The bc, shuf, and diff commands have been reimplemented with simpler awk-based commands as bcg, shufg and diffg. The bc command is frequently used and performance sensitive, so the tools will detect if it is present on the system and if so use it instead of the bcg version based on awk, which tends to be a bit slower.

Usage Example

To use the library first put the bin directory in your path, e.g.:

export PATH=`pwd`/bin:PATH

To train an MLR model put the training feature data in a file, called, e.g. X:

and the training output data in a file called, e.g. y:

Now train the model as follows:

cat X | bashml ols $(cat y | arg) > beta

The fitted coefficients would then be written to a file called beta.

Now you can make predictions with the model as follows:

echoe "1 5 90" | bashml mlr beta

echoe is just a short hand for echo -e. You may pass in many feature vectors to predict in a batch to mlr.

Matrices are passed into commands as space-separated columns and newline separated rows on stdin, or as an argument using space separated columns and comma separated rows. There is a convenience function called arg that allows for conversion between stdin represenation and arg representation.

Command Reference

ols

Fits a feature matrix and output vector using olr and returns estimated coefficients for prediction. Fearure matrix is passed on stdin and output vector on arg. Example:

echoe "3 3.5\n3.2 3.6" | bashml ols 1,2 > betatest

mlr

Predicts using a MLR model trained with the ols command. Feature matrix is input and name of file that holds coefficent predictions from ols is the only arg. Example:

echoe "1 5" | bashml mlr betatest

ql

This command performs Q-table based reinforcement learning. A qtable can be initiated with:

bashml ql <ACTIONS> <STATES>

The output is a 0-filled matrix that should be passed in to the command for training. To train pass in the curretn qtable on stdin as follows:

echoe "$qtable" | bashml ql <action> <state> <reward> <next_state> <alpha> <gamma>

To make prediction pass in the qtable on stdin and the current state as the only arg.

echoe "$qtable" | bashml ql <current_state>

The following example initializes a qtable with 3 actions and 3 states, trains it by setting reward 0.234 for action 2 and state 3 and next_state 2, with alpha 0.1 and gamma 0.1. Finally it predicts the best action to take in state 3.

bashml ql 3 3 | bashml ql 2 3 0.234 2 0.1 0.1 | bashml ql 3

id3

This command creates a decision tree encoded as a bash script. To train a model you pass in a table with headers and the last column as decision variable. E.g.

cat data/decision.dat | MAXLEVEL=5 bashml id3 > decision_tree.sh && chmod u+x decision_tree.sh

MAXLEVEL denotes the number of levels of the tree generated. The default is 10. Predictions can then be made as follows:

Outlook=Rain Wind=Strong ./decision_tree.sh

All headers in the training table become environment variables during prediction.

dt

This command makes a prediction for a decision tree model encoded as a bash script (see id3 command above). The command takes as input on stdin a table with a header where the columns correspond to the parameters in the model, and the path to the model file as the only command line argument. For example:

cat data/student_predict.dat | bashml dt data/student_scores_example_tree.sh

rf

This command trains a random forest of ID3 trees:

cat data/student_scores.dat | bashml rf TREE_ID

where TREE_ID is the tree ID that needs to be used for predictions.

TREE_COLUMNS and TREES can be passed as environment variables to specify how many feature columns should be samples in each tree and how may trees should be generated, respectively.

To make predictions pass the environment variable PREDICT=yes to pick the majority classification. It is equivalent to the dt command for individual trees.

cat data/student_predict.dat | PREDICT="yes" bashml rf TREE_ID

where TREE_ID is the tree ID that was used during training.

entropy

This command takes a two column table as input on stdin and computes the entropy of column 1 on column 2.

echoe "1 2\n3 4" | bashml entropy

gain

This command takes a two column table as input on stdin and computes the information gain of all attribute values of column 1 on column 2.

echoe "1 2\n3 4" | bashml gain

maxgain

This command takes a multi column table as input on stdin and computes the column with max information gain on the last column.

echoe "1 2 4\n3 4 5" | bashml maxgain

lr

This command trains and predicts with Logistic Regression models. As input you pass in the feature matrix on stdin as a space separated columns and line separated row table. The outcome vector which must be of the same size as the number of rows in the feature matrix is passed in as a command line argument where each element is separated by space.

To train a model use the following:

cat data/logreg_example.dat | ITERATIONS=2 bashml lr 1 0 1 0

It will output the model parameters, ofthen referred to as theta that then can be passed in during predictions as follows:

cat data/logreg_test.dat | PREDICT="yes" bashml lr 0.0525194  -0.0092348  -0.01250607

Predictions are probabilities, where the returned vector elements correspond to the rows in the input.

dot

Computes the dot product of two vectors passed in as rows in stdin format, e.g.:

echoe "1 2 3\n4 5 6" | bashml dot

mult

Multiplies two matrices. First matrix is passed in stdin format and second in arg format, e.g.:

echoe "1 2 3\n4 5 6" | bashml mult 2,3,4

invert

Computes the inverse of a matrix, e.g.:

echoe "3 3.5\n3.2 3.6" | bashml invert

transpose

transposes a matrix, e.g.:

echoe "3 3.5\n3.2 3.6" | bashml transpose

shufg

Portable equivalent to shuf. Randomizes order of lines passed on stdin.

echoe "1\n2\n3" | shufg

bcg

Portable equivalent to bc. Performs floating point arithmetic based on specified precision. Will use bc internally if available.

echo "scale=2;1.2/5.6" | bcg

diffg

Portable equivalent to diff. Computes diff between content passed as stdin and files passed as only argument.

cat file1 | bashml diffg file2 && "echo files are the same"

General Notes

The eliminate and identity commands are used internally by the invertcommand and should not be used directly.

Calculati

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding