Bashml
BashML is a Machine Learning library developed entirely in bash
Install / Use
/learn @cablelabs/BashmlREADME
BashML Library
BashML is a Machine Learning library developed entirely in bash.
The purpose of the library is to run on embedded or restricted Linux devices that lack support for higher-level programming languages and where disk space and memory may be limited or package installation not possible or difficult. The benefits include:
- zero install
- zero config
- single dependency (bash)
- easy to use
- composition friendly (pipes)
- universally applicable to multitude of ML problems
- multi-platform
- light-weight (~800 lines of code, ~19kB)
The solution has been tested on MacOS, Ubuntu Linux, Windows (Git-bash), Android, RaspberryPi, and OpenWrt.
The following models are currently supported.
- Multiple Linear Regression (MLR), supervised
- Q-table Reinforement Learning, a.k.a. Q-learning (QL), unsupervised
- Decision Trees (DT), supervised
- Random Forest (RF), supervised
- Logistic Regression (LR), supervised
General Design
BashML is designed to take advatage of pipes and basic input/output handling.
Calculations are done with the bc command
as bash only supports integer arithmetic
natively. Simple sed,awk, and tr commands,
combined with cat, echo and printf
comprise the bulk of the code. All these
commands are pre-installed with bash on most
Unix/Linux systems, including MacOS.
Multipe Linear Regression
MLR models are simple yet powerful, in that most supervised feature and output vector ML formulations can be expressed as MLR models. BashML supports training and execution of MLR models using ordinary least squares (OLS) fitting.
Qlearning
Similarly QL is the simplest form or reinforcement learning, yet it is generally applicable to many RL problems.
See: ql
Decision Trees and Random Forest
Desicion Trees also lend themselves to simple training and execution of many classification problems. BashML implements the ID3 algorithm as well as a Random Forest feature bagging algorithm.
Logistic Regression
LR models are popular when the outcomes are probabilities of events. Although similar to MLR the fitting is quite different as it uses gradient descent.
See: lr
Generic Math Functions
To avoid external dependencies some generic math functions have been implemented, including:
- entropy,gain,maxgain used to build decision trees.
- dot,mult,transpose,invert used to build MLR and LR models
Portability Functions
Although many of the basic built-in utilities on Linux are standardized by POSIX or defacto standard in GNU Core utilities they may not all be present on some of the stripped down flavors and forks of Linux. Hence, some simple portability functions were implemented to provide cross-platform support without having to install additional packages. The bc, shuf, and diff commands have been reimplemented with simpler awk-based commands as bcg, shufg and diffg. The bc command is frequently used and performance sensitive, so the tools will detect if it is present on the system and if so use it instead of the bcg version based on awk, which tends to be a bit slower.
Usage Example
To use the library first put the bin
directory in your path, e.g.:
export PATH=`pwd`/bin:PATH
To train an MLR model put the training feature data
in a file, called, e.g. X:
1 7 560
1 3 220
1 3 340
1 4 80
1 6 150
1 7 330
and the training output data in a file
called, e.g. y:
16.68
11.50
12.03
14.88
13.75
18.11
Now train the model as follows:
cat X | bashml ols $(cat y | arg) > beta
The fitted coefficients would then be written
to a file called beta.
Now you can make predictions with the model as follows:
echoe "1 5 90" | bashml mlr beta
echoe is just a short hand for echo -e.
You may pass in many feature vectors to predict
in a batch to mlr.
Matrices are passed into commands as space-separated
columns and newline separated rows on stdin, or as an argument
using space separated columns and comma separated rows.
There is a convenience function called arg that
allows for conversion between stdin represenation and
arg representation.
Command Reference
ols
Fits a feature matrix and output vector using olr and returns estimated coefficients for prediction. Fearure matrix is passed on stdin and output vector on arg. Example:
echoe "3 3.5\n3.2 3.6" | bashml ols 1,2 > betatest
mlr
Predicts using a MLR model trained with the ols command. Feature matrix is input and name of file that holds coefficent predictions from ols is the only arg. Example:
echoe "1 5" | bashml mlr betatest
ql
This command performs Q-table based reinforcement learning. A qtable can be initiated with:
bashml ql <ACTIONS> <STATES>
The output is a 0-filled matrix that should be passed in to the command for training. To train pass in the curretn qtable on stdin as follows:
echoe "$qtable" | bashml ql <action> <state> <reward> <next_state> <alpha> <gamma>
To make prediction pass in the qtable on stdin and the current state as the only arg.
echoe "$qtable" | bashml ql <current_state>
The following example initializes a qtable with 3 actions and 3 states,
trains it by setting reward 0.234 for action 2 and state 3 and next_state 2,
with alpha 0.1 and gamma 0.1. Finally it predicts the best action to take in
state 3.
bashml ql 3 3 | bashml ql 2 3 0.234 2 0.1 0.1 | bashml ql 3
id3
This command creates a decision tree encoded as a bash script. To train a model you pass in a table with headers and the last column as decision variable. E.g.
cat data/decision.dat | MAXLEVEL=5 bashml id3 > decision_tree.sh && chmod u+x decision_tree.sh
MAXLEVEL denotes the number of levels of the tree generated. The default is 10.
Predictions can then be made as follows:
Outlook=Rain Wind=Strong ./decision_tree.sh
All headers in the training table become environment variables during prediction.
dt
This command makes a prediction for a decision tree model encoded as a bash script (see id3 command above). The command takes as input on stdin a table with a header where the columns correspond to the parameters in the model, and the path to the model file as the only command line argument. For example:
cat data/student_predict.dat | bashml dt data/student_scores_example_tree.sh
rf
This command trains a random forest of ID3 trees:
cat data/student_scores.dat | bashml rf TREE_ID
where TREE_ID is the tree ID that needs to be used for predictions.
TREE_COLUMNS and TREES can be passed as environment variables
to specify how many feature columns should be samples in each tree
and how may trees should be generated, respectively.
To make predictions pass the environment variable PREDICT=yes
to pick the majority classification.
It is equivalent to the dt command for individual trees.
cat data/student_predict.dat | PREDICT="yes" bashml rf TREE_ID
where TREE_ID is the tree ID that was used during training.
entropy
This command takes a two column table as input on stdin and computes the entropy of column 1 on column 2.
echoe "1 2\n3 4" | bashml entropy
gain
This command takes a two column table as input on stdin and computes the information gain of all attribute values of column 1 on column 2.
echoe "1 2\n3 4" | bashml gain
maxgain
This command takes a multi column table as input on stdin and computes the column with max information gain on the last column.
echoe "1 2 4\n3 4 5" | bashml maxgain
lr
This command trains and predicts with Logistic Regression models. As input you pass in the feature matrix on stdin as a space separated columns and line separated row table. The outcome vector which must be of the same size as the number of rows in the feature matrix is passed in as a command line argument where each element is separated by space.
To train a model use the following:
cat data/logreg_example.dat | ITERATIONS=2 bashml lr 1 0 1 0
It will output the model parameters, ofthen referred to as theta that then can be passed in during predictions as follows:
cat data/logreg_test.dat | PREDICT="yes" bashml lr 0.0525194 -0.0092348 -0.01250607
Predictions are probabilities, where the returned vector elements correspond to the rows in the input.
dot
Computes the dot product of two vectors passed in as rows in stdin format, e.g.:
echoe "1 2 3\n4 5 6" | bashml dot
mult
Multiplies two matrices. First matrix is passed in stdin format and second in arg format, e.g.:
echoe "1 2 3\n4 5 6" | bashml mult 2,3,4
invert
Computes the inverse of a matrix, e.g.:
echoe "3 3.5\n3.2 3.6" | bashml invert
transpose
transposes a matrix, e.g.:
echoe "3 3.5\n3.2 3.6" | bashml transpose
shufg
Portable equivalent to shuf. Randomizes order of lines passed on stdin.
echoe "1\n2\n3" | shufg
bcg
Portable equivalent to bc. Performs floating point arithmetic based on specified precision. Will use bc internally if available.
echo "scale=2;1.2/5.6" | bcg
diffg
Portable equivalent to diff. Computes diff between content passed as stdin and files passed as only argument.
cat file1 | bashml diffg file2 && "echo files are the same"
General Notes
The eliminate and identity commands
are used internally by the invertcommand
and should not be used directly.
Calculati
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
