SkillAgentSearch skills...

Predictit

Library/framework for making predictions. Automatically choose best models (ARIMA, regressions, MLP, LSTM...) from libraries like Scikit, Statsmodels or Tensorflow. Preprocess data and chose optimal parameters of prediction.

Install / Use

/learn @Malachov/Predictit
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

predictit

Binder Python versions PyPI version Downloads Language grade: Python Documentation Status License: MIT codecov

Library/framework for making time series predictions. Choose the data, choose the models (ARIMA, regressions, LSTM...) from libraries like statsmodels, scikit-learn, tensorflow. Do the setup (it's not necessary of course, you can use some preset) and predict.

Library contain model Hyperparameter optimization as well as option variable optimization. That means, that library can find optimal preprocessing (smoothing, dropping non correlated columns, standardization) and on top of that it can find optimal hyperparameters such as number of neuron layers.

Output

Most common output is plotly interactive graph and object with results array, results with history etc...

<p align="center"> <img src="docs/source/_static/img/output_example.png" width="620" alt="Plot of results"/> </p> <p align="center"> <img src="docs/source/_static/img/table_of_results.png" width="620" alt="Table of results"/> </p>

Links

Repo on github

Official readthedocs documentation

Installation

Python >=3.6 (Python 2 is not supported).

Install just with

pip install predictit

Sometimes you can have issues with installing some libraries from requirements (e.g. numpy because not BLAS / LAPACK). There are also two libraries - Tensorflow and pyodbc not in requirements, because not necessary, but troublesome. If library not installed with pip, check which library don't work, install manually with stackoverflow and repeat...

There are some libraries that not every user will be using (e.g. Tensorflow or libraries for some data inputs). If you want to be sure to have all libraries, you can download requirements_advanced.txt and then install advanced requirements with pip install -r requirements_advanced.txt.

Library was developed during 2020 and structure and even API (configuration) changed a lot. From version 2.0 it's considered to be stable and following semantic versioning.

How to

Software can be used as a python library or with command line arguments or as normal python script. Main function is predict in main.py script. There is also predict_multiple_columns function if you want to predict more at once (columns or time frequencies) and also compare_models function that tell you which models are best. It evaluates error criterion on out of sample test data instead of predict (which use as much data as possible). Some models, for example decision trees just assign input from learning set, so error in predict is 0, in compare_models its accurate. So first is optimal to use compare_models, find best models and then use it in predict.

Try live demo - playground on binder

Config

Import libraries

<!--phmdoctest-setup-->
import predictit
import numpy as np
import pandas as pd

from predictit import config

and type config., then, if not automatically, use ctrl + spacebar to see all subcategories and in subcategories, you can see description in the docstrings for all the configurable values.

<p align="center"> <img src="docs/source/_static/img/config_intellisense.png" width="620" alt="GUI"/> </p>

You can edit config in two ways

  1. As object attributes

You can use subcategories like general, data_input, output

config.data_input.data = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv'

You can also use config instance directly for all the attributes and omit the subcategories (though without intellisense help).

config.datetime_column = 'Date'  # Will be used for resampling and result plot description
config.freq = "D"  # One day - one value resampling
  1. Multiple parameters at once with dictionary and update function
config.update({
    'datalength': 300,  # Used datalength
    'predicts': 14,  # Number of predicted values
    'default_n_steps_in': 12  # Value of recursive inputs in model (do not use too high - slower and worse predictions)
})

# After if you setup prediction as needed, it's simple

predictions = predictit.predict()

If you need create more configurations and don't want to override its values, you can create multiple instances, but you need to insert this new config as function parameter

other_config = config.copy()  # or predictit.configuration.Config()
other_config.predicts = 30  # This will not affect config for other examples
predictions_3 = predictit.predict(config=other_config)

Simple example of using predictit as a python library and function arguments

Although there are many config variables, defaults should be enough.

predictions_1 = predictit.predict(data=np.random.randn(100, 2), predicted_column=1, predicts=3)

There are only two positional arguments data and predicted_column(because, there is more than a hundred configurable values). So you can use also

my_data = pd.DataFrame(np.random.randn(100, 2), columns=['a', 'b'])
predictions_1_positional = predictit.predict(my_data, 'b')

Simple example of using main.py as a script

Open configuration.py (only script you need to edit (very simple)), do the setup. Mainly used_function and data or data_source and path. Then just run main.py.

Simple example of using command line arguments

Run code below in terminal in predictit repository folder. Use python predictit/main.py --help for more parameters' info.

python predictit/main.py --used_function predict --data 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv' --predicted_column 'Temp'

Example of compare_models function

You can compare it on same data in various parts or on different data (check configuration on how to insert dictionary with data names)

my_data_array = np.random.randn(200, 2)  # Define your data here

config.update({
    'data_all': {'First part': (my_data_array[:100], 0), 'Second part': (my_data_array[100:], 1)},
    'predicted_column': 0
})
compared_models = predictit.compare_models()

Example of predict_multiple function

config.data = np.random.randn(120, 3)
config.predicted_columns = ['*']  # Define list of columns or '*' for predicting all of the numeric columns
config.used_models = ['Conjugate gradient', 'Decision tree regression']  # Use just few models to be faster

multiple_columns_prediction = predictit.predict_multiple_columns()

Example of config variable optimization

config.update({
    'data': "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv",
    'predicted_column': 'Temp',
    'datalength': 120,
    'optimization': True,
    'optimization_variable': 'default_n_steps_in',
    'optimization_values': [4, 6, 8],
    'plot_all_optimized_models': False,
    'print_table': 'detailed',  # Print detailed table
    'print_result_details': True,
    'used_models': ['AR', 'Sklearn regression']
})

predictions_optimized_config = predictit.predict()

Hyperparameters tuning

To optimize hyperparameters, just set optimizeit: 1, and model parameters limits. It is commented in it's docstrings how to use it. It's not grid bruteforce. It is a heuristic method based on halving interval is used, but still it can be time-consuming. It is recommended only to tune parameters worth of it. Or tune it by parts.

GUI

It is possible to use basic GUI. But only with CSV data source. Just run gui_start.py if you have downloaded software or call predictit.gui_start.run_gui() if you are importing via PyPI. Screenshot of such a GUI

<p align="center"> <img src="docs/source/_static/img/GUI.png" width="620" alt="GUI"/> </p>

Better GUI with fully customizable settings will be shipped next year, hopefully.

Categorical embeddings

It is also possible to use string values in predictions. You can choose config values 'embedding' 'label' and every unique string will be assigned unique number, 'one-hot' create new column for every unique string (can be time-consuming).

Feature engineering

For feature derivation, you can use difference transformations, first and second order differences, multiplications of columns, rolling mean, rolling standard deviations and also rolling fourier transform.

Feature selection is under development right now :[

Data preprocessing, plotting and other Functions

You can use any library functions separately for your needs of course. mydatapreprocessing, mylogging and mypythontools are my other projects, which are used heavily. Example is here


import mydatapreprocessing as mdp
from mypythontools.plots import plot
from predictit.analyze import analyze_column

da
View on GitHub
GitHub Stars9
CategoryEducation
Updated4mo ago
Forks0

Languages

Python

Security Score

87/100

Audited on Nov 15, 2025

No findings