Results for "preprocessing-library"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

129 skills found · Page 4 of 5

helloooideeeeea / RealTimeCutVADLibraryForAndroid

Real-time Voice Activity Detection (VAD) library for Android using Silero models powered by ONNX Runtime. Includes advanced noise suppression and audio preprocessing with WebRTC APM, supporting seamless WAV data output with header metadata.

universal

aarandroidonnxruntime+1

Updated 3mo ago

AlinaBaber / Arabic Speech Recognition By Machine Learning And Feature Extraction

This project implements an Arabic Speech Recognition system using an ensemble voting classifier. The model is built with Python and utilizes the Librosa library for preprocessing and feature extraction.

universal

arabic-speech-recognitionjupyter-notebookmachine-learning+4

Updated 4mo ago

L-A-Sandhu / TimeMesh

A Python library for time series data preprocessing featuring advanced windowing strategies, normalization, and dataset splitting. Perfect for preparing time-dependent data for LSTM, Transformer, and other sequence models.

universal

dataset-windowingtemporal-data-normalizationtime-series-machine-learning+2

Updated 9d ago

SPINLab / Deep Geometry

A python library for preprocessing geospatial vector geometries for use in deep learning

universal

Updated 7mo ago

MedMaxLab / BIDSAlign

BIDSAlign a library for preprocessing and align multiple datasets to a common template.

universal

bids-formatchannel-systemdeep-learning+3

Updated 7d ago

MuhammadNoman76 / LughaatNLP

LughaatNLP: First Urdu language preprocessing library in Pakistan. Tokenization, lemmatization, stop word removal, and normalization for Urdu text. Join us to advance Urdu NLP! #OpenSource #UrduLanguage

universal

Updated 13d ago

Malachov / Predictit

Library/framework for making predictions. Automatically choose best models (ARIMA, regressions, MLP, LSTM...) from libraries like Scikit, Statsmodels or Tensorflow. Preprocess data and chose optimal parameters of prediction.

universal

automlforecastingmachine-learning+1

Updated 4mo ago

tdemareuil / Totalsat

Python package to easily download and preprocess Sentinel-2 and Landsat 5-7-8 images directly in a Jupyter notebook. Also includes additional satellite-related tools such as image labelling, image splitting, map data plotting, reverse geocoding, etc. Built on top of GDAL, Google Earth Engine and the coastsat library.

universal

Updated 9mo ago

FanaticPythoner / AutoAi

AI automation library that allows automatic training for a large amount of differents models and automatic data preprocessing

universal

artificial-intelligenceartificial-neural-networksauto-trainer+6

Updated 1y ago

cschell / Motion Learning Toolbox

Python library for preprocessing of XR motion tracking data for machine learning applications.

universal

Updated 1mo ago

ankishb / Ml Toolbox

This repo contains various data science strategy and machine learning models to deal with structure as well as unstructured data. It contains module on feature-preprocessing, feature-engineering, machine-learning-models, bayesian-parameter-tuning, etc, built using libraries such as scikit-learn, keras, h2o, xgboost, lightgbm, catboost, etc.

universal

bayesian-optimizationdata-scienceensemble-model+6

Updated 1y ago

VikasSingh-DS / Movies Reviews Bert Sentiment Flask API

Fine-tune BERT for sentiment analysis. I have done text preprocessing (special tokens, padding, and attention masks) and build a Sentiment Classifier using the amazing Transformers library by Hugging Face! I have train my model on kaggle notebook on gpu. The model give the accuracy of 95.14% on validation dataset.

universal

bert-modelflaskgpu+3

Updated 1y ago

ashinde8 / Data Preprocessing And Machine Learning

- The dataset consists of 1042 rows and 20 columns. This is a regression problem where we can the target variable is 'price' which I have predicted using Machine Learning Modeling. - Dropped the columns 'id', 'time_created','time_updated','external_id','url','latitude' and 'longitude' from the dataset, as these variables do not provide information significant in modeling. - Here I have observed that the variable 'status' has only one value throughout the dataset i.e. 'active', hence I have can drop this variable as it is not providing us significant information. - I observed that the variables 'bedrooms' ,'bathrooms', 'garages' ,'parkings' ,'offering' ,'erf_size' ,' floor_size' have missing values and the target variable 'price' also has missing values. Hence I took care of this by filling the missing values of the independent features and the target variable. - After making the above observation I filled the two rows which have value '[None]' in the property_type column with 'house' as the value for the'agency' variable for these rows is 'rawson' and the mode for the variable 'property_type' for the agency 'rawson' is 'house' and also mode for the 'property_type' variable for the area 'Constantia' is also 'house' - Predicted the missing Values Using Imputers From sklearn.preprocessing - Here I used the KNNImputer to fill the missing values in the variables 'price', "garages","parkings","erf_size","floor_size" by predicting the values using the KNNImputer library. - We go through a range of values from 1 to 20, for the parameter 'n_neighbors' in the KNNImputer, as we want to find which value of 'n_neighbors' gives the maximum value of correlation between the target variable 'price' and the feature 'floor_size'. The reason I have selected the variable 'floor_size' to calculate the correlation with the target variable 'price' is that, before imputing the missing values the target variable 'price' had the highest corrleation with the independent variable 'floor_size' which was 0.5319914806523912. Now I am finding the maximum correaltion value between the target variable 'price' and the variable 'floor_size' after the missing values are imputed using the KNNImputer, for different values of the parameter 'n_neighbors' and then compare it with 0.5319914806523912, whcih is the correlation for the original dataset whcih consists of missing values. - Here we observe that the maximum correlation between the target variable 'price' and the independent variable 'floor_size' is 0.4233518730063556, when the value for 'n_neighbors' is 6. This value is less than the value of correlation for the orignal dataset, hence we move on to another Imputer to fill the missing values as after the missing values were filled using the KNNImputer the correlation decreased whcih is not desirable. - Here we observe that the correlation between the target variable 'price' and the independent variable 'floor_size' is 0.6703992976511615 after the imputation of missing values using IterativeImpueter. This value is more than the correlation value for the original dataset. Hence we allow the imputation of the missing values using IterativeImputer into the orignal dataset. - Now while filling the variable 'bathrooms' and 'bedrooms'; there are 4 and 14 NaN values respectively. Hence I have decided to fill the values on a case by case basis. I have decided to fill the 'NaN' values based on their 'property_type'. So for filling the 'bathrooms' variable which has 'property_type' as 'house', I have filled these values with the mode for the 'bathrooms' and 'bedrooms' variable. Similarly I have done the same for the other 'property_type' 'apartment'. - Performed Data Visualizations for the features to draw more insights. - Here, you can see outliers in the target variable 'price' from the above figure. While price outliers would not be a concern because it is the target feature,the presence of outliers in predictors, in this case there aren't any, would affect the model’s performance. Detecting outliers and choosing the appropriate scaling method to minimize their effect would ultimately improve performance. - From the correlation matrix, we can see that there is varying extent to which the independent variables are correlated with the target. Lower correlation means weak linear relationship but there may be a strong non-linear relationship so, we can’t pass any judgement at this level, let the algorithm work for us. - Build the regression models Linear Regression, XGBoost, AdaBoost, Decision Tree, Random Forest, KNN and SVM. - Performed Hyperparameter tuning for all the above algorithms. - Predicted the prices using the above models and used the metrics RMSE, R -square and Adjusted R-square. - As expected, the Adjusted R² score is slightly lower than the R² score for each model and if we evaluate based on this metric, the best fit model would be XGBoost with the highest Adjusted R² score and the worst would be SVM Regressor with the least R² score. - However, this metric is only a relative measure of fitness so, we must look at the RMSE values. - In this case, XGBoost and SVM have the lowest and highest RMSE values respectively and the rest models are in the exact same order as their Adjusted R² scores.

KatherLab / Llmaixlib

Python library for document preprocessing and information extraction

universal

Updated 2mo ago

drewmee / PyEEM

Python library for the preprocessing, correction, deconvolution and analysis of Excitation Emission Matrices (EEMs).

universal

eemenvironmental-monitoringexcitation-emission-matrix+3

Updated 5mo ago

AlinaBaber / Arabic Speech Recognition By Machine Learning And Feature Extraction API

universal

arabicarabic-speech-recognitionflask+3

Updated 1y ago

Saeed-Engr / Machine Learning

In this repository , all python libraries including python, numpy, pandas, matplotlib, seaborn, tensorflow, keras, scikit learn(sklearn), data preprocessing , data cleaning, data visualization, computer vision and natural language processing etc data, codes, projects, files are included

universal

Updated 3mo ago

Broad-sky / Mearching Learning Project

⭐This warehouse mainly describes data understanding, data visualization, data preprocessing, feature selection, model selection, model evaluation and the use of various machine learning algorithms based on sklearn library (univariate feature selection, recursive feature elimination, principal component analysis, decision tree , random forest, GBDT family (boosting algorithm)).

universal

Updated 12d ago

NayeemHossenJim / Machine Learning

📚 A curated collection of Machine Learning projects and algorithm implementations in Python. Focused on foundational ML concepts, data preprocessing, and model building using libraries like scikit-learn and pandas. Developed as part of my continuous learning and portfolio-building in applied machine learning.

universal

Updated 4mo ago

virajbhutada / Google Stock Price Forecasting Lstm

Analyzing and predicting Google's stock prices through detailed data exploration and advanced LSTM models. This project involves data preprocessing, creating time-series sequences, constructing and training LSTM networks, and evaluating their performance to forecast future stock prices utilizing Python and Machine Learning libraries.

universal

data-analysisdata-sciencedata-visualization+14

Updated 9d ago