129 skills found · Page 3 of 5
sergio11 / Online Payment Fraud🚨 Fraud Detection with Deep Neural Networks (PoC) 🤖 A hands-on personal project to predict fraudulent financial transactions using deep learning. Covers the full pipeline: from exploratory data analysis (EDA) and preprocessing to model training and evaluation. An experimental approach to tackling real-world financial fraud. 📊🔍
Dhanuraj-22 / Diabetes Prediction MLThis project implements an end-to-end machine learning pipeline to predict diabetes based on medical attributes. It includes data preprocessing, exploratory data analysis, Logistic Regression model training, and evaluation using accuracy, confusion matrix, and classification report.
rameshvs / Medical Imaging PipelinesTools for constructing data preprocessing & analysis pipelines for medical & neuroscientific imaging data
KalyanM45 / End To End Chest Disease ClassificationThis repository offers a comprehensive solution for chest disease detection, covering data ingestion, preprocessing, model training, and CI/CD deployment pipelines. From raw data to automated deployment, streamline your chest disease detection process with our end-to-end solution.
marcgarnica13 / Ml Interpretability European FootballUnderstanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.
mckellardw / Slide SnakeSnakemake pipeline for the preprocessing, alignment, QC, and quantification of spatial transcriptomics data - both short-read and long-read
caleblareau / ProatacPreprocessing pipeline for (sc)ATAC data
abcsFrederick / NGS Preprocessing PipelineNGS Pipelines for Preprocessing RNA-seq, Whole Genome Sequencing and Exome-seq, and miRNA-Seq Data
saitejabandaru-in / Excel Automation ToolkitExcel automation framework integrating VBA macros with Python (Pandas) pipelines for data preprocessing, reporting, and interactive business intelligence dashboards.
FelixLin99 / Kaggle EDFMASDAn exploration of "Kaggle-EEG data for Mental Attention State Detection". Construct a pipeline to preprocess data. Use Wavelet Packet Decomposition to extract time-frequency features of EEG-data. Finally, use SVM, CNN and LSTM to do the classification.
FGA-DIKU / BONSAIA BERT-based framework for processing and analyzing Electronic Health Records (EHR) data. It provides an end-to-end pipeline for data preprocessing, model training, and clinical outcome prediction.
NsElgezawy / VizionaryML VizionaryML is an end-to-end data analysis and machine learning project designed to turn data into insight. From cleaning and preprocessing to visualization, model training, and evaluation — this pipeline showcases the full data journey. Includes optional interactive dashboard using Streamlit.
10-OASIS-01 / Autoregressive Language ModelThis project is a comprehensive implementation of a Transformer-based language model. It encompasses the full pipeline of natural language modeling, including data preprocessing, model training, evaluation, and inference.
Dip3102001 / Paderborn DiagnosisA comprehensive repository for motor fault diagnosis experiments using the Paderborn Bearing Dataset. This project explores deep learning-based feature extraction, ensemble modeling (CNNs, Transformers), and data augmentation techniques to enhance fault classification. Includes automated pipelines for preprocessing, training, and evaluation.
dyneth02 / Breast Cancer Prediction Machine Learning AppA comprehensive machine learning application that predicts breast cancer malignancy using cytology measurements. Features an interactive Streamlit web interface with real-time visualizations including radar charts for cell nuclei analysis. Implements logistic regression with data preprocessing pipelines for accurate benign/malignant classification.
PeterAugustin243 / CNN Based Image ClassifierA deep learning-based image classifier built with MobileNetV2 to recognize shoes, clips, and toothbrushes. The project includes preprocessing, normalization, and advanced data augmentation for robust training. It features fine-tuned transfer learning and a prediction pipeline with confidence scoring.
renswickd / MLOps Hotel Revenue ManagementIt is an End-to-end MLOps pipeline for hotel reservation prediction leveraging GCP, LightGBM, MLFlow, Jenkins and Docker. This system automates data ingestion, preprocessing, feature selection, model training, and tracking, with a Flask-based frontend for real-time inference.
Spidy104 / PPG DIABETES CLASSIFICATIONThis repository provides a complete pipeline for non-invasive blood glucose estimation using Photoplethysmography (PPG) signals. It includes data preprocessing, feature extraction, machine learning model training, and result visualization to support research and development in biomedical signal analysis and diabetes screening.
Lahdhirim / NLP Sentiment Classification AwsA complete pipeline for sentiment analysis using Hugging Face Transformers and AWS services. The model can be run on both Streamlit Share Server and AWS (using S3 for storage and EC2 for deployment). This repository covers data preprocessing, model training, evaluation, and accurate sentiment prediction on reviews.
gojiplus / Text As DataPipeline for Analyzing Text Data: Acquire, Preprocess, Analyze