Results for "pyspark-machine-learning"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

47 skills found · Page 1 of 2

uber / Petastorm

1.9k

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

universal

deep-learningmachine-learningparquet+6

Updated 3d ago

jadianes / Spark Py Notebooks

1.7k

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

universal

big-databigdatadata-analysis+9

Updated 4d ago

Apress / Machine Learning With Pyspark

118

Source Code for 'Machine Learning with PySpark' by Pramod Singh

universal

Updated 5d ago

hyunjoonbok / PySpark

104

PySpark functions and utilities with examples. Assists ETL process of data modeling

universal

hadooppysparkpyspark-api+5

Updated 8mo ago

XD-DENG / Spark ML Intro

PySpark Machine Learning Examples

universal

machine-learningspark

Updated 8mo ago

asifahmed90 / Pyspark ML In Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

universal

colab-notebookhadoopmachine-learning-algorithms+5

Updated 1mo ago

nikhitmago / Lookalike Modelling

Finding customer lookalikes using Machine Learning in PySpark

universal

machine-learningpysparkspark

Updated 28d ago

alanchn31 / Loan Default Prediction

Loan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy

universal

airflowairflow-dagsairflow-plugins+14

Updated 26d ago

edyoda / Machine Learning Using Pyspark

Learn Machine Learning using PySpark from scratch

universal

Updated 2mo ago

eswarchandt / Machine Learning Algorithms With Pyspark

In this Complete process in machine learning is discussed and done with pyspark .

universal

Updated 1mo ago

Upasna22 / Twitter Sentiment Analysis Using Apache Spark

Accessed the Twitter API for live streaming tweets. Performed Feature Extraction and transformation from the JSON format of tweets using machine learning package of python pyspark.mllib. Experimented with three classifiers -Naïve Bayes, Logistic Regression and Decision Tree Learning and performed k-fold cross validation to determine the best.

universal

Updated 9mo ago

RishiSankineni / Machine Learning Pipeline LR Pyspark

Power Plant ML Pipeline Application - Apache Spark

universal

apache-sparkedx-coursepyspark

Updated 1y ago

marcgarnica13 / Ml Interpretability European Football

Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.

huseinzol05 / Pyspark ML

Gathers data science and machine learning problem solving using PySpark and Hadoop.

universal

Updated 1y ago

naenumtou / DataScienceLab

All statistical models / machine learning / computer vision / financial models / NLP / PySpark / python techniques / library tutorials can be found here.

universal

aicomputer-visionfinancial-analysis+7

Updated 8mo ago