47 skills found · Page 1 of 2
uber / PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
jadianes / Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Apress / Machine Learning With PysparkSource Code for 'Machine Learning with PySpark' by Pramod Singh
hyunjoonbok / PySparkPySpark functions and utilities with examples. Assists ETL process of data modeling
XD-DENG / Spark ML IntroPySpark Machine Learning Examples
asifahmed90 / Pyspark ML In ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
nikhitmago / Lookalike ModellingFinding customer lookalikes using Machine Learning in PySpark
alanchn31 / Loan Default PredictionLoan Default Prediction using PySpark, with jobs scheduled by Apache Airflow and Integration with Spark using Apache Livy
edyoda / Machine Learning Using PysparkLearn Machine Learning using PySpark from scratch
eswarchandt / Machine Learning Algorithms With PysparkIn this Complete process in machine learning is discussed and done with pyspark .
Upasna22 / Twitter Sentiment Analysis Using Apache Spark Accessed the Twitter API for live streaming tweets. Performed Feature Extraction and transformation from the JSON format of tweets using machine learning package of python pyspark.mllib. Experimented with three classifiers -Naïve Bayes, Logistic Regression and Decision Tree Learning and performed k-fold cross validation to determine the best.
RishiSankineni / Machine Learning Pipeline LR PysparkPower Plant ML Pipeline Application - Apache Spark
marcgarnica13 / Ml Interpretability European FootballUnderstanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.
huseinzol05 / Pyspark MLGathers data science and machine learning problem solving using PySpark and Hadoop.
naenumtou / DataScienceLabAll statistical models / machine learning / computer vision / financial models / NLP / PySpark / python techniques / library tutorials can be found here.
Anam-Mahmood / Introduction To Big Data Analysis Machine Learning In Python With PySparkNo description available
pinarersoy / PySpark SparkSQL MLibIncludes several examples of data manipulation techniques by using PySpark and machine learning algorithms using MLib
Foroozani / BigData PySpark:bangbang: Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh
wangruinju / PySpark Machine LearningA collection of machine learning examples using PySpark
shashankmrao / Data Science Mini Projects Using Apache SparkSparkSQL, ETL, Machine Learning, Deep Learning, Time Series Analysis, Computer Vision, and Natural Language Processing exercises with Apache PySpark