135 skills found · Page 1 of 5
uber / PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
jadianes / Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
spark-examples / Pyspark ExamplesPyspark RDD, DataFrame and Dataset Examples in Python language
krishnaik06 / Pyspark With PythonNo description available
cluster-apps-on-docker / Spark Standalone Cluster On DockerLearn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. :zap:
CamDavidsonPilon / Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
tirthajyoti / Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
ThreatHuntingProject / HunterA threat hunting / data analysis environment based on Python, Pandas, PySpark and Jupyter Notebook.
vivek-bombatkar / Spark With Python My Learning Notes ETL pipeline using pyspark (Spark - Python)
hyunjoonbok / PySparkPySpark functions and utilities with examples. Assists ETL process of data modeling
SuperJohn / Spark And Python For Big Data With PysparkCourse on Udemy by Jose Portilla
anguenot / Pyspark Cassandrapyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
asdspal / DimRedpython, scala, and pyspark code for few dimensional reduction algorithms
andre-salvati / Databricks TemplateA production-ready PySpark project template with medallion architecture, Python packaging, unit tests, integration tests, CI/CD automation, Databricks Asset Bundles, and DQX data quality framework.
datamole-ai / PysparkdtAn open-source Python library for simplifying local testing of Databricks workflows that use PySpark and Delta tables.
jmcurbelo / Pyspark Ingenieria De DatosEste repositorio contiene el material del curso de Udemy Big Data y Spark: ingeniería de datos con Python y pyspark. En este curso, aprenderás a utilizar las herramientas y técnicas necesarias para trabajar con grandes conjuntos de datos utilizando la librería pyspark.
itversity / Spark Sql And Pyspark Using Python3Repository related to Spark SQL and Pyspark using Python3
jonesberg / DataAnalysisWithPythonAndPySpark DataData for the `Data Analysis with Python and PySpark` book
indiacloudtv / StructuredstreamingkafkapysparkApche Spark Structured Streaming with Kafka using Python(PySpark)
CoorpAcademy / Docker PysparkDocker image of Apache Spark with its Python interface, pyspark.