DataScienceResources
Open Source Data Science Resources.
Install / Use
/learn @jonathan-bower/DataScienceResourcesREADME
Data Science Resources
Hello and welcome to the Data Science Resources repo. I originally built this repo so that I could have a location to host resources that are helpful to me. Through building the repo I realized that other people might be also be interested. I have tried to curate content on data science topics, high quality resources to learn from, and relevant blog posts.
The intended goal was to cover more than just the technical component of data science. I have tried to find topics that cover building data science teams, business practices, use-cases, product metrics and data science career paths. Hope this is helpful
Table Of Contents
1. Data Science Getting Started
2. Data Pipeline & Tools
- Python
- Data Structures & CS topics
- Statistics
- Stats/Engineering Libraries
- Databases/Frameworks
- Data Acquisition
- Processing & EDA
- Machine Learning
- Additional Tools or Processes
- Data Visualization
- ipython Notebook Tutorials
- Data Sources
- New Data Tools
3. Product
4. Career Resources
- Data Science Career Path
- Types of Data Scientists
- Data Science Applications/Use Cases
- Data Science Websites/Books
- Data Science Meetups in the Bay Area
- Data Science Blogs
- Data Science Conferences
- Data Science Presentations
- Relevant Business Processes
5. Open Source Data Science Resources
Data Science Getting Started
Data Science is a multidisciplinary field covering at the very minimum - statistics, programming, machine learning Drew Conway's venn diagram or Cheat Sheet of a Modern Data Scientist. These topics are covered throughout this repo. I personally find the best way to learn a topic is to get my hands dirty quickly - with that in mind I would get to work in python and then implement different tools or theory into my toolkit as they are understood. If you haven't used python before I would strongly urge you to use the codecademy course to familiarize yourself with the content and how to program. Good luck and have fun.
A note about order - I framed the contents in the Pipeline & Tools section order of the data pipeline starting with acquisition, exploratory data analysis, cleaning data, model section & evaluation and then visualization.
Start
- Data Science Pipeline - Detailed overview of data pipeline from MachineLearningMastery.com
- Intro to ipython - A curation of Ipython Notebooks great for introductory level to python, programming, comp sci, data science and other topics.
- How do I Become a Data Scientist? - Some more great starting points from William Chen.
Data Science Courses:
- Coursera - Data Science Specialization at Coursera - many other courses available as well.
- Udacity - Online MOOCs that are the Data Science related courses. by I
- Data Science Bootcamps - A collection of all bootcamps currently on the market as of April 5, 2014 by Ikechukwu Okonkwo.
- Coursera Machine Learning Course - Andrew Ng's pinnacle Machine Learning course.
- Edx - EDX courses related to data science.
Data Pipeline & Tools
Python
Python is my workhorse language specifically as it has many data science and statistic library, the ability to work in production environments, and work on other problems outside of data science. There are many other languages that could be useful but are not covered here: Julia, R, Cython, Pig, Scala, Java, etc.
- Python @ Codecademy - If you have never used Python, right this way..
- The Python Wiki - Good resource with lots of info about Python.
- Python for Data Science Tutorial - Kaggle - Stepping into Data Science with Kaggle and installing some libraries.
- Introduction to Data Processing with Python - Just as the name says - some introductory level information and exercises.
- Git tutorial - Git for Version Control. Simple tutorial for Git from Github.
- Git Tips - 19 git tips for everyday use.
- Anyone Can Code - Languages, tutorials, cheat sheets, algorithms and data structures
Data Structures & CS Topics
- Algorithms & Data Structures - Binary trees, hash tables, linked lists, big(O) notation and more.
- Algorithm & Data Structures - Well organized detailed and digestible site full of content covering data structures, algorithms, recursion and assignments!
- Big O Notation - Great details and visual of big-O notation.
- Visualizations of Data Structures - Collection of different algorithms (graph problems) and data structures (queues, heaps, hashes) that walks through the visualization to get a better intuitive understanding.
- Data Structures CheatSheet & Big Oh Notation
- Data Structures CheatSheet -smaller more readable
- Coursera: Stanford Algorithms Design & Analysis - Course on algorithm design & analysis
Statistics
Some primers on understanding statistics and other resources to get a deeper understanding.
- Statistics Without the Agonizing Pain - John Rauser's really great video on statistics - funny and engaging with a good message.
- Probability Programming and Bayesian Methods for Hackers - full book all online through ipython notebooks.
- Probabilistic Programming and Bayesian Methods for Hackers - Github Repo for the book above.
- Statistics Cheat Sheet in Ipython Notebook
- The only probability Cheatsheet you'll ever need - Self explanatory - (thanks William Chen @ http://datastories.quora.com/) for pointing me this great cheat sheet out - wish I had that back at college.
- Khan Academy: Statistics - Tons of videos to help learn statistics concepts.
- Statistical Distributions in iPython Notebook - Discrete, Bernoulli, Poisson, Binomial, Alpha, Beta etc. The descriptions are mathematical - will find another resource to explain.
Stats/Engineering Libraries
A collection of workhorse libraries that are elemental for any python data scientist.
- Pandas Wes McKinney's pandas library for EDA on small to medium sized data sets when you don't want to put the infrastructure for SQL or when it isn't necessary. It has many other great applications other than just better than SQL on small to medium data sets.
- Numpy/Pandas/Scipy Cheatsheet - self explanatory
- SciPy - Open-source software for mathematics, science and engineering.
- NumPy -
Security Score
Audited on Apr 10, 2026
