Go
The Open Source Data Science Masters
Install / Use
/learn @datasciencemasters/GoREADME
Note from the Editor: Take Two
<sup>In the old days of 2013, the OSDSM was born. Then, there were "little to no Data Scientists with 5 years experience, because the job simply did not exist." (David Hardtke, Nov 2012) Since then, history has witnessed many things, including:</sup>
<sup>• Data Scientists working across industries and the world</sup><br /> <sup>• social media manipulation disrupts many elections</sup><br /> <sup>• BLM and #metoo and Extinction Rebellion and many other social movements</sup><br /> <sup>• machine learning begins falling under engineering domain</sup><br /> <sup>• a pandemic</sup><br /> <sup>• climate change disasters becoming very frequent while climate warms faster than predicted</sup><br /> <sup>• remote work becoming common</sup> <sup>• multiple global recession shocks</sup>
<sup>In that decade, Data Science has seen growth of jobs, shortfall of goals, success in many industries, abject failure in others, and nefarious use cases. In particular, adverse consequences and complications of learning from data appear in too many examples: elections undermined by psychographics, dismal gender (Men=74%) and BIPOC diversity in the AI field, a revived eugenics, an explainability crisis, facial recognition used to identify people and systematically detain them, "aggression" detection microphones in schools, and many others. It has never been more clear that we need to talk about the real world impacts of our work, and consider how our creations are used. As you consider this, read a prescient novel that grapples with the consequences of birthing, of creation, of technology.</sup>
<sup>Like any tool, data-driven technologies are indifferent to the morality of their ends. Perhaps the greatest risk of all is leaving this tool in the hands of the few expensively-educated people who cannot possibly represent all of us. To balance this, open source movements seek to lower the barriers to education for everyone. Data science and data literacy must be widespread, accessible, and leveraged for building our collective future. More than ever, we need that future to be built by members of society who are diverse and focused on generative, sustainable, resilient, emergent solutions. After all, the things we build are mirrors of ourselves (seriously, read Shelley's Frankenstein).</sup>
<sup>Computers reflect the biases and belief systems of the people programming them -@alicegoldfuss</sup>
<sup>The OSDSM is built with the belief that open source education makes a diverse, collective, generative future-building possible. I hope that you are one of the next people -- whether you call yourself a Data Scientist or not -- to help make better decisions with the scientific process, critical thinking, and everything else your unique perspective brings to the table. This rewritten curriculum focuses on what is needed to be successful in the entry-level role, but that is just a generic outline; truly, I hope where you take it extends far beyond that.</sup>
Start here 👇
The Open Source Data Science Masters
The open-source curriculum for learning to be a Data Scientist. Curriculum resources from both universities and working Data Scientists focuses on foundational theory and applied skills. The OSDSM is collectively-maintained and open to PRs.
The goal of this curriculum is to prepare the student for an entry level Data Scientist role, using open source materials, at no cost but with the same calibur of materials found in the most reputable paid programs. Books not offered for free are often available through a public library, also indicated here with current list price. The Masters is self-guided and self-accredited. To better support credibility, the structure now includes a Capstone project intended to demonstrate the student's problem solving approach, skills in execution, and communication. Upon completion, the student can award oneself a Credential on LinkedIn from the Open Source Data Science Masters. As with all things, the OSDSM is best played as a team sport (try finding people on r/learndatascience).
This is called a "Masters" because it is primarily concerned with "upper-level" college course material in mathematics, programming, economics, or related disciplines. Come as you are!
- 📖 The Core - This is a critical foundation for what is to come; don't skip the foundational lessons.
- ❄️ Specialty - Choose what is most interesting to you, or most relevant to the work you plan to do.
- 🤝 Doing Data Science - Learn about how doing science with others and for businesses can work.
- 🧑💻 Capstone Project - Choose a meaningful project or dataset to demonstrate what you've learned.
📖 The Core
This is a critical foundation for what is to come; don't skip!
What is Data Science?
One could argue that "Data Science" is a recent term for an already existing information analysis discipline. Humans instinctually search for patterns, a purpose we also see in this more digitized discipline. Read different sources (and search beyond this list) about the uses of data science.
- The Signal and The Noise / Nate Silver Book
$18-- Narrated cases of Data Science at play in the real world. - Dataclysm: Who We Are (When We Think No One's Looking) / Christian Rudder Book
$17-- From the inside of OKCupid, real examples of how data science can illustrate human behavior. - Informatics of the Oppressed / Rodrigo Ochigame Logic Magazine -- Algorithms of oppression have been around for a long time. So have radical projects to dismantle them and build emancipatory alternatives.
- A showcase of Jupyter Python Data Analysis Notebooks across disciplines.
Foundations of Data Science
Problem Solving
When there are no answers in the back of the book, how do you proceed? Breaking down problems is a skill, one that can and should be learned. Follow Pólya's process, and for extra credit, seek out resources on computer science decomposition.
- Problem-Solving Heuristics "How To Solve It" George Pólya Berkeley / Summary Book
$18
The Scientific Process & Experimentation
It is crucial as a Data Scientist that you show integrity in and transparency of scientific process. Even if you've been here before, review and draw out the process diagram for the scientific method.
Querying Data
Get familiar and comfortable with manipulating data in a database with a common relational querying language. There are diverse query languages, but SQL is a widely used foundation.
- SQL School Mode Analytics / Tutorials
Math & Statistics
Calculus
- Single Variable Calculus MIT OpenCourseWare
- Multivariable Calculus MIT OpenCourseWare
Linear Algebra
The foundational mathematics for working with large samples of data. Spend time in exercises until you feel highly confident in the key topics of Linear Algebra. It will serve you well.
- An Intuitive Guide to Linear Algebra Better Explained / Article
- A Programmer's Intuition for Matrix Multiplication Better Explained / Article
- Vector Calculus: Understanding the Cross Product Better Explained / Article
- Vector Calculus: Understanding the Dot Product Better Explained / Article
- Linear Algebra Khan Academy / Videos
- Linear Algebra MIT
Statistics
How can we answer questions with data? Everywhere you look, you'll see methods from statistics. Spend a lot of time here!
- Stats in a Nutshell [Book
$46](https://bookshop.org/a/2958/97814493
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Apr 7, 2026
