100DaysOfMLCode
No description available
Install / Use
/learn @tifabi/100DaysOfMLCodeREADME
100DaysOfMLCode
The creation of this repository was inspired by Siraj Raval's challenge to code machine learning for at least an hour everyday for 100 days.
I nervously accepted this challenge in addition to working full time and taking 6 hours of graduate courseowrk in the 2018 summer semester. I will use this repository to store code, jupyter notebook examples, and thought processes.
Topics Explored:
Day 1 - July 7 | Principal Component Analysis (PCA) and explained variance ratio
Day 2 - July 8 | SparsePCA -> CODE
Day 3 - July 9 | Bag of Words
Day 4 - July 10 | Tokenization & Vectorization time trials -> CODE
Day 5 - July 11 | Stemming and Lemmatizing with CountVectorizer, TfidfVectorizer, and HashingVectorizer -> CODE
Day 6 - July 12 | Development of visualization pipeline for ML -> CODE
Day 7 - July 13 | Big Data Visualization with Datashader
Day 8 - July 14 | t-SNE and Datashader Failure -> CODE
Day 9 - July 15 | Gene Expression - Getting Started -> FOLDER
Day 10 - July 16 | Gene Expression - Reading in Data
Day 11 - July 17 | Gene Expression - Preprocessing & Boxplot
Day 12 - July 18 | Intro to Data Splitting -> CODE
Day 13 - July 19 | Text Relationships with spaCy -> CODE
Day 14 - July 20 | Gene Expression - Cytoscape and Orange3
Day 15 - July 21 | Trial-and-error Data Splitting Research
Day 16 - July 22 | Trial-and-error Data Splitting Implimentation -> CODE
Day 17 - July 23 | NMF -> CODE
Day 18 - July 24 | RFE -> CODE
Day 19 - July 25 | Exploring Variable Replacement
Day 20 - July 26 | Pipelines - Introduction
Day 21 - July 27 | A list of 10,000 dictionaries -> CODE
Day 22 - July 28 | Linear Regression - Simple in R -> Folder
Day 23 - July 29 | Data Visualization, Dimensionality Reduction, Feature Selection, and a hand full of models. -> CODE
Day 24 - July 30 | Linear Regression - Continue to draft description -> Folder
Day 25 - July 31 | Linear Regression - Simple in Python -> CODE
Day 26 - Aug 1 | Pipeline - Start of Pipeline Example -> CODE
Day 27 - Aug 2 | Pipeline - Ridge Regression for Pipeline Example -> CODE
Day 28 - Aug 3 | Pipeline - Flexibility for selecting columns with missing values -> CODE
Day 29 - Aug 4 | Pipeline - Pipeline to compare methods of handling missing values -> CODE
Day 30 - Aug 5 | Pipeline - Identify categorical columns and convert to dummy -> CODE
Day 31 - Aug 6 | Pipeline - Custom Imputer using sklearn linear_model -> CODE
Day 32 - Aug 7 | kNN - add to Pipeline & normalizing -> CODE
Day 33 - Aug 8 | Pipeline - Researching topics to come
Day 34 - Aug 9 | What's great about bias?
Day 35 - Aug 10 | Bias-Variance decomposition - rounding error & elimination
Day 36 - Aug 11 | Bias-Variance decomposition from scratch in Python
Day 37 - Aug 12 | Continued work on Bias-Variance decomposition
Day 38 - Aug 13 | Bias-Variance decomposition working example
Day 39 - Aug 14 | Scatterplots for Collinearity
Day 40 - Aug 15 | ML Work for Client - not shared publicly
Day 41 - Aug 16 | Correlation Matrix for Collinearity
Day 42 - Aug 17 | Ontology from web scraping
Day 43 - Aug 18 | Eigen Values for MultiCollinearity
Day 44 - Aug 19 | Eigen Values & Vectors for MultiCollinearity
Day 45 - Aug 20 | Word frequencies from PDFs
Day 46 - Aug 21 | NLP with Regression - Expoloring the literature
Day 47 - Aug 22 | Text mining for Google Chips
Day 48 - Aug 23 | Methods of Web scraping
Day 49 - Aug 24 | Selenium for web scraping
Day 50 - Aug 25 | Reformatting results of web scraping
Day 51 - Aug 26 | NLP methods from web scraped results
Day 52 - Aug 27 | Applied Algorithms - different methods of sorting
Day 53 - Aug 28 | Methods of NLP for Social Media Data
List of Topics to Explore:
- PCA on Genetic Data - Gene Expression
- Create Jupyter Notebook foundation
- Find Good Data
- Explain how to differentiate good data from bad data
- GPU
- Efficient Use of Data Structures
- Write computationally expensive parts in C++
- Make good use of memory & caching
- Multireading / multiprocessing in Python, Celery for parallel processing
- Kernal PCA
- Differences (pro/cons) betw
Related Skills
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
last30days-skill
13.4kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
