CrackingMachineLearningInterview

A repository to prepare you for your machine learning interview, involving most of the questions asked by all the tech giants and local companies. Do this to Ace your Machine Learning Engineer Interviews

Generate Convert Improve

Install / Use

/learn @shafaypro/CrackingMachineLearningInterview

About this skill

Quality Score

0/100

README

CrackingMachineLearningInterview

A practical interview preparation repository for Machine Learning Engineer, AI Engineer, Data Scientist, Deep Learning Engineer, Data Engineer, and DevOps or platform-focused roles.

This README now serves three purposes:

It keeps the original core ML interview questions.
It adds a more organized 2026 interview-prep layer focused on modern ML engineering topics such as LLMs, RAG, evaluation, agents, safety, and production systems.
It acts as the main entry point for related tracks including AI/GenAI, data engineering, and DevOps.

Who this repository is for

Machine Learning Engineer
Data Scientist
Deep Learning Engineer
AI Engineer
Software Engineer working on AI/ML products
Data Engineer
MLOps Engineer
DevOps / Platform Engineer

How to use this repository

Start with the 2026 Interview Roadmap if you are preparing for current AI/ML interviews.
Use 2026 Additional Questions and Answers for modern interview rounds.
Use the AI / GenAI, Data Engineering, and DevOps sections for specialized interview tracks.
Use the Classic Question Bank for core ML, statistics, deep learning, and algorithms.
Use Preparation Resources and References to build a targeted study plan.

Quick Navigation

About

Github Profile: Shafaypro ©
Repository: CrackingMachineLearningInterview

Image References

Image references are included for educational purposes. Please see the repository references for attribution where applicable.

Sharing

Feel free to share the repository link in your blog, study notes, or interview preparation material.

Repository Structure

docs/2026-interview-roadmap.md: current interview focus areas for ML Engineer and AI Engineer roles.
docs/2026-additional-questions.md: modern 2026 question bank covering LLMs, RAG, evaluation, agents, and production AI.
docs/resources-and-references.md: books, references, and additional interview topics.
docs/study-pattern.md: recommended preparation topics and study structure.
ai_genai/: GenAI and LLM engineering topics.
data_engineering/: data engineering interview topics and platform concepts.
devops/: DevOps, infrastructure, and deployment topics.
README.md: repository landing page plus the original classic ML interview question bank.

AI / GenAI Track

Use this track for AI Engineer, GenAI Engineer, LLM Engineer, Applied AI, and agent-platform interviews.

Core topics:

Data Engineering Track

Use this track for pipeline, ETL, orchestration, warehouse, lakehouse, and streaming interviews.

Core topics:

DevOps Track

Use this track for infrastructure, CI/CD, containers, orchestration, and IaC interviews.

Core topics:

Classic Question Bank

Difference between SuperVised and Unsupervised Learning?

    Supervised learning is when you know the outcome and you are provided with the fully labeled outcome data while in unsupervised you are not 
    provided with labeled outcome data. Fully labeled means that each example in the training dataset is tagged with the answer the algorithm should 
    come up with on its own. So, a labeled dataset of flower images would tell the model which photos were of roses, daisies and daffodils. When shown 
    a new image, the model compares it to the training examples to predict the correct label.

What is Reinforcment Learning and how would you define it?

    A learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be 
    explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current 
    knowledge) .Semisupervised learning is also known as Reinforcment learning, in reinforcment learning each learning steps involved a penalty 
    criteria whether to give model positive points or negative points and based on that penalizing the model.

What is Deep Learning ?

    Deep learning is defined as algorithms inspired by the structure and function of the brain called artificial neural networks(ANN).Deep learning 
    most probably focuses on Non Linear Analysis and is recommend for Non Linear problems regarding Artificial Intelligence.

Difference between Machine Learning and Deep Learning?

    Since DL is a subset of ML and both being subset of AI.While basic machine learning models do become progressively better at whatever their 
    function is, they still need some guidance. If an AI algorithm returns an inaccurate prediction, then an engineer has to step in and make 
    adjustments. With a deep learning model, an algorithm can determine on its own if a prediction is accurate or not through its own neural network.

Difference between SemiSupervised and Reinforcment Learning?

Difference between Bias and Variance?

    Bias is definned as over simpliciation assumption assumed by the model, 
    Variance is definned as ability of a model to learn from Noise as well, making it highly variant.
    There is always a tradeoff between these both, hence its recommended to find a balance between these two and always use cross validation to 
    determine the best fit.

What is Linear Regressions ? How does it work?

    Fitting a Line in the respectable dataset when drawn to a plane, in a way that it actually defines the correlation between your dependent
    variables and your independent variable. Using a simple Line/Slope Formulae. Famously, representing f(X) = M(x) + b.
    Where b represents bias
    X represent the input variable (independent ones)
    f(X) represents Y which is dependent(outcome).

    The working of linear regression is Given a data set of n statistical units, a linear regression model assumes that the relationship between the 
    dependent variable y and the p-vector of regressors x is linear. This relationship is modeled through a disturbance term or error variable ε — an 
    unobserved random variable that adds "noise" to the linear relationship between the dependent variable and regressors. Thus the model takes the 
    form Y = B0 + B1X1 + B2X2 + ..... + BNXN
    This also emplies : Y(i) = X(i) ^ T + B(i)
    Where T : denotes Transpose
    X(i) : denotes input at the I'th record in form of vector
    B(i) : denotes vector B which is bias vector.

UseCases of Regressions:

    Poisson regression for count data.
    Logistic regression and probit regression for binary data.
    Multinomial logistic regression and multinomial probit regression for categorical data.
    Ordered logit and ordered probit regression for ordinal data.

What is Logistic Regression? How does it work?

    Logistic regression is a statistical technique used to predict probability of binary response based on one or more independent variables. 
    It means that, given a certain factors, logistic regression is used to predict an outcome which has two values such as 0 or 1, pass or fail,
    yes or no etc
    Logistic Regression is used when the dependent variable (target) is categorical.
    For example,
        To predict whether an email is spam (1) or (0)
        Whether the tumor is malignant (1) or not (0)
        Whether the transaction is fraud or not (1 or 0)
    The prediction is based on probabilties of specified classes 
    Works the same way as linear regression but uses logit function to scale down the values between 0 and 1 and get the probabilities.

What is Logit Function? or Sigmoid function/ where in ML and DL you can use it?

    The sigmoid might be useful if you want to transform a real valued variable into something that represents a probability. While the Logit fun

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>