Intro to Machine Learning | Week 3 | Synkoc AI/ML Internship

Synkoc AI/ML Internship · Week 3 · Lesson 7 of 13

Intro to
Machine Learning

What ML actually is. How models learn. The complete sklearn workflow. Overfitting vs underfitting, train-test split, and the bias-variance tradeoff — the most important concepts in all of ML.

🧠 What is ML

🎏 How Models Learn

⚖️ sklearn Workflow

📈 Bias-Variance

🧑‍💻

Synkoc Instructor

AI/ML Professional · Bangalore

⏳ ~55 minutes
🟣 Week 3 Begin

What is Machine Learning?

Traditional programming: you write the rules. Machine learning: the algorithm finds the rules from data. Same goal, opposite approach. ML is not magic — it is optimisation.

💻

Traditional Programming

You write explicit rules. Data + Rules = Output. Works when you can specify every rule. Fails when rules are too complex to write.

if salary > 50000 and age < 30:
approved = True
else:
approved = False

Brittle: every new case needs a new rule

🧠

Machine Learning

You provide labelled examples. The algorithm finds the rules automatically. Data + Labels = Model that knows the rules.

model = RandomForestClassifier()
model.fit(X_train, y_train)
# Model found the rules itself
model.predict(X_new)

Adapts: learns from new data automatically

🎓

How it learns

The algorithm makes predictions, compares to true labels, measures the error, and adjusts parameters to reduce error. Repeat thousands of times.

prediction = model.predict(X)
error = y_true - prediction
# Adjust model parameters
# Repeat until error is minimal

This is gradient descent at a high level

📊

Why now?

Three things converged: massive datasets (Internet), powerful hardware (GPUs), and better algorithms. All three at once made modern ML possible.

2024: GPT-4 trained on
~13 trillion tokens of text
~25,000 A100 GPUs
~$100 million compute budget

Scale makes the difference

Chapter 1 of 4

Types of ML

Three families of ML. Supervised, Unsupervised, and Reinforcement Learning. Each solves a different type of problem. You will implement all three this week.

Three Types of Machine Learning

Every ML algorithm belongs to one of three families. The type determines what data you need and what kind of problem you can solve.

📋

Supervised Learning

Training data has labels. The model learns the mapping from features to labels. Most common type in industry.

X = [[hours, age], ...]
y = [1, 0, 1, 1, 0, ...] # labels
model.fit(X, y) # learns mapping

⚡ Week 3 Lesson 2: Regression + Classification

📊

Unsupervised Learning

Training data has NO labels. The model finds hidden structure — clusters, patterns, compressions — on its own.

X = [[hours, age], ...]
# No y! No labels provided
kmeans.fit(X) # finds groups
kmeans.labels_ # discovered clusters

⚡ Week 3 Lesson 3: KMeans Clustering

🎮

Reinforcement Learning

Agent takes actions in an environment, receives rewards or penalties, and learns a policy to maximise long-term reward.

# No dataset needed
# Agent plays the game
# Reward: +1 win, -1 lose
# Learns optimal strategy

⚡ Used in: robotics, game AI, AlphaGo

🔍

Semi-supervised

Small labelled dataset + large unlabelled dataset. Learn from both. Very practical for real projects where labelling is expensive.

# Few expensive labels
# Many cheap unlabelled examples
# Use both for training
# Common in NLP and vision

⚡ Used when labelling is expensive

Chapter 2 of 4

The sklearn Workflow

Five steps. Every ML model in scikit-learn follows the exact same pattern. Learn it once, apply it to any algorithm. This is the most valuable pattern in all of applied ML.

The 5-Step sklearn Workflow

Every sklearn model — LinearRegression, RandomForest, KMeans, SVM — follows the identical 5-step pattern. Master this pattern and you know how to use any sklearn algorithm.

📚

1. Import

from sklearn.X import Model

→

📈

2. Split

train_test_split(X, y, 0.2)

→

🎓

3. Train

model.fit(X_train, y_train)

→

🔮

4. Predict

model.predict(X_test)

→

📉

5. Evaluate

accuracy_score(y_test, y_pred)

🤖

The Universal sklearn Pattern

This exact 5-step pattern works for LinearRegression, LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, KNeighborsClassifier, SVC, and every other sklearn estimator. Change the import on Step 1 and everything else stays identical. This is why sklearn is the most widely used ML library in the world.

sklearn_workflow.py● LIVE

Chapter 3 of 4

Train-Test Split

Why we split data. The critical difference between training performance and real-world performance. Get this wrong and your ML model is worthless.

Train-Test Split

Never evaluate your model on the data it trained on. That is like letting students grade their own exam papers. The model must prove it generalises to data it has never seen.

⚖️

80% train, 20% test — always separate

The training set is what the model learns from. The test set is held back completely and only used for final evaluation. A model that scores 99% on training data and 55% on test data has memorised the training data but learned nothing general. This is called overfitting.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,    # 20% for testing
    random_state=42   # reproducibility
)

print(X_train.shape) # 80% of data
print(X_test.shape)   # 20% of data

⚡Rule: Train on X_train, y_train. Evaluate ONLY on X_test, y_test. Never look at test data before training is complete. Your test score is your real-world performance estimate.

Chapter 4 of 4

Bias-Variance Tradeoff

The single most important concept in all of machine learning. Understanding this tells you why your model fails and exactly how to fix it.

Overfitting vs Underfitting

Every ML failure is one of two things: the model is too simple (underfitting) or too complex (memorised the training data). The goal is the middle ground — just right.

📉

Underfitting (High Bias)

Model is too simple. Cannot capture the true pattern in data. Train accuracy AND test accuracy are both low. The model has not learned enough.

Train accuracy: 62%
Test accuracy: 60%
# Both low = underfitting
# Fix: more complex model
# Fix: more features

⚠ Increase model complexity

😀

Just Right (Sweet Spot)

Model generalises well. High training accuracy AND high test accuracy. Close gap between train and test performance. This is the goal.

Train accuracy: 92%
Test accuracy: 89%
# Small gap = good generalisation
# Keep this model!

✓ Ship this model

📈

Overfitting (High Variance)

Model memorised training data. High train accuracy but low test accuracy. Large gap between the two. Model fails on new data.

Train accuracy: 99%
Test accuracy: 58%
# Huge gap = overfitting
# Fix: regularisation
# Fix: more training data

⚠ Reduce complexity or add data

🔧

How to fix each

Underfitting: use more complex model, add polynomial features, train longer. Overfitting: regularisation (L1/L2), dropout, less complex model, more data, cross-validation.

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(scores.mean())

⚡ cross_val_score gives reliable estimate

Lesson Summary

You have completed the ML foundations. Here is what you now understand deeply:

🧠

What ML Is

Data + Labels = Model that finds rules automatically. Three types: supervised (labels), unsupervised (no labels), reinforcement (rewards).

⚖️

sklearn Workflow

Import → Split → fit() → predict() → score(). Same 5 steps for every algorithm in sklearn. Change the import, keep everything else.

⚖️

Train-Test Split

Always split. 80% train, 20% test. random_state=42 for reproducibility. Never evaluate on training data. Test score = real-world estimate.

📈

Bias-Variance Tradeoff

Underfitting: both scores low, model too simple. Overfitting: huge train-test gap, model memorised data. Goal: small gap, both scores high.

🧠

ML Foundations Complete!

You understand what ML is and the universal sklearn workflow. Complete the Intro Lab. Next: Supervised Learning — training real regression and classification models.

✓ Video — Done

✏ Practical Lab — Next

❓ Quiz — After Lab

Synkoc IT Services · Bangalore · support@synkoc.com · +91-9019532023