Supervised Learning | Week 3 | Synkoc AI/ML Internship

Synkoc AI/ML Internship · Week 3 · Lesson 8 of 13

Supervised
Learning

Train your first real ML models. Linear Regression predicts numbers. Logistic Regression classifies categories. Both are the foundation of production ML systems worldwide.

📈 Linear Regression

✅ Logistic Regression

📋 Decision Trees

🌳 Random Forest

🧑‍💻

Synkoc Instructor

AI/ML Professional · Bangalore

⏳ ~60 minutes
🔴 Core ML Models

Four Supervised Learning Models

Four algorithms that power the majority of production ML systems. Each solves different problems with different strengths. You will train all four today.

📈

Linear Regression

Fit a straight line through data points. Predict any continuous number. The simplest and most interpretable regression model.

predict house price, score, revenue

✅

Logistic Regression

Fits an S-curve (sigmoid). Outputs probability between 0 and 1. The go-to baseline for classification.

classify spam, pass/fail, disease

📋

Decision Tree

Learns if-then rules from data. Splits features at optimal thresholds. Highly interpretable — can be printed as a flowchart.

credit risk, medical diagnosis

🌳

Random Forest

100+ decision trees, each trained on random subsets. Averages their predictions. Robust, accurate, hard to overfit.

Kaggle baseline, feature importance

Chapter 1 of 4

Linear Regression

Find the best-fit line through data points. Minimise Mean Squared Error. Interpret coefficients. The foundation of all regression analysis.

Linear Regression Explained

Linear regression fits the equation y = mx + b to your data. It finds the slope m and intercept b that minimises the average squared error between predictions and actual values.

📈

y = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ

Each feature gets a weight (coefficient). The model learns these weights during training to minimise MSE. After training, model.coef_ holds the learned weights. model.intercept_ holds the bias term w₀. A positive weight means the feature is positively correlated with the target. A negative weight means it predicts the target decreasing.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

print(model.coef_) # learned weights
print(model.intercept_) # bias term
y_pred = model.predict(X_test)

⚡Interpret coefficients: If study_hours has weight 8.5, each extra study hour predicts 8.5 more points on the exam. This interpretability is why Linear Regression is used in economics, healthcare, and social science research.

linear_regression.py● LIVE

Chapter 2 of 4

Logistic Regression

The go-to classification baseline. Applies the sigmoid function to output a probability. Threshold it to get a binary prediction. Used in spam filters, credit scoring, and medical diagnosis worldwide.

Logistic Regression Explained

Logistic Regression extends linear regression with a sigmoid function that squashes any value into [0, 1]. The output is a probability. You threshold it to make a binary decision.

✅

P(y=1) = sigmoid(w·x + b)

Sigmoid maps any real number to a probability between 0 and 1. Values above 0.5 are classified as class 1. Values below 0.5 are class 0. The threshold of 0.5 can be adjusted. Lowering the threshold increases recall (catches more positives). Raising it increases precision (fewer false positives). This trade-off is critical in medical and fraud detection applications.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

probs = model.predict_proba(X_test) # probabilities
labels = model.predict(X_test) # 0 or 1
acc = model.score(X_test, y_test)

⚡predict_proba() returns the raw probability for each class. If column 1 is 0.87, the model is 87% confident the sample belongs to class 1. Use this for ranking, risk scoring, and adjustable thresholds.

Chapter 3 of 4

Decision Trees & Random Forest

Non-linear models that capture complex patterns. Decision Trees learn if-then rules. Random Forest combines 100 trees for robust, accurate predictions. The Kaggle competition workhorse.

Decision Tree & Random Forest

Decision Trees make predictions by learning a flowchart of if-then decisions. Random Forest builds hundreds of trees and votes, dramatically reducing overfitting.

📋

Decision Tree

Splits data at feature thresholds that maximally separate classes. Highly interpretable. Prone to overfitting without max_depth constraint.

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
max_depth=4, # control complexity
random_state=42
)
model.fit(X_train, y_train)

⚡ max_depth prevents overfitting

🌳

Random Forest

100+ decision trees, each trained on a random sample of data and features. Final prediction is the majority vote. Very hard to overfit.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
n_estimators=100, # number of trees
random_state=42
)
model.fit(X_train, y_train)

⚡ Standard Kaggle baseline model

📉

Feature Importance

Random Forest ranks features by their contribution to predictions. The most valuable output for understanding which features matter most.

importances = model.feature_importances_
# [0.42, 0.31, 0.18, 0.09]
# Feature 0 contributes 42%
# Feature 3 contributes only 9%
# Drop low-importance features

⚡ Use to remove irrelevant features

📈

Model Comparison

Always train multiple models and compare. The best model depends on your data, interpretability needs, and performance requirements.

models = {
  'LogReg': LogisticRegression(),
  'Tree': DecisionTreeClassifier(),
  'Forest': RandomForestClassifier()
}
for name, m in models.items():
  m.fit(X_tr, y_tr)
  print(name, m.score(X_te, y_te))

⚡ Always compare at least 3 models

Chapter 4 of 4

Training All Four Models

The complete supervised learning workflow. Train and compare four models on the same dataset. The sklearn API is identical for all four — only the class name changes.

supervised_all_models.py● LIVE

Lesson Summary

You have trained real ML models. Here is what you can now do:

📈

Linear Regression

Predict continuous numbers. model.coef_ for feature weights. Interpret coefficients. Use MSE and R² to evaluate. Great for price, score, revenue prediction.

✅

Logistic Regression

Classify binary outcomes. predict_proba() for probabilities. Adjust threshold for precision/recall tradeoff. Baseline for any classification problem.

📋

Decision Tree

Interpretable if-then rules. Control complexity with max_depth. Visualise with plot_tree(). Prone to overfitting without depth limit.

🌳

Random Forest

Ensemble of 100+ trees. Hard to overfit. feature_importances_ reveals which features matter. Standard Kaggle competition baseline.

🌳

Supervised Learning Complete!

Four real ML models trained. Complete the Lab with 5 hands-on tasks. Then move to Unsupervised Learning: KMeans Clustering — grouping data without labels.

✓ Video — Done

✏ Practical Lab — Next

▶ Unsupervised Learning — After Quiz

Synkoc IT Services · Bangalore · support@synkoc.com · +91-9019532023