Chapters:
Synkoc AI/ML Internship · Week 3 · Lesson 8 of 13
Supervised
Learning
Train your first real ML models. Linear Regression predicts numbers. Logistic Regression classifies categories. Both are the foundation of production ML systems worldwide.
📈 Linear Regression
✅ Logistic Regression
📋 Decision Trees
🌳 Random Forest
🧑‍💻
Synkoc Instructor
AI/ML Professional · Bangalore
⏳ ~60 minutes
🔴 Core ML Models
Four Supervised Learning Models
Four algorithms that power the majority of production ML systems. Each solves different problems with different strengths. You will train all four today.
📈
Linear Regression
Fit a straight line through data points. Predict any continuous number. The simplest and most interpretable regression model.
predict house price, score, revenue
Logistic Regression
Fits an S-curve (sigmoid). Outputs probability between 0 and 1. The go-to baseline for classification.
classify spam, pass/fail, disease
📋
Decision Tree
Learns if-then rules from data. Splits features at optimal thresholds. Highly interpretable — can be printed as a flowchart.
credit risk, medical diagnosis
🌳
Random Forest
100+ decision trees, each trained on random subsets. Averages their predictions. Robust, accurate, hard to overfit.
Kaggle baseline, feature importance
Chapter 1 of 4
01
Linear Regression
Find the best-fit line through data points. Minimise Mean Squared Error. Interpret coefficients. The foundation of all regression analysis.
Linear Regression Explained
Linear regression fits the equation y = mx + b to your data. It finds the slope m and intercept b that minimises the average squared error between predictions and actual values.
📈
y = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ
Each feature gets a weight (coefficient). The model learns these weights during training to minimise MSE. After training, model.coef_ holds the learned weights. model.intercept_ holds the bias term w₀. A positive weight means the feature is positively correlated with the target. A negative weight means it predicts the target decreasing.
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

print(model.coef_)      # learned weights
print(model.intercept_) # bias term
y_pred = model.predict(X_test)
Interpret coefficients: If study_hours has weight 8.5, each extra study hour predicts 8.5 more points on the exam. This interpretability is why Linear Regression is used in economics, healthcare, and social science research.
linear_regression.py● LIVE
Chapter 2 of 4
02
Logistic Regression
The go-to classification baseline. Applies the sigmoid function to output a probability. Threshold it to get a binary prediction. Used in spam filters, credit scoring, and medical diagnosis worldwide.
Logistic Regression Explained
Logistic Regression extends linear regression with a sigmoid function that squashes any value into [0, 1]. The output is a probability. You threshold it to make a binary decision.
P(y=1) = sigmoid(w·x + b)
Sigmoid maps any real number to a probability between 0 and 1. Values above 0.5 are classified as class 1. Values below 0.5 are class 0. The threshold of 0.5 can be adjusted. Lowering the threshold increases recall (catches more positives). Raising it increases precision (fewer false positives). This trade-off is critical in medical and fraud detection applications.
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

probs = model.predict_proba(X_test) # probabilities
labels = model.predict(X_test)        # 0 or 1
acc = model.score(X_test, y_test)
predict_proba() returns the raw probability for each class. If column 1 is 0.87, the model is 87% confident the sample belongs to class 1. Use this for ranking, risk scoring, and adjustable thresholds.
Chapter 3 of 4
03
Decision Trees & Random Forest
Non-linear models that capture complex patterns. Decision Trees learn if-then rules. Random Forest combines 100 trees for robust, accurate predictions. The Kaggle competition workhorse.
Decision Tree & Random Forest
Decision Trees make predictions by learning a flowchart of if-then decisions. Random Forest builds hundreds of trees and votes, dramatically reducing overfitting.
📋
Decision Tree
Splits data at feature thresholds that maximally separate classes. Highly interpretable. Prone to overfitting without max_depth constraint.
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=4,       # control complexity
    random_state=42
)
model.fit(X_train, y_train)
⚡ max_depth prevents overfitting
🌳
Random Forest
100+ decision trees, each trained on a random sample of data and features. Final prediction is the majority vote. Very hard to overfit.
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=100, # number of trees
    random_state=42
)
model.fit(X_train, y_train)
⚡ Standard Kaggle baseline model
📉
Feature Importance
Random Forest ranks features by their contribution to predictions. The most valuable output for understanding which features matter most.
importances = model.feature_importances_
# [0.42, 0.31, 0.18, 0.09]
# Feature 0 contributes 42%
# Feature 3 contributes only 9%
# Drop low-importance features
⚡ Use to remove irrelevant features
📈
Model Comparison
Always train multiple models and compare. The best model depends on your data, interpretability needs, and performance requirements.
models = {
  'LogReg': LogisticRegression(),
  'Tree': DecisionTreeClassifier(),
  'Forest': RandomForestClassifier()
}
for name, m in models.items():
  m.fit(X_tr, y_tr)
  print(name, m.score(X_te, y_te))
⚡ Always compare at least 3 models
Chapter 4 of 4
04
Training All Four Models
The complete supervised learning workflow. Train and compare four models on the same dataset. The sklearn API is identical for all four — only the class name changes.
supervised_all_models.py● LIVE
Lesson Summary
You have trained real ML models. Here is what you can now do:
📈
Linear Regression
Predict continuous numbers. model.coef_ for feature weights. Interpret coefficients. Use MSE and R² to evaluate. Great for price, score, revenue prediction.
Logistic Regression
Classify binary outcomes. predict_proba() for probabilities. Adjust threshold for precision/recall tradeoff. Baseline for any classification problem.
📋
Decision Tree
Interpretable if-then rules. Control complexity with max_depth. Visualise with plot_tree(). Prone to overfitting without depth limit.
🌳
Random Forest
Ensemble of 100+ trees. Hard to overfit. feature_importances_ reveals which features matter. Standard Kaggle competition baseline.
🌳
Supervised Learning Complete!
Four real ML models trained. Complete the Lab with 5 hands-on tasks. Then move to Unsupervised Learning: KMeans Clustering — grouping data without labels.
✓ Video — Done
✏ Practical Lab — Next
▶ Unsupervised Learning — After Quiz
Synkoc IT Services · Bangalore · support@synkoc.com · +91-9019532023
Press ▶ Play to start the lesson with voice narration
0:00 / ~60:00
🔊
1 / 13