Chapters:
Synkoc AI/ML Internship · Week 2 · Lesson 3 of 13
NumPy
Arrays
The numerical computing foundation of all data science. Master arrays, vectorised operations, indexing, and multi-dimensional maths — the engine inside every ML model.
📊 Arrays
⚡ Vectorised Ops
🔍 Indexing
📐 ML Maths
🧑‍💻
Synkoc Instructor
AI/ML Professional · Bangalore
⏳ ~50 minutes
🟢 Beginner Friendly
What is NumPy?
NumPy is the numerical computing library that powers all of data science. Every ML framework — TensorFlow, PyTorch, sklearn — uses NumPy arrays internally.
🔆
N-dimensional Arrays
Store data in 1D, 2D, or higher. A dataset is 2D: rows = samples, cols = features.
shape (100, 5)
= 100 samples
= 5 features
Vectorised Speed
Operations run on entire arrays at once in optimised C. 100x faster than Python loops.
arr * 2    # fast
arr + arr  # fast
no for loop
📋
Broadcasting
Automatically expands smaller arrays to match. Normalise 1000 rows with one line.
X - X.mean()
# subtracts mean
# from every row
📈
ML Foundation
sklearn, Pandas, Matplotlib all accept and return NumPy arrays. Universal data format.
X_train.shape
model.coef_
np.array(data)
Chapter 1 of 5
01
Creating Arrays
Create 1D and 2D arrays, understand shape and dtype, and use essential array-creation functions.
Creating NumPy Arrays
Four ways to create arrays. shape gives dimensions, dtype gives the number type.
🔆
shape & dtype
Every NumPy array has a shape tuple showing dimensions and a dtype showing the number type. A 150-sample, 4-feature dataset has shape (150, 4).
arr = np.array([70, 80, 90, 60, 85])
print(arr.shape)  # (5,)
print(arr.dtype)  # int64

mat = np.zeros((3, 4))        # shape (3,4)
rng = np.arange(1, 11)       # [1..10]
lin = np.linspace(0, 1, 5)   # [0, .25, .5, .75, 1]
💡ML use: X_train.shape tells you how many samples and features. dtype must be float64 for most sklearn algorithms.
array_creation.py● LIVE
Chapter 2 of 5
02
Vectorised Operations
Apply maths to entire arrays at once with no loops. 100x faster than Python for loops.
Vectorised Operations
Instead of looping, NumPy applies operations to the whole array at once using optimised C code.
No loops needed
Any arithmetic operator runs element-wise automatically. Aggregate functions operate along specific axes.
scores = np.array([72, 85, 90, 60, 78])
print(scores + 5)          # add 5 to every element
print(scores > 75)         # boolean array
print(np.mean(scores))     # 77.0
print(np.std(scores))      # standard deviation
z = (scores - np.mean(scores)) / np.std(scores)
print(z.round(2))          # z-score
💡ML use: (X - mean) / std is exactly what StandardScaler does. One line normalises the entire feature matrix.
vectorised_ops.py● LIVE
Chapter 3 of 5
03
Indexing & Slicing
Select any element, row, column, or filtered subset using boolean masks.
Indexing & Slicing
2D arrays use [row, col] syntax. Colon means all. Boolean indexing filters rows by condition.
🎯
2D Indexing
arr[0] = row 0. arr[0,2] = row 0 col 2. arr[:,1] = all rows col 1. arr[:2] = first 2 rows.
X[0]     # first row
X[0, 2]  # row 0 col 2
X[:, 1]  # all of col 1
X[:2]    # first 2 rows
🔎
Boolean Indexing
Create a boolean mask from a condition. Use it to filter rows. This is how pandas .loc works.
mask = scores > 80
scores[mask]
# only values > 80
X[y == 0] # class 0 rows
📋
reshape & stack
reshape() changes dimensions. vstack adds rows. hstack adds columns. flatten() makes 1D.
arr.reshape(3, 4)
np.vstack([a, b])
np.hstack([a, b])
arr.flatten()
axis parameter
axis=0 operates down columns (per feature). axis=1 operates across rows (per sample).
X.mean(axis=0)
# one mean per col
X.sum(axis=1)
# one sum per row
Chapter 4 of 5
04
NumPy for ML
Complete NumPy ML pipeline: normalise, split, and compute per-class statistics.
Complete NumPy ML Preprocessing Pipeline
numpy_ml_pipeline.pyComplete
1import numpy as np
2np.random.seed(42)
3X = np.random.randn(100, 4) # 100 samples, 4 features
4y = (X[:, 0] > 0).astype(int) # binary labels
5 
6# Normalise: z-score per feature column
7X_norm = (X - X.mean(axis=0)) / X.std(axis=0)
8 
9# Manual 80/20 train-test split via slicing
10X_train, X_test = X_norm[:80], X_norm[80:]
11y_train, y_test = y[:80], y[80:]
12 
13# Per-class statistics via boolean indexing
14print("Class 0 mean:", X_norm[y==0].mean(axis=0).round(3))
15print("Class 1 mean:", X_norm[y==1].mean(axis=0).round(3))
16print("Train shape:", X_train.shape)
Line 7: (X - mean) / std normalises all 100 rows and 4 columns simultaneously using broadcasting. Lines 10-11: slicing splits data into train and test in one step. Lines 14-15: X_norm[y==0] uses boolean indexing to filter class 0 samples and compute mean feature values per class. This entire pipeline is what sklearn does internally.
Lesson Summary
You now know the NumPy operations that underpin every ML project:
🔆
Array Creation
np.array(), np.zeros(), np.arange(), np.linspace(), np.random.randn(). Know shape and dtype for every array.
Vectorised Ops
Apply maths to entire arrays. Use np.mean(), np.std() with axis parameter. Write z-score in one line.
🔍
Indexing
Select with [row, col], slices [:, 1], and boolean masks. axis=0 for column-wise, axis=1 for row-wise.
📈
ML Pipeline
Normalise with (X-mean)/std, split with slicing, filter classes with boolean indexing. These are the raw operations sklearn wraps.
📈
NumPy Complete!
You now understand the numerical foundation of all data science. Open the NumPy Practical Lab to apply arrays, vectorised ops, and indexing across 5 tasks.
✓ Video — Done
✎ Practical Lab — Next
❓ Quiz — After Lab
Synkoc IT Services · Bangalore · support@synkoc.com · +91-9019532023
Press ► Play to start the lesson with voice narration
0:00 / ~50:00
🔊
1 / 14