NumPy Arrays | Week 2 | Synkoc AI/ML Internship

Synkoc AI/ML Internship · Week 2 · Lesson 3 of 13

NumPy
Arrays

The numerical computing foundation of all data science. Master arrays, vectorised operations, indexing, and multi-dimensional maths — the engine inside every ML model.

📊 Arrays

⚡ Vectorised Ops

🔍 Indexing

📐 ML Maths

🧑‍💻

Synkoc Instructor

AI/ML Professional · Bangalore

⏳ ~50 minutes
🟢 Beginner Friendly

What is NumPy?

NumPy is the numerical computing library that powers all of data science. Every ML framework — TensorFlow, PyTorch, sklearn — uses NumPy arrays internally.

🔆

N-dimensional Arrays

Store data in 1D, 2D, or higher. A dataset is 2D: rows = samples, cols = features.

shape (100, 5)
= 100 samples
= 5 features

⚡

Vectorised Speed

Operations run on entire arrays at once in optimised C. 100x faster than Python loops.

arr * 2 # fast
arr + arr # fast
no for loop

📋

Broadcasting

Automatically expands smaller arrays to match. Normalise 1000 rows with one line.

X - X.mean()
# subtracts mean
# from every row

📈

ML Foundation

sklearn, Pandas, Matplotlib all accept and return NumPy arrays. Universal data format.

X_train.shape
model.coef_
np.array(data)

Chapter 1 of 5

01

Creating Arrays

Create 1D and 2D arrays, understand shape and dtype, and use essential array-creation functions.

Creating NumPy Arrays

Four ways to create arrays. shape gives dimensions, dtype gives the number type.

🔆

shape & dtype

Every NumPy array has a shape tuple showing dimensions and a dtype showing the number type. A 150-sample, 4-feature dataset has shape (150, 4).

arr = np.array([70, 80, 90, 60, 85])
print(arr.shape)  # (5,)
print(arr.dtype)  # int64

mat = np.zeros((3, 4))        # shape (3,4)
rng = np.arange(1, 11)       # [1..10]
lin = np.linspace(0, 1, 5)   # [0, .25, .5, .75, 1]

💡ML use: X_train.shape tells you how many samples and features. dtype must be float64 for most sklearn algorithms.

array_creation.py● LIVE

Chapter 2 of 5

02

Vectorised Operations

Apply maths to entire arrays at once with no loops. 100x faster than Python for loops.

Vectorised Operations

Instead of looping, NumPy applies operations to the whole array at once using optimised C code.

⚡

No loops needed

Any arithmetic operator runs element-wise automatically. Aggregate functions operate along specific axes.

scores = np.array([72, 85, 90, 60, 78])
print(scores + 5)          # add 5 to every element
print(scores > 75)         # boolean array
print(np.mean(scores))     # 77.0
print(np.std(scores))      # standard deviation
z = (scores - np.mean(scores)) / np.std(scores)
print(z.round(2))          # z-score

💡ML use: (X - mean) / std is exactly what StandardScaler does. One line normalises the entire feature matrix.

vectorised_ops.py● LIVE

Chapter 3 of 5

03

Indexing & Slicing

Select any element, row, column, or filtered subset using boolean masks.

Indexing & Slicing

2D arrays use [row, col] syntax. Colon means all. Boolean indexing filters rows by condition.

🎯

2D Indexing

arr[0] = row 0. arr[0,2] = row 0 col 2. arr[:,1] = all rows col 1. arr[:2] = first 2 rows.

X[0]     # first row
X[0, 2]  # row 0 col 2
X[:, 1]  # all of col 1
X[:2]    # first 2 rows

🔎

Boolean Indexing

Create a boolean mask from a condition. Use it to filter rows. This is how pandas .loc works.

mask = scores > 80
scores[mask]
# only values > 80
X[y == 0] # class 0 rows

📋

reshape & stack

reshape() changes dimensions. vstack adds rows. hstack adds columns. flatten() makes 1D.

arr.reshape(3, 4)
np.vstack([a, b])
np.hstack([a, b])
arr.flatten()

⚡

axis parameter

axis=0 operates down columns (per feature). axis=1 operates across rows (per sample).

X.mean(axis=0)
# one mean per col
X.sum(axis=1)
# one sum per row

Chapter 4 of 5

04

NumPy for ML

Complete NumPy ML pipeline: normalise, split, and compute per-class statistics.

Complete NumPy ML Preprocessing Pipeline

numpy_ml_pipeline.pyComplete

1import numpy as np

2np.random.seed(42)

3X = np.random.randn(100, 4) # 100 samples, 4 features

4y = (X[:, 0] > 0).astype(int) # binary labels

5

6# Normalise: z-score per feature column

7X_norm = (X - X.mean(axis=0)) / X.std(axis=0)

8

9# Manual 80/20 train-test split via slicing

10X_train, X_test = X_norm[:80], X_norm[80:]

11y_train, y_test = y[:80], y[80:]

12

13# Per-class statistics via boolean indexing

14print("Class 0 mean:", X_norm[y==0].mean(axis=0).round(3))

15print("Class 1 mean:", X_norm[y==1].mean(axis=0).round(3))

16print("Train shape:", X_train.shape)

Line 7: (X - mean) / std normalises all 100 rows and 4 columns simultaneously using broadcasting. Lines 10-11: slicing splits data into train and test in one step. Lines 14-15: X_norm[y==0] uses boolean indexing to filter class 0 samples and compute mean feature values per class. This entire pipeline is what sklearn does internally.

Lesson Summary

You now know the NumPy operations that underpin every ML project:

🔆

Array Creation

np.array(), np.zeros(), np.arange(), np.linspace(), np.random.randn(). Know shape and dtype for every array.

⚡

Vectorised Ops

Apply maths to entire arrays. Use np.mean(), np.std() with axis parameter. Write z-score in one line.

🔍

Indexing

Select with [row, col], slices [:, 1], and boolean masks. axis=0 for column-wise, axis=1 for row-wise.

📈

ML Pipeline

Normalise with (X-mean)/std, split with slicing, filter classes with boolean indexing. These are the raw operations sklearn wraps.

📈

NumPy Complete!

You now understand the numerical foundation of all data science. Open the NumPy Practical Lab to apply arrays, vectorised ops, and indexing across 5 tasks.

✓ Video — Done

✎ Practical Lab — Next

❓ Quiz — After Lab

Synkoc IT Services · Bangalore · support@synkoc.com · +91-9019532023