These six commands run in sequence give you a complete picture of any dataset before you make any decisions.
📊
df.shape + df.columns
How many rows and columns? What are the feature names? First check to orient yourself.
print(df.shape) # (1000, 8)
print(df.columns) # Index(['age'...])
print(df.dtypes) # age int64...
⚡ dtypes: if numeric shows 'object' → astype(float)
📝
df.head() + df.info()
head() shows first 5 rows as a table. info() shows dtypes, null counts, memory usage.
df.head() # first 5 rows
df.tail() # last 5 rows
df.info() # dtypes + nulls
df.sample(5) # random 5 rows
⚡ info() Non-Null count < total = missing!
📉
df.describe()
Returns count, mean, std, min, 25%, 50%, 75%, max for every numeric column. All stats from Week 1 at once.
df.describe()
# age score
# mean 25.4 77.2
# std 4.1 12.8
# max 45.0 98.0
⚡ max much higher than 75% = outlier!
🔍
df.isnull().sum()
Count missing values per column. Reveals which columns need imputation and how much data is missing.
df.isnull().sum()
# age 0
# score 23 <-- fix this!
# city 12 <-- fix this!
# passed 0
⚡ Above 30% missing → consider dropping column