Capstone Project | Week 4 | Synkoc AI/ML Internship

Synkoc AI/ML Internship · Week 4 · Lessons 11-13 of 13

Capstone
Project

Apply everything from 3 weeks in one real-world ML project. Choose 2 from 10 industry-grade projects. Build end-to-end. Present your results.

🌟 Choose 2 Projects

📈 End-to-End ML

📄 Present & Document

🏅 Earn Certificate

🧑‍💻

Synkoc Instructor

AI/ML Professional · Bangalore

⏳ ~60 minutes
🟢 Capstone Level

The Capstone Challenge

Week 4 is your proof of mastery. No more tutorials. You design, build, and present two complete ML solutions independently.

1

Pick 2 Projects

→

2

Load & Explore Data

→

3

Clean & Engineer

→

4

Train & Tune

→

5

Evaluate & Report

→

6

Present Results

Chapter 1 of 3

01

Projects 1 to 5

Five industry-grade ML projects. Classification, regression, NLP, and computer vision. Pick the ones that excite you most.

Projects 1 — 5

Project 01 · Classification

📩 Spam Email Classifier

Train a model to detect and filter spam emails based on message content using Natural Language Processing.

Extract features from raw email text using TF-IDF vectorisation
Train Naive Bayes and Logistic Regression classifiers
Evaluate with Precision, Recall and F1-Score (spam = positive class)
Explain why high Recall matters more than Precision for spam

NLPTF-IDFNaive BayesClassification

Project 02 · Regression

🏠 House Price Predictor

Predict residential property prices using location, size, amenities, and neighbourhood features.

Perform full EDA: distributions, correlations, outlier detection
Engineer features: price per sqft, age of property, distance to centre
Compare Linear Regression vs Random Forest Regressor
Report MSE, RMSE, R2 and interpret which features drive price

RegressionEDAFeature Eng.Random Forest

Project 03 · Classification

💊 Credit Card Fraud Detector

Identify fraudulent transactions in a heavily imbalanced dataset where only 0.17% of transactions are fraud.

Handle extreme class imbalance using SMOTE oversampling
Train Random Forest and XGBoost on the resampled data
Demonstrate why accuracy is misleading: use ROC-AUC and F1
Set decision threshold to minimise false negatives (missed fraud)

Imbalanced DataSMOTEROC-AUCXGBoost

Project 04 · Clustering

👥 Customer Segmentation

Group e-commerce customers by purchasing behaviour to enable targeted marketing campaigns.

Compute RFM features: Recency, Frequency, Monetary value per customer
Apply Elbow Method to choose optimal K, then run KMeans clustering
Use PCA to visualise clusters in 2D scatter plot
Name and describe each segment: Champions, At-Risk, New Customers

RFM AnalysisKMeansPCAMarketing

Project 05 · NLP

🥳 Sentiment Analyser

Classify product reviews as Positive, Neutral, or Negative to help businesses monitor customer satisfaction.

Clean and preprocess text: lowercase, remove stopwords, lemmatise
Vectorise with TF-IDF and train multi-class Logistic Regression
Build a prediction function that takes any review string as input
Report per-class F1 and identify which sentiment class is hardest to predict

NLPText CleaningTF-IDFMulti-class

Chapter 2 of 3

02

Projects 6 to 10

Five more projects covering medical AI, time-series, recommendation systems, image classification, and energy prediction.

Projects 6 — 10

Project 06 · Medical AI

💊 Diabetes Risk Predictor

Predict whether a patient is at risk of diabetes using clinical measurements such as glucose, BMI, and age.

Handle missing values coded as zeros (physiologically impossible readings)
Perform feature importance analysis to identify top clinical predictors
Train and compare Logistic Regression, Decision Tree, and Random Forest
Justify why Recall matters more than Precision in medical diagnosis

Healthcare AIClinical DataFeature Importance

Project 07 · Time Series

📈 Sales Forecasting

Predict future weekly sales for a retail chain using historical transaction data and seasonal patterns.

Engineer time features: day of week, month, is_holiday, rolling averages
Train Random Forest Regressor on the engineered time features
Evaluate with MAE and MAPE (Mean Absolute Percentage Error)
Visualise predicted vs actual sales on a time-series line chart

Time SeriesFeature Eng.RetailForecasting

Project 08 · Recommender

🎬 Movie Recommendation Engine

Build a content-based recommendation system that suggests similar movies based on genre, cast, and plot keywords.

Combine genre, cast, director and keyword text into one feature string
Compute TF-IDF vectors and cosine similarity between all movie pairs
Build a function: input any movie title, return top 10 similar movies
Extend with collaborative filtering using user rating matrix

RecommenderCosine SimilarityTF-IDFContent-Based

Project 09 · Computer Vision

📷 Handwritten Digit Classifier

Classify handwritten digits 0-9 from 28x28 pixel images using the MNIST dataset — the Hello World of deep learning.

Flatten images from (28,28) to (784,) feature vectors, normalise 0-1
Train Random Forest and compare to a simple Neural Network with sklearn
Display a confusion matrix heatmap to see which digits are confused
Visualise misclassified samples and explain why the model failed

Computer VisionMNISTImage DataNeural Net

Project 10 · Regression

⚡ Energy Consumption Predictor

Predict hourly electricity demand for a city using weather, time, and economic activity data.

Engineer cyclical time features: encode hour and month as sin/cos pairs
Handle weather outliers and missing sensor readings
Compare Gradient Boosting vs Linear Regression for energy forecasting
Report feature importances and explain which factors drive consumption peaks

Energy AICyclical FeaturesGradient BoostingRegression

Chapter 3 of 3

03

How to Build Your Capstone

The 6-step professional workflow every data scientist follows. Your capstone must demonstrate all 6 steps for both projects.

The 6-Step Build Process

Every project follows this structure. Each step builds on the previous. Skip none.

Step 01

📊 Data Loading & EDA

Load dataset. Print shape, dtypes, head(). Plot distributions. Check missing values. Compute correlation matrix.

df.info(), df.describe(), df.isnull().sum()
At least 3 visualisation charts
State 3 key observations from EDA

Step 02

🧹 Data Cleaning & Features

Handle missing values, encode categoricals, create new features, normalise numeric columns.

fillna with median/mean/mode
LabelEncoder or one-hot for categoricals
At least 1 engineered feature

Step 03

⚙ Model Training

Train at least 2 different algorithms. Use train_test_split with random_state=42. Apply cross-validation.

Minimum 2 algorithms compared
cross_val_score with cv=5
Print mean +/- std for each model

Step 04

📈 Evaluation

Use the right metrics for your problem type. Classification: F1, ROC-AUC. Regression: MSE, R2.

classification_report or MSE+R2
Confusion matrix for classification
ROC-AUC with predict_proba

Step 05

🔎 Insights & Analysis

Print feature importances. Explain which variables matter most. Discuss model limitations.

feature_importances_ bar chart
At least 3 business insights
State what would improve the model

Step 06

📄 Report & Present

Write a structured report. Present findings in clear language a non-technical stakeholder can understand.

Executive summary (3 sentences)
Results table: model vs metric
Recommendation for production deployment

Grading Rubric

Each project is graded out of 50 points. Two projects = 100 points total. Pass threshold: 70 points.

15 pts

📊 EDA & Cleaning

Thorough EDA with charts, correct handling of missing values, feature engineering, data is ready for training

20 pts

⚙ Model & Evaluation

Minimum 2 models compared, cross-validation used, correct metric for problem type, results clearly reported

15 pts

📄 Report & Insights

Executive summary, feature importance analysis, at least 3 business insights, deployment recommendation

+5 bonus

🌟 Innovation

Bonus points for creative feature engineering, hyperparameter tuning, or going beyond the minimum requirements

70/100

🏅 Pass Threshold

Score 70 or above across both projects to earn the Synkoc AI/ML Internship Certificate of Completion

Jupyter

📄 Submission Format

Submit as .ipynb Jupyter Notebook with all cells executed, or Python .py file with full comments and output

Tips for Success

From Synkoc instructors who have reviewed hundreds of capstone projects. Do these and you will stand out.

🎯

Choose What Interests You

Pick projects from domains you find genuinely interesting. Enthusiasm shows in the quality of analysis. Your best work comes from real curiosity.

📊

Spend 40% on EDA

Most students rush to modelling. The best projects spend the most time understanding the data. Every insight you find in EDA makes the model better.

⚙

Always Compare 2+ Models

Never submit a project with only one model. Always compare at least two using cross-validation. This shows you know how to make evidence-based decisions.

🔎

Explain Your Choices

Why did you choose this metric? Why this model? Explaining your reasoning clearly is what separates junior from senior data scientists.

👥

Write for Non-Technical Readers

Your executive summary should be understandable by a business manager who knows nothing about ML. This is the most valuable skill in industry.

🌟

Go Beyond the Minimum

The minimum gets you a pass. Going beyond, with hyperparameter tuning, creative features, or extra visualisations, gets you a distinction and impresses future employers.

Everything You Have Mastered

In 4 weeks you covered the complete ML engineering stack. Your capstone is proof of mastery.

Week 1

🔨 Foundation

Python variables, loops, functions, data structures. Statistics: mean, std, probability, correlation, EDA.

Week 2

📈 Data Tools

NumPy arrays and vectorised ops. Pandas DataFrames, cleaning, groupby. Matplotlib and Seaborn visualisation. EDA pipeline.

Week 3

🤖 Machine Learning

Supervised: Linear Reg, Logistic Reg, Decision Tree, Random Forest. Unsupervised: KMeans, PCA. Evaluation: F1, CV, ROC-AUC.

Week 4

🌟 Capstone

End-to-end ML on real data. Choose 2 from 10 real-world projects. Build, evaluate, document, and present complete solutions.

🏅

Ready to Build?

You have the knowledge. You have the tools. Now prove it. Choose your 2 projects from the Capstone Lab, build complete pipelines, and earn your Synkoc AI/ML Internship Certificate.

✓ 3 Weeks of Training Done

✎ Choose 2 Projects — Now

🏅 Certificate on Completion

Good luck from the entire Synkoc team in Bangalore. We believe in you.

Synkoc IT Services · support@synkoc.com · +91-9019532023