Introduction to Machine Learning

Course Overview

A comprehensive introduction to machine learning for undergraduate students covering supervised learning, unsupervised learning, and practical ML skills. The course emphasizes both theoretical understanding and hands-on implementation of core ML algorithms.

Course Description

This course introduces the fundamental concepts, algorithms, and applications of machine learning. Students will learn how to train models to make predictions from data, understand when different approaches are appropriate, and gain practical experience through programming assignments and a capstone project. No prior machine learning experience required.

Learning Objectives

By the end of this course, students will be able to:

Understand fundamental ML concepts and problem formulations
Implement common ML algorithms from scratch
Apply scikit-learn and other ML libraries to real datasets
Evaluate and compare different models appropriately
Identify and address common pitfalls (overfitting, data leakage, etc.)
Communicate ML results effectively to technical and non-technical audiences
Recognize ethical considerations in ML applications

Tentative Course Outline

Part I: Foundations (Weeks 1-3)

Week 1: Introduction to Machine Learning

What is machine learning?
Types of learning: supervised, unsupervised, reinforcement
ML workflow and problem formulation
Applications across domains
Course logistics and expectations

Week 2: Mathematical Foundations

Linear algebra review (vectors, matrices, norms)
Probability and statistics essentials
Optimization basics (gradients, convexity)
Introduction to Python scientific stack (NumPy, Pandas, Matplotlib)

Week 3: Data Preprocessing & Exploratory Analysis

Data collection and quality assessment
Missing data handling
Feature scaling and normalization
Visualization techniques
Train/validation/test splits

Part II: Supervised Learning - Regression (Weeks 4-6)

Week 4: Linear Regression

Simple and multiple linear regression
Least squares estimation
Gradient descent optimization
Analytical vs. iterative solutions
Feature engineering

Week 5: Regularization & Model Selection

Overfitting and underfitting
Ridge regression (L2 regularization)
Lasso regression (L1 regularization)
Cross-validation techniques
Bias-variance tradeoff

Week 6: Beyond Linear Models

Polynomial regression
Basis function expansion
Locally weighted regression
Introduction to non-parametric methods

Part III: Supervised Learning - Classification (Weeks 7-10)

Week 7: Logistic Regression

Binary classification problem
Sigmoid function and log-odds
Maximum likelihood estimation
Multi-class classification (one-vs-rest, softmax)
Evaluation metrics: accuracy, precision, recall, F1, ROC-AUC

Week 8: Support Vector Machines

Linear SVMs and maximum margin
Kernel trick and kernel functions
Soft margin and C parameter
Practical considerations and tuning

Week 9: Decision Trees & Ensemble Methods

Decision tree learning (ID3, CART)
Entropy, information gain, Gini impurity
Random forests
Gradient boosting (XGBoost, LightGBM)
Feature importance

Week 10: Neural Networks & Deep Learning Intro

Perceptron and multi-layer perceptrons
Backpropagation algorithm
Activation functions
Deep learning frameworks (PyTorch/TensorFlow intro)
When to use deep learning vs. classical ML

Part IV: Unsupervised Learning (Weeks 11-12)

Week 11: Clustering

K-means clustering
Hierarchical clustering
DBSCAN and density-based methods
Evaluation metrics (silhouette score, elbow method)
Clustering applications

Week 12: Dimensionality Reduction

Principal Component Analysis (PCA)
Singular Value Decomposition (SVD)
t-SNE for visualization
Autoencoders (brief introduction)
Feature selection vs. feature extraction

Part V: Advanced Topics & Best Practices (Weeks 13-14)

Week 13: ML in Practice

Hyperparameter tuning strategies
Model interpretation and explainability
Dealing with imbalanced datasets
Data augmentation techniques
Pipeline design and MLOps basics

Week 14: Ethics, Fairness, & Societal Impact

Bias in ML systems
Fairness metrics and definitions
Privacy considerations
Responsible AI practices
Case studies of ML failures and successes

Part VI: Final Projects (Weeks 15-16)

Week 15: Project Development

Work on final projects
In-class consultations
Peer review sessions

Week 16: Final Presentations & Course Wrap-up

Student project presentations
Future directions in ML
Resources for continued learning

Prerequisites

Required:

Programming: Proficiency in Python or willingness to learn quickly
Mathematics:
- Calculus: derivatives, partial derivatives, chain rule
- Linear algebra: vectors, matrices, matrix multiplication
- Probability: random variables, expectations, distributions
- Statistics: mean, variance, correlation

Recommended:

Previous programming coursework
Data structures and algorithms
Experience with Jupyter notebooks

No prior ML experience required!

Course Format

Lectures: 2x per week (90 minutes each)
- Theory with intuitive explanations
- Worked examples and demos
- Interactive discussions
Lab Sections: Weekly (90 minutes)
- Hands-on coding exercises
- Dataset exploration
- Q&A and assignment help
Programming Assignments: 6 assignments
- Implement algorithms from scratch
- Apply ML libraries to real datasets
- Analyze and report results
Final Project: Team-based (2-4 students)
- End-to-end ML pipeline on chosen dataset
- Written report and presentation
- Emphasis on practical application

Assessment

Programming Assignments (45%): 6 assignments
- Assignment 1-3: 6% each (basics)
- Assignment 4-6: 9% each (more complex)
Midterm Exam (20%)
- Covers weeks 1-8
- Mix of conceptual and problem-solving questions
Final Project (30%)
- Proposal (5%)
- Final report (18%)
- Presentation (7%)
Participation & Quizzes (5%)
- In-class participation
- Short weekly quizzes on readings

Textbooks & Resources

Online Resources

Coursera ML courses (Andrew Ng)
Fast.ai practical deep learning course
Scikit-learn documentation and tutorials
Course website with supplementary materials

Software & Tools

Python 3.8+
Jupyter Notebooks / Google Colab
scikit-learn, pandas, NumPy
Matplotlib, seaborn for visualization
PyTorch or TensorFlow (final weeks)

Datasets

Students will work with various real-world datasets:

UCI ML Repository datasets
Kaggle competition datasets
Healthcare, finance, NLP, computer vision domains
Custom datasets from research collaborations

Course Policies

Collaboration Policy

Assignments: Discuss concepts freely, but write and submit your own code
Projects: Full collaboration within teams
Exams: Individual work only, closed book

Late Policy

Assignments: 10% penalty per day (up to 3 days late)
After 3 days: 0 credit unless prior arrangement
One “late pass” per student (waives penalty once)

Regrade Requests

Submit within 1 week of assignment return
Must include detailed written justification

Academic Integrity

Cite all external resources
Do not share solutions or code
Violations will be reported per university policy

Expected Workload

Lectures: 3 hours/week
Lab sections: 1.5 hours/week
Assignments: 6-10 hours/week (varies by assignment)
Exam preparation: 10-15 hours total
Final project: 15-20 hours total (last 4 weeks)
Average: ~10-12 hours/week

Learning Outcomes

Upon successful completion, students will have:

Technical Skills
- Ability to implement core ML algorithms
- Proficiency with scikit-learn and Python ML ecosystem
- Understanding of when to use different algorithms
Practical Experience
- End-to-end ML project experience
- Data preprocessing and feature engineering skills
- Model evaluation and selection expertise
Theoretical Foundation
- Understanding of mathematical foundations
- Ability to reason about model behavior
- Intuition for algorithm strengths and weaknesses
Professional Skills
- Communication of technical results
- Ethical awareness in ML applications
- Foundation for advanced ML coursework or industry work

Career Relevance

This course prepares students for:

Data scientist and ML engineer roles
Graduate studies in ML, AI, or related fields
Research positions in academia or industry
Applied ML roles across various industries

Office Hours & Support

Instructor office hours: 2x per week
TA office hours: 5x per week (staggered schedule)
Online discussion forum (Ed Discussion or similar)
Email for administrative questions only

Coming Soon

Detailed syllabus, lecture slides, assignment specifications, and dataset links will be published once the course is officially scheduled.