Algorithms for Data Science, Jan-May 2026
- Lecturer: Ravishankar Krishnaswamy
- Location: TBD
Course Objectives
The students should be able to:
- Identify if a given algorithmic problem has a structure that makes it solvable.
- Systematically analyse the complexity class of algorithms, and grasp concepts of computational hardness, i.e. NP-completeness etc.
- Identify the conditions for applying various methods for solving the problem.
- Be aware of various algorithmic techniques for modeling and solving problems (randomized algorithms, sketching, streaming, greedy, divide&conquer, dynamic programming, etc.).
- Deconstruct data-science problems into their fundamental computational components and build algorithms for the same.
Topics Covered (Week-wise)
Complexity Theory (Week 1)
- P vs NP, NP-Hardness, NP-Completeness. Formal definitions of complexity classes and the relationships between them.
- Cook-Levin Theorem. The foundational result showing that SAT is NP-Complete.
- Reduction of MIS to Classroom Scheduling. Demonstrating hardness of scheduling via Maximum Independent Set.
- Reduction of 3SAT to k-Clique. A classic polynomial-time reduction between NP-Complete problems.
Divide and Conquer (Weeks 2–3)
- Merge Sort. The canonical divide-and-conquer sorting algorithm running in O(n log n) time.
- Linear Time Median Finding. Deterministic selection using the median-of-medians strategy.
- Nearest Pair Problem in 2D. Finding the closest pair of points in the plane in O(n log n) time.
- Karatsuba Multiplication. Sub-quadratic integer multiplication via clever recursive decomposition.
- Strassen's Algorithm. Matrix multiplication in sub-cubic time using divide and conquer.
- Discrete FFT. The Fast Fourier Transform for efficient DFT computation using the recursive nature of the exponential series.
Dynamic Programming (Weeks 4–5)
- Fibonacci Numbers. Illustrating memoization and bottom-up computation on a simple recurrence.
- Longest Common Subsequence. Classic DP on two sequences with applications in diff and bioinformatics.
- Matrix Chain Multiplication. Optimal parenthesization to minimize matrix chain multiplications.
- Single Sink Shortest Path on DAG. Computing shortest paths to a target in a DAG using dynamic programming on graphs.
- Bellman-Ford Algorithm. Shortest paths in graphs with negative edge weights via iterative relaxation
- Markov Decision Process. Optimal policy computation using value iteration and the Bellman equation, following on from previous lecture (RL is basically DP++)
- Backpropagation. Efficient gradient computation in neural networks as a DP over the computation graph.
Greedy Algorithms (Weeks 6–7)
- Interval Scheduling. Maximizing the number of non-overlapping intervals by greedy earliest finish time.
- Minimum Spanning Tree (Kruskal's). Building an MST by greedily adding lightest edges using Union-Find.
- Max k-Coverage. A greedy approximation algorithm for selecting k sets to maximize coverage.
- Nearest Neighbour Approximation in n-Dimensions. Greedy heuristics for proximity problems in high-dimensional spaces.
- Huffman Coding. Constructing optimal prefix-free codes via a greedy bottom-up tree construction.
Randomized Algorithms (Weeks 8–9)
- Basics of Probability. Introduction to Markov, Chebyshev, and Chernoff bounds.
- Randomized Mincut. Karger's algorithm for finding the minimum cut in a graph using random edge contractions.
Graph Algorithms (Week 10)
- Max-Flow Min-Cut. The Ford-Fulkerson algorithm for computing maximum flows and the duality with minimum cuts.
Randomization, Hashing, and Sketching (Weeks 10–11)
- Randomized Matrix Multiplication Verification. Using randomization to verify the correctness of matrix multiplication efficiently.
- Universal Hashing. Using randomization to design hash functions with good average-case performance.
- Sketching Algorithms. Count-Min Sketch for frequency estimation, and dimensionality reduction via the Johnson-Lindenstrauss Lemma.
Randomized Linear Algebra and Parallel Algorithms (Week 12)
- Randomized Linear Algebra. Randomized approximate matrix multiplication via row/column sampling.
- Parallel Algorithms. Prefix sums (scan) with small work and depth as a building block for parallel computation.
Announcements
- Course details coming soon.