Algorithms for Data Science, Jan-May 2026

Lecturer: Ravishankar Krishnaswamy

Location: TBD

The students should be able to:

Identify if a given algorithmic problem has a structure that makes it solvable.
Systematically analyse the complexity class of algorithms, and grasp concepts of computational hardness, i.e. NP-completeness etc.
Identify the conditions for applying various methods for solving the problem.
Be aware of various algorithmic techniques for modeling and solving problems (randomized algorithms, sketching, streaming, greedy, divide&conquer, dynamic programming, etc.).
Deconstruct data-science problems into their fundamental computational components and build algorithms for the same.

P vs NP, NP-Hardness, NP-Completeness. Formal definitions of complexity classes and the relationships between them.
Cook-Levin Theorem. The foundational result showing that SAT is NP-Complete.
Reduction of MIS to Classroom Scheduling. Demonstrating hardness of scheduling via Maximum Independent Set.
Reduction of 3SAT to k-Clique. A classic polynomial-time reduction between NP-Complete problems.

Merge Sort. The canonical divide-and-conquer sorting algorithm running in O(n log n) time.
Linear Time Median Finding. Deterministic selection using the median-of-medians strategy.
Nearest Pair Problem in 2D. Finding the closest pair of points in the plane in O(n log n) time.
Karatsuba Multiplication. Sub-quadratic integer multiplication via clever recursive decomposition.
Strassen's Algorithm. Matrix multiplication in sub-cubic time using divide and conquer.
Discrete FFT. The Fast Fourier Transform for efficient DFT computation using the recursive nature of the exponential series.

Fibonacci Numbers. Illustrating memoization and bottom-up computation on a simple recurrence.
Longest Common Subsequence. Classic DP on two sequences with applications in diff and bioinformatics.
Matrix Chain Multiplication. Optimal parenthesization to minimize matrix chain multiplications.
Single Sink Shortest Path on DAG. Computing shortest paths to a target in a DAG using dynamic programming on graphs.
Bellman-Ford Algorithm. Shortest paths in graphs with negative edge weights via iterative relaxation
Markov Decision Process. Optimal policy computation using value iteration and the Bellman equation, following on from previous lecture (RL is basically DP++)
Backpropagation. Efficient gradient computation in neural networks as a DP over the computation graph.

Interval Scheduling. Maximizing the number of non-overlapping intervals by greedy earliest finish time.
Minimum Spanning Tree (Kruskal's). Building an MST by greedily adding lightest edges using Union-Find.
Max k-Coverage. A greedy approximation algorithm for selecting k sets to maximize coverage.
Nearest Neighbour Approximation in n-Dimensions. Greedy heuristics for proximity problems in high-dimensional spaces.
Huffman Coding. Constructing optimal prefix-free codes via a greedy bottom-up tree construction.

Basics of Probability. Introduction to Markov, Chebyshev, and Chernoff bounds.
Randomized Mincut. Karger's algorithm for finding the minimum cut in a graph using random edge contractions.

Max-Flow Min-Cut. The Ford-Fulkerson algorithm for computing maximum flows and the duality with minimum cuts.

Randomized Matrix Multiplication Verification. Using randomization to verify the correctness of matrix multiplication efficiently.
Universal Hashing. Using randomization to design hash functions with good average-case performance.
Sketching Algorithms. Count-Min Sketch for frequency estimation, and dimensionality reduction via the Johnson-Lindenstrauss Lemma.

Randomized Linear Algebra. Randomized approximate matrix multiplication via row/column sampling.
Parallel Algorithms. Prefix sums (scan) with small work and depth as a building block for parallel computation.