CPSC/AMTH 445/545 - Introduction to Data Mining - Fall 2017 Yale
Yale University CPSC 445/545 - F2016

CPSC/AMTH 445/545

Introduction to Data Mining

Fall 2017

Instructor: Guy Wolf (guy.wolf@yale.edu)

TA: Jay Stanley (jay.stanley@yale.edu)
ULAs: Tyler Dohrn (tyler.dohrn@yale.edu) & Scott Stankey (scott.stankey@yale.edu)

The ability to process and extract insightful information from large amounts of data has become a desired, if not necessary, skill in almost every field of industry and science. Among other benefits, such information can provide useful knowledge, support decision-making, uncover hidden trends, and enable deeper understanding of observed phenomena. This course covered some of the main problems and challenges encountered in data analysis and applications, and provided fundamental tools and techniques for solving them. We discussed popular algorithms for data organization & visualization, such as principal component analysis (PCA) and multidimensional scaling (MDS). Students have become familiar with a variety of machine learning and data mining approaches. These included both supervised approaches, such as performing classification (e.g., with decision trees, Bayesian classifiers, and SVM), and unsupervised ones, such as clustering data (e.g., with k-means, density estimators, and linkage-based agglomeration).

The lectures and discussions in class were accompanied by homework exercises that combined theoretical questions, which emphasized the understanding of underlying data mining principles, together with programming tasks (e.g., in MatLab and/or Python) that demonstrated practical implementations of studied data mining techniques. Grades in this course were based on these exercises, a project, and an exam.

The course assumed basic prior knowledge in probabilities, linear algebra, data structures, algorithms, and programming.


No required textbook, but the following books were recommended for the course:


This is a list of topics covered by this course:


Extra topics (slides not prepared specifically for this course):