The ability to process and extract insightful information from large amounts of data has become a desired, if not necessary, skill in almost every field of industry and science. Among other benefits, such information can provide useful knowledge, support decision-making, uncover hidden trends, and enable deeper understanding of observed phenomena. This course covered some of the main problems and challenges encountered in data analysis and applications, and provided fundamental tools and techniques for solving them. We discussed popular algorithms for data organization & visualization, such as principal component analysis (PCA) and multidimensional scaling (MDS). Students have become familiar with a variety of machine learning and data mining approaches. These included both supervised approaches, such as performing classification (e.g., with decision trees, Bayesian classifiers, and SVM), and unsupervised ones, such as clustering data (e.g., with k-means, density estimators, and linkage-based agglomeration).
The lectures and discussions in class were accompanied by homework exercises that combined theoretical questions, which emphasized the understanding of underlying data mining principles, together with programming tasks (e.g., in MatLab and/or Python) that demonstrated practical implementations of studied data mining techniques. Grades in this course were based on these exercises, a project, and an exam.
The course assumed basic prior knowledge in probabilities, linear algebra, data structures, algorithms, and programming.