The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
processing huge (tera- or petabytes big) data sets
real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),
searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,
finding anomalies in data,
clustering of massive sets of records,
recommendation systems,
reduction of data dimensionality
applications of DeepLearning to data mining
During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems:
Recommender Systems: Collaborative Filtering, MatrixFactorization
Algorithms for dimensionality reduction: LLE, t-SNE, UMAP
RandomForest and XGBoost: the most popular algorithms for classification and regression trees
Algorithms for detecting anomalies in data
Locality Sensitive Hashing (LSH): a general technique for finding similar items in huge collections of items
Algorithms for mining data streams: sampling, filtering (Bloom filters), probabilistic counting
Applications of DeepLearning to data mining
Distributed Processing of Massive Data: Hadoop, MapReduce, Spark
Outcome:
After completing the course, the students should:
know most successful algorithms and techniques used in Data Mining;
gain some hands-on experience with several algorithms for mining complex data sets;
be able to apply the acquired knowledge and skills to new problems.