Information retrieval and text mining  

Using today’s search engines allows us to find the needle in the haystack much easier than before. But how do you find out what the needle looks like and where the haystack is? That is exactly the problem we will discuss in this course. An important difference with standard information retrieval (search) techniques is that they require a user to know what he or she is looking for, while text mining attempts to discover information that is not known beforehand. This is very relevant, for example, in criminal investigations, legal discovery, (business) intelligence, sentiment- & emotion mining or clinical research. Text mining refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text. Text mining encompasses several computer science disciplines with a strong orientation towards artificial intelligence in general, including but not limited to information retrieval (building a search engine), statistical pattern recognition, natural language processing, information extraction and different methods of machine learning (including deep learning), clustering and ultimately integrating it all using advanced data visualization and chatbots to make the search experience easier and better. In this course we will also discuss ethical aspect of using Artificial Intelligence for the above tasks, including the need for eXplainable AI (XAI), training deep-learning large language-models more energy efficient, and several ethical problems that may arise related to bias, legal, regulatory and privacy challenges. This course is closely related with the course Advanced Natural Language Processing (ANLP). In the ANLP course, the focus is more on advanced methods and architectures to deal with complex natural language tasks such as machine translation, and Q&A systems. IRTM focusses more on building search engines and using text-analytics to improve the search experience. In the IRTM course, we will use a number of the architectures that are discussed in more detail in ANLP. The overlap between the two courses is kept to a minimum. There is no need to follow the courses in a specific order. Prerequisites None. Recommended reading Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press, 2008. In bookstore and online: http://informationretrieval.org. More information at: https://curriculum.maastrichtuniversity.nl/meta/464235/information-retrieval-and-text-mining
Presential
English
Information retrieval and text mining
English

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or HaDEA. Neither the European Union nor the granting authority can be held responsible for them. The statements made herein do not necessarily have the consent or agreement of the ASTRAIOS Consortium. These represent the opinion and findings of the author(s).