Nowadays a large amount of data is being produced constantly and its processing is a complex task from theoretical and technical perspective. The potential data benefits increase when data is integrated from heterogeneous sources and processed in near real-time, thus minimizing the latency of useful information and knowledge.Stream and context processing, system adaptation technologies are used for this purpose. This course covers data integration technologies, mostly emphasizing data stream processing and integration technologies like Apache Spark and Apache Kafka. They are viewed in the context of data life-cycle, which includes data integration, processing and interpretation and usage of the acquired information for adapting systems near real-time. In data integration the logical integration process and infrastructure solutions play equally important role since data is integrated using distributed, horizontally scalable environment. Near real-time stream integration and system adaption use cases based on Apache Spark, Apache Kafka, Apache Cassandra, Docker and Cloudstack are being covered as part of this course.
Outcome:
Ability to choose the most suitable data stream integration technology - Exam
Ability to define data integration solution on logical level - Exam and practical assignment
Ability to integrate data streams - Practical assignment