For the data fusion, the input data is collected, processed and analysed in a Big Data pipeline (s. figure). The term “Big Data” in this project does not refer to the generated data volumes, but to the complex analysis, processing and utilisation of the data from different sources. In particular, the aim is to identify statistical correlations, patterns and relationships with regard to users’ route choices within the public transport network.
The back-end system developed by INIT is based on an Apache Kafka data infrastructure. As open source software for transferring and storing large data streams, Apache Kafka works as a data broker between the data producers (input data) and the data consumers (IT partner systems). Apache Flink is used to transform data streams and to link data sources. Apache Beam serves as a unified application programming interface (API) to enable algorithms to be used in various processes. A major advantage of these deployed technologies is their broad scalability as well as the possibility of real-time processing. This makes it possible to analyse a continuous stream of events in a traffic network and to take traffic control measures in real time as well as, for example, to provide occupancy levels in passenger information in the future.