Final Thesis: Giving Structure to Open Data in the JValue ODS

Abstract: Nowadays the internet provides a lot of open data for public use. Those can be written in various data types and cover plenty of subjects. Because of that the absence of a standard results into the main problem. Every provider can decide for himself how the data is constructed.

The JValue project is dedicated to this problem and aims to be the central point where those open data are gathered and optimized. Currently the JValue Open- Data-Service (ODS) provides the extraction, transformation and retrieving of open data supporting numerous protocols and data formats.

However until now there is only a very generic interface for the retrieval of those open data since the system currently ignores any data structure. In addition to that any provider can alter their data structure and upload it after the adjustment process, since they are not bound to any restrictions. This can lead to major restrictions or even the loss of the data gathering process.

To counteract this behavior a process shall be introduced, which allows the ODS to structure those open data. Furthermore a schema recommendation for the data should be generated, which then will be the foundation of the remaining data gathering process.

As a consequence of the introduced data schema there is now a possibility to also derive fitting database tables from those schema. This tables should be created and filled dynamically and provide the user a fully and easy accessible interface. As an implication of the persistent structured data, the earlier mentioned problem of frequently changing data structures can now be easily solved. The schema can be used to validate those imported and transformed data. By also adding a corresponding visual state to those data configurations, the user will be able to react up on changed data structures.

Keywords: data engineering, schema recommendation, open data

PDF: Master Thesis

Reference: Alexander Mahler. Giving Structure to Open Data in the JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Implementing an Open Data ETL Processing Engine with Kafka

Abstract: The JValue project group is developing a modeling ecosystem for Extract Transform Load (ETL) processes. Part of this ecosystem is a description model for those. This thesis suggests a conversion process from the description model into an Apache Kafka runtime, described in a cloud-native format, like Docker Compose. The conversion is implemented as a library and done in a multi-phase approach as known from classical compilers. In the first step, the description language is converted into a runtime independent intermediate description and afterward in a description of a concrete runtime, in this case, Kafka. The multi-phase approach minimizes the implementation work for additional runtimes and allows runtime independent optimization and analysis. The goal for the generated runtime is to use existing Kafka components, which is only partially possible due to the complexity of the description model.

Keywords: open data, compiler, Apache Kafka

PDF: Master Thesis

Reference: Fabian Arnold. Implementing an Open Data ETL Processing Engine with Kafka. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.