Final Thesis: A Study and Analysis of the Performance of the JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model

Abstract: Open data has been known for having data quality issues that require complex data cleansing and data transformation in order to be usable for data analysis, data visualization, training machine learning algorithms, and other data science activities. Open Data Service (ODS) is a software project that aims at creating an interface for reliable and safe consumption of open data. It does so by providing the necessary tooling and infrastructure needed for collaboration on eliminating open data usability obstacles. ODS underwent several cycles of development to better serve its purposes, which include functioning as an extract, transform, load (ETL) tool to consume open data from different sources and adapt it to different needs. In this work we evaluate and analyze ODS performance in that regard. Specifically, as part of a data pipeline supporting a real-world data science application.

PDF: Master Thesis

Reference: Shady Hegazy. Study and Analysis of the Performance of JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Giving Structure to Open Data in the JValue ODS

Abstract: Nowadays the internet provides a lot of open data for public use. Those can be written in various data types and cover plenty of subjects. Because of that the absence of a standard results into the main problem. Every provider can decide for himself how the data is constructed.

The JValue project is dedicated to this problem and aims to be the central point where those open data are gathered and optimized. Currently the JValue Open- Data-Service (ODS) provides the extraction, transformation and retrieving of open data supporting numerous protocols and data formats.

However until now there is only a very generic interface for the retrieval of those open data since the system currently ignores any data structure. In addition to that any provider can alter their data structure and upload it after the adjustment process, since they are not bound to any restrictions. This can lead to major restrictions or even the loss of the data gathering process.

To counteract this behavior a process shall be introduced, which allows the ODS to structure those open data. Furthermore a schema recommendation for the data should be generated, which then will be the foundation of the remaining data gathering process.

As a consequence of the introduced data schema there is now a possibility to also derive fitting database tables from those schema. This tables should be created and filled dynamically and provide the user a fully and easy accessible interface. As an implication of the persistent structured data, the earlier mentioned problem of frequently changing data structures can now be easily solved. The schema can be used to validate those imported and transformed data. By also adding a corresponding visual state to those data configurations, the user will be able to react up on changed data structures.

Keywords: data engineering, schema recommendation, open data

PDF: Master Thesis

Reference: Alexander Mahler. Giving Structure to Open Data in the JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Implementing an Open Data ETL Processing Engine with Kafka

Abstract: The JValue project group is developing a modeling ecosystem for Extract Transform Load (ETL) processes. Part of this ecosystem is a description model for those. This thesis suggests a conversion process from the description model into an Apache Kafka runtime, described in a cloud-native format, like Docker Compose. The conversion is implemented as a library and done in a multi-phase approach as known from classical compilers. In the first step, the description language is converted into a runtime independent intermediate description and afterward in a description of a concrete runtime, in this case, Kafka. The multi-phase approach minimizes the implementation work for additional runtimes and allows runtime independent optimization and analysis. The goal for the generated runtime is to use existing Kafka components, which is only partially possible due to the complexity of the description model.

Keywords: open data, compiler, Apache Kafka

PDF: Master Thesis

Reference: Fabian Arnold. Implementing an Open Data ETL Processing Engine with Kafka. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Fehlertoleranzanalyse von Microservice basierten Softwarearchitekturen – Konzept und Anwendung am JValue ODS

Abstract: Microservice-based software architectures play an essential role in building sizeable scalable cloud systems. The main advantage of microservices compared to the traditional software monoliths is the independent development, deployment, and scaling of the individual microservices, which allows innovations at a higher speed. Because microservice-based architectures are distributed systems, complexity is shifted from code to the network and communication layer. Therefore, additional failures like service outage or network connectivity loss arise, which must be tolerated to keep the system healthy and running. Within this thesis, a reusable concept is developed to analyse the fault tolerance of microservice-based software architectures. This allows for revealing weaknesses in the architecture that negatively affects the system’s reliability and resilience. For frequent problems, solution proposals are provided. The concept’s applicability and effectiveness are evaluated by applying it at the JValue Open Data Service (ODS). The analysis revealed several issues regarding the ODS’s fault tolerance, which could be fixed with the provided solutions.

Keywords: Microservices, fault tolerance, dependency graph, transactional outbox pattern

PDF: Master Thesis

Reference: Jonas Schüll. Fehlertoleranzanalyse von Microservice basierten Softwarearchitekturen – Konzept und Anwendung am JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Design and Implementation of Parameterizable Data Import for the JValue ODS

Abstract: Governments have recognized that the publication of open data is of great economic and social value. Collecting and using this data is challenging because it is not always available in an easy to process format. Minimizing these challenges is the task of the JValue Open Data Service (ODS), a system that makes data consumption easy. Yet the location of a resource and the time of a data import is statically defined.

This thesis presents a concept how the ODS can be extended by parameterizable datasources and how the data import can be triggered manually. This addresses the challenge of rapidly changing data on the Internet and adapts the ODS in order to deal with the emerging problems. With parameterizable datasources it is viable to dynamically describe the location of resources. The possibility for manual data imports ensures that data is only retrieved when it is really needed. The design decisions and the implementation of these functionalities for the ODS are covered in this thesis.

Keywords: open data; etl; JValue ODS; RESTful APIs

PDF: Bachelor Thesis

Reference: Jens Wächtler. Design and Implementation of Parameterizable Data Import for the JValue ODS. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2020.

Final Thesis: GraphQL-based Generic and Domain Specific Query Interfaces for the JValue ODS

Abstract: The JValue Open Data Service (ODS) is an open source application, founded and developed by the professorship of Open Source Software of the Friedrich Alexander Universität Erlangen Nürnberg (FAU). The application aims to become the go-to place for developers and data scientists around the world when building applications by using Open Data. The JValue ODS’ business model consists of collecting data from various different Open Data APIs, apply processing and cleansing and offering the improved version of the data back to the user. At the time of writing the thesis, the ODS only offers a rudimentary auto generated REST Interface for accessing the resulting data which does not fulfill the various requirements of its targeted user base. Therefore, this thesis aims to offer a solution on how a responsive and clean user API could be implemented using GraphQL as query language. As the ODS itself is a collection of many different docker native micro-services, the relevant functionality for providing the APIs is developed using the same core principles.

In order to grant the user the ability to fetch the raw data, each creation of a new pipeline, which configures the result of the attached data endpoint and a customizable transformation script, can be further configured to generate a read only GraphQL endpoint on the basis of the underlying PostgreSQL schema. This API is capable of filtering and combining all the fields of the collected data. To give the users, who are more knowledgeable in the domain they are working in, the possibility to add business logic to this data a node.js template project is provided. After implementing the desired functionality and hosting it on their preferred hosting solution, the users can register their custom endpoint through the JValue ODS Web Interface which are then validated and merged with the existing schema.

Keywords: GraphQL, JValue ODS, open data

PDF: Bachelor Thesis

Reference: Kai Malte von Rönne. GraphQL-based Generic and Domain Specific Query Interfaces for the JValue ODS. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2020.

Final Thesis: Applying Event-Driven Architecture to the JValue ODS

Abstract: In a world of “Big Data” and data transparency “open data” might not be considered as an unknown term anymore. As the name suggests, it describes data, that is commonly available and free to use for everyone.

The vision of the software “JValue Open Data Service (ODS)” is to provide a solid solution for the community to achieve data availability and homogeneity in representation, regarding open data. With the provision of a user interface to configure the way open data is extracted and how it is presented, ETL (Extract-Transform-Load) processes are exceedingly facilitated.

The challenges of scaling large for a platform, based on a service-orientated architecture, demand a solution that can handle a large set of interactions. The purpose of this thesis is to apply an “Event-Driven Architecture” to the already existing model of the software to fulfill these requirements and furthermore, avoid the bottlenecks caused by the central orchestration of the former structure. This is done by reevaluating the current software design that is composed of distributed microservices and identify events of its current architecture with a technique, called “Event-Storming”. The results lead to an architecture design that will be implemented in the solution of “JValue ODS” and finally evaluated.

PDF: Bachelor Thesis

Reference: Hannes Fleischer. Applying Event-Driven Architecture to the JValue ODS. Bachelor Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2020

Final Thesis: Design und Implementierung eines Konzepts zur Live-Konfiguration

Abstract: Nowadays a variety of systems generate and handle more and more data. This data is often presented in different formats and has to be retrieved from various locations. The Open Data Service (ODS) presents a system to conquer those tasks. The ODS allows the definition of data manipulation pipelines to process and convert data. Those data transformations are defined by code snippets executed on the data.

This thesis presents a concept for the live configuration of those transformations. With live configuration the user receives instant feedback about his code, similar to using an integrated development environment (IDE). A live preview of the resulting data is also shown. The result is a faster and more efficient development of the transformation snippets. The thesis describes the creation of the live configuration concept and documents an exemplary implementation of the concept as a web application.

Keywords: JValue Open Data Service, live configuration

PDF: Master Thesis

Reference: Karl Lugwig Werner. Design und Implementierung eines Konzepts zur Live-Konfiguration. Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2019.

Final Thesis: Development of a Microservice for Open Weather Data

Abstract: Since mid 2017 the Deutscher Wetterdienst (DWD) has been obligated to provide its weather data to the public for free. However, anyone who wants to access it needs to click through a file system hoping to still be on the right path to the desired weather data. Due to confusing folder names this turns into a frustrating experience very quickly. The data then comes in various formats, making it hard to automatically process it.

In order to stop this hassle, we present a microservice that adapts weather data from the DWD server to a more user friendly REST interface. This thesis describes the architecture and implementation of the microservice. As a result, users can fetch historical, current and forecast weather data of twenty different weather parameters in an easy to process JSON format.

Keywords: Open source, open data, open data service, JValue ODS, DWD, weather

PDF: Master Thesis

Reference: Daniel Vahle. Development of a Microservice for Open Weather Data . Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2019.

Final Thesis: Migration the JValue ODS to Microservices

Abstract: “A world in which consuming open data is easy and safe.“ This is the vision of the JValue Open Data Service (ODS) that aims to bundle finding the right open data, using and combining it into an easy and intuitive process. The daily life of developers should be free from hardships, like writing data crawlers for different data formats in combination with different protocols, and instead, let them focus on their real problems.

As in every crowd-sourced application, the more users contributing and adding new data sources, the better. With these requirements, the need for a scalable software emerges. The microservice-based approach promises to achieve this, as well as making the project easier to understand and maintain for developers.

This thesis presents a microservice-based architecture draft for the ODS and a migration strategy from the current monolithic architecture towards it. The exemplary implementation of a selected microservice proofs the feasibility of such an architectural style. Furthermore, a discussion about the challenges and benefits of a distributed system and the experiences written down in this thesis shall reduce the risks and enable a complete migration to a microservices-based architecture in the future.

Keywords: Open source, open data, open data service, JValue ODS

PDF: Master Thesis

Reference: Georg Schwarz. Migration the JValue ODS to Microservices. Master Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg: 2019.