Final Thesis: A Study and Analysis of the Performance of the JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model

Abstract: Open data has been known for having data quality issues that require complex data cleansing and data transformation in order to be usable for data analysis, data visualization, training machine learning algorithms, and other data science activities. Open Data Service (ODS) is a software project that aims at creating an interface for reliable and safe consumption of open data. It does so by providing the necessary tooling and infrastructure needed for collaboration on eliminating open data usability obstacles. ODS underwent several cycles of development to better serve its purposes, which include functioning as an extract, transform, load (ETL) tool to consume open data from different sources and adapt it to different needs. In this work we evaluate and analyze ODS performance in that regard. Specifically, as part of a data pipeline supporting a real-world data science application.

PDF: Master Thesis

Reference: Shady Hegazy. Study and Analysis of the Performance of JValue Open Data Service as Part of a Data Pipeline Supporting An Online Learning Model. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Giving Structure to Open Data in the JValue ODS

Abstract: Nowadays the internet provides a lot of open data for public use. Those can be written in various data types and cover plenty of subjects. Because of that the absence of a standard results into the main problem. Every provider can decide for himself how the data is constructed.

The JValue project is dedicated to this problem and aims to be the central point where those open data are gathered and optimized. Currently the JValue Open- Data-Service (ODS) provides the extraction, transformation and retrieving of open data supporting numerous protocols and data formats.

However until now there is only a very generic interface for the retrieval of those open data since the system currently ignores any data structure. In addition to that any provider can alter their data structure and upload it after the adjustment process, since they are not bound to any restrictions. This can lead to major restrictions or even the loss of the data gathering process.

To counteract this behavior a process shall be introduced, which allows the ODS to structure those open data. Furthermore a schema recommendation for the data should be generated, which then will be the foundation of the remaining data gathering process.

As a consequence of the introduced data schema there is now a possibility to also derive fitting database tables from those schema. This tables should be created and filled dynamically and provide the user a fully and easy accessible interface. As an implication of the persistent structured data, the earlier mentioned problem of frequently changing data structures can now be easily solved. The schema can be used to validate those imported and transformed data. By also adding a corresponding visual state to those data configurations, the user will be able to react up on changed data structures.

Keywords: data engineering, schema recommendation, open data

PDF: Master Thesis

Reference: Alexander Mahler. Giving Structure to Open Data in the JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Implementing an Open Data ETL Processing Engine with Kafka

Abstract: The JValue project group is developing a modeling ecosystem for Extract Transform Load (ETL) processes. Part of this ecosystem is a description model for those. This thesis suggests a conversion process from the description model into an Apache Kafka runtime, described in a cloud-native format, like Docker Compose. The conversion is implemented as a library and done in a multi-phase approach as known from classical compilers. In the first step, the description language is converted into a runtime independent intermediate description and afterward in a description of a concrete runtime, in this case, Kafka. The multi-phase approach minimizes the implementation work for additional runtimes and allows runtime independent optimization and analysis. The goal for the generated runtime is to use existing Kafka components, which is only partially possible due to the complexity of the description model.

Keywords: open data, compiler, Apache Kafka

PDF: Master Thesis

Reference: Fabian Arnold. Implementing an Open Data ETL Processing Engine with Kafka. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Elasticity Concept for Microservice-based System

Abstract: Software Elasticity is the concept of adapting available resources to the current or expected workload. This concept fits modern and stateless microservice architectures, which are scalable by design. Their scalability is closely related to Software Resilience and places new demands on cloud architectures. The JValue Open Data Service (JValue ODS) is an open data platform with focus on Extract, Transform, Load (ETL) pipelines and aims to make the usage of open data easy, reliable and safe. For the success of the ODS, an Elastic and therefore Resilient hosting is mandatory. This thesis deploys the ODS to an on-premise Kubernetes cluster to improve the uptime guarantee, discusses different deployment strategies, elaborates horizontal microservice scaling techniques and operates the necessary infrastructure. This thesis presents Peffer’s Design Research Process to build a concept for Elasticity in microservice-based architectures. The concept is demonstrated and evaluated in the context of the JValue ODS.

Keywords: Microservices, elasticity, scalability, kubernetes, devops

PDF: Master Thesis

Reference: Aron Metzig. Elasticity Concept for a Microservice-based System. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Testing Microservice Integration with Consumer-Driven Contract Tests

Abstract: Microservice-Systeme bestehen aus eigenständigen, verteilten Services, die über Netzwerkverbindungen miteinander kommunizieren. Das Testen von Service-Integrationen kann bei derartigen Systemen eine Herausforderung darstellen, da hierzu mehrere Services zur selben Zeit ausgeführt werden müssen und es viele potenzielle Quellen für falsch-negative Testergebnisse gibt.

Consumer-Driven Contract Testing (CDCT) ist ein Ansatz, der dazu verwendet werden kann, beide Seiten einer Service-Integration unabhängig voneinander zu testen. Dies wird erreicht, indem die beiden Seiten der Integration mithilfe eines Vertrags (engl. contract) voneinander entkoppelt werden, wobei der Contract als Vermittler fungiert. Dieser wird durch den Service vorgegeben, welcher die Schnittstelle eines anderen Services beansprucht, und drückt dessen Erwartungen an die verwendete Schnittstelle aus.

Diese Arbeit erforscht, inwieweit CDCT zur Testung von Microservice-Integratio- nen beitragen kann, indem Vorteile, Nachteile, Herausforderungen und Richtlinien erfasst werden, die im Zusammenhang zu CDCT für Microservice-Systeme stehen. Für die initiale Theoriebildung wurde zunächst eine strukturierte Literaturrecherche durchgeführt. Im Anschluss wurde Aktionsforschung betrieben, bei der Consumer-Driven Contract Tests für ein Open-Source Microservice-System entwickelt wurden. Zuletzt, nach der abschließenden Evaluation der Aktionsforschung, wurden die Inhalte, die im Rahmen der strukturierten Literaturrecherche erhoben wurden, mit den Erfahrungen aus der Aktionsforschung abgeglichen.

Keywords: Microservices, integration testing, consumer-driven contract tests, pact

PDF: Master Thesis

Reference: Felix Quast. Testing Microservice Integration with Consumer-Driven Contract Tests. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Final Thesis: Konzept und Implementierung zur Observability für microservicebasierte Anwendungen

Abstract: ‘Microservices’ sind in der heutigen Zeit ein bekanntes und beliebtes Architekturmuster. Viele weltbekannte Tech-Unternehmen haben sich für diese entschieden. Die Entkopplung und die Aufteilung der Aufgaben in kleinere Services bringen neben daraus resultierenden Vorteilen auch Herausforderungen mit sich. Einen zentralen Negativpunkt hinsichtlich der Entwicklung dieser Dienste stellen die erschwerte Fehlersuche sowie die Schwierigkeit dar, den Überblick über die Anwendung als Gesamtes zu behalten.

In dieser Arbeit werden Softwaretools zur Überwachung und zur Aggregation von Log-Informationen vorgestellt. Darüber hinaus wird eine Kombination von Programmen gewählt, um ein Konzept zu entwickeln und eine beispielhafte Implementierung dieser Werkzeuge in ein bereits laufendes Open-Source-Projekt zu präsentieren.

Keywords: Microservices, observability, monitoring

PDF: Bachelor Thesis

Reference: Daniel Fabrikantow. Konzept und Implementierung zur Observability für microservicebasierte Anwendungen. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Value Types in TypeScript for JValue

Abstract: Over the past years, TypeScript has increasingly been gaining popularity due to its nature of providing functionalities to ease the development of scalable and robust applications whilst syntactically being a superset of JavaScript. With the growing complexity of data-driven environments, it is essential for programming languages to cope with value types beyond their primitive data types to capture the semantics of intangible data, such as systems of measurement, thus increasing readability and solidity across the codebase. By creating a test-driven framework in TypeScript, this thesis lays out different methods to efficiently implement value types, discusses their benefits as well as drawbacks, and ensures the reliability of the framework by integrating it into an existing data-driven service.

Keywords: Value types, JValue, TypeScript

PDF: Bachelor Thesis

Reference: Mert Baran. Value Types in Typescript for JValue. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021. 

Final Thesis: Hierarchical Open Data Source Import for the JValue ODS

Abstract: Open Data has become more popular in the last few years due to its value to society. Governments, institutions, companies or individuals can make use of Open Data and add to economic growth or extract new knowledge from publicly available data. The Open Data Service (ODS) is a software developed by the Professorship of Open Source that aims to simplify the consumption of Open Data and make it more reliable.

The goal of this thesis is to extend the functionality of the ODS by the support of hierarchically structured data sources, in particular, File Transfer Protocol (FTP) based data sources. Due to the simplicity and reliability of the FTP, it is an appropriate solution for providing Open Data. This thesis aims to enable the user to explore and configure FTP data sources by developing a new microservice with a proof-of-concept user interface. As a result, consuming Open Data from FTP data sources is simplified and becomes more flexible.

Keywords: Open data, FTP, JValue ODS, microservices

PDF: Master Thesis

Reference: Benjamin Fischer. Hierarchical Open Data Source Import for the JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Fehlertoleranzanalyse von Microservice basierten Softwarearchitekturen – Konzept und Anwendung am JValue ODS

Abstract: Microservice-based software architectures play an essential role in building sizeable scalable cloud systems. The main advantage of microservices compared to the traditional software monoliths is the independent development, deployment, and scaling of the individual microservices, which allows innovations at a higher speed. Because microservice-based architectures are distributed systems, complexity is shifted from code to the network and communication layer. Therefore, additional failures like service outage or network connectivity loss arise, which must be tolerated to keep the system healthy and running. Within this thesis, a reusable concept is developed to analyse the fault tolerance of microservice-based software architectures. This allows for revealing weaknesses in the architecture that negatively affects the system’s reliability and resilience. For frequent problems, solution proposals are provided. The concept’s applicability and effectiveness are evaluated by applying it at the JValue Open Data Service (ODS). The analysis revealed several issues regarding the ODS’s fault tolerance, which could be fixed with the provided solutions.

Keywords: Microservices, fault tolerance, dependency graph, transactional outbox pattern

PDF: Master Thesis

Reference: Jonas Schüll. Fehlertoleranzanalyse von Microservice basierten Softwarearchitekturen – Konzept und Anwendung am JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.

Final Thesis: Design and Implementation of Parameterizable Data Import for the JValue ODS

Abstract: Governments have recognized that the publication of open data is of great economic and social value. Collecting and using this data is challenging because it is not always available in an easy to process format. Minimizing these challenges is the task of the JValue Open Data Service (ODS), a system that makes data consumption easy. Yet the location of a resource and the time of a data import is statically defined.

This thesis presents a concept how the ODS can be extended by parameterizable datasources and how the data import can be triggered manually. This addresses the challenge of rapidly changing data on the Internet and adapts the ODS in order to deal with the emerging problems. With parameterizable datasources it is viable to dynamically describe the location of resources. The possibility for manual data imports ensures that data is only retrieved when it is really needed. The design decisions and the implementation of these functionalities for the ODS are covered in this thesis.

Keywords: open data; etl; JValue ODS; RESTful APIs

PDF: Bachelor Thesis

Reference: Jens Wächtler. Design and Implementation of Parameterizable Data Import for the JValue ODS. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2020.