Publications
Here you find a list of my academic publications.
Insecure Ingredients? Exploring Dependency Update Patterns of Bundled JavaScript Packages on the Web
@ ICSE 2026
Reusable software components, typically distributed as packages, are a central paradigm of modern software development. The JavaScript ecosystem serves as a prime example, offering millions of packages with their use being promoted as idiomatic. However, download statistics on npm raise security concerns as they indicate a high popularity of vulnerable package versions while their real prevalence on production websites remains unknown. Package version detection mechanisms fill this gap by extracting utilized packages and versions from observed artifacts on the web. Prior research focuses on mechanisms for either hand-selected popular packages in bundles or for single-file resources utilizing the global namespace. This does not allow for a thorough analysis of modern web applications' dependency update behavior at scale. In this work, we improve upon this by presenting Aletheia, a package-agnostic method which dissects JavaScript bundles to identify package versions through algorithms originating from the field of plagiarism detection. We show that this method clearly outperforms the existing approaches in practical settings. Furthermore, we crawl the Tranco top 100,000 domains to reveal that 5% - 20% of domains update their dependencies within 16 weeks. Surprisingly, from a longitudinal perspective, bundled packages are updated significantly faster than their CDN-included counterparts, with consequently up to 10 times fewer known vulnerable package versions included. Still, we observe indicators that few widespread vendors seem to be a major driving force behind timely updates, implying that quantitative measures are not painting a complete picture.
From Constrictor to Serpent: Investigating the Threat of Cache Poisoning in the Python Ecosystem
@ GI Sicherheit 2026
Attacks on software supply chains are on the rise, and attackers are becoming increasingly creative in how they inject malicious code into software components. This paper is the first to investigate Python cache poisoning, which manipulates bytecode cache files to execute malicious code without altering the human-readable source code. We demonstrate a proof of concept, showing that an attacker can inject malicious bytecode into a cache file without failing the Python interpreter's integrity checks. In a large-scale analysis of the Python Package Index, we find that about 12,500 packages are distributed with cache files. Through manual investigation of cache files that cannot be reproduced automatically from the corresponding source files, we identify classes of reasons for irreproducibility to locate malicious cache files. While we did not identify any malware leveraging this attack vector, we demonstrate that several widespread package managers are vulnerable to such attacks.
Exploring the Susceptibility to Fraud of Monetary Incentive Mechanisms for Strengthening FOSS Projects
@ ARES 2025
Free and open source software (FOSS) is ubiquitous on modern IT systems, accelerating the speed of software engineering over the past decades. With its increasing importance and historical reliance on uncompensated contributions, questions have been raised regarding the continuous maintenance of FOSS and its implications from a security perspective. In recent years, different funding programs have emerged to provide external incentives to reinforce community FOSS' sustainability. Past research primarily focused on analyses what type of projects have been funded and for what reasons. However, it has neither been considered whether there is a need for such external incentives, nor whether the incentive mechanisms, especially with the development of decentralized approaches, are susceptible to fraud. In this study, we explore the need for funding through a literature review and compare the susceptibility to fraud of centralized and decentralized incentive programs by performing case studies on the Sovereign Tech Fund (STF) and the tea project. We find non-commercial incentives to fill an important gap, ensuring longevity and sustainability of projects. Furthermore, we find the STF to be able to achieve a high resilience against fraud attempts, while tea is highly susceptible to fraud, as evidenced by revelation of an associated sybil attack on npm. Our results imply that special considerations must be taken into account when utilizing quantitative repository metrics regardless whether spoofing is expected.
Analyzing the Potency of Pretrained Transformer Models for Automated Program Repair
@ SEAA 2024
Manually finding and fixing bugs is cumbersome work, which consumes valuable resources in the software development cycle. In this work, we examine the capability of pretrained transformer models to tackle the task of automated program repair. Previous research has been focused on inherently different machine learning architectures for solving this use case. Our contributions include a novel dataset for fine-tuning the models, the introduction of a windowing technique augmenting the pretrained model and the evaluation on the commonly used Defects4J benchmark along with an ablation study. The findings demonstrate that leveraging our dataset leads to enhanced model performance surpassing Bugs2Fix. Our model enhancements significantly boost overall performance, enabling resulting models to achieve parity with the current state of the art by fixing 30 bugs in 27 minutes on Defects4J. This shows that pretrained transformers are promising for the task of automated bug fixing and should be considered by future research. However, similar to the existing state-of-the-art solutions, the performance still needs be improved to provide practical benefits to end users.
SoK: Automated Software Testing for TLS Libraries
@ ARES 2024
Reusable software components, typically integrated as libraries, are a central paradigm of modern software development. By incorporating a library into their software, developers trust in its quality and its correct and complete implementation. Since errors in a library affect all applications using it, there is a need for quality assurance tools such as automated testing that can be used by library and application developers to verify the functionality. In the past decade, many different systems have been published that focus on the automated analysis of TLS implementations for finding bugs and security vulnerabilities. However, all of these systems focus only on few TLS components and lack a common analysis scenario and inter-approach comparisons. Especially, the amount of manual effort required across the whole analysis process to obtain the root cause of an error is often ignored. In this paper, we survey and categorize literature on automated testing approaches for TLS libraries. The results reveal a heterogeneous landscape of approaches with a trade-off between the manual effort required for setup and for result interpretation, along with major deficits in the considered performance metrics. These imply important future directions to advance the current state of protocol test automation.
TEEM: A CPU Emulator for Teaching Transient Execution Attacks
@ GI Sicherheit 2024
Side channel attacks have been an active field of attacker research for decades. The Spectre, Meltdown and Load Value Injection publications established a new type of attacks, known as transient execution attacks, which utilize that architectural rollbacks leave traces in microarchitectural caches and buffers. These can serve as covert channels, resulting in practically relevant but hard to prevent attack scenarios. The associated weaknesses are complex, which makes it hard for security researchers to detect them and even harder for developers to prevent them. To achieve advancements in this field it is important to teach students about the underlying concepts. However, the documentation of modern CPUs is neither complete nor correct, which increases difficulties in obtaining practical experience. As a result, there is a need for a CPU emulator that facilitates practical learning with options for looking inside the box. We contribute TEEM, a Transient Execution EMulator of a RISC-V CPU supporting several microarchitectural features relevant for teaching transient execution attacks. Our empirical teaching experiences clearly indicate an improvement in the student’s understanding of Meltdown and Spectre.
On the Feasibility of Detecting Non-Cooperative Wi-Fi Devices via a Single Wi-Fi-Router
@ IPIN 2023
Detecting intruding devices using Wi-Fi based indoor positioning systems running on commodity Access Points (APs) makes demands on both compatibility with available hardware as well as not depending on the intruding device to cooperate with the system. In this paper, we examine the feasibility of detecting non-cooperative Wi-Fi devices with a single AP reliably and whether available hardware in affected homes is sufficient in carrying out the task. Commonly, indoor positioning systems require non-trivial setups with specifically tailored hard- and software, as the aspiration is to maximize precision for which the devices to be located will actively assist, which we cannot rely upon. First, criteria are derived that help identify indoor positioning systems suitable in our use case, specifically Channel State Information (CSI)-based approaches. These systems are then evaluated on both compatibility with commodity hardware and accuracy by conducting experiments on available devices. We show that despite promising premises the commodity hardware landscape is insufficiently supporting such a system for widespread use and that the one compatible router we found can not detect an intruding Wi-Fi device accurately enough even in favourable conditions.
Analyzing the Feasibility of Privacy-Respecting Automated Tracking of Devices Fleeing a Burglary
@ WiMob 2023
Criminals bringing their mobile devices to a burglary and subsequently fleeing the crime scene may be tracked and re-identified by digital traces left behind by their devices, e.g., by utilizing WiFi routers listening to their communication. Currently, it is unclear how a sensor network for automated tracking of these fleeing devices should be designed, especially considering a privacy-respecting operation. While various WiFi device tracking techniques are well researched for indoor applications, tracking non-cooperative devices using a sensor network through a town or city is a unique use-case that is yet to be examined. In this paper a tracking simulator is implemented which allows for multiple parameters of both the fleeing device as well as the sensors to be thoroughly tested and its tracking performance based on these parameters to be evaluated. For this, different metrics such as device activity, detection events, and path reconstruction are gathered by running the simulation using different sensor activation strategies with the goal to have as many sensors as needed but as few as possible active. Through this approach, the device tracking success is evaluated while having minimal infringement on the privacy of devices not being involved with the criminal activity.
Datasets
Replicability is extremely important for real scientific progress. I try to provide data and source code whenever licensing allows it. If you are missing any data or code for a publication or struggle to use them, please message me.
Replication Data for "Insecure Ingredients? Exploring Dependency Update Patterns of Bundled JavaScript Packages on the Web"
Replication Software for "Insecure Ingredients? Exploring Dependency Update Patterns of Bundled JavaScript Packages on the Web"
Replication Data for: "From Constrictor to Serpent: Investigating the Threat of Cache Poisoning in the Python Ecosystem"
Replication Code for: "From Constrictor to Serpent: Investigating the Threat of Cache Poisoning in the Python Ecosystem"