Analysis of Ignored Patches in the Linux Kernel Development

Type: Master thesis
State: finished
Supervisor: Ralf Ramsauer
Student: Sebastian Duda
Submission date: 20. Dec 2019

In Kooperation mit der Friedrich-Alexander Universität Erlangen

Abstract

The importance of Linux in industry increases continuously with the ascending variety of products running Linux based software. Use cases are not only limited to consumer applications but also high-performance computer and business-critical control technology. The usage of software in critical application requires a certification (e.g. IEC 61508), which demands the documentation of and the compliance with the software development process. As a result, the analysis of the Linux kernel development process becomes equally more valuable. The Linux kernel development process is well understood, documented, and researched. However, some actions of contributors or maintainers do not comply with the process. One of these actions is ignoring patches. Ignoring a patch means that the patch is neither answered nor accepted by the developers. There is no analysis of this phenomenon yet. In this thesis, we conducted statistical analyses of the ignored patch phenomenon to answer our research questions. In the analyzed time frame (release v3.0 to v4.20), 18k of 792k patches (2.3%) were ignored. The ratio of ignored patches decreased over time. We detected two clusters (minor contributions, and automatically created patches) of patches that make up most of the ignored patches. There were no other statistically significant abnormalities of the ignored patches. Based on the analyses, we were able to show that there are indicators of discrimination against certain groups. We further recognized the trend that larger subsystems and lists are ignoring relatively fewer patches. To conduct the analyses, we created a dataset of the patches sent to the Linux kernel. We published the dataset for further analyses. The dataset is extracted from the available mailing lists and Torvalds’ git repository. We conducted spot tests to validate the correctness and integrity of our dataset. There is a small volume spot test for trivial measurements like the size of a patch. Besides, there is a high volume spot test to test our developed is-ignored metric.