Mining Software Repositories to Analyze the Multivariate Collateral Evolution in Database Applications


Databases are elementary components for a wide variety of software systems. The fast advance of technology and changing user-demands require database schemas to evolve constantly. This schema evolution was extensively studied within research. As schemas do not evolve in isolation, additional challenges related to the co-evolution of schema and code are inevitable. Several applications support multiple alternative database systems (DB variants). The existence of vendor-specific SQL dialects leads to yet another dimension of complexity: multiple, conceptually equivalent DB-variant schemas have to be evolved together. The simultaneous evolution of both, database schemas and program code for all DB variants, was addressed only recently as the phenomenon of multivariate collateral evolution (MCE). This work provides the first formalized and automated methodology of mining software repositories (MSR) to analyze the MCE. Here, a function-based and project specific approach is provided to detect co-changes relevant for the MCE at a fine-grained level. Additionally, the approach of lag coupling analysis is introduced to detect co-changes after a certain delay (lag). To get initial empirical insights into this phenomenon, a case study based on the evolution histories of eight popular open source projects is conducted. In the context of this, the reproducibility of an empirical study on the co-evolution of schema and code is evaluated within a reproduction experiment. The results of the case study show that the examined projects are indeed affected by MCE at some level, while each of them copes with the challenges attached to it in a unique way. Therefore, the phenomenon of MCE is a real-world problem of high relevance. The results of the reproduction experiment underlines that the reproducibility of empirical studies in software engineering still remains an open challenge, which has to be tackled in future development.