SARATHI : characterization study on regression bugs and identi cation of regression bug inducing changes: a case-study on Google chromium project

Khattar, Manisha; Sureka, Ashish (Advisor)

SARATHI : characterization study on regression bugs and identi cation of regression bug inducing changes: a case-study on Google chromium project

Khattar, Manisha; Sureka, Ashish (Advisor)

URI: https://repository.iiitd.edu.in/jspui/handle/123456789/198

Date: 2014-10-16

Abstract:

Software regression bugs are de ned as defects which occur, when a previously working software feature or functionality stops behaving as intended. One of the reasons for regression bugs is code changes or system patching which leads to unexpected side e ects. Running a test suite to validate the new features getting added and faults introduced in previously working code, after every change is impractical. As a result, by the time an issue is identi ed and reported a lot of changes are made to the source code, which makes it very di cult for the developers to nd the regression bug inducing change. A bug xer has to go through several suspected revisions before being able to locate actual regression causing revision. Thus nding regression bug inducing change is a non trivial and challenging problem. We rst conduct an in-depth characterization study of regression bugs by mining issue tracking system dataset belonging to a large and complex software system i.e. Google Chromium Project showing: priority, number of comments, closure-time distribution and opening and closing trend analysis and quality of bug xing process for regression bugs in comparison to crash, performance and security bugs. We also de ne a metric which computes the quality of bug xing process for one type of bug report in comparison to the quality of bug xing process for other types of bug reports. We present our results thus obtained using several visualisations. We then describe our character n-gram based solution approach for nding the regression bug inducing change. We mine existing issue reports and log messages of regression bugs for establishing ground truth dataset for our model using several heuristics. We extract several features of regression causing revisions from training dataset that then are used in building the model for predicting the regression inducing revision. 78% of regression issues are reported within 20 days after the revision causing them was committed and almost 3073 revisions are made in around 20 days before the reporting timestamp of the issue. So a bug xer will have to look back and analyze several hundreds or thousands of revisions for nding the culprit. Our aproach provides a bug xer with Top K revisions(in decreasind order of similarity score) that are suspected of having regressed an issue. We implemented the proposed IR model and evaluated its performance on our test dataset. We demonstrate the e ectiveness of character n-gram based textual features for identifying regression causing revision as there is high textual similarity between title,description of issue reports and log message of regression inducing revision and tite,component of regressed issue and paths of modi ed les its corresponding regression inducing revision. We conduct a series of experiments to validate and demonstrate e ectiveness of our proposed approach. We nd that for 60% of issues the actual regression causing revision was part of Top K(K=75) recommended revisions.

Show full item record