Static technique for bug localization using character N-Gram based information retrieval model

Sangeeta

dc.contributor.advisor	Surekha, Ashish
dc.contributor.author	Sangeeta
dc.date.accessioned	2012-03-14T10:36:50Z
dc.date.available	2012-03-14T10:36:50Z
dc.date.issued	2012-03-14T10:36:50Z
dc.identifier.uri	https://repository.iiitd.edu.in/jspui/handle/123456789/17
dc.description.abstract	Bug or Fault localization is a process of identifying the speci c location(s) or region(s) of source code (at various granularity levels such as the directory path, le, method or state- ment) that is faulty and needs to be modi ed to repair the defect. Bug localization is a routine task in software maintenance (corrective maintenance). Due to the increasing size and complexity of current software applications, automated solutions for bug localization can signi cantly reduce human e ort and software maintenance cost. We presented a technique (which falls into the class of static techniques for bug localiza- tion) for bug localization using a character N-gram based Information Retrieval (IR) model. We framed the problem of bug localization as a relevant document(s) search task for a given query and investigated the application of character-level N-gram based textual features de- rived from bug reports and source-code le attributes. We implemented the proposed IR model and evaluated its performance on dataset downloaded from two popular open-source projects (JBOSS and Apache). We conducted a series of experiments to validate our hypothesis and presented evidences to demonstrate that the proposed approach is e ective. The accuracy of the proposed ap- proach is measured in terms of the standard and commonly used SCORE and MAP (Mean Average Precision) metrics for the task of bug localization. Experimental results reveal that the median value for the SCORE metric for JBOSS and Apache dataset is 99.03% and 93.70% respectively. We observed that for 16.16% of the bug reports in the JBOSS dataset and for 10.67% of the bug reports in the Apache dataset, the average precision value (computed at all recall levels) is between 0.9 and 1.0.	en_US
dc.language.iso	en_US	en_US
dc.subject	Bug Localization	en_US
dc.subject	Mining Software repositories	en_US
dc.subject	Information Retrieval	en_US
dc.subject	Automated Software engineering	en_US
dc.title	Static technique for bug localization using character N-Gram based information retrieval model	en_US
dc.type	Thesis	en_US