Abstract:
Bug or Fault localization is a process of identifying the speci c location(s) or region(s) of
source code (at various granularity levels such as the directory path, le, method or state-
ment) that is faulty and needs to be modi ed to repair the defect. Bug localization is a
routine task in software maintenance (corrective maintenance). Due to the increasing size
and complexity of current software applications, automated solutions for bug localization
can signi cantly reduce human e ort and software maintenance cost.
We presented a technique (which falls into the class of static techniques for bug localiza-
tion) for bug localization using a character N-gram based Information Retrieval (IR) model.
We framed the problem of bug localization as a relevant document(s) search task for a given
query and investigated the application of character-level N-gram based textual features de-
rived from bug reports and source-code le attributes. We implemented the proposed IR
model and evaluated its performance on dataset downloaded from two popular open-source
projects (JBOSS and Apache).
We conducted a series of experiments to validate our hypothesis and presented evidences
to demonstrate that the proposed approach is e ective. The accuracy of the proposed ap-
proach is measured in terms of the standard and commonly used SCORE and MAP (Mean
Average Precision) metrics for the task of bug localization. Experimental results reveal that
the median value for the SCORE metric for JBOSS and Apache dataset is 99.03% and 93.70%
respectively. We observed that for 16.16% of the bug reports in the JBOSS dataset and for
10.67% of the bug reports in the Apache dataset, the average precision value (computed at
all recall levels) is between 0.9 and 1.0.