Born to be geek! -- Ucdavis2023-01-05T18:15:16+01:00http://herraiz.org/blog/tags/ucdavis/Israel Herraizisra@herraiz.orgThe impact of bias in bug-fix datasets for defects prediction2012-04-15T00:00:00+02:00http://herraiz.org/blog/2012/04/15/the-impact-of-bias-in-bugfix-datasets<p>
Last week I gave <a href="http://seminars.cs.ucdavis.edu/?type=1&when=past&talkid=263">a talk at UC Davis</a> about the research work I will be
doing <a href="http://herraiz.org/blog/2012/04/02/visiting-uc-davis/">during these months</a>. It contains some preliminary results about
the impact of bias in bug-fix datasets.
</p>
<p>
In projects with bug tracking systems and version control
repositories, when a commit corresponds to a bug fix, it is usually
marked accordingly (for instance, with a message like "Fixes
bug #123"). This information can be used to recover the relation
between commits and bugs, which is useful for defects prediction. The
preliminary results I have obtained so far, show that the impact of
bias is negligible for defects prediction if the model is based on a
binary classifier (that is, only predicts whether an entity will
contain or not defects, not how many defects it will
contain). However, it is true that a non-biased dataset can provide a
better accuracy, but just because, by definition, <i>non-biased datasets
contain more data</i>. If we reduce the size of a non-biased dataset, by
extracting a random sub-sample, it is as good as a biased dataset of
the same size. Well, at least for the two cases I have studied so far.
</p>
<p>
More details in the slides. You can also <a href="http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets">see the slides at
Slideshare.net, and get a PDF copy</a>.
</p>
<div style="width:425px" id="__ss_12494328">
<iframe src="http://www.slideshare.net/slideshow/embed_code/12494328" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
</div>