Software Fault Prediction using Language Processing Dave Binkley Henry Field Dawn Lawrie Maurizio Pighin Loyola College in Maryland Universita degli Studi di Udine What is a Fault?

Problems identified in bug reports Bugzilla Led to code change And Fault Prediction? Metrics Source code

Fault Predictor ignore consider

Ohh look at! Old Metrics Dozens of structure based Lines of code Number of attributes in a class Cyclomatic complexity Why YAM? (Yet Another Metric)

1. Many structural metrics bring the same value Recent example Gyimothy et al. Empirical validation of OO metrics TSE 2007 Why YAM? 2. Menzies et al. Data mining static code attributes to learn defect predictors.

TSE 2007 Why YAM? -- Diversity [the] measures used [are] less important than having a sufficient pool to choose from. Diversity in this pool is important. Menzies et al. New Diverse Metrics

SE SE IR IR Nirvana Use natural language semantics (linguistic structure)

QALP -- An IR Metric SE SE QALP QALP Nirvana What is a QALP score?

Use IR to `rate modules Separate code and comments Stop list -- an, NULL Stemming -- printable -> print Identifier splitting go_spongebob -> go sponge bob tf-idf term weighting [ press any key ] Cosine similarity [ again ]

tf-idf Term Weighting Accounts for term frequency - how important the term is a document Inverse document frequency - how common in the entire collection High weight -frequent in document but rare in collection

Cosine Similarity = COS ( ) Document 1 Football Document 2

Cricket Why the QALP Score in Fault Prediction High QALP score (Done) High Quality Low Faults

Fault Prediction Experiment QALP LoC / SLoC Source code Fault Predictor ignore

consider Ohh look at! Linear Mixed-Effects Regression Models Response variable = f ( Explanatory variables)

In the experiment Faults = f ( QALP, LoC, SLoC ) Two Test Subjects Mozilla open source 3M LoC 2.4M SLoC MP proprietary source 454K LoC 282K SLoC

Mozilla Final Model defects = f(LoC, SLoC, LoC * SLoC) Interaction R2 = 0.16 Omits QALP score MP Final Model defects = -1.83 + QALP(-2.4 + 0.53 LoC - 0.92 SLoC) + 0.056 LoC - 0.058 SLoC

R2 = 0.614 (p < 0.0001) MP Final Model defects = -1.83 + QALP(-2.4 + 0.53 LoC - 0.92 SLoC) + 0.056 LoC - 0.058 SLoC LoC = 1.67 SLoC (paper includes quartile approximations) defects = + 0.035 SLoC

more (real) code more defects MP Final Model defects = -1.83 + QALP(-2.4 + 0.53 LoC - 0.92 SLoC) + 0.056 LoC - 0.058 SLoC Good when coefficient of QALP < 0 Interactions exist

Consider QALP Score Coefficient (-2.4 + 0.53 LoC - 0.92 SLoC) Again using LoC = 1.67 SLoC QALP(-2.4 - 0.035 SLoC) Coefficient of QALP < 0 Consider QALP Score Coefficient (-2.4 + 0.53 LoC - 0.92 SLoC) QALP-score coefficient

Graphically 2000 1800 1600 1400 LOC 1200

1000 800 600 400 200 0 5 155 305

455 605 755 SLOC 905 1055 1205 1355 1505 Good News! Interesting range

coefficient of QALP < 0 Ok I Buy it Now What do I do? (not a sales pitch) High LoC more faults

Refractor longer functions Obviously improves metric value Ok I Buy it Now What do I do? (not a sales pitch)

But, High LoC more faults Join all Lines Obviously improves metric value But faults? Ok I Buy it Now What do I do?

But, High QALP score fewer faults Add all code back in as comments - Improves score Ok I Buy it

Now What do I do? High QALP score fewer faults Consider variable names in low scoring functions. Informal examples seen Future Refractoring Advice Outward Looking Comments

Comparison with external documentation Incorporating Concept Capture Higher quality identifiers are worth more Summary Diversity IR based metric Initial study provided mixed results

Question? Ok I Buy it Now What do I do? The Neatness metric pretty print code lower edit distance higher score

Recently Viewed Presentations

  • Slide 1

    Slide 1

    Jei manote, kad reikia, galite įkelti dokumentus, susijusius su projekto aprašymu. Pastabose pateikta informacija, kaip į pateiktį įkelti dokumentą. Projektas "Žmonijos įvairovė" yra sukurtas pagal iTEC projekto scenarijų "Mokiniai kuria gamtos mokslų mokymosi išteklius".
  • PowerPoint Presentation

    PowerPoint Presentation

    Partner Document Analysis:With a partner, analyze the quote on mercantilism by the famed Austrian theorist Philipp Wilhelm von Hornick. Think about what mercantilism might be based on the quote. After doing this, brainstorm two positive and two negative results of...
  • Solving Equations - Mr Barton Maths

    Solving Equations - Mr Barton Maths

    mr barton maths .com. Venn Diagrams. About Venn Diagrams. I began to see the huge potential for the use of Venn Diagrams as a rich task from my constant source of inspiration - the amazing Median Maths Blog, by Don...
  • 4.3 &quot;FUN&quot; damentals of Biomechanics

    4.3 "FUN" damentals of Biomechanics

    4.3"FUN"damentals of Biomechanics. ... Moment of Inertia- How much Torque is needed to cause rotation. It increases as center of mass gets farther from the axis. The force needed to cause a rotational movement goes by the name .
  • PowerPoint Presentation

    PowerPoint Presentation

    Both an anti-oxidant and pro-oxidant2. Body unable to synthesize2. High zinc may cause copper deficiency3. What dose was studied? 2 mg/day (AREDS)4. ... INCREASE L/Z foods in patients diet-if not, then supplement. OMEGA 3's have value, but NOT for AMD.
  • Friendship With GOD Name Lay Person Kairos -

    Friendship With GOD Name Lay Person Kairos -

    God sent these men and used their friendship with Him to spread His word. He finally sent His son, Jesus, the Christ Friendship With GOD YOU ARE WORTHY TO BE GOD'S FRIEND! Jesus is God. He and the Father are...
  • HEAP Draft Timeline

    HEAP Draft Timeline

    5 percent is the MINIMUM, there is nomaximum. The HEAP team encourages and expects local CoCs and large cities to work with youth advocates and youth service providers to determine the most appropriate services for this targetpopulation.
  • -nt -nd -nk

    -nt -nd -nk

    ie=/e/ cei=/e/ ei=/e/ ei=/a/ neighbor receive seize niece thief weigh priest eighteen deceive ceiling sleigh shield conceit weird grief mischief receipt relieve yield either reign neither freight belief Syllables and Affixes Sort 54: Sounds for ie and ei * *