[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
COMP2100/2500
Lecture 14: Static Code AnalysisSummary
Introduction to static analysis of code, concentrating on formal verification and code reviews.
Aims
Explain the important role of static analysis
Give a basic understanding of what formal verification are about
Describe the process of code reviews and the benefits to be gained from including them in the software development process.
Code Analysis
- Dynamic Code Analysis:
Software is run on test data and its behaviour analysed.
Specification Testing (black box)
Structural Testing (white box)
Integration Testing
Acceptance Testing
- Static Code Analysis:
Software is examined without being run.
Formal Verification
Static Analysis Tools
Code Inspection
None of these is the solution. We need a combination.
Why Do Static Code Analysis?
You don't need a finished program to do static analysis (so you can find and fix faults earlier when it is cheaper).
It is more effective at finding faults than dynamic analysis (testing).
Formal verification offers significantly higher product confidence than can be achieved with testing.
Why Do Dynamic Code Analysis?
Static analysis only checks what people think the code will do against what they think it should do.
Dynamic analysis checks what the code actually does.
Customers can judge what the code does against what they wanted.
Formal Verification
Formal verification is a proof that a program meets its specification. (But not necessarily that the specification describes what the user wanted. Remember the saying about building the software right vs. building the right software.)
Formal verification
is hard to do
requires specially trained staff
is expensive to apply
But...
real systems are proved
many new graduates have appropriate background (especially if they do COMP2600 in second semester)
failure is often even more expensive
Today, formal verification is largely reserved for
research projects
life critical systems
security critical systems
hardware
Practical Formal Verification Concepts
These include:
preconditions
postconditions
loop variants
loop invariants
class invariants
Eiffel allowed all of these to be made executable and verified at run-time. Java doesn't have these features built in to the language, but it is possible to add them. See the way pre- and post-conditions are implemented in the Oops system using the Assert class.
Example: Verifying Loops
from S -- Setup Code invariant I -- Invariant variant v -- Variant until C -- Stopping Condition loop B -- Loop Body endConsider the contract for the loop.
What does the loop require? In other words, what must be true at the start if this loop is to be able to do its job correctly? This is the precondition (P).
What must the loop ensure? In other words, what must be true at the end if the loop has done its job correctly? This is the postcondition (Q).
Choose S, I, v, C and B such that
The setup code establishes the invariant and ensures the variant is not negative.
{P} S {I and v >= 0}The loop body preserves the invariant and ensures the variant has decreased but remains non-negative.
({I and not C} B {I and v >= 0 and v < old v}The invariant and the stopping condition imply the postcondition.
(I and C implies Q).This will ensure that your loop code always establishes the postcondition. In other words, it proves the assertion
{P} loop {Q}
Example: Find the smallest element in an array
- Precondition:
a is nonempty
- Postcondition:
s is the smallest element in a
In Eiffel:
from i := a.lower s := a.item (i) invariant -- s is the smallest element in the set -- {a.item (a.lower), ..., a.item (i)} variant a.upper - i until i = a.upper loop i := i + 1 s := s.min (a.item (i)) endThe same thing in Java:
i = 0; s = a[0]; while (i < a.length - 1) { /* * Invariant: s is the smallest element in the set * {a[0], a[1], ..., a[i]} * Variant: a.length - i - 1 */ i++; s = Math.min(s, a[i]); }Notice that the syntax makes it less obvious. Even less so with a for loop, I think. The point is that the Eiffel loop syntax was designed with correctness proofs in mind.
Example: Find the first occurrence of t in a
- Precondition:
Nothing
- Postcondition:
If i is a valid index into a then it is the index of the first instance of t in a.
In Eiffel:
from i := a.lower invariant -- There does not exist j such that -- a.lower <= j < i and a.item (j) = t variant a.upper + 1 - i until i = a.upper + 1 or else a.item (i) = t loop i := i + 1 endThe same thing in Java:
i = 0; while (i < a.length && a[i] != t) { /* * Invariant: There is no j with 0 <= j < i and a[j] = t * Variant: a.length - i */ i++; }
Static Analysis Tools
The compiler will check your code for syntactic errors, but other voluntary style constraints can be checked statically:
use of header comments
naming conventions on identifiers
initialisation of variables
unused variables
size and complexity of code
consistency of units usage (speed := speed + acceleration * time)
The ‘lint’ program for C is the best known static analysis tool. I use ‘weblint’ to check the HTML code in these notes.
Code Reviews
(See Sections 13.4 - 13.9 and Chapter 14 of Introduction to the Personal Software Process.)
What is a code review?
When you sit down, either alone or in a small group, and read carefully through every line of a piece of code, looking for and recording defects.
This is a new phase to add to the PSP, between the Code and Compile phases.
Advantages of code reviews
They are up to five times more efficient than testing at finding defects. Typically code reviews find 6 - 10 defects per hour, while testing only finds 2 - 4 defects per hour.
You get to find defects early, as soon as possible after they were injected into the program.
Disadvantages of code reviews
They are time-consuming.
They are difficult to do well (and there's not much point in doing them badly).
Typically code reviews are skipped when projects come under pressure. This is a false economy.
How to do a code review
Print out the program code and review it on paper, not on the computer screen.
Look at every line of code.
(In team code reviews, have one member of the team read every line of code aloud. You can even do this when reviewing your own code - it helps.)
Use a standard checklist to make sure that you don't forget anything.
Follow the instructions in Chapter 14 of Introduction to the PSP to develop your own personal Code Review Checklist, based on analysis of your defect log. The idea is to know what kind of mistakes you usually make, and look for those.
You will be doing this in Lab 7.
Don't just check for mistakes in the code that's there; also check that the code does what it is supposed to do, and that it does all it is supposed to do. (This means reviewing it against the requirements and the design.)
Do the code review between the Code and Compile phases.
Why find defects early?
Because the cost of finding and correcting a defect goes up by a factor of about 10 for each additional phase that the defect remains in the system.
“Once a lemon, always a lemon.”
Defect removal methods work like a filter, usually removing a certain percentage of defects from a system. For example, unit tests generally find around half of all defects; later system testing generally finds only 30-40% of remaining defects. So the more defects there are in the system at the start of testing, the more there will be at the end.
Typical defect injection and removal rates
pre-PSP Post-PSP inject 100 / KLOC 50 / KLOC remove in code review 0 60% remove in compile 50% 50% remove in unit test 80% 80% defects remaining 10 / KLOC 2 / KLOC Typical times taken: For code review 10 defects per hour; for compile, 10-12 defects per hour; for unit test 2 defects per hour; for integration and system testing, at least 10 hours per defect.
Defect rates and cost examples
1. Student assignment: 500 LOC
Without PSP With PSP Defects Total injected 50 25 Found in code review 0 15 Remaining 50 10 Found in compile 25 5 Found in unit test 20 4 Defects remaining 5 1 Time Code review 0 2.5 Compile 2 0.5 Unit test 10 2 Total defect removal time 12 5 2. Individual project: 10,000 LOC
Without PSP With PSP Defects Total injected 1,000 500 Found in code review 0 300 Remaining 1,000 200 Found in compile 500 100 Found in unit test 400 80 Defects remaining 100 20 Time Code review 0 50 Compile 40 10 Unit test 200 40 Total defect-removal time 240 100 Those remaining defects passed to the user would typically take about 10 hours each to remove. That's 1,000 hours versus 200 hours: a saving of another 800 hours.
3. Team project: 50,000 LOC, five-member team.
Each member develops 10,000 LOC component as above, then they integrate them to form a complete system. The non-PSP team have approximately 500 defects remaining in their system, while the PSP team has 100.
Without PSP With PSP Time Integration and system test 5,000 1,000 How long does it take 5 people to do 5,000 hours work? With the whole team doing nothing else for 40 hours per week, that's 25 weeks, or just under six months. The PSP team have only 1,000 hours to do, or 5 weeks' full-time work. Taking into account the savings on the individual phase (140 hours, three and a half weeks) they will deliver their product at least five months earlier.
[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
Copyright © 2005, Jim Grundy and Ian Barnes, The Australian National University
Version 2005.3, Tuesday, 26 April 2005, 00:00:11 +1000
Feedback & Queries to
comp2100@cs.anu.edu.au