[Fuzzing book] 3. Code Coverage

Serendipity·2023년 9월 7일

Black-Box Testing CGI encoding Errors Detection White-Box Testing cgi_decode()code coverage

2023 LeSN

목록 보기

20/52

Code Coverage Summary:

Introduction
- Code coverage measures which parts of a program are executed during a test run.
- This measurement is crucial for test generators to maximize code coverage.
CGI Decoder Explanation
- A brief about CGI encoding:
  - Used in URLs to encode invalid characters like blanks and punctuation.
  - Blanks are replaced by '+'.
  - Other invalid characters are replaced by '%xx', where 'xx' is the hexadecimal equivalent.
  - E.g., "Hello, world!" would become "Hello%2c+world%21".
- A cgi_decode() function is introduced:
  - Decodes CGI-encoded strings.
  - Based on code from [Pezzè et al, 2008], inclusive of its bugs.
  - Uses a dictionary hex_values to map hexadecimal characters to integer values.
  - Processes each character in the input string and replaces accordingly.

Function Implementation:

def cgi_decode(s: str) -> str:
    # ... [code as provided] ...
    return t

Example of its usage:

cgi_decode("Hello+world")  # Outputs: 'Hello world'

Testing Methods:
- Two methods: Black-box testing and White-box testing.
a. Black-Box Testing:
- Derives tests based on the specification of the function or system.
- Tests for:
  1. Correct replacement of '+'.
  2. Correct replacement of "%xx".
  3. Non-replacement of other characters.
  4. Recognition of illegal inputs.
- Four assertions given that test these four features, all of which pass.
b. White-Box Testing:
- Derives tests based on the implementation and internal structure.
- Coverage Criteria:
  1. Statement coverage: Every code statement must be executed at least once.
  2. Branch coverage: Every branch (if, while decisions) must be taken at least once.
- Advanced criteria can include sequences of branches, loop iterations, data flows, etc. as highlighted by [Pezzè et al, 2008].
- Using cgi_decode() as an example:
  1. Test for if c == '+'.
  2. Test two scenarios for if c == '%' - valid and invalid input.
  3. Test the final else case for all other characters.

Advantages and Disadvantages of White-box Testing:

Advantages:
- Finds errors in implemented behavior.
- Can be conducted when the specification lacks detail.
- Helps in identifying and specifying corner cases.
Disadvantages:
- May miss non-implemented behavior. If a specified functionality is not implemented, white-box testing won't detect it.

Tracing Executions:

White-box testing can assess whether a program feature was covered by instrumenting the program execution.
This instrumentation tracks which code was executed.
After testing, the information can guide programmers to areas not yet covered.
Python’s sys.settrace(f) function can define a tracing function f() that's called for every line executed, making it ideal for dynamic analysis.

Example: Tracing `cgi_decode()`:

Function: cgi_decode("a+b") returns 'a b'.
Tracing Setup:
- Use sys.settrace() to trace the execution of cgi_decode().
- The tracing function is called for every line executed, capturing which line numbers were hit.
Implementation:
- Define a global variable coverage to store line numbers that were executed.
- The traceit function captures the line numbers when the event is "line".
- Switch tracing on/off using sys.settrace().
Result: When tracing cgi_decode("a+b"), the lines of execution can be observed.
Coverage Analysis:
- Convert the coverage list to a set to see which lines were covered.
- Print out the function code, annotating lines not covered with #.

A Coverage Class for Better Management:

Purpose:
- The global variable approach is cumbersome.
- Use the with statement in Python for more elegant coverage tracking.

Usage:

with Coverage() as cov:
    function_to_be_traced()
c = cov.coverage()

Coverage Class Implementation:
- __init__: Constructor initializes a trace list.
- traceit: Tracing function that captures the function name and line number of every executed line.
- __enter__: Method called at the start of the with block; turns on tracing.
- __exit__: Method called at the end of the with block; turns off tracing.
- trace: Returns a list of executed lines as (function_name, line_number) pairs.
- coverage: Returns a set of executed lines.
- function_names: Returns the set of function names that were covered.
- __repr__: String representation of the object, showing covered and uncovered code.
Use: Implement the Coverage class for elegant coverage tracking.

Key Code Snippets:

Setting up trace:

sys.settrace(traceit)  # Turn on
cgi_decode(s)
sys.settrace(None)    # Turn off

Tracing function:

def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]:
    if event == 'line':
        global coverage
        function_name = frame.f_code.co_name
        lineno = frame.f_lineno
        coverage.append(lineno)
    return traceit

Coverage class setup:
```
class Coverage:
    ...
```

1. Interactive Use of Code Coverage

When printed, the coverage object provides a listing of the code with covered lines marked by a #.
The function cgi_decode() decodes CGI-encoded strings, replacing '+' with a space and '%xx' with the corresponding character.
Covered lines are represented using a #.

2. Comparing Coverage

Coverage is represented as a set of executed lines, which allows for set operations on them.
Individual test cases can be compared to find out which lines are covered by one but not the other.
Maximum coverage can be determined statically, or by executing known good test cases.
An example demonstrates the difference in coverage between two distinct test inputs.

3. Coverage of Basic Fuzzing

Coverage tracing can assess the effectiveness of testing methods, especially test generation methods.
The goal is to achieve maximum coverage for cgi_decode() using random inputs.
By repeatedly fuzzing with random inputs, the coverage of the function can be assessed over time.
A plot is created to show how coverage increases with the number of inputs. Averaged over multiple runs, it is observed that maximum coverage is typically achieved after 40–60 fuzzing inputs.

4. Getting Coverage from External Programs

The challenge of obtaining coverage is not unique to Python; other languages also have tools for measuring coverage.
A demonstration is provided for obtaining coverage in a C program:
- The C program cgi_decode decodes CGI-encoded strings.
- The C code, which includes the initialization of hex_values and the cgi_decode() function implementation, is presented.
- A driver in the C program gets the first argument and passes it to the cgi_decode() function.

Codes of Interest:
1. The Python function cgi_decode() for decoding CGI-encoded strings.
2. Set operations to compare coverage of different test cases.
3. Fuzzing the cgi_decode() function to gauge its coverage.
4. The C program's cgi_decode function and its associated routines for decoding CGI-encoded strings.

Code Coverage in .gcov File

.gcov files have each line prefixed with the number of times it was executed.
- '-' stands for non-executable lines.
- '#####' represents lines that were never executed.
Example section provided for the cgi_decode() function, indicating unexecuted code (return -1 for illegal input).

Code for Parsing .gcov Files

Python code is given to parse the .gcov file and retrieve coverage information.
- read_gcov_coverage function reads a .gcov file and constructs a set of tuples representing the file name and line numbers that were executed.

Errors Detection Using Fuzzing

Even if all lines in a function are covered, it doesn't guarantee the absence of errors.
An "oracle" or results checker is needed to verify test results.
- For cgi_decode(), one could compare the results from both C and Python implementations.
Fuzzing can discover internal errors that don't require checking the result.
The provided fuzzer() method reveals an error in cgi_decode() related to input ending with a '%' character.
- It leads to accessing characters out of string bounds, causing errors.
- The same error exists in the C variant of the implementation and has more severe consequences.

Lessons Learned

Coverage Metrics: These are automated ways to estimate the amount of functionality tested during a test run.
Popular Metrics: Key metrics include statement coverage and branch coverage.
Python's Advantage: Python makes it easy to access program state during execution.
The Limitation of Coverage: While coverage metrics are useful, they don't always capture all bugs. The given example demonstrates a scenario where the cgi_decode() function can crash due to unanticipated input, yet this bug wouldn't be caught by traditional coverage criteria.
The Power of Fuzzing: Simple fuzzing can reveal hidden bugs when combined with runtime checks.

Cleanup

Instructions provided to clean up after the tests, which includes deleting all files with the pattern cgi_decode.*.

Serendipity

I'm an graduate student majoring in Computer Engineering at Inha University. I'm interested in Machine learning developing frameworks, Formal verification, and Concurrency.