CIS 547 - Lectures

Module 1: Introduction to Software Analysis

This module will introduce you to the domain of Software Analysis – interchangeably called Program Analysis – and its role in checking correctness properties of programs. You will learn about a general class of properties, called program invariants, and how they can be checked using static and dynamic analyses. Then, you’ll learn a step-by-step process of designing a static analysis to check program invariants. Through different examples, you will witness how this process achieves its objective by abstractly evaluating programs.

Additionally, this module covers fundamental tradeoffs in program analysis. Based on these tradeoffs, program analyses are characterized along two dimensions: Soundness and Completeness. You’ll learn how to calculate precision, recall, and F-measure of program analysis that will enable you to measure its accuracy. You will also learn how the Undecidability of program correctness properties necessitates these tradeoffs. In the end, you’ll learn about the primary consumers of program analysis and their examples.

Module 2: The LLVM Framework

Module 3: Software Specifications

This module introduces you to Software Specifications and their role in software testing. You will learn about the landscape of testing methods and how program analysis can be used to automate testing. You’ll learn about different kinds of specifications, including pre- and post-conditions and loop- and class- invariants, and how they help improve program reliability. You’ll then learn how to automatically infer such specifications using the Houdini algorithm and how to measure the quality of a test suite to make it more robust.

Module 4: Random Testing

This module introduces the concept of Random Testing and describes its evolution over three generations from its inception to today’s sophisticated fuzzers. You will learn about general-purpose fuzzers, their strengths and limitations, and how to effectively apply them to uncover crashing bugs and security vulnerabilities. You’ll also learn how the random testing paradigm is adapted to test programs in two important domains, mobile apps and multi-threaded programs, and how it can provide a probabilistic worst-case guarantee on finding concurrency bugs.

Module 5: Delta Debugging

This module introduces Delta Debugging – a debugging technique that automates the task of minimizing a complex crashing test-case to help localize the cause of the program failure. Starting with an iterative binary search technique that cuts test-cases into half, you’ll learn a formal framework to shrink test-cases, culminating into the Delta Debugging Minimization algorithm. The running time and minimality guarantee provided by the algorithm are discussed. In the end, you’ll learn about applications of this technique to diverse debugging tasks in practice.

Module 6: Statistical Debugging

This module introduces Statistical Debugging – a debugging technique that uses data collected from program runs by the user base to identify and fix bugs. Throughout this module, you’ll learn what data is worth collecting from the user base, how to collect this data, and then how to find bugs by analyzing the data. You’ll learn statistical measures and algorithms for isolating bugs that affect the user base.

Module 7: Dataflow Analysis - Part I

This module delves into Dataflow Analysis – a popular type of static analysis widely used in compilers and software quality tools. It introduces the WHILE language and the control-flow graph representation of WHILE programs on which dataflow analyses operate. Two classic dataflow analyses are presented: Reaching Definitions Analysis and Very Busy Expressions Analysis. The similarities and differences of the two analyses are discussed.

Module 8: Dataflow Analysis - Part II

This module introduces the two of the remaining classic dataflow analyses – Available Expressions Analysis and Live Variables Analysis. Then you’ll learn about the overall pattern followed by the four classic dataflow analyses. The module also delves into Interval Analysis, a modern dataflow analysis with many applications to finding security bugs. In the end, you’ll learn about the concept of widening to ensure termination of interval analysis.

Module 9: Pointer Analysis

This module introduces the concept of Pointer Analysis – a form of dataflow analysis that reasons about the flow of pointers. A popular pointer analysis called Andersen’s algorithm and the points-to graph representation is described. Different dimensions for classifying pointer analyses based on accuracy and cost are presented. A more scalable but less precise pointer analysis called Steensgaard’s algorithm is introduced. In the end, an important security application of pointer analysis, called Control Flow Integrity, is discussed.

Module 10: Constraint-Based Analysis

In this module, you’ll learn about Constraint-Based Analysis, a popular approach to program analysis, and its benefits over the approaches studied earlier. You’ll learn a constraint language called Datalog and how it is used to specify intra-procedural and inter-procedural dataflow analyses. The module will also describe tradeoffs made in context-insensitive and context-sensitive inter-procedural analysis. In the end, you’ll learn about other constraint languages, SAT and SMT, and their applications to program analysis.

Module 11: Automated Test Generation

This module introduces the concept of automating test generation for units of code. The presented techniques guide test generation by leveraging different kinds of program specifications, such as types, invariants, and pre- and post-conditions. You’ll learn two orthogonal but complementary approaches: Randoop and Korat. Randoop is designed to test classes and libraries while Korat is used to test data structures.

Module 12: Type Systems - Part I

This module will introduce you to the most widely used form of static analysis known as Type Systems. You’ll learn a core language, the Lambda Calculus, and notation for specifying a type system for the language in the form of rules. You will then use the rules to determine whether a program is accepted or rejected by the type system. In the end, you’ll learn about soundness and completeness properties of type systems.

Module 13: Type Systems - Part II

This module elucidates connections between type systems and other forms of static analysis. It sets out by showing how to perform Type Inference using constraint solving and thereby alleviate annotation burden on programmers. It then demonstrates how other forms of static analyses can be described using the notation of type systems. You will learn how to describe three broad classes of static analyses – flow-insensitive, flow-sensitive, and path-sensitive – using type rules and understand their strengths and tradeoffs.

Module 14: Dynamic Symbolic Execution

This module introduces a technique for automated test generation called Dynamic Symbolic Execution (DSE). DSE is an example of a hybrid analysis: it collaboratively combines static analysis and dynamic analysis. You’ll learn how it outperforms both random testing, which is based on purely dynamic analysis, and symbolic execution, which is based on purely static analysis. You’ll also learn about the effectiveness of the DSE approach in testing various real-world applications.