Control Flow Graph (CFG) Wiki

Overview

A Control Flow Graph (CFG) models a program as a directed graph whose nodes are basic blocks (maximal straight-line instruction sequences with a single entry and a single exit) and whose edges represent possible transfers of control between those blocks. CFGs allow program structure to be analyzed without executing the program and form the substrate on which many static analyses, fuzzing heuristics, and reverse-engineering or obfuscation techniques are built.

Structure

In a CFG, each basic block corresponds to a contiguous sequence of instructions. Edges are derived from control-flow operations such as conditional branches, jumps, calls, and returns. In the processor-fuzzing literature, individual basic blocks of interest (e.g., b4) are identified inside the graph alongside their predecessors and successors when reasoning about unvisited code.

Construction and Tools

CFGs can be derived from source-level, intermediate, or binary representations. For hardware designs expressed at the RTL level, the cited processor-fuzzing work uses Pyverilog to parse the design and the Z3 SMT-solver as part of forming the CFG. For EVM bytecode, recovering an accurate CFG is itself non-trivial because compilers routinely reuse the same code in different execution contexts.

Refinement via Dependency Analysis

A purely control-flow-derived CFG can be conservative. The processor-fuzzing paper explicitly compares two CFG views of the same Rocket-processor exception-handling code:

A plain CFG in which each basic block is connected based on control flow alone (the left view).
A refined CFG produced after dependency analysis, which adds edges that reflect data and control dependencies between blocks (the right view).

The refined CFG preserves additional structural information that downstream consumers, such as coverage-driven heuristics, can exploit.

Reuse-Sensitive vs. Reuse-Insensitive CFGs

When the same bytecode appears in multiple execution contexts — a common artifact of compiler code reuse — a reuse-insensitive CFG merges those occurrences, introducing semantic ambiguities and redundant control-flow dependencies. A reuse-sensitive CFG, by contrast, identifies and replicates the reused code into each context, producing a graph that more faithfully represents the program semantics. The Esuer tool demonstrates that constructing reuse-sensitive CFGs for EVM bytecode yields 99.94% execution-trace coverage and an F1-score of 97.02% for accurate identification of reused code, and supports downstream vulnerability detection (e.g., tx.origin and reentrancy vulnerabilities) with F1-scores of 99.97% and 99.67% respectively.

Application in Hardware Fuzzing

In the cited processor-fuzzing framework, the CFG underpins the Dependency-Aware Heuristic that guides seed selection. For each branch p_i accessed by a seed, the heuristic searches the refined CFG for successors within a bounded distance (≤ 3); if any of those successors is an unvisited branch u_j, the reciprocal of the smallest such distance contributes to the seed's heuristic value:

H_dep(seed_i) = Σ_{p_i ∈ P_seed_i} min_{u_j ∈ unvisited} 1 / d(p_i, u_j)

This encourages the fuzzer to prefer seeds whose execution paths in the CFG pass near unvisited basic blocks, in contrast to the Frequency Heuristic, which only uses accumulated access counts of branches to prioritize rare-target reachability.

Application in Program Obfuscation

A program's CFG typically leaks considerable structural information: if only the straight-line code (SLC) is obfuscated, the CFG can still be extracted and analyzed. Consequently, methods have been proposed to rewrite a program P into a functionally equivalent program P' such that CFG{P} and CFG{P'} are radically different — even non-isomorphic — to harden the program against such extraction.