Skip to content
STIMSMITH

Control Flow Graph (CFG)

Concept WIKI v1 · 6/11/2026

A Control Flow Graph (CFG) is a directed graph representation of a program in which nodes are basic blocks and edges encode possible transfers of control. CFGs are a foundational abstraction used by static analysis, hardware-fuzzing coverage heuristics, and program-obfuscation research, and are commonly refined with auxiliary analyses (e.g., data/control dependencies, code-reuse context) to better approximate execution semantics.

Overview

A Control Flow Graph (CFG) models a program as a directed graph whose nodes are basic blocks (maximal straight-line instruction sequences with a single entry and a single exit) and whose edges represent possible transfers of control between those blocks. CFGs allow program structure to be analyzed without executing the program and form the substrate on which many static analyses, fuzzing heuristics, and reverse-engineering or obfuscation techniques are built.

Structure

In a CFG, each basic block corresponds to a contiguous sequence of instructions. Edges are derived from control-flow operations such as conditional branches, jumps, calls, and returns. In the processor-fuzzing literature, individual basic blocks of interest (e.g., b4) are identified inside the graph alongside their predecessors and successors when reasoning about unvisited code.

Construction and Tools

CFGs can be derived from source-level, intermediate, or binary representations. For hardware designs expressed at the RTL level, the cited processor-fuzzing work uses Pyverilog to parse the design and the Z3 SMT-solver as part of forming the CFG. For EVM bytecode, recovering an accurate CFG is itself non-trivial because compilers routinely reuse the same code in different execution contexts.

Refinement via Dependency Analysis

A purely control-flow-derived CFG can be conservative. The processor-fuzzing paper explicitly compares two CFG views of the same Rocket-processor exception-handling code:

  • A plain CFG in which each basic block is connected based on control flow alone (the left view).
  • A refined CFG produced after dependency analysis, which adds edges that reflect data and control dependencies between blocks (the right view).

The refined CFG preserves additional structural information that downstream consumers, such as coverage-driven heuristics, can exploit.

Reuse-Sensitive vs. Reuse-Insensitive CFGs

When the same bytecode appears in multiple execution contexts — a common artifact of compiler code reuse — a reuse-insensitive CFG merges those occurrences, introducing semantic ambiguities and redundant control-flow dependencies. A reuse-sensitive CFG, by contrast, identifies and replicates the reused code into each context, producing a graph that more faithfully represents the program semantics. The Esuer tool demonstrates that constructing reuse-sensitive CFGs for EVM bytecode yields 99.94% execution-trace coverage and an F1-score of 97.02% for accurate identification of reused code, and supports downstream vulnerability detection (e.g., tx.origin and reentrancy vulnerabilities) with F1-scores of 99.97% and 99.67% respectively.

Application in Hardware Fuzzing

In the cited processor-fuzzing framework, the CFG underpins the Dependency-Aware Heuristic that guides seed selection. For each branch p_i accessed by a seed, the heuristic searches the refined CFG for successors within a bounded distance (≤ 3); if any of those successors is an unvisited branch u_j, the reciprocal of the smallest such distance contributes to the seed's heuristic value:

H_dep(seed_i) = Σ_{p_i ∈ P_seed_i} min_{u_j ∈ unvisited} 1 / d(p_i, u_j)

This encourages the fuzzer to prefer seeds whose execution paths in the CFG pass near unvisited basic blocks, in contrast to the Frequency Heuristic, which only uses accumulated access counts of branches to prioritize rare-target reachability.

Application in Program Obfuscation

A program's CFG typically leaks considerable structural information: if only the straight-line code (SLC) is obfuscated, the CFG can still be extracted and analyzed. Consequently, methods have been proposed to rewrite a program P into a functionally equivalent program P' such that CFG{P} and CFG{P'} are radically different — even non-isomorphic — to harden the program against such extraction.

See Also

  • Static Analysis — uses CFGs as a primary program model.
  • Dependency-Aware Heuristic — a CFG-driven seed-prioritization technique.
  • Fine-Grained Code Analysis for Processor Fuzzing — constructs and refines CFGs from RTL for hardware-fuzzing coverage.

CITATIONS

10 sources
10 citations
[1] A CFG's nodes are basic blocks and its edges represent control flow between them; basic blocks of interest (e.g., b4) appear with their predecessors and successors. Fine-Grained Code Analysis for Processor Fuzzing
[2] Pyverilog and the Z3 SMT-solver are used in forming the CFG for hardware (RTL) designs. Fine-Grained Code Analysis for Processor Fuzzing
[3] Compared with a plain CFG built from control flow alone, a refined CFG produced after dependency analysis adds additional edges that capture data/control dependencies between basic blocks (illustrated on Rocket-processor exception handling). Fine-Grained Code Analysis for Processor Fuzzing
[4] The Dependency-Aware Heuristic is defined on the refined CFG: for each accessed branch p_i, it searches CFG successors within distance ≤ 3 and scores a seed by Σ min(1 / d(p_i, u_j)) over unvisited successors u_j. Fine-Grained Code Analysis for Processor Fuzzing
[5] The Frequency Heuristic prioritizes seeds based on accumulated access counts of branches, in contrast to the CFG-distance-based Dependency-Aware Heuristic. Fine-Grained Code Analysis for Processor Fuzzing
[6] Compiler-introduced code reuse causes reuse-insensitive CFGs of EVM bytecode to exhibit semantic ambiguities and redundant control-flow dependencies. Building Reuse-Sensitive Control Flow Graphs (CFGs) for EVM Bytecode
[7] Esuer dynamically identifies code reuse when constructing reuse-sensitive CFGs of EVM bytecode, achieving 99.94% execution-trace coverage and a 97.02% F1-score for identifying reused code. Building Reuse-Sensitive Control Flow Graphs (CFGs) for EVM Bytecode
[8] Esuer's reuse-sensitive CFGs help identify smart-contract vulnerabilities such as tx.origin and reentrancy with F1-scores of 99.97% and 99.67% respectively. Building Reuse-Sensitive Control Flow Graphs (CFGs) for EVM Bytecode
[9] A program's CFG typically leaks considerable structural information, so obfuscating only straight-line code is insufficient: the CFG can still be extracted and analyzed. Generating Functionally Equivalent Programs Having Non-Isomorphic Control-Flow Graphs
[10] A method has been proposed to rewrite a program P into a functionally equivalent P' such that CFG{P} and CFG{P'} are radically different (non-isomorphic). Generating Functionally Equivalent Programs Having Non-Isomorphic Control-Flow Graphs