Skip to content
STIMSMITH

Data Footprint

Concept WIKI v1 · 6/13/2026

Data footprint is a workload characterization metric that quantifies the range of data addresses accessed by an application during its execution. It is a key determinant of cache and memory hierarchy performance because the size of the data footprint relative to each cache level governs hit and miss behavior, and it is used as a controllable knob in synthetic workload generators such as Genesys and as an optimization target in compiler tiling and accelerator design.

Data Footprint

Definition

Data footprint is defined as the range of data addresses accessed by an application during its execution time. As stated in the Genesys workload-generation framework, "Data footprint metric determines the range of data addresses accessed by the synthetic application during its execution time" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. In Genesys the metric is treated as a single scalar workload knob (metric 8 in the framework's memory-access characteristics) that "controls the size of the memory regions, which are accessed by the synthetic application" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55; 96e74d1d-5a60-48ef-8449-695d1fecd929].

Importance for the Memory Hierarchy

The data footprint matters because its size relative to the available cache capacity determines performance at every level of the cache/memory hierarchy. Per Genesys, the metric is important "because it can determine performance of different levels of caches and memory based on how large the footprint is with respect to the available cache size and memory structure" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. When the footprint exceeds a cache level's capacity, miss rates at that level rise sharply; when it is smaller, working-set effects dominate.

Data Footprint as a Workload Characteristic

Role in Synthetic Workload Generation

In Genesys, data footprint is one of twelve core workload-specific metrics exposed to the user as programmable knobs. It sits in the Memory-access Characteristics group alongside regular/irregular behavior, spatial-locality stride bins, temporal-locality bins, and L1/L2 data-cache miss rates [Genesys, chunk 96e74d1d-5a60-48ef-8449-695d1fecd929]. The value of this metric feeds a code-generator that produces a two-level nested loop: "the inner loop controls the application's data footprint and the outer loop controls the number of dynamic instructions (overall runtime). Every static load or store instruction resets to the first element" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. This structural choice makes the data footprint directly controllable by the inner-loop trip count.

Role in Compiler Tiling

The Latency Based Tiling approach characterizes miss-ratio scaling — "the relationship between data access latency and working set size with sharp increases in latency indicating the data footprint exceeds capacity from a cache level" — using triangular loops. From these inflection points an approximate location of L1, L2, and L3 capacities is derived, and the tiling strategy is "applied to a subset of the polyhedral model, where loop nestings are tiled based on both the derived memory hierarchy and the observed data footprint per iteration" [Latency Based Tiling, arxiv:2510.15912v1]. The detected sizes are expected to be under-approximations of the true sizes because caches are shared across processes in multi-process systems.

Role in Accelerator Design (FHE)

In fully homomorphic encryption (FHE) accelerator work, the term data footprint is used to denote the on-chip working set that must be retained across cryptographic operations. WHET identifies "conventional FHE constructions as major sources of excessive working sets and heavy off-chip memory traffic" and introduces fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising "to reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads" [WHET, arxiv:2606.11541v1]. Here, reducing the data footprint is a prerequisite for additional on-chip memory-efficiency refinements such as a special-purpose buffer and functional-unit extensions.

Related Entities

  • Genesys — a synthetic workload-generation framework that treats data footprint as one of its core user-controllable memory-access metrics (USES, incoming).

LINKED ENTITIES

1 links

CITATIONS

6 sources
6 citations
[1] Data footprint metric determines the range of data addresses accessed by the synthetic application during its execution time, and it controls the size of the memory regions accessed by the synthetic application. Genesys: Automatically Generating Representative Workloads
[2] Data footprint determines performance of different levels of caches and memory based on how large the footprint is with respect to the available cache size and memory structure. Genesys: Automatically Generating Representative Workloads
[3] Data footprint is listed as metric 8 in Genesys's memory-access characteristics group, alongside regular/irregular behavior, spatial-locality stride bins, temporal-locality bins, and L1/L2 data-cache miss rates. Genesys: Automatically Generating Representative Workloads
[4] In the generated code, the inner loop of a two-level nested loop controls the application's data footprint and the outer loop controls the number of dynamic instructions; every static load or store instruction resets to the first element. Genesys: Automatically Generating Representative Workloads
[5] Miss ratio scaling captures the relationship between data access latency and working set size, with sharp increases in latency indicating the data footprint exceeds a cache level's capacity; loop nestings are tiled based on the derived memory hierarchy and the observed data footprint per iteration. Latency Based Tiling
[6] Conventional FHE constructions are major sources of excessive working sets; fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads. WHET: Welding Homomorphic Encryption to Accelerator Architectures