Data Footprint
Definition
Data footprint is defined as the range of data addresses accessed by an application during its execution time. As stated in the Genesys workload-generation framework, "Data footprint metric determines the range of data addresses accessed by the synthetic application during its execution time" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. In Genesys the metric is treated as a single scalar workload knob (metric 8 in the framework's memory-access characteristics) that "controls the size of the memory regions, which are accessed by the synthetic application" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55; 96e74d1d-5a60-48ef-8449-695d1fecd929].
Importance for the Memory Hierarchy
The data footprint matters because its size relative to the available cache capacity determines performance at every level of the cache/memory hierarchy. Per Genesys, the metric is important "because it can determine performance of different levels of caches and memory based on how large the footprint is with respect to the available cache size and memory structure" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. When the footprint exceeds a cache level's capacity, miss rates at that level rise sharply; when it is smaller, working-set effects dominate.
Data Footprint as a Workload Characteristic
Role in Synthetic Workload Generation
In Genesys, data footprint is one of twelve core workload-specific metrics exposed to the user as programmable knobs. It sits in the Memory-access Characteristics group alongside regular/irregular behavior, spatial-locality stride bins, temporal-locality bins, and L1/L2 data-cache miss rates [Genesys, chunk 96e74d1d-5a60-48ef-8449-695d1fecd929]. The value of this metric feeds a code-generator that produces a two-level nested loop: "the inner loop controls the application's data footprint and the outer loop controls the number of dynamic instructions (overall runtime). Every static load or store instruction resets to the first element" [Genesys, chunk 7f53489b-7dd4-4b1a-a498-8e612b643e55]. This structural choice makes the data footprint directly controllable by the inner-loop trip count.
Role in Compiler Tiling
The Latency Based Tiling approach characterizes miss-ratio scaling — "the relationship between data access latency and working set size with sharp increases in latency indicating the data footprint exceeds capacity from a cache level" — using triangular loops. From these inflection points an approximate location of L1, L2, and L3 capacities is derived, and the tiling strategy is "applied to a subset of the polyhedral model, where loop nestings are tiled based on both the derived memory hierarchy and the observed data footprint per iteration" [Latency Based Tiling, arxiv:2510.15912v1]. The detected sizes are expected to be under-approximations of the true sizes because caches are shared across processes in multi-process systems.
Role in Accelerator Design (FHE)
In fully homomorphic encryption (FHE) accelerator work, the term data footprint is used to denote the on-chip working set that must be retained across cryptographic operations. WHET identifies "conventional FHE constructions as major sources of excessive working sets and heavy off-chip memory traffic" and introduces fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising "to reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads" [WHET, arxiv:2606.11541v1]. Here, reducing the data footprint is a prerequisite for additional on-chip memory-efficiency refinements such as a special-purpose buffer and functional-unit extensions.
Related Entities
- Genesys — a synthetic workload-generation framework that treats data footprint as one of its core user-controllable memory-access metrics (USES, incoming).