Co-emulation

Co-emulation is a hardware-verification approach enabled by hybrid CPU/FPGA architectures and SoC-FPGAs in which the design under test is mapped onto an FPGA, while the testbench continues to execute on an HDL simulator.^[1] It is used in the context of verification acceleration, particularly where RTL simulation alone is limited by testbench performance.

Overview

In a co-emulation platform, verification is split across two execution domains:

Component	Execution domain
Design under test (DuT)	FPGA fabric
Testbench	HDL simulator running on a CPU

This arrangement differs from purely software-based RTL simulation because the DuT is implemented in FPGA hardware rather than simulated entirely by the HDL simulator. However, the testbench remains simulator-based, so its performance can still constrain overall verification throughput.^[1]

Motivation

The motivation for co-emulation is closely tied to the growing performance gap between RTL execution and testbench execution. Modern HDL simulators can support multicore parallel simulation of RTL designs, but the cited evidence notes that much less has been done to parallelize testbenches across multiple cores.^[1] As a result, verification engineers have explored pragmatic techniques such as distributed stimulus generation using inter-process communication and multicore predictor architectures using C++ thread pools through the Direct Programming Interface.^[1]

Co-emulation addresses part of this problem by accelerating the DuT through FPGA mapping, but it does not automatically accelerate the testbench, which remains in the HDL simulation environment.^[1]

Testbench bottleneck

A major limitation of co-emulation is that the testbench may remain the dominant performance bottleneck. The evidence describes testbench performance as “the proverbial elephant in the room,” because RTL simulation has received multicore acceleration support while testbench parallelization remains comparatively limited.^[1]

SystemVerilog/UVM testbenches are especially affected by several issues:

SystemVerilog lacks fundamental support for transaction-level modeling concepts such as temporal decoupling of data from simulation time and events.^[1]
Integral SystemVerilog variables implicitly carry value-change event semantics, enabling constructs such as wait(a > b), but also contributing to slower algorithmic execution compared with native languages such as C or C++.^[1]
Interfaces from SystemVerilog to emulation platforms commonly require C/C++ interaction through the DPI layer, and data exchange through DPI can impose runtime overhead.^[1]
Tool-level multicore parallelism is primarily available for RTL and gate-level simulation, not arbitrary behavioral testbench code.^[1]

Relationship to transaction-based acceleration

A related approach for improving co-emulation performance is transaction-based acceleration. The evidence describes this as an approach that proposes an “untimed hardware verification language domain” to accelerate the testbench in co-emulation scenarios.^[1]

In this model, the goal is to reduce the cost of detailed signal-level or event-level interaction between the simulator-hosted testbench and the FPGA-hosted DuT by raising communication to a transaction level. The provided evidence specifically frames transaction-based acceleration as a method proposed for accelerating the testbench side of co-emulation.^[1]

Parallelism considerations

Multicore RTL simulators can partition statically scoped RTL designs and simulate independent regions concurrently, synchronizing data exchanged across partition boundaries through formally identifiable pins and ports.^[1] Testbenches are different: they are behavioral, can use automatic data scoping, and can share data by reference across components.^[1] According to the evidence, enabling safe parallelism in such testbenches would require user-level programming constructs for synchronized shared-data access, which the current SystemVerilog standard lacks.^[1]

This distinction is important for co-emulation because moving the DuT into FPGA hardware does not remove the need to execute, synchronize, and communicate with the simulator-based testbench.

Summary

Co-emulation accelerates hardware verification by mapping the DuT to FPGA hardware while retaining the testbench in an HDL simulator.^[1] Its effectiveness depends not only on FPGA execution speed but also on the performance of the simulator-hosted testbench, DPI communication, and the degree to which transaction-level or multicore techniques can reduce testbench overhead.^[1]

References

^[1]: Evidence item 333ba6f0-72c6-490a-935a-d1cfd222a7f6.