Skip to content
STIMSMITH

Amdahl's Law

Concept WIKI v1 · 5/24/2026

**Amdahl's Law** describes a fundamental limit on performance improvement in parallel or optimized systems: “the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used.”[3882ee73-d7fa-4f61-9805-76c7e17a2ebd] In practice, this means that accelerating only the parallelizable or optimized portion of a workload cannot eliminate time spent in sequential, synchronization, scheduling, or other non-optimized parts of the system.

Amdahl's Law

Amdahl's Law describes a fundamental limit on performance improvement in parallel or optimized systems: “the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used.”[1] In practice, this means that accelerating only the parallelizable or optimized portion of a workload cannot eliminate time spent in sequential, synchronization, scheduling, or other non-optimized parts of the system.

Core idea

Amdahl's Law is commonly used to reason about expected speedup when a workload is divided into:

  • a parallel or optimized portion, which can benefit from additional processors or improvements; and
  • a sequential or non-optimized portion, which remains a bottleneck regardless of how much the other portion improves.[1]

The provided source illustrates this with speedup curves for workloads whose parallel portions are 50%, 75%, 90%, and 95%, plotted against the number of processors.[1][2] The curves show the practical implication of the law: higher parallel fractions allow larger speedups, but the remaining non-parallel fraction still limits total improvement.

Application to multicore simulation

Amdahl's Law is especially relevant in multicore testbench and simulation architectures. A parallel simulator may use multiple task executors, each assigned its own CPU or POSIX thread, while synchronization barriers keep those executors aligned with the simulator scheduler.[1] However, the scheduler itself can remain a sequential component, limiting the achievable performance improvement.[1]

The cited multicore testbench discussion notes that, at a given simulation time, a testbench simulator may only have a small number of active events and processes. As a result, “not much can be achieved by parallelizing the scheduler.”[1] This is a direct Amdahl-style limitation: even if some execution tasks are parallelized, the sequential scheduler and synchronization overhead constrain total speedup.

Parallelism and overheads

In cooperative threading, context switching occurs when a task yields, such as while waiting for an event. Before yielding, the task must save its state, including the call stack and CPU registers, so it can later resume execution.[1] The source identifies context switching as simulator runtime overhead that does not perform useful testbench functionality, and recommends reducing frequent context switches by reducing simulation events.[1]

In multicore testbenches, synchronization barriers add another overhead beyond the sequential scheduler.[1] These overheads reduce the portion of execution that benefits from parallelism, thereby reducing the speedup predicted by Amdahl's Law.

When multicore execution helps

Despite these limits, multicore testbenches can provide meaningful performance gains when the parallelized tasks are sufficiently compute-intensive.[1] The evidence identifies sequence randomization as often one of the most compute-intensive processes in a testbench, because it may involve solving complex constraints.[1]

A cited example is VIP-level parallelism in eUVM, where multiple UVM agents or Verification IPs are mapped to separate CPU threads so that sequence randomization and related work can be distributed across cores.[1] This kind of design increases the fraction of useful work that can run in parallel, improving the conditions under which Amdahl's Law permits significant speedup.

Limitations in testbench parallelization

The source also describes practical reasons why multicore testbench speedup may be less than ideal:

  1. When the scheduler becomes active, all task executors may deactivate.[2]
  2. High speedup requires balanced task loads across executors.[2]
  3. Load imbalance can occur when different types of VIPs are mapped to different executors.[2]
  4. Synchronization barriers and scheduler activity reduce the effective parallel portion of the workload.[1]

These constraints are consistent with Amdahl's Law: performance is not determined only by the number of processors, but by how much of the workload can actually execute in parallel and how much time is lost to sequential coordination.

Related architectural technique: asynchronous worker threads

To increase usable concurrency, the cited eUVM architecture introduces worker threads: free-running asynchronous threads owned by the simulator but decoupled from the scheduler.[2] Because these worker threads continue running even when the scheduler activates, they can help avoid some limitations of executor-based parallelism.[2]

However, worker threads cannot wait for simulator events, so data exchange with synchronous simulator tasks is handled through asynchronous TLM FIFO mechanisms.[2] This illustrates an engineering response to Amdahl's Law: architectures can improve speedup not only by adding cores, but also by reducing scheduler dependence and increasing the fraction of work that can proceed concurrently.

Summary

Amdahl's Law states that total system speedup is limited by the fraction of execution time affected by an optimization.[1] In multicore simulation and verification environments, this means that parallel task execution can help, especially for compute-heavy workloads such as constrained sequence randomization, but sequential scheduling, synchronization barriers, context switching, and load imbalance limit the achievable improvement.[1][2]