Overview
Exception handling is a concept that spans multiple layers of computing, from hardware CPU behavior to high-level programming-language constructs. The provided evidence touches on exception handling in three distinct contexts: (i) CPU emulator testing, where exceptions are modeled as part of the abstract machine state and used to detect emulation defects; (ii) programming-language practices, where studies have examined exception flows and anti-patterns in Java and C# codebases; and (iii) RISC-V assembly program generation, where the riscv-dv random instruction generator emits a dedicated exception-handling section.
Exception Handling in CPU Emulators
In CPU emulation, exceptions are an explicit component of the formal abstract machine state. A CPU state is modeled as a tuple s = (pc, R, M, E), where E is the exception state taking values from {⊥, illegal instruction, division by zero, general protection fault, ...}. The special value ⊥ indicates that no exception occurred. The state-transition function δ maps s = (pc, R, M, E) into a new state s′ = (pc′, R′, M′, E′) by executing the current instruction at pc, and E′ reflects the exception status after the instruction.
Defects Observed in Emulators
The Testing CPU Emulators study (ISSTA 2009) documents several exception-handling defects in widely used emulators:
- Pin: Not all exceptions are properly handled. Pin does not notify the emulated program about trap and illegal-instruction exceptions. Several legal instructions that raise a general-protection fault on the physical CPU are executed without generating any exception on Pin (e.g.,
add %ah, %fs:(%ebx)). When segment registers are pushed onto the stack, the stack pointer is not updated properly, reserving a single word where a double-word is required (e.g.,push %fs). - Valgrind / QEMU: Instructions are not executed atomically because they are translated into several intermediate instructions. Consequently, when an exception occurs mid-instruction, the state of memory and registers may differ from the state prior to instruction execution (e.g.,
idiv (%ecx)with a zero divisor). Some logical instructions do not faithfully update the status register. - BOCHS: Certain floating-point instructions alter the state of the status register in ways that do not match the physical CPU.
- Atomicity (physical CPU): On the physical CPU, each instruction is executed atomically; when an exception occurs, the state of memory and registers corresponds to the state preceding the instruction's execution. Deviations from this invariant in emulators constitute defects under the faithful-emulation definition used in the paper.
Page-Fault Exceptions in the EmuFuzzer Methodology
The EmuFuzzer testing methodology explicitly intercepts and uses exceptions during test-case execution. The execution of test-case code on the physical CPU continues until one of: (i) the last instruction is reached; (ii) a page-fault exception caused by an access to a missing page occurs; (iii) a page-fault exception caused by a write access to a non-writable page occurs; or (iv) any other exception occurs. Page-fault exceptions are used for lazy memory synchronization: during the initialisation phase, all data pages of the physical environment are protected, and when an instruction tries to access memory, the page-fault exception is intercepted to retrieve the corresponding memory page from the emulated environment.
On the physical CPU side, the test-case runs through a small user-space program that registers signal handlers for page faults and other runtime exceptions, and the test-case code is executed as a shellcode. The emulator is extended with embedded code that (a) intercepts the beginning and end of each basic block or instruction of the emulated program, (b) intercepts exceptions that may occur during execution, and (c) provides an interface to access the values of the CPU registers and the memory content of the emulator.
Exception Handling Practices in Programming Languages
Modern programming languages such as Java and C# provide exception-handling features that separate error-handling code from regular source code. These features are designed to enhance software reliability, comprehension, and maintenance, but their misuse can still cause reliability degradation or catastrophic software failures such as application crashes.
Exception Flow Analysis Findings
A 2017 study analyzed over 10,000 exception handling blocks and over 77,000 related exception flows from 16 open-source Java and C# (.NET) libraries and applications. Key findings:
- Each
tryblock has up to 12 possible potentially recoverable yet propagated exceptions. - 22% of distinct possible exceptions can be traced back to multiple methods (average of 1.39 and maximum of 34).
- There is a notable lack of documentation of the possible exceptions and their sources, but such critical information can be identified by exception-flow analysis on well-documented API calls (e.g., JRE and .NET documentation).
- Different exception-handling strategies are observed between Java and C#.
- The findings highlight the opportunity to leverage automated software analysis to assist exception-handling practices and signify the need for further in-depth studies.
Common Anti-Patterns
A companion 2017 study collected a thorough list of exception anti-patterns from the same set of 16 open-source Java and C# libraries and applications, using an automated exception-flow analysis tool. Although exception-handling anti-patterns widely exist across all subjects, only a few anti-patterns are commonly identified:
- Unhandled Exceptions
- Catch Generic
- Unreachable Handler
- Over-catch
- Destructive Wrapping
The prevalence of these anti-patterns also illustrates differences between C# and Java, calling for further in-depth analyses of exception-handling practices across languages.
Exception Handling in RISC-V Assembly Generation
In the CHIPS Alliance riscv-dv random instruction generator, the riscv_asm_program_gen class generates RISC-V assembly programs that include a dedicated exception-handling section. The generated program is described as containing multiple sections: initialization routine, instruction section, data section, stack section, page table, interrupt handling, and exception handling, each produced by different functions within riscv_asm_program_gen.
Generation Flow
The function gen_program() is the main routine that generates the full RISC-V assembly program by calling other functions in the class. After the main and any subprogram generation is complete, host-interface instructions are added through gen_section (with labels such as write_tohost and _exit). The flow then calls push_gpr_to_kernel_stack(), which pushes general-purpose registers to the stack for trap handling. riscv_asm_program uses gen_section() to select the instruction string mtvec_handler, which has exception_handler and interrupt_handler defined within it.
Relationship to gen_section
gen_section is directly involved in assembling named code sections used by the generated program. In the exception-handling path, it selects the mtvec_handler string that contains the exception_handler and interrupt_handler definitions.