Source 2d0c3798... — STIMSMITH

SOURCE ARCHIVE

SHA256: 2d0c37989889c2c31fe19c421dae8d89c9016d1c584e70d3f4f2aff57529b0ba

URL: https://yuntaolu.github.io/files/Lu-2025-TODAES-DeepVerifier.pdf

TYPE: application/pdf

SIZE: 4364.3 KB

FETCHED: 6/12/2026, 10:09:59 PM

EXTRACTOR: liteparse

CHARS: 99,334

EXTRACTED CONTENT

99,334 chars

® DIGITAL mopeny PDF Download2026 Sore Total Downloads: 06 Latest updates: hitps://dl.acm.org/doi/10.1145/3721133 Accepted: 17 February 2025 RESEARCH-ARTICLE Revised: 21 January 2025 Received: 07 November 2024 DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Con fon Verification

YUNTAO LU, Chinese University of Hong Kong, Hong Kong, Hong Kong

CHEN BAL, Hong Kong University of Science and Technology, Hong Kong, Hong Kong

YUXUAN ZHAO, Chinese University of Hong Kong, Hong Kong, Hong Kong

ZIYUE ZHENG, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

YANGDI LYU, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

MINGYU LIU, Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China

View all

Open Access Support provided by: Chinese University of Hong Kong

The Hong Kong University of Science and Technology (Guangzhou)

Huawei Technologies Co., Ltd.

Hong Kong University of Science and Technology

                                                 ACM Transactions on Design Automation of Electronic Systems
                                                                             hitps://doi.org/10.1145/3721133
                                                                                 EISSN:            1557-7309

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification YUNTAO LU, The Chinese University of Hong Kong, Hong Kong, Hong Kong CHEN BAI, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong YUXUAN ZHAO, The Chinese University of Hong Kong, Hong Kong, Hong Kong ZIYUE ZHENG, The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, China YANGDI LYU, The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, China MINGYU LIU, Huawei Technologies Co Ltd, Wuhan, China BEI YU, The Chinese University of Hong Kong, Hong Kong, Hong Kong

Veriication is critical in ensuring the reliable operation of modern, complex computing systems. However, as processor designs become increasingly sophisticated, conventional static veriication techniques struggle to generate high-quality test sequences that achieve comprehensive coverage. Dynamic simulation-based approaches, which leverage coverage-driven objectives, can increase conidence in correct processor functionality but often sufer from low veriication eiciency due to the generation of redundant test sequences and signiicant computational overhead. To address these challenges, this paper presents DeepVeriier, a novel coverage-guided test generation framework that leverages data-driven learning of existing test sequences and their associated coverage feedback. DeepVeriier uses a language model to learn the semantic representations of test sequences, ensure adherence to syntax constraints, and estimate the relationship between test sequences and coverage scores. By updating test sequences with higher coverage, DeepVeriier can signiicantly improve the eiciency and efectiveness of the veriication process. Experimental results of verifying an out-of-order RISC-V microprocessor demonstrate that the framework accurately estimates the coverage scores of test sequences and updates high-quality sequences that contribute to higher coverage. This coverage-guided test generation technique holds promise for enhancing the reliability of modern processor designs.

1 Introduction RISC-V represents an open-source instruction set architecture (ISA) [1], [2] that has lourished across various sectors of academia and industry due to its modular and extensible characteristics. In the development of processors utilizing RISC-V ISAs, functional veriication plays a pivotal role in ensuring the correctness of functionality; this phase can account for as much as 70% of the design and development lifecycle [3].

Corresponding author: Mingyu Liu. Authors’ Contact Information: Yuntao Lu, The Chinese University of Hong Kong, Hong Kong, Hong Kong; e-mail: ytlu23@cse.cuhk.edu.hk; Chen Bai, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong; e-mail: cbai@cse.cuhk.edu.hk; Yuxuan Zhao, The Chinese University of Hong Kong, Hong Kong, Hong Kong; e-mail: yxzhao21@cse.cuhk.edu.hk; Ziyue Zheng, The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, Guangdong, China; e-mail: zzheng989@connect.hkust-gz.edu.cn; Yangdi Lyu, The Hong Kong University of Science and Technology - Guangzhou Campus, Guangzhou, Guangdong, China; e-mail: yangdilyu@hkust-gz.edu.cn; Mingyu Liu, Huawei Technologies Co Ltd, Wuhan, China; e-mail: liumingyu@huawei.com; Bei Yu, The Chinese University of Hong Kong, Hong Kong, Hong Kong; e-mail: byu@cse.cuhk.edu.hk.

This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.
© 2025 Copyright held by the owner/author(s).
ACM 1557-7309/2025/3-ART
https://doi.org/10.1145/3721133

    ACM Trans. Des. Autom. Electron. Syst.

2 • Y. Lu et al.

Coverage metrics are quantitative indicators that show the extent to which a design has been veriied. The idea behind coverage-guided veriication is to implement more advanced methods for identifying untested behaviors in a design. This approach ultimately improves the efectiveness of the veriication process, as seen in various constrained functional veriication techniques [4], [5], [6]. As processor designs have become increasingly complex, the focus on functional veriication has intensiied. There is a growing interest in automated veriication techniques, aiming to improve coverage metrics and reduce the time required for veriication. To address these challenges, advanced approaches have been proposed, including static formal analysis, dynamic simulation, and semi-formal methods. Static formal techniques use mathematical formulations and equivalence checking [7] to theoretically prove the correctness or generate counterexamples. However, a state explosion issue arises when verifying complex designs with deep hard-to-reach states [8], [9], [10]. This challenge limits the practicality and scalability of formal methods for large-scale designs. On the other hand, dynamic simulation techniques function by modeling the system under examination within a hardware-level environment. This methodology facilitates the veriication of design functionality through the systematic observation of the system’s behavior across a range of conditions and scenarios. This approach is widely regarded as the leading solution for the evaluation of large-scale circuits. A simulation veriication work [11] achieves coverage objectives through two pioneering techniques: constrained random generation and coverage-guided test sequence generation. The application of constraints efectively narrows the search space of test sequences, while coverage scores strategically direct the exploration of these generated test sequences to maximize the coverage of hardware states. Instruction stream generators (ISGs), such as FORCE-RISCV [12], RISC-V design veriication framework [13], and RISC-V Non-ISA-speciic coverage tool [14], have been developed to systematically produce test sequences randomly and repetitively. This approach signiicantly enhances the coverage of various design frameworks. Semi-formal methods, i.e., a concolic testing approach [15], generate efective test sequences by analyzing the control low graph (CFG) of register-transfer level (RTL) designs. These sequences are simulated and executed to check coverage scores, which then guide the next round of sequence generation. Successfully applying this method requires a thorough understanding of both the design and the grammatical structure of RTL code to accurately analyze the control and data lows that inform the generation of test sequences. Additionally, inspired by successful software testing practices, fuzzing methods have been applied to hardware veriication tasks. Coverage-guided fuzzing techniques have been proposed in [16], [17], [18], [19], [20] to generate test sequences that achieve higher coverage scores, revealing bugs and vulnerabilities in hardware designs. Nevertheless, these methods generate test sequences within a few hours or days. The efectiveness of coverage-guided fuzzing is signiicantly inluenced by the combination of RTL structure analysis and coverage scores. This quality relies on heuristic mutation methods and search algorithms developed by experts, who utilize feedback from coverage scores of test sequences to enhance their strategies. Recently, the increasing trend of machine learning (ML) techniques has opened up new possibilities for addressing the challenges of generating high-quality test sequences with minimal professional knowledge. ML approaches represent a promising direction to reduce reliance on domain expertise while facilitating greater automation and scalability in test sequence generation. In previous work, a fully connected Bayesian network [21] described the relationships between input test stimuli and coverage metrics. To enhance the representational capabilities of ML models, researchers developed a three-layer artiicial neural network [22] that accelerated the test generation process and improved coverage. Additionally, a language model [23] was employed to understand the semantics of RTL (Register Transfer Level) code, allowing it to predict the hit rate of functional coverage, which in turn improved veriication eiciency. However, the adoption of ML methods also introduces new challenges. These include the need for substantial amounts of data to learn the relevant representations and implicit relationships, as well as the inherent complexity associated with interpretation.

ACM Trans. Des. Autom. Electron. Syst.

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 3

Random Instruction Stream Generator

Defined Architecture Instruction Instruction Set Template | | User Data Generati Sequences Simulator APIs on RISC-V RTL Simulator Architecture | Instruction Engine RISC-V ELFs Configuration | ISA j Compiler Coverage ] Analyzer Coverage Feedback (a)

Random Generation | Transformer Language Model Coverage Score Predictor Continues Predicted Coverage Instruction | Representation Transformer || | | Sequence 1 d<latexitsha1_base64="J7Z4l6l4xpH4EkC/9P/e4FTqF5g=">AAAB8HicbVC7SgNBFJ31GeMrKtjYLAbBKuxaRMsQG8sEzEOSJczO3k2GzGOZmRXCkq+wsVDEVvAv/AI7G7/FyaPQxAMXDufcy733hAmj2njel7Oyura+sZnbym/v7O7tFw4Om1qmikCDSCZVO8QaGBXQMNQwaCcKMA8ZtMLh9cRv3YPSVIpbM0og4LgvaEwJNla6i3oZlxGwca9Q9EreFO4y8eekWDmuf9P36ketV/jsRpKkHIQhDGvd8b3EBBlWhhIG43w31ZBgMsR96FgqMAcdZNODx+6ZVSI3lsqWMO5U/T2RYa71iIe2k2Mz0IveRPzP66QmvgoyKpLUgCCzRXHKXCPdyfduRBUQw0aWYKKovdUlA6wwMTajvA3BX3x5mTQvSn65VK7bNKpohhw6QafoHPnoElXQDaqhBiKIowf0hJ4d5Tw6L87rrHXFmc8coT9w3n4ACl6USA== Blocks Li Loss Update model il Linear i1 {| Projection |i |! Target Coverage Optimized Validation Optimization Sequence

Diversity and + Semantic + Higher Optimized Scalability Representations Coverage ~— Sequences

(b)

Fig. 1. (a) End-to-end verification workflow for a random test sequence generator used in RISC-V designs. (b) We propose a new strategy for updating test sequences using a transformer language model, which ofers feedback on the coverage of a coverage predictor during the early verification stages in the flow.

In this paper, we introduce a novel framework for generating test sequences called DeepVeriier. This framework is guided by coverage metrics and learns to leverage the relationship between generated test sequences and their associated coverage scores using a language model. Additionally, it employs an objective coverage model to optimize these test sequences for achieving higher coverage. To address the challenges posed by limited data and interpretation issues, we generate data from a large number of test sequences along with their corresponding coverage scores obtained from a simulator. We highlight the proposed DeepVeriier, focusing on three key aspects. First, we customized a language model to represent RISC-V ISA instruction sequences. This model tokenizes and transforms assembly RISC-V instructions into continuous, low-dimensional embedded vectors, which can then be used as input for a subsequent neural network that estimates coverage scores. Second, we designed a coverage score estimator capable of predicting diverse coverage values for these continuous instruction sequence embeddings. This lightweight predictor allows for fast and accurate predictions across various embeddings of RISC-V instruction snippets. Third, we propose an end-to-end diferentiable learning strategy to optimize the generated test sequences. Our sequence update strategy is informed by the predicted coverage values and the deviation between the estimated coverage and the

ACM Trans. Des. Autom. Electron. Syst.

(

…

4 • Y. Lu et al.

closure coverage scores. By computing the gradients of the test sequence embeddings for the predicted coverage values, we can update the embeddings using these gradients and adaptive learning rates. This process aims to produce test sequences with higher coverage. A comparison between conventional coverage-guided test generation methods and DeepVeriier is illustrated in Fig. 1. In traditional methods, test sequences are generated based on feedback scores from the coverage analyzer, following co-simulation with both the instruction set simulator and the RTL simulator, as depicted in Fig. 1(a). In contrast, DeepVeriier eiciently generates high-quality test sequences using a sequence representation language model, a score predictor, and a sequence optimizer, as shown in Fig. 1(b). The efectiveness of DeepVeriier lies in its ability to accurately represent RISC-V instruction sequences through the language model and utilize a fast and reliable coverage score predictor, which minimizes the waiting time during co-simulation. While we demonstrate our approach on 64-bit RISC-V processors, the methodology is inherently extensible to other ISA families. The transformer-based framework requires only adaptation of the tokenization layer and ISA architectural constraints from speciications for diferent targets, making it applicable to various processors. The current implementation with a limited set of RISC-V instruction tokens represents a starting point, and the transformer architecture can scale to much larger vocabularies with large-scale tokens to accommodate more complex ISA extensions, additional architectural states, or even mixed-ISA systems. Additionally, it employs a gradient-based sequence update strategy to enhance test sequences in the optimal direction for achieving higher coverage scores. Our contributions are summarized below: • A customized novel language model has been developed for learning the semantic and syntactic representations of RISC-V instruction sequences. • A fast and accurate coverage predictor calculates coverage scores for test instruction sequences. • A gradient-based sequence update strategy uses the target coverage scores to enhance test sequences, aiming for higher coverage. • Experimental results demonstrate that our framework efectively integrates representations of instruction sequences, estimates coverage eiciently, and optimizes test sequences, achieving a coverage improvement of approximately 3% to 6%, thereby facilitating coverage closure. The rest of this paper is organized as follows: Section 2 introduces the fundamentals of veriication tasks. Section 3 presents the problem formulation for the test sequence update issue. Section 4 ofers an overview of the DeepVeriier framework, including the algorithms employed. Section 5 outlines the experimental setup and evaluates the results obtained from DeepVeriier. Finally, Section 6 summarizes the indings and discusses future research directions.

2 Preliminaries 2.1 RISC-V ISA The RISC-V Instruction Set Architecture (ISA) [1] comprises a mandatory base integer instruction set, referred to as RV32I, RV64I, or RV128I, which is characterized by speciic register widths. Additionally, it includes various optional extensions identiied by single letters; for example, the letters M, A, C, F, and D correspond to integer multiplication and division, atomic instructions, compressed instructions, and single and double precision loating point operations, respectively. For example, RV32IMC speciies a 32-bit core equipped with both M and C extensions. The designation G signiies the IMAFD instruction set; therefore, RV32GC denotes RV32IMAFDC. Each core contains 32 general-purpose registers, labeled from x0 to x31, while the loating point extensions contribute an additional 32 loat-point registers. Moreover, there exist speciic instructions to access these registers by their designated names, where source registers are referred to as rs1 and rs2, and the destination register is designated as RD. The format and semantics of both the base ISA and its extensions are delineated in the unprivileged ISA speciication.

ACM Trans. Des. Autom. Electron. Syst.

    DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 5

Furthermore, the privileged speciication outlined in the RISC-V documentation [2] encompasses several critical functionalities essential for environmental interaction and operating system execution. This speciication delineates various execution modes, particularly the mandatory machine mode, along with the extensions for Supervisor and User modes, accompanied by detailed descriptions of the corresponding control and status registers (CSRs). CSRs are specialized registers that are fundamental to the privileged architecture framework. For instance, the machine status register monitors and regulates the current operating state of the hardware thread (hart), the machine ISA register identiies the ISA extensions supported by the processor, and the machine trap-vector base-address register designates the address of the trap/interrupt handler. Additionally, a read-only machine architecture ID register encodes the unique architecture identiication of the processor, among other functionalities. The complexity and modularity of the RISC-V ISA present challenges and opportunities for veriication. The diverse instruction categories activate diferent components of the design, which exercise speciic functional units and control paths, making comprehensive coverage crucial yet challenging to achieve through manual or random testing alone. The characteristics of RISC-V motivate our transformer-based approach in several ways. For the instruction representation, our model learns semantic relationships to capture both the syntax of individual instructions. For the coverage correlation, the self-attention mechanism enables modeling relationships between instruction patterns and coverage metrics, which helps identify which instruction combinations efectively exercise diferent parts of the design. For sequence updates, understanding the semantic structure of RISC-V instructions allows our method to generate valid and efective test sequences.

2.2 Simulation-based Verification In simulation-based veriication, the RTL model of the microprocessor is converted into a high-level object-oriented software class within both open-source and commercial environments [24], [25]. This translated high-level model is subsequently instantiated in the testbench alongside a memory model that is populated with the relevant veriication tests. Once the simulation infrastructure is established, it is essential to develop criteria to ascertain whether the simulated tests yield a pass or fail outcome. The most straightforward approach is to conduct correctness checking through directed tests. Upon completion of a test execution, the inal result is compared against a pre-calculated reference answer. The determination of success or failure is based on this comparison. While directed tests are extensively utilized within the industry, their primary aim is often to conirm the fundamental functionality of the design or to investigate speciic, deliberate corner cases. For over three decades, to rigorously assess the design, both industry professionals and academic researchers have relied on veriication binaries that are generated randomly. A Random Instruction Generator (RIG), also referred to as a test program generator or instruction stream generator, is a software utility designed to produce randomized assembly instruction streams based on predeined conigurations. The tests generated by the RIG encompass a wide range of implemented functionalities. This tool is capable of creating intricate test cases that may be challenging for engineers to devise independently [26], [27], [12]. By applying complete random instruction streams to stress the design under test, one can assess the system comprehensively. However, complete randomness lacks control over the generated tests, resulting in a low probability of uncovering deeply concealed bugs arising from complex interactions among various components. To address this limitation, some RIGs ofer the capability to exert control over the generated tests via test program templates. These templates provide an abstract representation of the test, delineating a set of constraints that the generator must adhere to. Consequently, this enables the management of both the direction and depth of the generated tests [5]. While simulation-based veriication relies heavily on random test generation and manual coverage analysis, our approach enhances this paradigm by introducing learning-based guidance. DeepVeriier leverages the coverage

    ACM Trans. Des. Autom. Electron. Syst.

6 • Y. Lu et al.

feedback from simulation runs to train our transformer model, enabling it to learn the relationship between instruction sequences and coverage outcomes. It creates a more intelligent test generation process that combines the thoroughness of simulation-based veriication with the predictive power of deep learning. Instead of purely random exploration or manual guidance, our method automatically learns instruction patterns and makes the veriication process more eicient and systematic.

2.3 Language Model in Natural Language Processing Natural language processing (NLP) constitutes a domain within linguistics and machine learning that concentrates on comprehending all aspects of human language. The primary objective of NLP tasks is not merely to interpret individual words in isolation, but rather to grasp the broader context in which these words are situated. Moreover, recent advancements in Transformers [28] have resulted in the availability of thousands of pre- trained models capable of performing various tasks across diferent modalities in text. These tasks include text classiication, information extraction, question answering, summarization, machine translation, and text generation, supporting over 100 languages. The automatic generation of vector representations for assembly instructions at the binary analysis level draws upon methodologies from the natural language processing (NLP) domain [29], treating binary assembly code as analogous to natural language documents. Prominent deep pre-trained language models in NLP include BERT [30], GPT [31], RoBERTa [32], and BART [33]. Reference [29] introduced a pre-trained assembly language model aimed at general-purpose instruction representation learning, which acknowledges the inherent naturalness of assembly instruction sequences. This work underscores the potential for acquiring semantic representations through pre-trained language models, facilitating their application across a variety of downstream tasks. Advances in NLP, particularly the transformer architecture’s success in capturing sequential dependencies and semantic relationships, inspire our approach to hardware veriication. We adapt these techniques to treat RISC-V assembly sequences as a domain-speciic language, where instructions are tokens and their relationships represent dependencies. Similar to the transformers learning semantic relationships between words in natural language, our model learns relationships between instructions and their impact on veriication coverage. Furthermore, we extend beyond direct NLP applications by incorporating instruction types and coverage objectives into the model. This allows DeepVeriier to not only understand instruction sequences but also predict their veriication efectiveness and guide test generation.

3 Problem Formulation The problem is articulated as follows: Problem 1. Given a RISC-V microprocessor, the task is to devise a method for generating high-quality test cases that efectively assess its functionality. The primary goal is to maximize the scores of established coverage metrics. In Problem 1, a test case is deined as a sequence of instructions. In this context, the terms "test sequences" and "test cases" are used interchangeably to convey the same concept.

4 DeepVerifier Framework 4.1 Overview Traditional veriication approaches typically rely on random testing to achieve coverage requirements, DeepVer- iier, illustrated in Fig. 1(b), serves as a complementary method, speciically targeting hard-to-reach coverage cases through intelligent test sequence generation. While achieving near-100% coverage is crucial in processor veriication, DeepVeriier is designed not as a replacement for existing methods, but as a complementary approach

ACM Trans. Des. Autom. Electron. Syst.

    DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 7

to enhance coverage eiciency. Traditional random testing remains essential for broad coverage but can be time-consuming when targeting speciic coverage cases. DeepVeriier addresses this limitation by learning from existing test sequences to intelligently guide the generation of new tests, potentially reducing the number of instructions and simulation time needed to achieve target coverage levels. Our framework particularly excels at identifying patterns that lead to improved coverage in challenging scenarios, working alongside conventional methods to accelerate the veriication process. Initially, we generate assembly instruction sequences through a random instruction sequence generator [12] in conjunction with a customized fuzzing tool. These sequences are compiled and simulated using RTL simulators like Synopsys VCS [25] or Verilator [24] to capture coverage scores. To optimize simulation time and instruction count eiciency, we preprocess these sequences and utilize them to train a transformer language model that learns the semantic patterns leading to efective coverage. Our approach comprises three key components designed to reduce the number of instructions needed for achieving target coverage: (1) an encoder that eiciently embeds test sequences into continuous vector rep- resentations; (2) a coverage predictor that rapidly forecasts coverage values from these representations; and (3) a decoder that translates optimized continuous representations back into valid instruction sequences. This framework enables quick identiication of promising test sequences through iterative parameter adjustment, signiicantly reducing the exploration time compared to random testing alone. The instruction stream generator creates test sequences based on predeined templates specifying instruction types, opcode weights, and sequence length. The transformer model’s encoder-decoder architecture eiciently converts between text sequences and token vectors, maintaining sequence validity while exploring the coverage space. A ine-tuned tokenizer works with the transformer model to convert instructions into token vectors matching the vocabulary size. The encoder creates compact word embeddings, while the decoder reconstructs valid sequences. The model training process minimizes cross-entropy loss to align input and output distributions, capturing crucial patterns that inluence coverage. The coverage predictor uses the transformer’s inal hidden state to build a neural network that eiciently maps sequence characteristics to coverage outcomes. To accelerate coverage improvement, we compute mean square loss between target and predicted coverage scores, using this deviation to update sequence representations. The gradient-based optimization process, guided by scheduled learning rates, eiciently modiies input sequences while maintaining their validity through a correctness veriication tool. This approach complements random testing by speciically targeting coverage gaps, potentially reducing the total number of instructions needed to achieve coverage targets.

4.2 Test Sequence Representation The test sequences comprise a diverse array of assembly programming language snippets of varying lengths, generated by a RISC-V instruction stream generator [12]. Each sequence encompasses a series of instructions, with lengths ranging from 128 to 256. Each instruction comprises opcodes and operands. In alignment with the speciications of the RISC-V ISA [1] and [2], we categorize opcodes into ive distinct types and operands into three categories. To efectively represent each instruction, we construct a vocabulary that includes both opcodes and operands, along with a tokenizer designed to convert each token into its corresponding embedded form. To maintain the representation strategy of the ~~and~~ tokens utilized in the trained tokenizer, which signify the start and end identiiers of each input sequence, we delineate each instruction sequence by inserting ~~and~~ at the beginning and concluding points of each sequence. To interpret the instructions within the RISC-V ISA speciications and convert instruction opcodes and operands into continuous vector representations suitable for subsequent utilization by a transformer language model, we reine the tokenizer. An instruction

    ACM Trans. Des. Autom. Electron. Syst.

8 • Y. Lu et al.

                ADDI x1, x1, 651              <s> ADDI <dsts> x1; <srcs> x1; <const>
                SLLI x1, x1, 0xd              <s> SLLI <dsts> x1; <srcs> x1; <const>
                …                             …
                LD x22, x29, 0                <s> LD <dsts> x22; <srcs> <mem> x29; <const> </mem>
                BEQ x22, x21, 4               <s> BEQ <srcs> x22; x21; <addr> <const> </addr>
                JALR x0, x1, 0                <s> JALR <dsts> x0; <srcs> x1; <addr> <const> </addr>
                …                             …
                                              </s>

    Tokenizer
                     Integer Register-Register:               Floating-Point:
                     ADD, SUB, …, AND, OR                     FADD.S, FSUB.S, …, FLT.S

                 Integer Immediate:                                   Compressed:
                 ADDI, SLLI, …, ANDI, ORI      Pre-trained            C.ADDI, C.BEQZ, …, C.J
                                                Tokenizer
                 Mutiplications/Divisions:                            Registers:
                 MUL, DIV, …, REM                                     x0, …, x31, f0, …, f31

                                               Special Tokens:
                                               <srcs>, <dsts>, <mem>,
                                               </mem>, <addr>, </addr>


    New Tokenizer

Fig. 2. Creating customized tokens to represent instruction sequences and integrating RISC-V ISA-specific tokens into the vocabulary, along with developing a new tokenizer.

comprises tokens that include an opcode (e.g., ADD, SUB), register operands, address operands, and immediate values. We incorporate opcodes and register symbols as new tokens within the vocabulary. To diferentiate between source and destination operands, we employ the tokens and to represent source registers and destination registers in an instruction, respectively. For immediate operands, we utilize the token to mitigate the presentation of diverse concrete values. Furthermore, for load and store instructions (e.g., LD, SD) that access memory addresses, we append and labels as tokens at the beginning and end of the instruction to represent the memory access pattern. For conditional and unconditional jump instructions (e.g., BEQ, JAL) that reference the next instruction address, we propose the use of and tokens to identify the range of address registers. Notably, to facilitate the processing of sequences, we replace commas within the instruction with semicolons and subsequently restore them following optimized modiications to maintain the validity of the instruction format. This tokenization approach enables our model to capture both instruction-level and sequence-level semantic relationships, which is crucial for generating efective test sequences. Unlike traditional approaches that treat instructions in isolation, our method preserves contextual dependencies that inluence coverage outcomes.

4.3 Model Architecture To learn semantic representations for hardware veriications, the LSTM network in [23] can sequentially learn the dependencies in an instruction sequence, while this approach faces several limitations. The sequential nature

ACM Trans. Des. Autom. Electron. Syst.

    DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 9

of LSTM processing makes it computationally expensive for long instruction sequences, and its ixed-size hidden state becomes a bottleneck when modeling complex instruction dependencies. The Transformer model [34] is a widely recognized technique for sequence modeling, which ofers three key advantages through its architecture. First, its parallel processing and self-attention mechanism excels at capturing long-distance dependencies within input sequences while maintaining computational eiciency. Second, its multi-head attention allows simultaneous modeling of diferent types of instruction relationships. Third, its explicit position encoding better preserves instruction order constraints critical for veriication tasks. To leverage these advantages of the model to understand both the semantics and syntax of RISC-V assembly sequences, we developed a customized version of BART [33], a ine-tuned transformer-based sequence-to- sequence language model. This customization involved speciic adaptations, including a tailored tokenizer and loss functions, aimed at enhancing RISC-V assembly sequence representation and improving coverage score prediction, drawing inspiration from PalmTree [29], a prior assembly language representation model. Unlike [29] that aims for general-purpose instruction embedding across architectures, our model is speciically optimized for veriication integration with custom tokenization. Furthermore, we bridge representation learning with coverage-guided veriication by introducing components for coverage prediction and sequence updates. This customization includes veriication-speciic adaptations of a tailored tokenizer and specialized loss functions, enabling both better coverage results and more interpretable veriication worklows. The customized tokenizer has been developed to efectively capture the distinct characteristics of RISC-V assembly sequences, including opcodes, register operands, address operands, and constant immediate values. To address the intricate internal structures of instructions, we represent the instruction sequence by appending special tokens as deined in Section 4.2. Additionally, we generate a speciic tokenizer tailored for this purpose, which builds upon the standard tokenizer of BART to incorporate instruction-related tokens into the vocabulary. The processes of preprocessing and tokenizing the instruction sequences, along with the composition of the RISC-V ISA-speciic tokenizer utilizing a customized vocabulary, are illustrated in Fig. 2. The fundamental components of a transformer model encompass an encoder, a decoder, and several linear projection layers designated for downstream tasks. The encoder and decoder are composed of a position-wise input embedding layer, followed by 𠐀 transformer blocks. Each transformer block includes a multi-head self- attention layer, a fully connected feed-forward network, and multiple additive and normalization layers. The output generated from the inal transformer block is subsequently processed through linear projection layers to derive the conclusive output representation of the input sequence. While our current implementation focuses on RISC-V instruction tokenization, the framework enables expan- sion to diverse veriication scenarios through adaptable token vocabularies and sequence lengths. The transformer model’s self-attention mechanism inherently supports sequences well beyond our current implementation, fa- cilitating the veriication of sophisticated designs. The extensible nature of our architecture accommodates the integration of advanced instruction sets, architectural states, and control mechanisms. This inherent scalability positions our framework to adapt to emerging processor architectures and their increasing complexity. As modern processor designs introduce more sophisticated features, the transformer’s capacity to capture intricate semantic relationships between instructions becomes increasingly valuable for comprehensive veriication coverage. The input embedding and positional encoding layer transforms input tokens x into a continuous embedded representation denoted as TE, while simultaneously incorporating positional information into these represen- tations, referred to as PE. The input tokens x = (𩐀1, 𩐀2, ..., 𩐀𦰀) encompass a sequence of length 𦰀, where each 𩐀𥠀, 𥠀 ∈ {1, 2, ..., 𦰀} signiies a token identity index within the vocabulary. An embedding matrix TE ∈ R𤐀𨰀𧀀𤀀𣠀𣰀 ×𤐀𦠀𧀀𤐀𤠀𦐀 is employed to map each input xi into the embedded space with 𤐀𨰀𧀀𤀀𣠀𣰀 representing the vocabulary size and 𤐀𦠀𧀀𤐀𤠀𦐀 indicating the dimensionality of the embedding space. The embedding te𥠀 corresponding to each token 𩐀𥠀

ACM Trans. Des. Autom. Electron. Syst.

10 • Y. Lu et al.

is articulated by the following Equation (1),

te𥠀                                  = TokenEmbedding(𩐀𥠀 ),                                                                (1)

where TE𩐀𥠀 denote the 𩐀𥠀 -th row of the embedding matrix TE which possesses a dimensionality of 𤐀𦠀𧀀𤐀𤠀𦐀 . For our model, the embedded dimension is deined as 768. To encode positional information for each token within the sequence, sinusoidal positional encoding employs both cosine and sine functions. For a given position 𧐀𧀀𨀀, 𧐀𧀀𨀀 ∈ {1, 2, ..., 𦰀} and dimension 𥠀, 𥠀 ∈ {1, 2, ..., 𤐀𦠀𧀀𤐀𤠀𦐀 }, the positional encoding is bifurcated into even indices pe𧐀𧀀𨀀,2𥠀 and odd indices pe𧐀𧀀𨀀,2𥠀+1. These indices correspond to the sine and cosine functions respectively, as indicated in Equation (2),

pe     pe𧐀𧀀𨀀,2𥠀                   = sine( 100002𥠀𥠀𥠀/𤐀𦠀𧀀𤐀𤠀𦐀 ),                                                    (2)
𧐀𧀀𨀀,2𥠀+1                          = cosine( 100002𥠀/𤐀𦠀𧀀𤐀𤠀𦐀 ).

The inal representation e𥠀 of each input token is derived by concatenating the token embedding te𥠀 with the positional encoding pe𧐀𧀀𨀀 . This relationship is expressed as e𥠀 = te𥠀 + pe𧐀𧀀𨀀 . Through the processes of token embedding and positional encoding, the input sequence X ∈ R𦰀×𤐀𨰀𧀀𤀀𣠀𣰀 of the length 𦰀 and a vocabulary size 𤐀𨰀𧀀𤀀𣠀𣰀 is converted into a sequence of continuous embedded representations E ∈ R𦰀×𤐀𦠀𧀀𤐀𤠀𦐀 , which maintains the same length 𦰀 and a dimension of 𤐀𦠀𧀀𤐀𤠀𦐀 . The attention mechanism employed within each transformer block is designed to capture long-range dependencies present in the embedded sequence and to evaluate the signiicance of various components through the computation of attention scores. Each transformer block incorporates a multi-head self-attention layer to derive these attention scores. Speciically, each input embedded matrix is linearly projected into three distinct matrices: query Q, key K, and value V by the projection matrices W𡀀 , W🠀, W𢐀 ∈ R𤐀𦠀𧀀𤐀𤠀𦐀 ×𤐀𦀀 , where 𤐀𦀀 = (𤐀𦠀𧀀𤐀𤠀𦐀 /𞰀 ), 𞰀 denotes the number of self-attention heads, which is established as 4 in the model. The multi-head attention score is expressed by Equation (3) as follows:

                            Atni(Q𥠀, K𥠀, V𥠀 ) = sotmax( Q√𥠀𤐀K𦀀𥠀⊤ )V𥠀,        (3)

MultiHeadAttention(Q, K, V) = Concat(Atn1, ..., AtnH)W𠠀 , where the computation of each self-attention head, the self-attention score is derived by taking the dot product of the query matrix Q𥠀 and key matrix K𥠀 . This result is subsequently scaled by the square root of the key dimension 𤐀𦀀 . The sotmax function is then applied to normalize the scores, ensuring that their sum equals one. Following this step, a weighted sum of the value matrix V𥠀 is calculated to produce the output Atni for each self-attention head. The inal multi-head attention score is formed by concatenating the outputs of 𞰀 self-attention heads, after which these concatenated results are projected using the weight matrix W𠠀 ∈ R𤐀𦠀𧀀𤐀𤠀𦐀 ×𤐀𦠀𧀀𤐀𤠀𦐀 . The output generated by the multi-head self-attention layer undergoes normalization through the function LayerNorm. This layer is designed to normalize inputs across features by utilizing the mean 󑰀 and variance 󓠀 of each hidden state h within a transformer block, as illustrated in Equation (4),

         LayerNorm(h) = √h󓠀−2 +󑰀 󐀀 󏠀 + 󏐀.                                                                                (4)

Upon processing through the multi-head attention layer, the resultant outputs, denoted as Xattn, are sub- sequently directed into the feed-forward network present within each transformer block. This feed-forward network comprises two fully connected layers, fc1 and fc , which operate on each hidden state h. A Gaussian 2

ACM Trans. Des. Autom. Electron. Syst.

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 11

Instruction Instruction Sequence Sequence Tokenizer Transformer J) Transformer 2 Tokenizer

T D || [| Token IDs || <latexitsha1_base64="y9tfvJsYxlYFrN0R1Gx1G1XgXEk=">AAAB8nicbVBNT8JAEJ3iF+IX6tHLRjDxRFoO6JHEi0dMBExKQ7bbLWzY7ja7WxLS8DO8eNAYr/4ab/4bF+hBwZdM8vLeTGbmhSln2rjut1Pa2t7Z3SvvVw4Oj45PqqdnPS0zRWiXSC7VU4g15UzQrmGG06dUUZyEnPbDyd3C70+p0kyKRzNLaZDgkWAxI9hYya9Hw3wqCQ7n9WG15jbcJdAm8QpSgwKdYfVrEEmSJVQYwrHWvuemJsixMoxwOq8MMk1TTCZ4RH1LBU6oDvLlyXN0ZZUIxVLZEgYt1d8TOU60niWh7UywGet1byH+5/mZiW+DnIk0M1SQ1aI448hItPgfRUxRYvjMEkwUs7ciMsYKE2NTqtgQvPWXN0mv2fBajdZDs9ZGRRxluIBLuAYPbqAN99CBLhCQ8Ayv8OYY58V5dz5WrSWnmDmHP3A+fwDUFJDdd (sha1_base64="gC9yTm68IZYr3cDfMlwdTr5l2aY=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsKtCTaWGIiH4a7kL1lDzbs7l1290zIhV9hY6Extv4cO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTjo5TRWibxDxWvRBrypmkbcMMp71EUSxCTrvh5Hbud5+o0iyWD2aa0EDgkWQRI9hY6bHq37DRyM+qg3LFrbkLoHXi5aQCOVqD8pc/jEkqqDSEY637npuYIMPKMMLprOSnmiaYTPCI9i2VWFAdZIuDZ+jCKkMUxcqWNGih/p7IsNB6KkLbKbAZ61VvLv7n9VMTXQcZk0lqqCTLRVHKkYnR/Hs0ZIoSw6eWYKKYvRWRMVaYGJtRyYbgrb68Tjr1mteoNe7rlWY1j6MIZ3AOl+DBFTThDlrQBgICnuEV3hzlvDjvzseyteDkM6fwB87nD9b5j7M=<latexit

*+,-. E vocab | D | | <latexitsha1_base64="y9tfvJsYxlYFrN0R1Gx1G1XgXEk=">AAAB8nicbVBNT8JAEJ3iF+IX6tHLRjDxRFoO6JHEi0dMBExKQ7bbLWzY7ja7WxLS8DO8eNAYr/4ab/4bF+hBwZdM8vLeTGbmhSln2rjut1Pa2t7Z3SvvVw4Oj45PqqdnPS0zRWiXSC7VU4g15UzQrmGG06dUUZyEnPbDyd3C70+p0kyKRzNLaZDgkWAxI9hYya9Hw3wqCQ7n9WG15jbcJdAm8QpSgwKdYfVrEEmSJVQYwrHWvuemJsixMoxwOq8MMk1TTCZ4RH1LBU6oDvLlyXN0ZZUIxVLZEgYt1d8TOU60niWh7UywGet1byH+5/mZiW+DnIk0M1SQ1aI448hItPgfRUxRYvjMEkwUs7ciMsYKE2NTqtgQvPWXN0mv2fBajdZDs9ZGRRxluIBLuAYPbqAN99CBLhCQ8Ayv8OYY58V5dz5WrSWnmDmHP3A+fwDUFJDddvocab(sha1_base64="gC9yTm68IZYr3cDfMlwdTr5l2aY=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsKtCTaWGIiH4a7kL1lDzbs7l1290zIhV9hY6Extv4cO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTjo5TRWibxDxWvRBrypmkbcMMp71EUSxCTrvh5Hbud5+o0iyWD2aa0EDgkWQRI9hY6bHq37DRyM+qg3LFrbkLoHXi5aQCOVqD8pc/jEkqqDSEY637npuYIMPKMMLprOSnmiaYTPCI9i2VWFAdZIuDZ+jCKkMUxcqWNGih/p7IsNB6KkLbKbAZ61VvLv7n9VMTXQcZk0lqqCTLRVHKkYnR/Hs0ZIoSw6eWYKKYvRWRMVaYGJtRyYbgrb68Tjr1mteoNe7rlWY1j6MIZ3AOl+DBFTThDlrQBgICnuEV3hzlvDjvzseyteDkM6fwB87nD9b5j7M=<latexit |] Decoder A ! "# $%& '(")' cross A | —t | Predictor entr O T Generation H o | <latexitsha1_base64="y9tfvJsYxlYFrN0R1Gx1G1XgXEk=">AAAB8nicbVBNT8JAEJ3iF+IX6tHLRjDxRFoO6JHEi0dMBExKQ7bbLWzY7ja7WxLS8DO8eNAYr/4ab/4bF+hBwZdM8vLeTGbmhSln2rjut1Pa2t7Z3SvvVw4Oj45PqqdnPS0zRWiXSC7VU4g15UzQrmGG06dUUZyEnPbDyd3C70+p0kyKRzNLaZDgkWAxI9hYya9Hw3wqCQ7n9WG15jbcJdAm8QpSgwKdYfVrEEmSJVQYwrHWvuemJsixMoxwOq8MMk1TTCZ4RH1LBU6oDvLlyXN0ZZUIxVLZEgYt1d8TOU60niWh7UywGet1byH+5/mZiW+DnIk0M1SQ1aI448hItPgfRUxRYvjMEkwUs7ciMsYKE2NTqtgQvPWXN0mv2fBajdZDs9ZGRRxluIBLuAYPbqAN99CBLhCQ8Ayv8OYY58V5dz5WrSWnmDmHP3A+fwDUFJDddvocab(sha1_base64="gC9yTm68IZYr3cDfMlwdTr5l2aY=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsKtCTaWGIiH4a7kL1lDzbs7l1290zIhV9hY6Extv4cO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTjo5TRWibxDxWvRBrypmkbcMMp71EUSxCTrvh5Hbud5+o0iyWD2aa0EDgkWQRI9hY6bHq37DRyM+qg3LFrbkLoHXi5aQCOVqD8pc/jEkqqDSEY637npuYIMPKMMLprOSnmiaYTPCI9i2VWFAdZIuDZ+jCKkMUxcqWNGih/p7IsNB6KkLbKbAZ61VvLv7n9VMTXQcZk0lqqCTLRVHKkYnR/Hs0ZIoSw6eWYKKYvRWRMVaYGJtRyYbgrb68Tjr1mteoNe7rlWY1j6MIZ3AOl+DBFTThDlrQBgICnuEV3hzlvDjvzseyteDkM6fwB87nD9b5j7M=<latexit H M | Predicted Coverage Update Parameters Update Parameters (a) (b)

Fig. 3. (a) Training a transformer model for a sequential modeling task focused on sequence generation will help the model learn the semantics and syntax of instruction sequences. (b) Performing a supervised prediction task to assess coverage scores using the fine-tuned transformer model from the generation task will enable us to estimate the relationship between the input sequence and its corresponding output.

Error Linear Unit (GELU) activation function, as speciied in Equation (5),

GELU(h) = h · 21 (1 + Φ( √h√2 )) (5) ≈ 0.5h(1 + tanh 2 (h+0.044715h3) ), 󒰀

where Φ is the cumulative distribution function for Gaussian distribution. The following feed-forward network is given by Equation (6), fc1 = GELU(X𣠀𨐀𨐀𦰀W1 + b1), (6) fc2 = GELU(fc W + b2), 1 2

where W1 ∈ R𤐀𦠀𧀀𤐀𤠀𦐀 ×𤐀𤰀𤰀𦰀, W2 ∈ R𤐀𤰀𤰀𦰀×𤐀𦠀𧀀𤐀𤠀𦐀 represent the projection matrices, while b1 ∈ R𤐀𤰀𤰀𦰀, b2 ∈ R𤐀𦠀𧀀𤐀𤠀𦐀 denote the bias vectors corresponding to the irst and second fully-connected layers, respectively. The dimension of the feed-forward network, denoted as 𤐀𤰀𤰀𦰀 , is established as 1024 within the model. The customized transformer model comprises both encoder and decoder components, consisting of two transformer blocks. The decoder utilizes a preceding output matrix Y ∈ R𦠀,𤐀𨰀𧀀𤀀𣠀𣰀 , where 𦠀 represents the length of the output sequence. To prevent the decoder from attending to future tokens, the self-attention mechanism is masked. This is achieved by assigning negative ininity values −∞ to the attention scores of future positions before the application of the sotmax function. The input matrices for the decoder’s self-attention are initialized with the outputs from the encoder’s self-attention. Speciically, the query matrix Q′ is derived from the masked self-attention of the decoder, while the key matrix K′ and value matrix V′ are obtained from the encoder’s outputs.

ACM Trans. Des. Autom. Electron. Syst.

12 • Y. Lu et al.

4.4 Training Tasks By incorporating coverage feedback into the test generation process, our approach captures the sequence pattern and optimizes coverage objectives. These optimization tasks help discover instruction dependencies in a sequence and obtain the coverage feedback in a short time to guide the successive test generation process. To implement our framework practically, we integrate the existing transformer framework and veriication worklows, requiring minimal additional setup while providing substantial beneits in test generation eiciency and coverage improvement. The ine-tuning task illustrated in Fig. 3(a) exposes a transformer model to efectively capture the sequence. Subsequently, we apply the ine-tuned model to conduct the downstream task of predicting coverage scores, as depicted in Fig. 3(b). Unlike traditional BART implementations that use text inilling as a noising method during training, our sequential modeling task is a ine-tuning task that prioritizes maintaining strict RISC-V ISA compliance throughout the training process. The RISC-V instruction sequences must maintain strict syntax and semantics of ISA compliance, as random masking could disrupt critical instruction dependencies and patterns. The primary objective is to optimize coverage scores through valid and executable test sequences. The transformer model needs to learn the precise relationship between instruction patterns and their corresponding coverage metrics. Adding noise during training could potentially interfere with this coverage-guided learning process. Additionally, unlike natural language tasks where partial text corruption is acceptable, hardware veriication demands precise instruction sequences that can be directly compiled and simulated for the coverage analysis. Training Task 1: Sequential Modeling of Test Generation. The representation of instruction sequences serves to maintain the architectural information of each instruction within a test sequence, as well as the execution rules governing the overall test sequence. This information encompasses the syntactic rules of RISC-V ISAs and highlights execution dependencies between instructions within the current test sequence tied to a particular design under evaluation. Moreover, this information relates to the association between test inputs and code coverage scores for a given design. By acquiring an understanding of the syntax and execution rules about tokens in a test sequence, and about existing training samples, the proposed model efectively learns the generation of instruction sequencing and the correlation between sequence patterns and objective coverage scores. Ö𦰀 Ö𦰀 L(x) = 𠰀 (𩐀𥠀 |x<𥠀 ) = 𠰀 (𩐀𥠀 |𩐀1, 𩐀2, ..., 𩐀𥠀 −1). (7) 𥠀=1 𥠀=1 Through model training, we can delineate the relationships between each token 𩐀𥠀 and acquire the execution rules pertinent to the current test sequence by optimizing the objective function. For example, regarding the existing tokens in a test sequence { ADD x1 x2 }, the transformer model predicts the subsequent token from among the valid register tokens in the vocabulary. Furthermore, the ine-tuned transformer model accurately preserves the syntax of the instruction while also ensuring the proper instruction dependencies. By predicting the subsequent tokens for instruction opcodes and operands, we can ascertain valid register symbols and dependencies, the order of memory access, and the integrity of the control low. Additionally, a valid test program should achieve correct termination and maintain the integrity of both computational semantics and architectural constraints. On one hand, the ine-tuned transformer model facilitates the appropriate next tokens and execution low; on the other hand, we have developed a RISC-V instruction syntax interpreter to guarantee the validity of each instruction following successive updates to the sequence. Training of the transformer model is carried out with a sequence generation task that focuses on recovering the input sequence from the inal hidden state, denoted as h𦰀, of the transformer block through transformations applied in the decoder. The objective of this task is to minimize the diference between the input sequence generated by the encoder and the output sequence produced by the decoder. A cross-entropy (CE) loss function

ACM Trans. Des. Autom. Electron. Syst.

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 13

is employed to quantify the discrepancy between the predicted token and the corresponding ground-truth token. The objective function is derived from the likelihood of the token sequence, which facilitates the evaluation of the similarity between the source token sequence x, 𥠀 ∈ {1, 2, ..., 𦰀} of length 𦰀 and the target token sequence y, 𥰀 ∈ {1, 2, ..., 𦠀} of length 𦠀. Furthermore, the transformer model is updated by minimizing the loss function. The transformer model operates by applying a multi-head self-attention mechanism to the input tokens x. This process is followed by an embedding and positional encoding phase, which converts distinct token vectors into continuous embedded representations. These representations are then subjected to position-wise feed-forward layers, ultimately generating an output distribution for target tokens through a sotmax operation. The probability 𠰀 (𩠀𥰀 = 𤀀 |x, y<𥰀 ) of the target token 𩠀𥰀 , which corresponds to a token identiier 𨰀 within the vocabulary, is computed based on the preceding tokens y<𥰀 and the source token sequence x. This calculation involves a linear projection of the last hidden state h𦰀, resulting in the vector z𥰀 containing scores for the potential next tokens in a vocabulary of size 𤐀𨰀𧀀𤀀𣠀𣰀. Subsequently, a sotmax function is applied to convert these scores into a probability distribution. The cross-entropy loss, which quantiies the divergence between a target sequence y and input sequence x, is deined in Equation (8), LossCE = −𦠀1 𥰀𦠀=1 𤐀𨰀𧀀𤀀𣠀𣰀𤀀=1 log 𠰀 (𩠀𥰀 = 𤀀 |𩐀, 𩠀<𥰀 ) (8) = −𦠀1 𥰀𦠀=1 𤐀𨰀𧀀𤀀𣠀𣰀𤀀 log Í𤐀 exp(z𥰀,𤀀 ) , =1 𨰀=𨰀𧀀𤀀𣠀𣰀1 exp(z𥰀,𨰀) where z𥰀,𤀀 represents the score associated with the token ID 𤀀 within the score vector z𥰀 corresponding to the target token 𩠀𥰀 in the sequence y. Training Task 2: Coverage Score Prediction of Test Sequences. Following the training of the sequential model aimed at minimizing cross-entropy loss, we utilize the ine-tuned transformer model denoted as Transformergen as the foundational component. In conjunction, we append a multi-layer neural network model referred to as MLP to serve as the prediction head for a coverage prediction task. This task is characterized as a supervised multi-target regression task focused on forecasting coverage scores s = (𨀀1, 𨀀2, ..., 𨀀𦐀 ), where 𦐀 represents the number of coverage types involved in the measurement, set at 𦐀 = 4 within our framework. The computational process of the transformer body in conjunction with the prediction head model is elucidated in Equation (9), hns = Transformergen(x), (9) = MLP(hn [ 0]) , where x represent the input sequence, while h𦰀 denotes the inal hidden state of the transformer architecture. The irst token of the hidden state, h𦰀 [0], is utilized for subsequent predictions. The objective of the training process is to minimize the mean square error (MSE), as deined in Equation (10), between the predicted coverage scores ˆ𨀀 and the corresponding ground truth coverage scores ¯𨀀.

1 𦐀 LossMSE = 𦐀𥠀=1 (𨀀ˆ𥠀 − 𨀀¯𥠀 )2 . (10) 4.5 Test Sequence Updates and Correction The sequence update and correction process comprises three principal components: (1) A coverage score predictor, which utilizes a transformer-based neural network consisting of two layers to map instruction sequences to their corresponding predicted coverage scores. (2) A gradient-based optimizer employs an iterative reinement ACM Trans. Des. Autom. Electron. Syst.

14 • Y. Lu et al.

approach using gradients derived from the predicted coverage scores. (3) A syntax correction tool, which ensures that the optimized sequences adhere to the ISA compliance and maintain syntactical correctness. Utilizing the transformer-based model, each instruction within the sequence is tokenized and embedded within a continuous representation. Positional encodings are employed to preserve the sequence order, while special tokens indicating the beginning and end of sequences demarcate the boundaries. To achieve a predetermined coverage score that exceeds the original predicted score, a constrained gradient-based optimizer is implemented to update the continuous representation of the instruction sequence iteratively. The sequence update mechanism is illustrated in Fig. 4, where the dotted arrows in the igure indicate the various levels of continuous representations of sequences. The iterative process commences with a randomly initialized continuous representation, from which it computes the gradients of the predicted coverage scores with ISA constraints and syntactical requirements, guiding the modiications to minimize the diference between the predicted and target coverage scores. We implement two-level constraints before and following the gradient updating process to ensure that the optimized sequence remains compliant with the ISA and maintains syntactical accuracy. During each gradient update iteration, we enforce ISA constraints by projecting the gradients into the feasible region deined by the ISA. This projection process is essential to convert calculated gradients into constrained gradients, thereby ensuring valid register utilization by setting gradients of reserved registers to zero and adhering to opcode and immediate value ranges pertinent to the speciic instruction types. Upon completion of the gradient updates, we validate the optimized sequence through a correction tool that assesses legality and rectiies any incompatible elements. A syntax correctness checker evaluates the syntactic integrity of the instruction sequences by examining instruction dependencies, memory access patterns, and control low stability. Furthermore, the correction script interprets each instruction within the optimized sequence, reallocates reserved registers by assigning available alternatives, corrects out-of-range immediate values by generating new values within deined maximum or minimum boundaries, and adjusts opcodes to the nearest similar alternative while incorporating random operands.

Original Updated | Hidden States | | Hidden States | JEN.d<latexitsha1_base64="J7Z4l6l4xpH4EkC/9P/e4FTqF5g=">AAAB8HicbVC7SgNBFJ31GeMrKtjYLAbBKuxaRMsQG8sEzEOSJczO3k2GzGOZmRXCkq+wsVDEVvAv/AI7G7/FyaPQxAMXDufcy733hAmj2njel7Oyura+sZnbym/v7O7tFw4Om1qmikCDSCZVO8QaGBXQMNQwaCcKMA8ZtMLh9cRv3YPSVIpbM0og4LgvaEwJNla6i3oZlxGwca9Q9EreFO4y8eekWDmuf9P36ketV/jsRpKkHIQhDGvd8b3EBBlWhhIG43w31ZBgMsR96FgqMAcdZNODx+6ZVSI3lsqWMO5U/T2RYa71iIe2k2Mz0IveRPzP66QmvgoyKpLUgCCzRXHKXCPdyfduRBUQw0aWYKKovdUlA6wwMTajvA3BX3x5mTQvSn65VK7bNKpohhw6QafoHPnoElXQDaqhBiKIowf0hJ4d5Tw6L87rrHXFmc8coT9w3n4ACl6USA== model ] h<latexitsha1_base64="62kCpsKiMgL44+6ph+oLXCHI1pA=">AAAB63icbVC7SgNBFL0bXzG+ooKNzWAQrMKuRbQMsbFMwDwgWcLsZDYZMjO7zMwKYckv2FgoYmvpX/gFdjZ+i7NJCk08cOFwzr3ce08Qc6aN6345ubX1jc2t/HZhZ3dv/6B4eNTSUaIIbZKIR6oTYE05k7RpmOG0EyuKRcBpOxjfZH77nirNInlnJjH1BR5KFjKCTSaN+rLQL5bcsjsDWiXegpSqJ41v9l77qPeLn71BRBJBpSEca9313Nj4KVaGEU6nhV6iaYzJGA9p11KJBdV+Ort1is6tMkBhpGxJg2bq74kUC60nIrCdApuRXvYy8T+vm5jw2k+ZjBNDJZkvChOOTISyx9GAKUoMn1iCiWL2VkRGWGFibDxZCN7yy6ukdVn2KuVKw6ZRgznycApncAEeXEEVbqEOTSAwggd4gmdHOI/Oi/M6b805i5lj+APn7QdnQ5GJ n Pra d<latexitsha1_base64="J7Z4l6l4xpH4EkC/9P/e4FTqF5g=">AAAB8HicbVC7SgNBFJ31GeMrKtjYLAbBKuxaRMsQG8sEzEOSJczO3k2GzGOZmRXCkq+wsVDEVvAv/AI7G7/FyaPQxAMXDufcy733hAmj2njel7Oyura+sZnbym/v7O7tFw4Om1qmikCDSCZVO8QaGBXQMNQwaCcKMA8ZtMLh9cRv3YPSVIpbM0og4LgvaEwJNla6i3oZlxGwca9Q9EreFO4y8eekWDmuf9P36ketV/jsRpKkHIQhDGvd8b3EBBlWhhIG43w31ZBgMsR96FgqMAcdZNODx+6ZVSI3lsqWMO5U/T2RYa71iIe2k2Mz0IveRPzP66QmvgoyKpLUgCCzRXHKXCPdyfduRBUQw0aWYKKovdUlA6wwMTajvA3BX3x5mTQvSn65VK7bNKpohhw6QafoHPnoElXQDaqhBiKIowf0hJ4d5Tw6L87rrHXFmc8coT9w3n4ACl6USA== model 1 Embeddings | h<latexitsha1_base64="YOsdq4+jboHcekOQ0pE8d5d3qgg=">AAAB+HicbVDLTgIxFO3gC/HBqEs3DcSEFZlxAS6JblxiIo8ExkmndKCh7UzajglO+BI2LjTGrX/h1p3Rj7EDLBQ8yU1Ozrm3vfcEMaNKO86nldvY3Nreye8W9vYPDov20XFbRYnEpIUjFslugBRhVJCWppqRbiwJ4gEjnWB8lfmdeyIVjcStnsTE42goaEgx0kby7eLIT8X0Lu3HknIyLfh22ak6c8B14i5JuVGqfH/V32dN3/7oDyKccCI0ZkipnuvE2kuR1BQz82A/USRGeIyGpGeoQJwoL50vPoVnRhnAMJKmhIZz9fdEirhSEx6YTo70SK16mfif10t0eOGlVMSJJgIvPgoTBnUEsxTggEqCNZsYgrCkZleIR0girE1WWQju6snrpH1edWvV2o1J4xIskAenoAQqwAV10ADXoAlaAIMEzMATeLYerEfrxXpdtOas5cwJ+APr7Qcs1pcq ′n 2 | Embeddings |

IN d<latexitsha1_base64="J7Z4l6l4xpH4EkC/9P/e4FTqF5g=">AAAB8HicbVC7SgNBFJ31GeMrKtjYLAbBKuxaRMsQG8sEzEOSJczO3k2GzGOZmRXCkq+wsVDEVvAv/AI7G7/FyaPQxAMXDufcy733hAmj2njel7Oyura+sZnbym/v7O7tFw4Om1qmikCDSCZVO8QaGBXQMNQwaCcKMA8ZtMLh9cRv3YPSVIpbM0og4LgvaEwJNla6i3oZlxGwca9Q9EreFO4y8eekWDmuf9P36ketV/jsRpKkHIQhDGvd8b3EBBlWhhIG43w31ZBgMsR96FgqMAcdZNODx+6ZVSI3lsqWMO5U/T2RYa71iIe2k2Mz0IveRPzP66QmvgoyKpLUgCCzRXHKXCPdyfduRBUQw0aWYKKovdUlA6wwMTajvA3BX3x5mTQvSn65VK7bNKpohhw6QafoHPnoElXQDaqhBiKIowf0hJ4d5Tw6L87rrHXFmc8coT9w3n4ACl6USA== model : e<latexitsha1_base64="T2juPQ80X1tWe9w/EqTKQqIssi4=">AAAB7HicbVA9SwNBEJ2LXzF+RQUbm8MgWIU7i2gZYmOZgJcEkiPsbeaSJXt7x+6eEI78BhsLRWzt/Bf+Ajsbf4ubj0ITHww83pthZl6QcKa043xZubX1jc2t/HZhZ3dv/6B4eNRUcSopejTmsWwHRCFnAj3NNMd2IpFEAcdWMLqZ+q17lIrF4k6PE/QjMhAsZJRoI3nYy9ikVyw5ZWcGe5W4C1KqnjS+2Xvto94rfnb7MU0jFJpyolTHdRLtZ0RqRjlOCt1UYULoiAywY6ggESo/mx07sc+N0rfDWJoS2p6pvycyEik1jgLTGRE9VMveVPzP66Q6vPYzJpJUo6DzRWHKbR3b08/tPpNINR8bQqhk5labDokkVJt8CiYEd/nlVdK8LLuVcqVh0qjBHHk4hTO4ABeuoAq3UAcPKDB4gCd4toT1aL1Yr/PWnLWYOYY/sN5+AOqAknk= i d<latexitsha1_base64="J7Z4l6l4xpH4EkC/9P/e4FTqF5g=">AAAB8HicbVC7SgNBFJ31GeMrKtjYLAbBKuxaRMsQG8sEzEOSJczO3k2GzGOZmRXCkq+wsVDEVvAv/AI7G7/FyaPQxAMXDufcy733hAmj2njel7Oyura+sZnbym/v7O7tFw4Om1qmikCDSCZVO8QaGBXQMNQwaCcKMA8ZtMLh9cRv3YPSVIpbM0og4LgvaEwJNla6i3oZlxGwca9Q9EreFO4y8eekWDmuf9P36ketV/jsRpKkHIQhDGvd8b3EBBlWhhIG43w31ZBgMsR96FgqMAcdZNODx+6ZVSI3lsqWMO5U/T2RYa71iIe2k2Mz0IveRPzP66QmvgoyKpLUgCCzRXHKXCPdyfduRBUQw0aWYKKovdUlA6wwMTajvA3BX3x5mTQvSn65VK7bNKpohhw6QafoHPnoElXQDaqhBiKIowf0hJ4d5Tw6L87rrHXFmc8coT9w3n4ACl6USA== model 1] Tokens e<latexitsha1_base64="Li1NcBFF6ot97Rg23eRGBKmYF94=">AAAB9XicbVDLSgNBEJyNrxhfUY9ehgQhp7DrIfEY9OIxgnlAsgmzk95kyMzuMjOrhGX/QxAPinj1O7x6E/0YJ4+DJhY0FFXddHd5EWdK2/anlVlb39jcym7ndnb39g/yh0dNFcaSQoOGPJRtjyjgLICGZppDO5JAhMeh5Y0vp37rFqRiYXCjJxG4ggwD5jNKtJF60E9Y2ku6kWQC0n6+aJftGfAqcRakWCuUvr+q7w/1fv6jOwhpLCDQlBOlOo4daTchUjPKIc11YwURoWMyhI6hARGg3GR2dYpPjTLAfihNBRrP1N8TCRFKTYRnOgXRI7XsTcX/vE6s/XM3YUEUawjofJEfc6xDPI0AD5gEqvnEEEIlM7diOiKSUG2CypkQnOWXV0nzrOxUypVrk8YFmiOLTlABlZCDqqiGrlAdNRBFEt2jJ/Rs3VmP1ov1Om/NWIuZY/QH1tsPbneW3Q== ′i Tokens d<latexitsha1_base64="JWl9PS9tIYwHGoDSMVdUiKgIrpA=">AAAB8HicbVC7SgNBFL0bXzG+ooKNzWAQrMKuRbQMsbFMwDwkWcLs7CQZMrOzzMwGwpKvsLFQxFbwL/wCOxu/xcmj0MQDFw7n3Mu99wQxZ9q47peTWVvf2NzKbud2dvf2D/KHRw0tE0VonUguVSvAmnIW0bphhtNWrCgWAafNYHgz9ZsjqjST0Z0Zx9QXuB+xHiPYWOk+7KYjSXAw6eYLbtGdAa0Sb0EK5ZPaN3uvfFS7+c9OKEkiaGQIx1q3PTc2foqVYYTTSa6TaBpjMsR92rY0woJqP50dPEHnVglRTypbkUEz9fdEioXWYxHYToHNQC97U/E/r52Y3rWfsihODI3IfFEv4chINP0ehUxRYvjYEkwUs7ciMsAKE2MzytkQvOWXV0njsuiViqWaTaMCc2ThFM7gAjy4gjLcQhXqQEDAAzzBs6OcR+fFeZ23ZpzFzDH8gfP2AwFelEI= vocab x<latexitsha1_base64="oC/pNPh0fTE9FVUbjZ7SimfnYfo=">AAAB7HicbVC7SgNBFL3rM8ZXVLCxGQyCVdi1iJYhNpYJuEkgWcLsZDYZMjO7zMyKYck32FgoYmvnX/gFdjZ+i5NHoYkHLhzOuZd77wkTzrRx3S9nZXVtfWMzt5Xf3tnd2y8cHDZ0nCpCfRLzWLVCrClnkvqGGU5biaJYhJw2w+H1xG/eUaVZLG/NKKGBwH3JIkawsZJ/383YuFsouiV3CrRMvDkpVo7r3+y9+lHrFj47vZikgkpDONa67bmJCTKsDCOcjvOdVNMEkyHu07alEguqg2x67BidWaWHoljZkgZN1d8TGRZaj0RoOwU2A73oTcT/vHZqoqsgYzJJDZVktihKOTIxmnyOekxRYvjIEkwUs7ciMsAKE2PzydsQvMWXl0njouSVS+W6TaMKM+TgBE7hHDy4hArcQA18IMDgAZ7g2ZHOo/PivM5aV5z5zBH8gfP2Awenkow= ai x<latexitsha1_base64="fnwhUnUJAQfUoRQ7gQy2FY4BWys=">AAAB+HicbVC7TsMwFHXKq5RHA4wsViukTlXCQBkrWBiLRB9SGyLHdVqrthPZDqJE/ZIuDCDEyl+wsiH4GJy2A7Qc6UpH59xr33uCmFGlHefTyq2tb2xu5bcLO7t7+0X74LClokRi0sQRi2QnQIowKkhTU81IJ5YE8YCRdjC6zPz2HZGKRuJGj2PicTQQNKQYaSP5dvHeT+nkNu3FknIyKfh22ak6M8BV4i5IuV6qfH/V3qcN3/7o9SOccCI0ZkipruvE2kuR1BQz82AvUSRGeIQGpGuoQJwoL50tPoEnRunDMJKmhIYz9fdEirhSYx6YTo70UC17mfif1010eO6lVMSJJgLPPwoTBnUEsxRgn0qCNRsbgrCkZleIh0girE1WWQju8smrpHVadc+qZ9cmjQswRx4cgxKoABfUQB1cgQZoAgwSMAVP4Nl6sB6tF+t13pqzFjNH4A+stx8+K5c1 ′i / d<latexitsha1_base64="JWl9PS9tIYwHGoDSMVdUiKgIrpA=">AAAB8HicbVC7SgNBFL0bXzG+ooKNzWAQrMKuRbQMsbFMwDwkWcLs7CQZMrOzzMwGwpKvsLFQxFbwL/wCOxu/xcmj0MQDFw7n3Mu99wQxZ9q47peTWVvf2NzKbud2dvf2D/KHRw0tE0VonUguVSvAmnIW0bphhtNWrCgWAafNYHgz9ZsjqjST0Z0Zx9QXuB+xHiPYWOk+7KYjSXAw6eYLbtGdAa0Sb0EK5ZPaN3uvfFS7+c9OKEkiaGQIx1q3PTc2foqVYYTTSa6TaBpjMsR92rY0woJqP50dPEHnVglRTypbkUEz9fdEioXWYxHYToHNQC97U/E/r52Y3rWfsihODI3IfFEv4chINP0ehUxRYvjYEkwUs7ciMsAKE2MzytkQvOWXV0njsuiViqWaTaMCc2ThFM7gAjy4gjLcQhXqQEDAAzzBs6OcR+fFeZ23ZpzFzDH8gfP2AwFelEI= vocab ADD x0, x17, x26 | ADD x8, x17, x26 | | JAL x23, 13356 || JAL x27, 64308 | BLT x19, x1, 768 | | … 1 | BEQ x14, x31, 1656 … i LW x5, x1, -308 | LW x5, x1, 1618

Fig. 4. The gradient-based sequence update flow via transformer language model.

ACM Trans. Des. Autom. Electron. Syst.

( ( (

( ( (

(<latexitsha1_base64="hwEvS6s7EhEAAl74HbsI950M5gY=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5V0gbGChbFI9AM1UeW4TmrVdiLbQaqi/goWBhBi5eew8W9w2wzQ8qSTnt670929MOVMG9f9dkobm1vbO+Xdyt7+weFR9fikq5NMEdohCU9UP8SaciZpxzDDaT9VFIuQ0144uZ37vSeqNEvkg5mmNBA4lixiBBsrPdb9GxbHfl4fVmtuw10ArROvIDUo0B5Wv/xRQjJBpSEcaz3w3NQEOVaGEU5nFT/TNMVkgmM6sFRiQXWQLw6eoQurjFCUKFvSoIX6eyLHQuupCG2nwGasV725+J83yEx0HeRMppmhkiwXRRlHJkHz79GIKUoMn1qCiWL2VkTGWGFibEYVG4K3+vI66TYbntvw7pu1Vr2IowxncA6X4MEVtOAO2tABAgKe4RXeHOW8OO/Ox7K15BQzp/AHzucP02ePqA==<latexitsha1_base64="hwEvS6s7EhEAAl74HbsI950M5gY=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5V0gbGChbFI9AM1UeW4TmrVdiLbQaqi/goWBhBi5eew8W9w2wzQ8qSTnt670929MOVMG9f9dkobm1vbO+Xdyt7+weFR9fikq5NMEdohCU9UP8SaciZpxzDDaT9VFIuQ0144uZ37vSeqNEvkg5mmNBA4lixiBBsrPdb9GxbHfl4fVmtuw10ArROvIDUo0B5Wv/xRQjJBpSEcaz3w3NQEOVaGEU5nFT/TNMVkgmM6sFRiQXWQLw6eoQurjFCUKFvSoIX6eyLHQuupCG2nwGasV725+J83yEx0HeRMppmhkiwXRRlHJkHz79GIKUoMn1qCiWL2VkTGWGFibEYVG4K3+vI66TYbntvw7pu1Vr2IowxncA6X4MEVtOAO2tABAgKe4RXeHOW8OO/Ox7K15BQzp/AHzucP02ePqA==<latexitsha1_base64="hwEvS6s7EhEAAl74HbsI950M5gY=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5V0gbGChbFI9AM1UeW4TmrVdiLbQaqi/goWBhBi5eew8W9w2wzQ8qSTnt670929MOVMG9f9dkobm1vbO+Xdyt7+weFR9fikq5NMEdohCU9UP8SaciZpxzDDaT9VFIuQ0144uZ37vSeqNEvkg5mmNBA4lixiBBsrPdb9GxbHfl4fVmtuw10ArROvIDUo0B5Wv/xRQjJBpSEcaz3w3NQEOVaGEU5nFT/TNMVkgmM6sFRiQXWQLw6eoQurjFCUKFvSoIX6eyLHQuupCG2nwGasV725+J83yEx0HeRMppmhkiwXRRlHJkHz79GIKUoMn1qCiWL2VkTGWGFibEYVG4K3+vI66TYbntvw7pu1Vr2IowxncA6X4MEVtOAO2tABAgKe4RXeHOW8OO/Ox7K15BQzp/AHzucP02ePqA==<latexitsha1_base64="hwEvS6s7EhEAAl74HbsI950M5gY=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5V0gbGChbFI9AM1UeW4TmrVdiLbQaqi/goWBhBi5eew8W9w2wzQ8qSTnt670929MOVMG9f9dkobm1vbO+Xdyt7+weFR9fikq5NMEdohCU9UP8SaciZpxzDDaT9VFIuQ0144uZ37vSeqNEvkg5mmNBA4lixiBBsrPdb9GxbHfl4fVmtuw10ArROvIDUo0B5Wv/xRQjJBpSEcaz3w3NQEOVaGEU5nFT/TNMVkgmM6sFRiQXWQLw6eoQurjFCUKFvSoIX6eyLHQuupCG2nwGasV725+J83yEx0HeRMppmhkiwXRRlHJkHz79GIKUoMn1qCiWL2VkTGWGFibEYVG4K3+vI66TYbntvw7pu1Vr2IowxncA6X4MEVtOAO2tABAgKe4RXeHOW8OO/Ox7K15BQzp/AHzucP02ePqA==

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 15

5 EXPERIMENTS 5.1 Experimental Setings Dataset. The dataset comprises instruction-level test sequences generated by the open-source ISG known as FORCE-RISCV [12]. This generator adheres to the constraints outlined in the RISC-V ISA speciications. FORCE- RISCV employs a random selection process for instructions, registers, addresses, and immediate values to produce valid instructions that fully support the RISC-V ISAs, encompassing RV64G, as well as RV64 IMAFDC, U, S, and M privilege levels, traps, exceptions, and virtual memory systems featuring page sizes of 4 KB, 2 MB, 1 GB, and 512 GB. The instruction stream generator utilizes templates that encompass the instruction set, weights, address constraints, and relative conigurations to create executable iles in ELF format and disassembled text iles of the generated instructions, as depicted in Fig. 5. The test templates, implemented in Python, facilitate control over the instruction generation process by invoking APIs designed for test sequence generation to deine constraints and instruction types. These Python templates are designed to be user-friendly, allowing for easy determination of the sequence of generated instructions and the corresponding constraints. Moreover, the generation engine integrates interfaces with the instruction set simulator [35], which is capable of simulating the behavior of the generated instructions. The resultant output includes a standard ELF ile and a disassembled text ile. Model Conigurations. The transformer language model utilized within our DeepVeriier framework processes input sequences of up to 1024 tokens. It consists of 512 embeddings, 4 blocks of multi-head attention layers with 4 attention heads in each. The model is capable of generating outputs with a maximum length of 1024 tokens, which is subsequently followed by a four-layer feed-forward network. Both the encoder and decoder maintain hidden state dimensions of 512 while the intermediate size of the feed-forward network is set at 1024. The vocabulary encompasses a total of 1216 token types. Key parameters include the maximum input length, embedding dimensions, the number of blocks, the number of attention heads, hidden dimensions, and the number of layers within the feed-forward network. Training Conigurations. We utilize PyTorch version 2.4 .0, Transformer version 4.39.2, and Tokenizers version 0.15.2 to implement the model. The training process is conducted on a single Nvidia GeForce RTX 4090 GPU which has 24 GB of memory, alongside the Nvidia CUDA Toolkit version 12.4. The ine-tuned transformer model

Random Instruction Stream Generator Architecture Model G0;03<9:2; 5; =:;0 Python Test Registers Templates Template Instruction /01 23 4 Paser CS@ 56708 9:2; É Handler @FF001 J ?40 1 J ?4 @FF J ?4 @FF ?0F II:?0F: I:?0F Sequence Data Generator Generator S:1 > ?<923 @BC F 5H I I:?0F É Instruction Set Simulator Fig. 5. Dataset generation via random instruction stream generator. ACM Trans. Des. Autom. Electron. Syst. 16 • Y. Lu et al. Table 1. Descriptions of coverage metrics.

Coverage Descriptions Line Lines or statements were executed. Toggle Signals toggle from 0 to 1 and 1 to 0. Condition Both true and false states of conditions were covered. Branch If, case statements, the ternary operator (?:) branches of execution.

achieves convergence after training for 100 epochs, utilizing a batch size of 2, an initial learning rate of 5𤠀 − 5, and a weight decay of 0.01 as part of the AdamW optimizer. The learning rate is adjusted according to a cosine function, decreasing from the initial value to zero, with several hard restarts after a warm-up period characterized by a linear increase from zero to the initial value. To converge the coverage predictor, we train a neural network with 4 hidden layers, employing an initial learning rate of 1𤠀 − 3, a weight decay of 0.01 as part of the AdamW optimizer, a batch size of 32, and 100 epochs. The training and validation datasets are randomly shuled and split into a ratio of 80% for training and 20% for validation. The training process involves two main components, the pertaining phase of a language model and a down- stream task of the coverage predictor. For the ine-tuning phase using the BART-base model with 139M parameters of open-source Hugging Face [36], each epoch processes approximately 1500 sequences in a total time of 4 hours for the full 100 epochs. The relatively long training time can be attributed to the small batch size of 2, which was necessitated by GPU memory constraints. Despite these limitations, the model successfully converges to achieve optimal performance. The subsequent training of the coverage predictor is considerably faster, completing in approximately 40 minutes due to its simpler architecture and 4 batch size capability. Coverage Metrics. The coverage score serves as a fundamental evaluation criterion for assessing the eicacy of the proposed sequence update strategy utilizing the transformer model of test sequences. The primary aim is to enhance the coverage score of each test sequence. The deinitions of the coverage metrics commonly employed to evaluate the quality of input test sequences are presented in Table 1. By compiling and executing the ELF ile corresponding to each test sequence on a simulated processor platform, we can obtain coverage scores for the generated input test sequences. This simulation utilizes a modest coniguration of the Berkeley Out-of-Order Machine (BOOM) [37] and RocketChip derived from [38], facilitated by the Synopsys VCS [25] and Verilator [24] simulators.

5.2 Experimental Results We compare the proposed test sequence update methodology against three diferent approaches: random gen- eration, mutational fuzzing testing, and methods based on long short-term memory (LSTM). For the random generation, we utilize a template iteratively provided by FORCE-RISCV ISG, employing diverse random seeds to generate 1000 test sequences, each containing between 128 and 256 instructions of the RV64GC ISA. To implement the mutation method, we adopt the same strategy utilized in existing fuzzing test techniques for instructions, as referenced in prior works [20], and [18]. This involves modifying source registers, introducing random immediate values, and generating instructions from a pool of valid instruction candidates for each test sequence within a minute timeframe. To establish a conventional representation model for a test sequence, we utilize an LSTM-based token rep- resentation model [23]. Speciically, we construct a 4-hidden-layer LSTM model that mirrors the embedding dimensions of advanced transformer models, thereby enabling it to share the same coverage score predictor as the transformer model.

ACM Trans. Des. Autom. Electron. Syst.

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 17 Table 2. Average coverage and runtime of diferent test sequence generation methods.

Coverage/Runtime Random [12] Mutation [20] LSTM [23] DeepVeriier Line 76.753122 79.286422 78.791064 83.333333 Branch 71.879106 79.080824 77.777780 79.286422 Toggle 62.054130 66.666670 66.032580 68.977651 Runtime (s) 955 1066 150 176

20.0 DeepVerifier 15.0 Mutation LSTM 10.0 5.0 0.0 −5.0 Test Sequences IDs

Fig. 6. Average coverage improvement of DeepVerifer, LSTM body method, and mutation method over random generation.

In Table 2, we present the accumulation coverage scores and runtime performance across 1000 test sequences. Branch coverage implicitly includes condition coverage since taking both branches requires conditions to be evaluated to be both true and false. The coverage collector combines them as branch coverage. Our approach demonstrates average coverage improvements of 6.97%, 2.19%, and 3.00% compared to random generation, mutation fuzzing, and LSTM representation, respectively. Notably, while our method requires slightly more runtime than the LSTM model, it achieves this superior coverage in signiicantly less time than both random generation and mutation fuzzing, indicating better eiciency in reaching coverage targets. The runtime eiciency of DeepVeriier can be attributed to the ability of a transformer-based model to learn quickly and leverage patterns from existing test sequences and the gradient-based optimization process that eiciently guides sequence generation toward uncovered directions. This is particularly evident when compared to mutation fuzzing, which requires 5.6x more runtime while achieving lower coverage. The comparative analysis in Fig. 6 demonstrates the superior performance stability of DeepVeriier, maintaining an average improvement of 6 − 8% over random generation throughout the sequence range. This consistent performance advantage, combined with the favorable runtime characteristics, validates our transformer-based approach’s eiciency in achieving higher coverage with reasonable computational overhead. To provide deeper insights into the performance dynamics, we visualize the coverage improvements for each approach in Figs. 7 to 9. As shown in Fig. 7, our method maintains consistently positive improvements across test cases. This stable performance contrasts with mutation fuzzing (see Fig. 8), where some test sequences show declining coverage, likely due to imbalances in instruction types. The limitation of human-deined mu- tation strategies, which randomly modify operands and opcodes from a candidate pool, proves insuicient for maintaining diverse test patterns. The performance of LSTM model in Fig. 9 shows limitations in extracting meaningful patterns from assembly test sequences, with many sequences showing negative improvement compared to random generation. This

ACM Trans. Des. Autom. Electron. Syst.

Average Coverage Improvement (%)

100 200 300 400 500 600 700 800 900 1,000 18 • Y. Lu et al. suggests that the simpler sequential modeling of LSTM fails to capture the complex dependencies present in assembly code efectively. 20.0 Line 15.0 Branch Toggle 10.0 5.0 0.0 −5.0 Test Sequences IDs Fig. 7. Coverage improvement of our transformer method over random generation. 20.0 Line 15.0 Branch Toggle 10.0 5.0 0.0 −5.0 Test Sequences IDs Fig. 8. Coverage improvement of the mutation over random generation. 20.0 Line 15.0 Branch Toggle 10.0 5.0 0.0 −5.0 Test Sequences IDs Fig. 9. Coverage improvement of the LSTM body method over random generation. ACM Trans. Des. Autom. Electron. Syst.

Coverage Improvement (%) Coverage Improvement (%) Coverage Improvement (%)

100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000 DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 19

1,200.0 DeepVerifier 1,200.0 DeepVerifier 1,000.0 LSTM 1,000.0 LSTM 800.0 Mutation 800.0 Mutation 600.0 Random 600.0 Random 400.0 400.0 200.0 200.0 0.0 0.0

# of Test Cases # of Test Cases (a) (b)

Fig. 10. (a) Accumulated runtime of exercising test cases on RocketChip; (b) Accumulated runtime of exercising test cases on BOOM.

The success of our transformer-based approach stems from the sophisticated representation ability of the transformer language model, which excels in extracting contextual features from instruction-level test sequences, and accurately estimating correlations between input sequences and coverage scores. Furthermore, our approach rapidly evaluates sequence quality during early veriication stages and eiciently optimizes test sequence generation through gradient-based updates. To explore the scalability and generalizability of our approach, we conducted extensive experiments on two diferent RISC-V processor implementations, the in-order RocketChip and the out-of-order BOOM core. Fig. 10(a) illustrates the accumulated runtime trend for executing 1000 test cases on RocketChip, showing a near-linear growth pattern that suggests increasing complexity as more test cases are processed. In parallel, Fig. 10(b) presents the accumulated runtime of exercising 1000 test cases on the BOOM core, exhibiting a similar trend but with higher overhead due to BOOM’s more sophisticated out-of-order execution architecture. The comparative analysis between these two implementations reveals that while both processors show consistent scalability patterns, BOOM requires approximately 1.5x more runtime than RocketChip due to its complex microarchitectural features. This observation validates our approach’s generalizability across diferent processor architectures while highlighting the impact of architectural complexity on veriication overhead.

5.3 Ablation Study The primary objective of ablation analysis is to demonstrate the robustness of our proposed strategy and to conirm the proper functionality of each component. We assess the performance of components that incorporate the semantic representation capabilities of a transformer model, as well as the accuracy of the coverage score predictor. Additionally, we compare the transformer model with LSTM networks to validate its superior capabilities in semantic representations. The evaluation indings will elucidate the novelty and efectiveness of our test sequence generation and updates, which are aimed at enhancing the quality of exercises for processor veriication. In Fig. 11, the variation in loss during the training phase of the transformer model and the downstream coverage predictor is illustrated. Fig. 11(a) depicts the trajectory of the cross-entropy loss for the transformer language

ACM Trans. Des. Autom. Electron. Syst.

Accumulated Runtime (s)

Accumulated Runtime (s)

0 200 400 600 800 1, 000 0 200 400 600 800 1, 000

20 • Y. Lu et al.

model, which shows a decreasing trend that approaches a plateau as training epochs progress. This downward trend suggests that the model is efectively learning from the existing data distribution and demonstrating enhanced capacity to minimize the divergence between the learned distribution and the target distribution. The low loss values observed for both training and validation datasets indicate that the transformer model is proicient in representing and extracting semantic information from the input sequences. By utilizing the embedded hidden state vectors from the transformer model, the coverage predictor associates these represented sequences with their corresponding coverage scores. Fig. 11(b) demonstrates that the MSE of the coverage predictor declines over time throughout both the training and validation phases. This reduction signiies an improvement in the accuracy of predicting coverage scores for test sequences as the number of training epochs increases. To further substantiate the predictive capabilities of the transformer model in the downstream coverage prediction task, Fig. 12 illustrates the deviation of coverage scores generated by the transformer model with the ground truth scores provided by a simulator. On average, the deviation error remains below 1% for each coverage metric assessed. The training metrics detailed in Fig. 11, along with the prediction accuracy demonstrated in Fig. 12, ofer compelling evidence that the transformer method efectively represents test sequences. By deriving embedded vectors for these sequences, the downstream coverage predictor accurately characterizes the correlation between sequence representations and their corresponding coverage scores. Additionally, while LSTM models also generate embedded representations of sequences, Fig. 12(d) presents a comparative performance analysis between the transformer model and LSTM. Through the comprehensive visualizations associated with the training and validation processes of the trans- former model, as well as the evaluation of coverage score predictions, we aim to demonstrate the eicacy of the systematic training methodologies employed, as evidenced by precise loss rates. Furthermore, this analysis collectively substantiates the superiority of the transformer model in comparison to traditional LSTM language representation techniques and random test generation approaches. The transformer model exhibits reduced prediction errors and enhanced coverage scores, which are instrumental in optimizing test sequence generation for processor veriication.

100.0 Training 100.0 Training 80.0 Validation 80.0 Validation 60.0 60.0 40.0 40.0 20.0 20.0 0.0 0 20 40 60 80 100 0.0 0 20 40 60 80 100 Training Epochs Training Epochs (a) (b)

Fig. 11. (a) Training loss of the transformer language model; (b) Training loss of coverage prediction model. ACM Trans. Des. Autom. Electron. Syst.

Sequence Modeling Loss (%)

Coverage PredictionLoss (%)

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 21

GroundTruth GroundTruth Prediction Prediction

Test Sequences IDs Test Sequences IDs (a) (b)

GroundTruth LSTM Model Prediction Transformer Model

Test Sequences IDs Test Sequences IDs (c) (d) Fig. 12. (a) ∼ (c) Deviation of predicted line, branch, and toggle coverage with the ground truth in the test dataset. (d) Comparison of predicted error between transformer and LSTM model body. 6 Conclusion Veriication eforts are heavily based on the quality and diversity of test sequences, which are essential for efective veriication practices. The coverage-directed random generation method produces various test sequences; however, it requires human input to create test templates and conigure architectures. In addition, achieving coverage with simulators can be a time-consuming process. To harness the capabilities of language models for semantic representation and eicient sequence generation, we propose a coverage-directed test generation strategy that employs a transformer model and a coverage predictor based on randomly generated test sequences. This strategy can integrate semantic representations of instructions, quickly align coverage predictions to illustrate the relationship between sequences and coverage, and ultimately enhance sequences for improved coverage. While our current work demonstrates efectiveness on the RISC-V processor implementation, we acknowledge several important directions for future research, to enhance the scalability to evaluate DeepVeriier on larger ACM Trans. Des. Autom. Electron. Syst. Toggle Coverage (%) Line Coverage (%)

Coverage Prediction Error(%) Branch Coverage (%)

100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000 100 200 300 400 500 600 700 800 900 1,000

22 • Y. Lu et al.

conigurations with more complex out-of-order execution features, we need to investigate performance on designs with advanced features and handle increased state space in larger processor designs. In addition to exploring the generalizability of DeepVeriier, we need to extend to support other designs and adapt the instruction embedding and tokenization schemes for diferent ISAs. Create a more generic coverage modeling approach that can accommodate diverse architectural features, and the current implementation can serve as a proof-of-concept, and extensions would help establish DeepVeriier as a more comprehensive veriication solution for modern

processor designs.

References [1] Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanovic. 2016. The RISC-V Instruction Set Manual, Volume I: User-Level ISA Version 2.1. ECS Department, UC Berkeley, Technical Report UCB/EECS-2016-118 (May 2016). [2] Andrew Waterman, Yunsup Lee, Rimas Avižienis, David A Patterson, and Krste Asanovic. 2016. The RISC-V Instruction Set Manual, Volume II: Privileged Architecture Version 1.9. ECS Department, UC Berkeley, Technical Report UCB/EECS-2016-129 (July 2016). [3] Harry Foster. 2020. Wilson Research Group Functional Veriication Study. https://resources.sw.siemens.com/en-US/white-paper-2020- wilson-research-group-functional-veriication-study-ic-asic-fucntional-veriication-trend-report. (2020). [4] Jun Yuan, Carl Pixley, Adnan Aziz, and Ken Albin. 2003. A Framework for Constrained Functional Veriication. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 142ś145. [5] Ilya Wagner, Valeria Bertacco, and Todd Austin. 2005. Stresstest: An Automatic Approach to Test Generation via Activity Monitors. In ACM/IEEE Design Automation Conference (DAC). 783ś788. [6] Nathan Kitchen and Andreas Kuehlmann. 2007. Stimulus Generation for Constrained Tandom Simulation. In ACM/IEEE Design Automation Conference (DAC). 258ś265. [7] Rajdeep Mukherjee, Daniel Kroening, and Tom Melham. 2015. Hardware Veriication Using Software Analyzers. In IEEE Annual Symposium on VLSI (ISVLSI). 7ś12. [8] Jeyavijayan Rajendran, Vivekananda Vedula, and Ramesh Karri. 2015. Detecting Malicious Modiications of Data in Third-party Intellectual Property Cores. In ACM/IEEE Design Automation Conference (DAC). 1ś6. [9] Yanhong Zhou, Tiancheng Wang, Huawei Li, Tao Lv, and Xiaowei Li. 2015. Functional Test Generation for Hard-to-reach States Using Path Constraint Solving. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 35, 6 (2015), 999ś1011. [10] Rui Zhang, Calvin Deutschbein, Peng Huang, and Cynthia Sturton. 2018. End-to-end Automated Exploit Generation for Validating The Security of Processor Designs. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 815ś827. [11] Nursultan Kabylkas, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, and Jose Renau. 2021. Efective Processor Veriication with Logic Fuzzer Enhanced Co-simulation. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 667ś678. [12] Inc. Futurewei Technologies. 2024. FORCE-RISCV: An Instruction Sequence Generator (ISG) for The RISC-V Instruction Set Architecture. https://github.com/openhwgroup/force-riscv. (2024). [13] Google. 2020. RISCV-DV: An SV/UVM-based Open-source Instruction Generator for RISC-V Processor Veriication. https://github.com/ chipsalliance/riscv-dv. (2020). [14] Neel Gala and Marc Karasek. 2020. RISC-V Torture Test Generator. https://github.com/ucb-bar/riscv-torture. (2020). [15] Yangdi Lyu and Prabhat Mishra. 2020. Automated Test Generation for Activation of Assertions in RTL Models. In IEEE/ACM Asia and South Paciic Design Automation Conference (ASPDAC). 223ś228. [16] Rahul Kande, Addison Crump, Garrett Persyn, Patrick Jauernig, Ahmad-Reza Sadeghi, Aakash Tyagi, and Jeyavijayan Rajendran. 2022. heHuzz: Instruction fuzzing of processors using Golden-Reference models for inding Software-Exploitable vulnerabilities. In USENIX Security Symposium. 3219ś3236. [17] Sadullah Canakci, Leila Delshadtehrani, Furkan Eris, Michael Bedford Taylor, Manuel Egele, and Ajay Joshi. 2021. DirectFuzz: Automated Test Generation for RTL Designs Using Directed Graybox Fuzzing. In ACM/IEEE Design Automation Conference (DAC). 529ś534. [18] Vladimir Herdt, Sören Tempel, Daniel Große, and Rolf Drechsler. 2021. Mutation-based Compliance Testing for RISC-V. In IEEE/ACM Asia and South Paciic Design Automation Conference (ASPDAC). 55ś60. [19] Jaewon Hur, Suhwan Song, Dongup Kwon, Eunjin Baek, Jangwoo Kim, and Byoungyoung Lee. 2021. DifuzzRTL: Diferential Fuzz Testing to Find CPU Bugs. In IEEE Symposium on Security and Privacy (SP). 1286ś1303. [20] Kevin Laeufer, Jack Koenig, Donggyu Kim, Jonathan Bachrach, and Koushik Sen. 2018. RFUZZ: Coverage-directed Fuzz Testing of RTL on FPGAs. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1ś8. [21] Shai Fine and Avi Ziv. 2003. Coverage Directed Test Generation for Functional Veriication using Bayesian Networks. In ACM/IEEE Design Automation Conference (DAC). 286ś291. [22] Fanchao Wang, Hanbin Zhu, Pranjay Popli, Yao Xiao, Paul Bodgan, and Shahin Nazarian. 2018. Accelerating Coverage Directed Test Generation for Functional Veriication: A Neural Network-Based Framework. In ACM Great Lakes Symposium on VLSI (GLSVLSI).

ACM Trans. Des. Autom. Electron. Syst.

DeepVerifier: Learning to Update Test Sequences for Coverage-Guided Verification • 23 207ś212.

[23] Shobha Vasudevan, Wenjie Joe Jiang, David Bieber, Rishabh Singh, C Richard Ho, Charles Sutton, and oThers. 2021. Learning Semantic Representations to Verify Hardware Designs. Annual Conference on Neural Information Processing Systems (NIPS) 34 (2021), 23491ś23504. [24] CHIPS Alliance under The Linux Foundation. 2024. Verilator: The Fastest Verilog/SystemVerilog Simulator. https://github.com/verilator/ verilator. (2024). [25] Inc. Synopsys. 2024. Synopsys VCS® Functional Veriication Solution. https://www.synopsys.com/veriication/simulation/vcs.html. (2024). [26] Walker Anderson. 1992. Logical Veriication of The NVAX CPU Chip Design. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 306ś309. [27] David A Wood, Garth A Gibson, and Randy H Katz. 1990. Verifying a Multiprocessor Cache Controller Using Random Test Generation. European Design and Test Conference 7, 4 (1990), 13ś25. [28] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Empirical Methods in Natural Language Processing (EMNLP). 38ś45. [29] Xuezixiang Li, Yu Qu, and Heng Yin. 2021. Palmtree: Learning An Assembly Language Model for Instruction Embedding. In ACM Conference on Computer and Communications Security (CCS). 3236ś3251. [30] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 4171ś4186. [31] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, and oThers. 2018. Improving Language Understanding by Generative Pre-training. In OpenAI. [32] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A Robustly Optimized Bert Pretraining Approach. In arXiv preprint. arXiv:1907.11692. [33] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2020. Bart: Denoising Sequence-to-sequence Ore-training for Natural Language Generation, Translation, and Comprehension. In Annual Meeting of the Association for Computational Linguistics (ACL). 7871ś7880. [34] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. Annual Conference on Neural Information Processing Systems (NIPS) 30 (2017). [35] RISC-V Software. 2019. Spike RISC-V ISA Simulator. https://github.com/riscv-software-src/riscv-isa-sim. (2019). [36] Facebook. BART(base-size model). https://huggingface.co/facebook/bart-base. ([n. d.]). [37] Christopher Celio, David A Patterson, and Krste Asanovic. 2015. The Berkeley Out-of-order Machine (BOOM): An Industry-competitive, Synthesizable, Parameterized RISC-V Processor. ECS Department, UC Berkeley, Technical Report UCB/EECS-2015-167 (June 2015). [38] Asanovic Krste, Avizienis Rimas, Bachrach Jonathan, Beamer Scott, Biancolin David, Celio Christopher, Cook Henry, Dabbelt Palmer, Hauser John, Izraelevitz Adam, Karandikar Sagar, Keller Benjamin, Kim Donggyu, Koenig John, Lee Yunsup, Love Eric, Maas Martin, Magyar Albert, Mao Howard, Moreto Miquel, Ou Albert, Patterson David, Richards Brian, Schmidt Colin, Twigg Stephen, Vo Huy, and Waterman Andrew. 2016. The Rocket Chip Generator. ECS Department, UC Berkeley, Technical Report UCB/EECS-2016-17 (April 2016).

Received 7 November 2024; revised 21 January 2025; accepted 17 February 2025

ACM Trans. Des. Autom. Electron. Syst.