Source 53f1a998... — STIMSMITH

SOURCE ARCHIVE

SHA256: 53f1a998ba53d91548b0e7d7c2b23f7aba29c61812f3bdfbbfcc5d6e2fcaadf9

URL: https://upcommons.upc.edu/server/api/core/bitstreams/6b9eae74-88d3-45d4-a145-cc2d63edc522/content

TYPE: application/pdf

SIZE: 1517.1 KB

FETCHED: 6/1/2026, 10:08:09 AM

EXTRACTOR: liteparse

CHARS: 39,371

EXTRACTED CONTENT

39,371 chars

Functional Verification of a RISC-V Vector Accelerator

Victor Jimenez∗, Mario Rodriguez†, Marc Dominguez†, Josep Sans∗, Ivan Diaz‡, Luca Valente§, Vito Luca Guglielmi¶, Josue Quiroga‡, Roberto Genovese‡, Nehir Sonmez‡, Oscar Palomar‡ and Miquel Moreto‡ ∗name.surname@semidynamics.com †name.surname@codasip.com ‡name.surname@bsc.es §name.surname@unibo.it ¶vitoluca95guglielmi@gmail.com

                                             Validation Team
                                       Computer Sciences Department
                                     Barcelona Supercomputing Center

Abstract—We present the functional verification efforts for an element widths via dedicated configuration registers. Our main academic RISC-V based vector accelerator, successfully taped-out goal in this project was to verify our novel, decoupled vector in the context of the European Processor Initiative. For our novel accelerator functionally, which implemented version 0.7.1 of RISC-V based decoupled vector accelerator, we built a verification infrastructure consisting of a UVM environment, performing step the RVV and was connected to the scalar processor core via by step co-simulation of all vector instructions, using the Spike the Open Vector Interface (OVI) [12]. instruction set simulator as a reference model. Furthermore, With RISC-V many groups have appeared in the open for validating this complex design connected to a scalar core source world contributing to the community with their projects. using a custom interface, we provided automated constrained- Groups like OpenHW have designed and developed verification random test generation, simulation and error reporting, and CI/CD infrastructure. We found 3005 errors during this process environments [10] for many designs as Parallel Ultra Low and reached 95.79% functional coverage. Power (PULP) designs such as RI5CY, Ariane and Ibex. Index Terms—verification, RISC-V, vector accelerator, UVM, The main contributions of this paper are: coverage, random binary generation • Description of an industrial grade verification approach, 1. INTRODUCTION with UVM testbench, reference model, assertions and coverage, for a modern RISC-V vector accelerator. Many open source and research hardware projects have • Implementation of a common UVM testbench for a novel emerged in the past decade, in which the main objective was interface design and a large-scale RTL project. to tape out an entire processing system [8, 9, 4]. To this end, a • Result comparison of each completed vector instruc- significant effort in design verification must be made in order tion against the reference model via co-simulation of to avoid fabricating a prone-to-fail design. However, academic constrained-random binaries and C programs. designs are typically not verified at the industrial-grade, often • Automated testing and regression infrastructure to reach due to a lack of resources and experience, and different needs high levels of functional/code coverage, of up to 95.79%. than the industry. Meanwhile, open-source ISAs such as RISC- V favor collaboration between research and industrial entities, 2. BACKGROUND also providing independence from non-European computing The Vector Processing Unit (VPU) is based on ISA Vector technologies. extension 0.7.1v [2], has eight vector lanes, supporting large The European Processor Initiative (EPI)1 is a project that vectors of up to a maximum vector length of 256 elements of 64 embraces this idea, being conceived to create the first European bits each (16Kb total). It has 32 logical and 40 physical vector processor and accelerators. Many partners are involved in its registers. Each lane has one Fused Multiply Accumulate (FMA) development, for example, BSC developed the Vector Accel- unit capable of calculating two double-precision operations per erator that will be directly connected to a scalar RISC-V core cycle, for a total maximum throughput of 16 DFlops/cycle. designed by SemiDynamics, while the top-level integration of It supports 64 and 32-bit floating-point vector operations, as the test chip is done by EXTOLL and the tape out is coordinated well as 64, 32, 16 and 8-bit integer vector operations. Memory by Fraunhofer. operations have limited out of order capability, mostly between RISC-V is an open-source Instruction Set Architecture (ISA) arithmetic and memory operations. [13], which among others, has a vector extension, currently The VPU has all eight vector lanes connected to the memory in version 1.0 [3]. This extension includes the vectorized operation units, the inter-lane ring and the instruction queues, version of many arithmetic, logical and memory instructions, which serialize the instructions arriving from the scalar core to along with vector-specific instructions such as reductions, and which the VPU is connected (Figure 1). The scalar core is in scatter and gather operations. Additionally, the RISC-V Vector charge of executing scalar instructions and sending the vector extension (RVV) is vector length agnostic and supports different instructions to the VPU. Memory accesses for the vector This work has been done by the authors in the time they were affiliated to memory operations are also performed by the core, through Barcelona Supercomputer Center. OVI. 1https://www.european-processor-initiative.eu/accelerator/

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI 10.1109/MDAT.2022.3226709

‘This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2022.3226709

                           H                                            final specifications were not ready for all the submodules, we
                                                                        decided to focus at the interface level (OVI) that already had
                                                                        well-defined specifications.
re    i                                    H                                   Once we built the UVM that drives instructions to the DUT,
 So                        3                                            we needed a way to evaluate the results of the           VPU. For
 3 :                       3    £     §                                 that purpose, we used a UVM scoreboard that compares these
HT I                                       £                            results with the ones from the reference model, a software
      H                    H    H     EI|=                              that predicts  how     the  design should behave     based on the
                                                                        inputs. Our reference model accepts instructions as an input
 a                         &    ER                                      and generates the expected results. We decided to use the
          low                                                           RISC-V ISA simulator Spike [11] for co-simulation in our UVM
Tee                                                                     environment.
         Fig. 1.  VPU is connected to the Scalar Core through OVI.           Even if detecting a mismatch in the result of an instruction
                                                                        is crucial for our job, it may not point out the cause of
                                                                        the error. We also added    SystemVerilog   assertions to improve
                                                                        observability.
OVI contains the following sub-interfaces:                                  4. DESIGN VERIFICATION INFRASTRUCTURE
 •             ISSUE: through which the core sends the request along
        with the instruction, configuration values and scalar input.          RISCV-DV                     Test harness
 •            DISPATCH: all issued instructions are either confirmed          RISCV-DV|
        or killed. This enables speculative issue of vector instruc-

      COMPLETED: thorugh which the VPU              notifies the in-
 •    tions.                                                                [===
            struction has been completed, together with metadata and
      scalar output.
 •             MEMOP: start and finish signals of a memory operation
      are sent using this interface.
 •    LOAD:       used by  the  core  to   send the load data    and                                       |
      metadata.                                                                                     yl
 •              STORE: VPU sends the data of store operations to the          Bare metal                   i           i
      core through this interface.
 •                MASK-INDEX: through which the VPU sends the vector                   Fig. 2. Verification Environment Overview
             content to generate the addresses of masked and indexed
      memory instructions.                                                        The verification environment we implemented is shown in
             At each vector instruction, many sub-interfaces must be    Figure 2 and explained in the following sub-sections.
          considered given that some of them can change the instruc-    A. UVM
     tion’s behavior or should be looked to retrieve results. ISSUE,
             STORE and MASK-INDEX interfaces use a credit system for    Our environment is composed of the                UVM top module,
handshaking between the VPU and the scalar core.                        which instantiates the UVM environment. As we have different
           The VPU was developed in collaboration with another part-    semi-independent sub-interfaces, we created one agent for each
      ner, UniZagreb, which was in charge of developing the Floating    specific sub-interface. For example, at the issue sub-interface,
Point Unit submodule and its verification.                              there is an agent, which contains a sequencer, a driver and a
          3. DESIGN VERIFICATION METHODOLOGY                            monitor connected to the virtual interface.
    Each virtual sequence creates interface-specific transactions

We constructed a set of tools and utilities around the design that are sent to the corresponding interface. When the driver under test (DUT) that facilitated the detection of errors. The gets the transaction, it stimulates the corresponding sub- tools we developed had to be easy to share with partners as interface with the incoming transaction values. As the virtual some verification efforts are shared and reusable for the next- sequence does not know when the transaction is driven, we generation designs. To meet these requirements, we used the also have a specific monitor that captures the interface state and Universal Verification Methodology (UVM) [1], which is built sends it back to the virtual sequence through the sequencer. The under the premises of creating a modular, scalable and reusable corresponding virtual sequence gets the transaction and reacts verification environment. to it, producing a new stimulus [7]. At first, we considered verifying individually each VPU All seven sub-interfaces are unique in the environment and submodule, stimulating them with constrained-random tech- constantly communicate with each other (e.g., masked to load niques with several UVM environments. As this approach and issue to dispatch sub-interfaces). To keep them in synchro- implied an unbearable amount of effort for our team and the nization, we use UVM events, which are capable of transmitting

                                           2



                  © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2022.3226709




   data along with the event trigger. This feature eased the virtual    •  The implementation of the vector tail zeroing, replaced by
sequences inter-communication.                                             a different policy after version 0.7.1.
            The high degree of dependence between the sub-interfaces    •  Instruction decoding to follow the 0.7.1 specification.
complicated the     constrained-random  stimulus         generation.    •   The requirements of Vector Context Status (VCS) fields in
        Therefore, we decided to randomize only the instructions fed       mstatus.
    to the issue sub-interface and made all the other sub-interfaces    Once the instruction is fed using the issue agent, the    UVM
react according to the instructions driven.                            follows the VPU instruction execution flow. This involves the
B. RISCV-DV                                                            stimulation/observation of two interfaces: a) DISPATCH where

RISCV-DV [6] is a SystemVerilog/UVM based open-source we must confirm or discard the execution of each instruction instruction generator for RISC-V processor verification devel- sending this information in instruction order, and b) COM- oped by Google. RISCV-DV generates random RISC-V assem- PLETED at the end of execution of a confirmed instruction, bly tests, which we used to provide vector instructions to test the completed monitor will observe a flag being set and will the VPU. RISCV-DV implemented a later RVV than 0.7.1, so create a transaction. we developed and adapted the parts we needed to fit. With this UVM setup, we could run simple instructions, The major additions we did to RISCV-DV were: which helped in the first stages of the verification process to see • Generation of vsetvli instructions through the code and that our design was not stalling. However, this doesn’t assert modification of the generation of memory operations to the result or the execution of the instruction went well. So, we allow the change of element width and vector length. introduced a UVM component that checks the correctness of • An option to select the initialization pattern of the data the results, the scoreboard. pages. • Constraining of the memory addresses accessed by the test D. Scoreboard to avoid memory exceptions, specially for vector memory indexed instructions. It’s connected to the completed monitor and when an instruc- • Adapting to the 0.7.1 RVV. tion finishes, a method that compares both results is executed. Additionally, since some of the design modules were in The issue monitor would send the transaction to the scoreboard development for most of the verification process, we had to and the reference model, but we directly take the information initially blacklist many of the instructions from our generated coming from Spike, as we need it to feed the instruction to the tests, to get functional tests at each iteration. When a significant VPU. number of errors were fixed, we gradually removed instructions The most interesting results from instruction execution are from the blacklist, until all implemented instructions were not always seen as outputs of the VPU. In OVI, the COM- enabled. PLETED sub-interface includes a scalar output and some C. Spike flags. In the general case, the VPU will write the result of the instruction in one physical vector register. At instruction In our environment, Spike has two main roles: 1) As a scalar completion time, the registers are accessed to get the result of core, executing scalar instructions and providing the vector ones the instruction. to the UVM in program order, and 2) As a golden/reference To check whether these results are correct or not, we include model to check the correctness of the DUT results. the destination vector register value in the information that we To fulfill these two functions, we performed several modifi- extract from Spike. cations to Spike: One particular case that we found is with reduction instruc- • Definition of functions to call Spike in SystemVerilog using tions, more specifically, the floating-point ones. The VPU uses Direct Programming Interface (DPI). a different reduction algorithm than Spike, which is allowed • Creation of a method that resumes the simulation until by the RVV specification. This situation caused two problems; a vector instruction is executed, the reference results are to begin with, we got a mismatch sometimes when executing returned to the UVM to compare against VPU results. these instructions when they were actually correct according to • Functions to read from Spike’s memory. the rounding mode and algorithm used. The second problem • A function to force the result of reductions into Spike was that, even if we knew that the mismatch had been a false to avoid execution divergence in unordered floating-point positive, the result remained wrong in the Spike vector registers. reductions. These values could later be used in other instructions and cause When a vector instruction is found, Spike provides the mismatches even if the instruction was executed correctly. We instruction, the results and other relevant data to the UVM. The have created an independent reference model in C for the instruction is then packed as a transaction and sent to the issue unordered reductions that implements the same exact reduction agent. It arrives at the VPU, it’s executed, and the reference algorithm as the DUT. The VPU result is compared in these model results are compared to those generated by the VPU. cases against the reduction reference model instead of spike, Also, some changes were done to accommodate Spike to the and if there is a match, the value is injected into the register 0.7.1 RISC-V vector specification: in spike.

                      3



    © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2022.3226709




E. Memory Operations                                                      values in the memory sub-interfaces. This way, we could ensure
       Memory operations are one of the most delicate parts of our     that we executed all the instructions in all the possible ways in
design. The VPU does not have direct access to memory, so it               the Accelerator. Furthermore, we gathered coverage metrics of
reads and writes data through the scalar core using the memop,        certain internal modules.
load, store and mask interfaces, which require plenty of inter             For instructions coverage and their different formats in Spec
sub-interface communication.                                                   0.7.1v [2] (Vector Length, Single Element Width, Rounding

For load operations, we need the data inside memory before the instruction is executed. Therefore, along with the rest instruction information from Spike, we get the data that be read. This data is written in a memory model, based on one from the OpenTitan project [9], which is accessed in corresponding addresses, and the Spike data is sent through load sub-interface of the VPU. For store operations, we need the data in memory before the instruction executes to check masked operations, to detect undesired writes. In a store operation, the VPU sends the data, which is stored in the memory model. Later on, these values will be read and compared with the Spike ones. Masked memory operations involve particular outgoing by transactions from the VPU, which sends the masks or indexes. These are needed to execute the instruction on the environment side and compare with the ones in Spike. This comparison helped to detect the origin of the issue in many of the memory instructions errors that were mask related. One interesting case that OVI presents is what we call retries. Retries occur when the VPU cannot handle all the loaded cache lines the scalar core sent. If this happens, the instruction complete specifying a vstart value,representing the first element that it could not write in the vector registers, indicating that the instruction did not finish and that it must be re-executed. This implies killing the instructions after the retrying one and re- issuing the instruction starting from the element corresponding to the stored vstart value. Additionally, memory exceptions for multiple in-flight loads and stores, were randomized, causing failing instructions to be killed. An added complexity was calculating the correct vstart value, deciding between the lowest received index, mask or data chunk. Retries implied plenty of changes and were one of the primary sources of errors in the VPU. We wanted to have the possibility to increase the chance of causing retries to the VPU which was randomized using UVM configuration objects. F. Assertions One of the critical points of the VPU is the interface, so we decided to run down the OVI specifications and write System Verilog Assertions (SVA) [5] that check that it is behaving as expected, implementing more than 50. At the early stages of the UVM testbench development, they helped to identify bugs in the VPU, as well as problems in the UVM stimulation. Most of the asserted properties were targeted to the memory-related sub- interfaces and ensured that the OVI specifications were strictly followed at any point of the project. G. Coverage We defined and implemented a functional coverage plan. We mostly checked things that could be directly observed in the VPU interface, like instructions, execution parameters and Modes, Masks, etc.) we developed a set of of ISA tests, that of the quickly tested the key configurations, in parallel with RISCV- should DV random tests for further stress. We also implemented the functional coverage for testing diverse loads and stores sce- the narios, not only for Vector Length, but also considering the the organization of the register file. Loads and their possible retries, as explained in Section E, with different vstart values were covered by directed tests that were added to the regressions suite in order to check these scenarios were still valid on every RTL change. In addition to functional coverage, we recorded assertions usage (active/passed) and code coverage of the simulations run continuous integration, which was used to generate and run tests, and to collect coverage metrics. H. CI Infrastructure Our Continuous Integration (CI) infrastructure is built using the open-source CI server Jenkins, where we created a set of pipelines that interact with each other to have the most error- free design possible. We implemented the following pipelines: will 1) New tests: Generates random tests with RISCV-DV, compiles the DUT, executes the binaries and does a classification between passed and failed tests, separated later into two directories. The first ones are used to create a regression set and the others are kept for debugging and checking until the error is fixed. 2) Retry: For each change in the main branch of the DUT repository, the set of failed tests is re-executed and classified again in passed and failed. 3) Selection: Every day at midnight, if the number of tests classified as passed is above a certain threshold, tests are ranked by the collected coverage and we create two sets of regressions, a large one, and a small one. 4) Regressions: When there is a change in the DUT, which is candidate to be merged, we execute the small set of regressions to check the correctness of these changes. Also, once per week, the large set is executed to ensure that recent changes do not break known-good tests. We used GitLab for version control and as a way to track down issues in our environment. We also documented the code and all these features so anyone in the project could run a simulation and added this documentation as guides and tutorials using the Wiki feature from GitLab. 5. EXPERIMENTAL R ESULTS This environment was in use for about a year. In this time, we have managed to find many errors and provide helpful information to the RTL team to debug these. Apart from that, we have provided CI pipelines that allowed both teams (RTL design and verification) to test new features and find new errors.

4



© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

The errors we have encountered in the VPU include design some instructions were not implemented and had to be black- issues and instruction mismatches. Some of these were caused listed, which artificially made the number of errors decrease by specification problems. When an error was found, the during this phase. necessary information to reproduce it (e.g. binary, faulty in- Once the RTL team fixed more errors and finished imple- struction) was provided. Furthermore, a table summarizing the menting the missing features, we started whitelisting instruc- active errors was used to help focussed debugging effort (e.g. tions, which caused an increase of errors found between June detecting multiple erroneous tests with the same instruction and September. During this period, we also managed to execute mnemonic, vector length, element width). Once the cause of several vectorized micro-benchmarks like SpMV , Matmul or the error was tentatively fixed, regressions were run before the axpy. While errors increased temporarily, we were also fixing changes could be merged. more of them, and this decrease can be seen at the final phase of the plot. At the end of this testing period, which we called “Night runs“, we were not getting any more errors. This was a huge = Loads advancement, but we wanted to provide more tests, so we devel- Stores oped a new set of testing pipelines. These combined ran around Widening 600 tests every day. We used them for collecting coverage Narrowing numbers and finding bugs, especially since the development Floating Point of the VPU still continued with new features. Reductions TABLE I FUNCTIONAL COVERAGE PER DESIGN UNIT

                                                                       Design Unit        Coverage   Design Unit         Coverage
                                                                     OVI/Pre-issue Queue    91.95%   Data Reorder Buffer   88.09%
                                                                     Instruction Unpacker  100.00%   Ctrl FSM              92.64%
                                                                     Instruction Renaming  100.00%   Functional Units     100.00%
   100                                                               Instruction Queue     100.00%   Vector Register File  92.37%
                                            — one                    Store Management       87.50%   Inter-lane Ring       99.18%
    wo                                                               Load Management       100.00%   Vector Lane           93.61%
                                                                     Item/Mask Management  100.00%

 g                                                Vector Mask                Table I summarizes the functional coverage achieved, for
                                                                    an average of 95.79%. Regarding code coverage (average of
   300                                                              72.64%) we have achieved 90.90% in Statements and a 49.83%
 E                                                                  in Toggles. We know    that we are not        driving  the design
 = 200                                                              appropriately in some cases, which were difficult to add in
   100                                                              the environment, and that is reflected in the coverage numbers.
                                                                    Additionally, the lower code coverage numbers can indicate
                                                                    some unused data structures or conditions in the RTL.
     x                                  SF            *                6. CONCLUSIONS AND RELATED WORK
     Fig. 3. Errors and number found per month                               In this paper, we have described the implementation of a
    verification environment targeting an academic RISC-V based

In Figure 3, the faulting instruction distribution between the vector accelerator, which was successfully taped-out in the con- random tests that failed in our daily simulations is shown in text of a European project. The environment is a reusable and the pie chart. We can see the type of instructions that failed the extendable UVM environment, which implements the protocol most: memory, narrowing and widening vector instructions. We between the OVI and the VPU and checks the correctness of ran 24 tests every night between April and July, which were the completed instructions. increased to 50 tests between August and the end of November Additionally, assertions and coverage were developed to which marked the RTL freeze before the chip tape out. Each observe better and extract metrics to know how well the of these tests contained approximately 500 vector instructions, verification process was done. Moreover, this environment is we managed to find a total of 3005 errors, and the instructions complemented by creating random binaries using the RISCV- above constitute around 70% of them. DV generator and the CI infrastructure, which plays an es- In Figure 3, we can see the number of errors per month. sential role in code health/maintainability and coverage clo- Around April, when we first set up the environment and the sure. Thanks to our automated constrained-random test gener- CI pipeline, we had many errors. In that starting phase, the ation, simulation and error reporting and CI/CD infrastructure, environment was not complete and the design had many issues, through this process, we found 3005 errors and reached 95.79% causing different kinds of instructions to fail. Additionally, of functional coverage.

                                                  5



 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2022.3226709

Regarding the environment, we have learnt that the imple- [13] A. Waterman and K. Asanovi´c. Specifications - RISC-V mentation of the communication divided among several agents International. https://riscv.org/technical/specifications/. complicates the maintenance, extension and performance. A [Online; accessed October 20, 2022]. 2019. way to cope with these issues would be a single agent that produces the stimulus. Doing all the interface interaction using a single module could simplify the sub-interfaces communica- tion and possible expansions of the design and the environment, which we will follow as future work. 7. ACKNOWLEDGEMENTS This research has received funding from the European High Performance Computing Joint Undertaking (JU) un- der Framework Partnership Agreement No 800928 (Euro- pean Processor Initiative) and Specific Grant Agreement No 101036168 (EPI SGA2). The JU receives support from the European Union’s Horizon 2020 research and innovation pro- gramme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. The EPI-SGA2 project, PCI2022-132935 is also co-funded by MCIN/AEI /10.13039/501100011033 and by the UE NextGen- erationEU/PRTR. BIBLIOGRAPHY [1] Accelera. Universal Verification Methodology standard reference. https : / / www . accellera . org / downloads / standards/uvm. [Online; accessed October 20, 2022]. [2] Amid, Asanovic, et al. Specifications - RISC-V Vector extension 0.7.1. https://github.com/riscv/riscv- v- spec/ releases/tag/0.7.1. [Online; accessed October 20, 2022]. [3] Amid, Asanovic, et al. Specifications - RISC-V Vector extension 1.0. https : / / github. com / riscv / riscv - v - spec / releases/tag/v1.0. [Online; accessed October 20, 2022]. [4] Matheus Cavalcante et al. “Ara: A 1-GHz+ scalable and energy-efficient RISC-V vector processor with mul- tiprecision floating-point support in 22-nm FD-SOI”. In: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 28.2 (2019), pp. 530–543. [5] Cerny, Eduard, et al. SVA: The power of assertions. Springer International Publishing, 2015. [6] Google. Random instruction generator RISCV-DV. [7] Mark Litterick, Jeff Montesano, and Taruna Reddy. “Mastering Reactive Slaves in UVM”. In: SNUG. 2015. [8] lowRISC. lowRISC. https://github.com/lowrisc/. [Online; accessed October 20, 2022]. [9] lowRISC. OpenTitan project. https://github.com/lowrisc/ opentitan. [Online; accessed October 20, 2022]. [10] OpenHW. PULP-Platform Simulation Verification. https: //core-v-docs-verif-strat.readthedocs.io/en/latest/pulp verif.html. [Online; accessed October 20, 2022]. [11] RISC-V. Spike repository. https : / / github . com / riscv - software- src/riscv- isa- sim. [Online; accessed October 20, 2022]. [12] Semidynamics. Open Vector Interface specifications. https://github.com/semidynamics/OpenVectorInterface. [Online; accessed October 20, 2022].

                              6



     © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.