SOURCE ARCHIVE
EXTRACTED CONTENT
45,403 chars Verifying Instruction Set Simulators using
Coverage-guided Fuzzing
Vladimir Herdt1 Daniel Große1,2 Hoang M. Le1 Rolf Drechsler1,2
1Institute of Computer Science, University of Bremen, 28359 Bremen, Germany
2Cyber-Physical Systems, DFKI GmbH, 28359 Bremen, Germany
{vherdt,grosse,hle,drechsle}@informatik.uni-bremen.de
Abstract—Verification of Instruction Set Simulators (ISSs) is limitations (wrt. modeling memory access and loops) on the
crucial. Predominantly simulation-based approaches are used. ISS [10].
They require a comprehensive testset to ensure a thorough Recently, in the SW domain the automated SW testing
verification. We propose a novel coverage-guided fuzzing (CGF) approach technique fuzzing [11] has become a standard in the SW to improve the testcase generation process. In addition to code development flow [12], [13]. Traditional fuzzing methods are coverage we integrate functional coverage and a custom mutation generational, which in spirit are similar to the model-based procedure tailored for ISS verification. As a case-study we apply generation techniques employed in processor-level verification our approach on a set of three publicly available RISC-V ISSs. and consequently have also been adopted for the verification of We found several new errors, including one error in the official RISC-V reference simulator Spike. ISSs [14]. More recent fuzzing approaches in the SW domain employ the so called mutation based technique. It mutates I. INTRODUCTION randomly created data and is guided by code coverage, hence In todays design flow Instruction Set Simulators (ISSs) play avoiding the effort to create an input model. Notable represen- an important role in serving as executable specification for tatives in this Coverage-guided Fuzzing (CGF) category are the subsequent development steps. Thus, ensuring the correctness LLVM-based libFuzzer [15] and AFL [16], which both have of the ISS is crucial as undetected errors will propagate been shown very effective and revealed various new bugs [15], and become very costly. The ISS is an abstract SW model [16]. of a processor, typically implemented in C++ to enable a In this paper we propose to leverage state-of-the-art CGF, high simulation performance. Formal methods do not scale to as used recently in the SW domain, for ISS verification. In complete verification and thus simulation-based methods are addition we propose a novel functional coverage metric (to employed. They require a comprehensive testset (i.e. stimuli). complement code coverage) and mutation procedure tailored Manually writing the testcases is not practical due to the specifically for ISS verification to improve the efficiency of significant effort required and pure random generation offers the fuzzer. As a case-study we have implemented our CGF only very limited coverage. approach with the proposed extensions on top of libFuzzer Various approaches have been proposed to improve random and evaluated it on a set of three publicly available RISC-V generation of processor-level stimuli. Model-based test gener- ISSs. We found new errors in every considered ISS, including ators use an input format specification to guide the generation one error in the official RISC-V reference simulator Spike. process and can integrate constraints processed by CSP/SMT II. BACKGROUND solver [1]–[3]. In [4] an optimized test generation framework, by propagating constraints among multiple instructions in an A. LibFuzzer: LLVM-based Coverage-guided Fuzzing effective way, has been presented. [5] proposed to mine pro- LibFuzzer is an LLVM-based coverage-guided fuzzing en- cessor manuals to obtain an input model automatically. Other gine. It aims to create input data (binary bytestreams) in notable approaches include coverage-guided test generation order to maximize the code coverage of the DUT. Therefore, based on bayesian networks [6] and other machine learning the DUT is instrumented by the clang compiler to report techniques [7]. coverage information that is recognized by libFuzzer. Please Since the ISS is a SW model, semi-formal methods based note, libFuzzer does not use functional coverage metrics. Input on dynamic program analysis and constraint solving are appli- data is transformed by applying a set of pre-defined mutations cable. They provide automatic ways to increase code coverage (shuffle bytes, insert bit, etc.) randomly. The input size is but are still susceptible to scalability issues [8], [9] or impose periodically increased. This work was supported in part by the German Federal Ministry Technically libFuzzer is linked with the DUT, hence per- of Education and Research (BMBF) within the project CONFIRM under forms so called in-process fuzzing, and allows to pass inputs to contract no. 16ES0565 and within the project SATiSFy under contract the DUT as well as receive coverage information back through no. 16KIS0821K, and by the University of Bremen’s Central Research Development Fund, and by the University of Bremen’s graduate school SyDe, specific interface functions. Thus, libFuzzer will call the input funded by the German Excellence Initiative. interface function defined in the DUT (interface shown in
978-3-9819263-2-3/DATE19/ c©2019 EDAA 360
1 extern "C" int LLVMFuzzerTestOneInput(const Step 1: Fuzzing Loop (CGF)
based on Feedback Feedback
2 // uint8_t *Data, size_t Size) { Fuzzer: Update Coverage Execution (Tracing)
allows to run some initialization code
once, first time this function is called Bytestream ELF load ISS-UT
3 static bool initialized = Fuzzer (Instructions) inject (Testcase) and (under Test)
4 do_initialization(); Total collect into ELF exec. instrumented
Coverage Testcase ELF Template Template with tracing
5 // ... process the input ... on new (Exec. Frame)
6 Coverage
7 // tell libFuzzer that the input has been Testset report mismatches
processed ISS-UT Result-1 including crashes
8 return 0; ELF (under Test) Check for
ELF Equality
ELF
9 } (Testcase)
(Testcase)
(Testcase) ISS Result-2
Reference register values
Step 2: Testset Evaluation and selected memory content
Fig. 1. Interface function that the DUT provides and that is repeatedly called
by libFuzzer (many) times during the fuzzing process. Fig. 2. Overview on coverage-guided fuzzing (CGF) for ISS verification
Fig. 1) and the instrumentation performed by clang will call 1) Fuzzing Loop (CGF): The fuzzer starts with an empty
these coverage functions. coverage and empty testset. The fuzzer iterates until the
B. RISC-V coverage goal or the specified time limit is reached. In each
step the fuzzer generates a (binary) bytestream. We interpret
RISC-V is an open and free Instruction Set Architecture this bytestream as a sequence of instructions for the ISS (ISA). The ISA consists of a mandatory base integer instruc- under test (ISS-UT). Every such bytestream is transformed tion, denoted RV32I, RV64I or RV128I with corresponding into an (ELF-)testcase, by embedding the bytestream into a register widths, and various optional extensions denoted as pre-defined ELF-template1. single letters, e.g. M (integer multiplication and division), A The ELF-template contains prefix and suffix code (execution (atomic instructions), etc. Thus, RV32IMA denotes a 32 bit frame) that is supposed to be executed before and after the core with M and A extensions. For the RV32IMA core all actual sequence of instructions. The prefix is responsible to instructions are 32 bit width and have at most two source initialize the ISS into a pre-defined initial state. This includes register RS1 and RS2, and one destination register RD. Control initializing all registers to pre-defined values to ensure that and Status Registers (CSRs) are registers serving a special all ISS implementations start in the same state. The suffix is purpose. For example the mtvec (Machine Trap-Vector Base- responsible to collect results and stop the simulation. It will Address) CSR stores the address of the trap/interrupt handler. write all register values into a pre-defined region in memory For a comprehensive description of the RISC-V ISA please (can contain additional content beside the register values) refer to the official specifications [17], [18]. to enable dumping the result of the execution to a file (an III. COVERAGE-GUIDED FUZZING FOR ISS VERIFICATION ISS typically provides an operation to dump specific memory regions), which can be compared. This section presents our proposed CGF approach and our The testcase is then executed on the ISS-UT. The ISS-UT fuzzing extensions tailored for ISS verification: functional cov- generates execution feedback by tracing relevant information. erage metric (definition in Section III-B and instrumentation The tracing functionality need to be instrumented into the ISS- to measure it in Section III-C) and a specialized mutator UT. The fuzzer will analyze the execution feedback and update (Section III-D). Functional coverage complements code cov- its coverage metrics accordingly. We consider structural and erage and improves the thoroughness of the testset especially functional coverage. As already mentioned, we present our in detecting computational errors (which depend on operand functional coverage metrics and the instrumentation to trace values and structure). The use-case for our mutator is to it in Section III-B and Section III-C, respectively. In case the introduce specific instruction pattern into the fuzzing process, coverage is increased by executing the testcase, the testcase is further guiding the fuzzer and allowing the fuzzer to re-use added to the fuzzer testset. those pattern in its own mutations and cross-over operations. 2) Testset Evaluation: After the testset has been generated, We start with an overview. every testcase is executed one after another on the ISS-UT and A. Overview other reference ISSs. The results are compared and mismatches reported. Please note, not all mismatches are necessarily bugs. An overview of our CGF approach for ISS verification is The reason is that the fuzzer is not constrained to specific well- shown in Fig. 2. Essentially, it consists of two subsequent defined subsets of the instruction set but considers all possible steps: First a testset is generated by the fuzzer (upper half of instructions and sequences of instructions (including illegal in- Fig. 2), then the testset is used to verify the functionality of the structions). This behavior is intended to check specifically for ISS under test (ISS-UT, lower half of Fig. 2) by comparing the 1 execution results with other reference ISSs (can be multiple). Technically, we use a linker script that generates an empty section in the In the following we describe both steps in more detail. ELF-template. Then we use objcopy utility with the –update-section argument to overwrite the empty section with the (binary) instruction bytestream.
Design, Automation And Test in Europe (DATE 2019) 361
uncommon (error-) cases. A common source for mismatches propose an additional mutator tailored for ISS verification.
are differences in configuration. For example the memory Our mutator starts at the beginning of the bytestream and
size can be different or some peripheral mapped into the proceeds forward, applying local mutations, until the end of
address space (which can not always be easily changed without the bytestream is reached. In each step one of the following
intrusive modifications). A load/store instruction might then two mutations is performed (randomly selected):
succeed for one ISS and fail for the other. We do not consider 1) Select a random instruction and inject its opcode bits at
such mismatches to be bugs. Thus, mismatches need to be the current position into the bytestream, overwriting existing
analyzed to check if they relate to bugs in the corresponding data. This ensures that a legal instruction is used, but the
ISS. parameters (source registers, destination register, immediate
B. Functional Coverage Metric values, etc.) are not modified (still randomized by the fuzzer).
For example the ADDI instruction of the RISC-V ISA consists
We define generic functional coverage metrics for registers of 32 bit (4 bytes) where the bits [11,7], [19,15] and [31,20] and immediates that are applicable to a large set of ISAs. In encode the parameters of ADDI: the destination register the following we introduce functional metrics that 1) reason (stores the addition result), source register (first operand) and about the instruction structure, and 2) the operand values: immediate (second operand), respectively. All remaining ten
- Structure Metrics: A testset satisfies the coverage met- bits [14,12] and [6,0] encode the opcode (a fixed value) of
ric R2 iff for every instruction with one source (RS1) and ADDI and hence would be injected in case ADDI is selected.
destination register (RD) at least once the case RD = RS1 Optionally, a randomly selected special immediate value can
and at least once the case RD = RS1 is observed. The also be injected into the instruction.
coverage metric R3 extends R2 to instructions operating on 2) Select a random constrained sequence of instructions
three registers (another source register RS2 is used). For R3, and inject it into the bytestream, overwriting existing data.
each of the four following cases should be observed for every A sequence is a list of concrete instructions with some
such instruction: parameters constrained to fixed values and others randomized.
• RS1 = RD ∧ RS2 = RD A common sequence is to use two instructions to load a large
• RS1 = RD ∧ RS2 = RD ∧ RS1 = RS2 constant value into a register by loading the lower and upper
• RS1 = RD ∧ RS2 = RD half separately (because typically only small immediates can
• RS1 = RD ∧ RS2 = RD be encoded in one instruction). In this case the destination
R3 can be further extended for ISAs involving more source register is constrained to be the same for both instructions,
and/or destination registers. but the actual value is randomized (or selected from a set
- Value Metrics: The metrics V(Rx) and V(Ix) require that of special values). Sequences are defined by the verification the Rx register and Ix immediate are assigned at least once to engineer and can be specialized for each instruction set. each special value in {MIN, -1, 0, 1, MAX}, where MIN and IV. CASE-STUDY: RISC-V ISS VERIFICATION MAX are the smallest and largest possible value of Rx and Ix, respectively. For immediates, that are interpreted as unsigned As a case study we built our CGF approach on top of values, e.g. shift amount, the negative values are omitted from libFuzzer and verify the RV32IMA ISS extracted from the the value set. Both metrics are applicable to every instruction publicly available RISC-V Virtual Prototype (VP) [19]2. We having the Rx and Ix parameter, respectively. For example denote this ISS as ISS-UT. As reference we use the following V(RD), V(RS1) and V(I imm) are applicable for the RISC-V two ISS: ADDI instruction (addition of register with immediate). 1) Spike, the official RISC-V ISA reference simulator [23]. C. Instrumentation for Tracing Functional Coverage 2) Forvis, an ISS implemented in Haskell aiming to be a An ISS typically consists of an execution loop that will formal specification of the RISC-V ISA [24]. fetch the next instruction, decode it, and then execute it. We We have found errors in ISS-UT as well as both reference instrument the execution step to call begin-fcov-trace and end- ISSs. We first describe the evaluation setting and libFuzzer fcov-trace function (before/after the actual execution). Both integration. Then we show our detailed evaluation results. trace functions take three arguments: 1) the instruction fetched A. Evaluation Setting and LibFuzzer Integration from memory, 2) the decoded operation, and 3) a snapshot We have instrumented ISS-UT with the clang compiler of the register values (read-only). Using two trace functions to trace branch coverage information. We manually (can be allows to capture the register state before (source register automated by an LLVM pass) added a call to the begin- values) and after the instruction execution (destination register fcov-trace and end-fcov-trace function to trace functional value). Please note, immediate values are also captured since they are encoded in the instruction itself. 2 The VP is implemented in standard-compliant SystemC and TLM-2.0 [20], D. Custom Mutations [21] and is designed as extensible and configurable platform with a generic bus system [22]. It integrates a RISC-V RV32IMA core and a PLIC-based
A mutation is applied on a bytestream (i.e. instructions interrupt controller together with an essential set of peripherals. The VP is inside a testcase) and returns a modified bytestream. We available under MIT licence. Visit www.systemc-verification.org for our most recent VP-based approaches.
362 Design, Automation And Test in Europe (DATE 2019)
TABLE I
EVALUATION RESULTS - ALL EXECUTION TIMES ARE REPORTED IN SECONDS - [V1..V7] MEANS ALL 7 ERRORS V1 TO V7 HAVE BEEN FOUND.
Tests / Generator Time Coverage (measured by instrumenting ISS-UT) Errors found in ISS
(sec.) Branch Functional Cov. ISS-UT Spike Forvis
Cov. R1 R2 R3 V(RS1) V(RS2) V(RD) V(I imm) V(I shmt)
T1: RISC-V ISA Tests 2 90.24% 58.57% 61.70% 50.00% 14.29% 2.70% 7.55% 8.33% 100.00% [V1..V3] / /
T2.1: RISC-V Torture 1000 5280 74.30% 2.17% 66.67% 69.23% 58.82% 91.43% 52.17% 9.09% 100.00% V1,V2 / H2
T2.2: 5000 26108 74.30% 2.17% 66.67% 69.23% 58.82% 91.43% 56.52% 18.18% 100.00% V1,V2 / H2
T2.3: 10000 52168 74.30% 2.17% 66.67% 69.23% 58.82% 91.43% 56.52% 63.64% 100.00% V1,V2 / H2
T3: Cov.-guided Fuzzing 32492 100.00% 100.00% 100.00% 100.00% 98.21% 100.00% 81.13% 100.00% 100.00% [V1..V7] S1 H1,H2
if ((op == ADDI) && (instr.RD() == instr.RS1())) experiments are performed on a Linux system with an Intel
features.add(1); Core i5 processor with 2.4 GHz and 32 GB RAM.
if ((op == ADDI) && (instr.RD() != instr.RS1()))
if features.add(2); B. Evaluation Results and Discussion
((op == ADDI) && (regs[instr.RS1()] == REG_MAX))
if features.add(3); Table I shows the results. The table is separated into four
((op == ADDI) && (instr.I_imm() == 0)) main columns. The first column (Tests/Generator) reports
features.add(4);
// etc ... which testset is used or how it has been obtained, respectively.
Fig. 3. Concept of mapping functional coverage trace information to features. The second column (Time) shows the time in seconds to
generate (9h timeout for our fuzzer) and execute the respective
testcases. Please note, the RISC-V ISA tests are directed tests
coverage information before/after executing one instruction, that are hand-written and thus do not require a generation step.
respectively. The branch coverage information is already rec- The third column shows the branch- and functional coverage
ognized and collected by libFuzzer, and is internally mapped obtained by running the respective testset. The coverage is
to unique features (i.e. integer values identifying the coverage measured based on the instrumented ISS-UT. The fourth and
information). We have extended libFuzzer to also support the last column shows which (and how many) errors have been
proposed functional coverage information, carefully making found by each approach. Table II lists and describes the errors
sure to generate unique features, i.e. not interfere with the we have found in some more detail.
branch coverage collection3. Conceptually, we implement it Already the RISC-V ISA tests can be very effective in
as a chain of consecutive if-blocks matching the cases of the detecting errors (Table I, row T1). The testset revealed three
functional coverage metrics and mapping each case to a unique errors (V1, V2 and V3) in ISS-UT. Furthermore, the testset
feature for every instruction, as shown in Fig. 3. is very compact and thus can be executed very fast. However,
We generate this code automatically from a python script being hand-written, the testset is susceptible to miss relevant based on the RISC-V ISA specification. We also use a slightly behavior, which is also reflected by the obtained coverage optimized representation based on a switch-case construct to values. Furthermore, significant manual effort is required in distinguish different opcodes more efficiently. order to create the testset. We integrated our custom mutator into libFuzzer in a way Torture tests reveal one additional error in Forvis (Table I, to be randomly selected with the same probability as any of rows T2.X). It can be observed that gradually increasing the the existing mutators. As discussed in Section III-D we define testset from 1,000 (Table I, row T2.1) to 10,000 (Table I, row instruction sequences to load a random or special value into a T2.3) randomly generated tests does only slightly increase the register and also allow to inject special immediate values. coverage. The reason is that Torture receives no execution In addition to the metrics defined in Section III-B, we feedback, hence every test is generated independent of the propose the RISC-V specific metric R1. R1 requires that every previous ones. instruction with a destination register (RD) is executed with Our fuzzer is able to detect all previously shown errors the case RD=0 and with the case RD=0. This metric is useful and finds six additional errors (4 in ISS-UT, 1 in Spike and to check that the hardwired zero register is not erroneously 1 in Forvis), see (Table I, column ”Errors found in ISS”, overwritten for some instruction. The metrics V(I imm) and row T3). Most of these errors relate to dealing with different V(I shmt) cover immediates used in computational operations. forms of illegal instructions in different steps of the execution We integrate the official RISC-V ISA tests [25] and the process. Besides being coverage-guided, a major benefit of our RISC-V Torture testcase generator [26] in the comparison to fuzzer is being not constrained to some specific instruction further evaluate the effectiveness of our fuzzing approach. All subset, as for example Torture (and hence could not detect the 3Technically, we essentially added a new tracing collector class and errors our fuzzer did, independent of the number of testcases extended the size of the feature representation data type. This step required generated). In particular the fuzzer operates on the binary adapting the fuzzer core to use sparse data structures, instead of fixed sized level, thus it can be used to check for errors that might even arrays, due to the increased feature state space. be masked by a compiler/assembler (as they do not generate
Design, Automation And Test in Europe (DATE 2019) 363
TABLE II
DESCRIPTION OF ALL ERRORS WE HAVE FOUND IN EACH ISS.
ISS Error description
Spike S1: Decoder error, which allows illegal instructions to be interpreted as FENCE or FENCE.I instruction. The
reason is that not all opcode bits are present in the decoding mask.
Forvis H1: Erroneously allows to access CSRs, representing counters for hardware performance monitoring (e.g. cycle
CSR), from a lower privileged execution mode without first enabling the access (by setting the mcounteren and
scounteren CSR).
H2: The REMU instruction, which computes the remainder of a division, fails because it performs a 64 bit
operation even though the ISS is configured to run in 32 bit mode.
ISS-UT V1: Multiplication erroneously dropping the upper 32 bit of the multiplication result for large operands due to
working on 32 instead of 64 bit operands.
V2: Overwriting the hardwired zero register with a non-zero value (i.e. instruction with destination register RD=0
and non-zero result). Such an instruction is normally not generated by a compiler and thus can be easily missed.
V3: CSR access instructions fail to update the CSR in case the source (RS1, contains value to be loaded into
CSR) and destination (RD, will receive current value of CSR) register is the same, because in this case RS1=RD
is overwritten before it is read.
V4: Undetected misaligned branch instruction due to not four byte aligned immediates. Again, the compiler will
not generate such branches and thus this error can easily be missed.
V5: Illegal CSR instruction has side-effects. A CSR access instruction reads the current CSR value into register
RD and then writes the RS1 register value into the CSR. In case of a read-only CSR the write access fails and
RD is not allowed to be modified.
V6: Illegal jump instruction has side-effects. The jump instruction safes the current program counter into register
RD before taking the jump. However, in case the jump target address is misaligned, RD is not allowed to be
modified.
V7: Various incomplete decoder checks similar to the error found in Spike. The reason is that ISS-UT only
checks the instruction opcodes as far as necessary to uniquely identify each instruction (which is revealed by
random mutations on the opcode bits).
illegal instructions). This also enables to thoroughly check the result values may not be possible in combination with some
instruction decoder unit, which even revealed an error (S1 specific operations).
in Table II) in the RISC-V reference simulator Spike. Another promising direction to complement CGF is to em-
It can be observed that our fuzzer maximizes most coverage ploy constrained random (CR) techniques. CR techniques [29] metrics to 100% (Table I, row T3). Besides V(RS1), which is have been very effective for various system-level use cases close to 100%, only the V(RD) metric is below at 81%. The covering both functional as well as non-functional aspects, reason is that the value of the destination register depends see for example [30], [31]. It is also possible to integrate on the operand values and the operation. Some result values CR techniques with the fuzzer, by using the CR generated might even be impossible for some operations. We believe that testcases as input seeds for the fuzzer. This might help to guide formal methods are required to fully maximize the V(RD) the fuzzer towards generating longer test sequences (without metric. In total our fuzzer generates a testset with 5,160 illegal instructions) and at the same time reducing the number testcases. The smallest test consists of 1 instruction and the of false-mismatches, i.e. where two ISSs report a difference largest of 23 instructions with an average of 3 instructions. which is due to a configuration mismatch as briefly discussed in Section III-A24. However, the fuzzer still retains the ability V. DISCUSSION AND FUTURE WORK of generating completely random instructions (guided by the In our experiments we observed that our CGF approach has state-of-the-art fuzzing heuristics) hence the ability to cover been very effective in finding various errors in the considered rare corner- and error-cases (which has been very effective ISSs and maximized most coverage metrics to 100%. One in our experiments) that might even be masked by a compil- exception is the V(RD) metric, which is below at 81%. It er/assembler. For example, the RISC-V Torture test generator depends on the input parameters and thus can be very difficult only generates valid instruction sequences and thus would to reach with simulation-based methods. To close this remain- 4 ing coverage gap, we plan to investigate formal (verification) We currently use automated debug scripts in case of a result mismatch that splits the input instruction sequence and repeatedly generates and executes a techniques at the abstraction level of VPs, e.g. [27], [28], to new ELF file on both ISSs adding one instruction after another. This allows us automatically identify inputs that will cover the missing parts to pin-point the precise location of the mismatch, which in turn greatly reduces or infer that the missing parts are indeed unreachable (some the effort to debug/analyze mismatches (in particular for longer instruction sequences).
364 Design, Automation And Test in Europe (DATE 2019)
not be able to detect some of the errors that our fuzzer did, [4] Y. Katz, M. Rimon, and A. Ziv, “Generating instruction streams using
independent of the number of testcases generated by Torture. abstract CSP,” in DATE, 2012, pp. 15–20.
Ideas from [32], to cover the values of output operands, [5] W. Ma, A. Forin, and J. Liu, “Rapid prototyping and compact testing
might also be applicable in this context and help in maximizing [6] of CPU emulators,” in RSP, 2010, pp. 1–7.
S. Fine and A. Ziv, “Coverage directed test generation for functional
our functional V(RD) metric. Also, recently there has been [7] verification using bayesian networks,” in DAC, 2003, pp. 286–291.
a lot of interest in integrating machine learning techniques C. Ioannides, G. Barrett, and K. Eder, “Feedback-based coverage
directed test generation: An industrial evaluation,” in Hardware and
into the fuzzing process, e.g. [33], [34], which seems very Software: Verification and Testing, S. Barner, I. Harris, D. Kroening,
promising to investigate in our application area. and O. Raz, Eds., 2011, pp. 112–128.
Another important direction for future work is to evaluate [8] P. Godefroid, N. Klarlund, and K. Sen, “Dart: Directed automated
the effectiveness of stronger coverage metrics. Path coverage [9] random testing,” SIGPLAN Not., pp. 213–223, 2005.
K. Sen, D. Marinov, and G. Agha, “Cute: A concolic unit testing engine
and cross-coverage of functional metrics can be very effective for c,” SIGSOFT Softw. Eng. Notes, pp. 263–272, 2005.
but at the same time very challenging metrics and often [10] H. Wagstaff, T. Spink, and B. Franke, “Automated ISA branch cov- impractical due to the large feature state space. Therefore, we erage analysis and test case generation for retargetable instruction set simulators,” in CASES, 2014, pp. 1–10. plan to consider selective path coverage and (functional) cross- [11] B. P. Miller, L. Fredriksen, and B. So, “An empirical study of the coverage. These are only applied to selected code regions, e.g. reliability of unix utilities,” Commun. ACM, pp. 32–44, 1990. consider all evaluation paths for each instruction separately in [12] “Oss-fuzz - continuous fuzzing for open source software,” https://github. com/google/oss-fuzz, 2018. the ISS but not across instructions, and input operand values. [13] “Microsoft security development lifecycle,” https://www.microsoft.com/ This can further improve the verification result while still en-us/sdl/process/verification.aspx, 2018. maintaining scalability. [14] L. Martignoni, R. Paleari, G. F. Roglia, and D. Bruschi, “Testing CPU emulators,” in ISSTA, 2009, pp. 261–272. Finally, we plan to broaden the scope of our evaluation to [15] “libFuzzer - a library for coverage-guided fuzz testing,” https://llvm.org/ integrate further architectures and instruction sets, as well as docs/LibFuzzer.html, 2018. apply our CGF approach to analyse the whole platform instead [16] “american fuzzy lop,” http://lcamtuf.coredump.cx/afl/, 2018. [17] A. Waterman and K. Asanovi´c, The RISC-V Instruction Set Manual; Vol- of limiting the analysis to the ISS component. ume I: User-Level ISA, SiFive Inc. and CS Division, EECS Department, University of California, Berkeley, 2017. VI. CONCLUSION [18] ——, The RISC-V Instruction Set Manual; Volume II: Privileged Archi- In this paper we proposed to leverage state-of-the-art Cover- tecture, SiFive Inc. and CS Division, EECS Department, University of California, Berkeley, 2017. age Guided Fuzzing (CGF) for ISS verification. We integrated [19] “RISC-V virtual prototype,” https://github.com/agra-uni-bremen/ a novel functional coverage metric (to complement code [20] riscv-vp. coverage) and mutation procedure tailored specifically for ISS IEEE Standard SystemC Language Reference Manual, IEEE Std. 1666, 2011. verification. Our extensions complement the code coverage [21] D. Große and R. Drechsler, Quality-Driven SystemC Design. Springer, metric used by CGF and thus improve the efficiency of the [22] 2010. fuzzing process. We have implemented our CGF approach V. Herdt, D. Große, H. M. Le, and R. Drechsler, “Extensible and configurable RISC-V based virtual prototype,” in FDL, 2018, pp. 5–16. with the proposed extensions on top of the LLVM-based [23] “Spike RISC-V ISA simulator,” https://github.com/riscv/riscv-isa-sim. libFuzzer and in a case-study evaluated it on a set of three [24] “Forvis: A formal RISC-V ISA specification,” https://github.com/ publicly available RISC-V ISSs. We observed that our fuzzer [25] rsnikhil/RISCV-ISA-Spec. “RISC-V ISA tests,” https://github.com/riscv/riscv-tests. has been very effective in maximizing most coverage metrics [26] “RISC-V torture test generator,” https://github.com/ucb-bar/ and in finding various errors. Fuzzing is particularly useful to [27] riscv-torture. trigger and check for corner- as well as error-cases and can V. Herdt, H. M. Le, D. Große, and R. Drechsler, “Verifying SystemC us- ing intermediate verification language and stateful symbolic simulation,” complement other testcase generation techniques. We found TCAD, 2018, (early access). new errors in every considered ISS, including one error in the [28] ——, “Compiled symbolic simulation for SystemC,” in ICCAD, 2016, official RISC-V reference simulator Spike. In our discussion [29] pp. 52:1–52:8. J. Yuan, C. Pixley, and A. Aziz, Constraint-based Verification. Springer, we sketched various directions for future work to further 2006. improve and complement our CGF. [30] F. Haedicke, H. M. Le, D. Große, and R. Drechsler, “CRAVE: An advanced constrained random verification environment for SystemC,” REFERENCES [31] in SoC, 2012, pp. 1–7. V. Herdt, H. M. Le, D. Große, and R. Drechsler, “Towards early vali- [1] A. Adir, E. Almog, L. Fournier, E. Marcus, M. Rimon, M. Vinov, dation of firmware-based power management using virtual prototypes: and A. Ziv, “Genesys-pro: innovations in test program generation for A constrained random approach,” in FDL, 2017, pp. 1–8. functional processor verification,” D&T, pp. 84–93, 2004. [32] M. Aharoni, S. Asaf, L. Fournier, A. Koifman, and R. Nagel, “Fpgen - [2] B. Campbell and I. Stark, “Randomised testing of a microprocessor a test generation framework for datapath floating-point verification,” in model using SMT-solver state generation,” in Formal Methods for HLDVT, 2003, pp. 17–22. Industrial Critical Systems, F. Lang and F. Flammini, Eds., 2014, pp. [33] “Neural fuzzing: applying DNN to software security testing,” https:// 185–199. www.microsoft.com/en-us/research/blog/neural-fuzzing/. [3] R. Emek, I. Jaeger, Y. Naveh, G. Bergman, G. Aloni, Y. Katz, [34] P. Godefroid, H. Peleg, and R. Singh, “Learn&fuzz: Machine learning M. Farkash, I. Dozoretz, and A. Goldin, “X-gen: a random test-case for input fuzzing,” 2017, pp. 50–59. generator for systems and socs,” in HLDVT, 2002, pp. 145–150.
Design, Automation And Test in Europe (DATE 2019) 365