Source 853ba322... — STIMSMITH

SOURCE ARCHIVE

SHA256: 853ba3220758a760f9939d84927f60a5b68023fda395a54b898f6e28cbe03b7d

URL: https://fmcad10.isec.tugraz.at/Papers/papers/07Session6/018Kuehne.pdf

TYPE: application/pdf

SIZE: 368.4 KB

FETCHED: 6/8/2026, 10:56:05 PM

EXTRACTOR: liteparse

CHARS: 61,009

EXTRACTED CONTENT

61,009 chars

Automated Formal Verification of Processors
    Based on Architectural Models

Ulrich K¨uhne∗ Sven Beyer† J¨org Bormann† John Barstow LSV ENS de Cachan OneSpin Solutions GmbH Abstract RT Solutions GmbH Infineon Technologies Ltd Cachan, France Munich, Germany Munich, Germany Bristol, UK

 Abstract—                                                                 processor pipeline, or an arbitration cycle in an arbiter. IPC
         To keep up with the growing complexity of digital systems, high   has been extended with further proofs which ensure that a set
       level models are used in the design process. In today’s processor   of properties verifies all input/output behavior of a circuit [5].
           design, a comprehensive tool chain can be built automatically   This methodology has already been used in industrial context
        from architectural or transaction level models, but disregarding
            formal verification. We present an approach to automatically   for the verification of a wide variety of designs [12] including
             generate a complete property suite from an architecture de-   small or medium size processors [6].
      scription, that can be used to formally verify a register transfer         However, these projects also illustrate that the integration
           level (RTL) implementation of a processor. The property suite   of a thorough formal examination into industrial verification
     is complete by construction, i.e. an exhaustive verification of all   practice requires larger changes to the education and opin-
         the functionality of the processor is ensured by the method. It
    allows for the efficient verification of single pipeline processors,   ions of verification engineers. Compared to simulation based
           including several advanced processor features like multicycle   approaches, formal verification requires a deep knowledge
         instructions. At the same time, the structured approach reduces   of the internals of the design under verification (DUV) in
          the effort for verification significantly compared to a manual   order to write assertions. An important motivation of the work
        complete formal verification. The presented techniques have been   summarized here and presented in [20] is therefore, that the
             implemented in the tool FISACO, which is demonstrated on an
industrial processor.                                                      automation of the formal verification of some well defined
     I. INTRODUCTION                                                       class of circuits eases the migration from simulation to formal
                                                                           verification and hence helps to introduce this technology. We
                 The complexity of digital hardware systems has shown an   chose smaller single pipeline processors as this class.
              exponential growth over the last decades and it is growing             For processors, a structured manual verification flow is
        still. To keep track of large systems during the design process,   available today [2]. But, automation of the verification is
             high level models are used increasingly. Especially for the   quite low, the more comprehensive the verification is. On the
          design of processors, architecture or transaction level models   other hand, existing approaches for the automatic verification
               form the core of an elaborate tool chain that enables the   of processors (see related work in Sect. II) often require a
            automatic generation of simulators, assemblers or compilers,   background of deep and general insight into verification goals
           like Facile [27] or LISA [9]. However, formal verification of   and correctness criteria.
   the functionality of the design is still not part of this tool chain.               In this paper we present a technique for the automatic
            There exist several techniques for the verification of hard-   generation of a complete property suite for processors. The
          ware designs. In simulation based verification, the outputs of   starting point of the approach is an architecture description
            the implementation are compared to a golden reference model,   of the processor. By defining a number of mapping functions
          that is usually based on a transaction level description. But,   the user captures how the abstract concepts are mapped to the
          simulation is not well suited to cover the whole functionality   register transfer level (RTL) implementation. These mapping
          of a pipelined processor because achieving a sufficient design   functions refer to pipeline stages, stall and cancel signals,
           quality for such a processor requires a huge simulation-based   and similar objects that design and verification engineers are
         verification effort and there is no guarantee that all possible   familiar with. Following this approach, the specification is
         bugs have been considered. In contrast, formal techniques offer   captured in a concise and readable form, while the underlying
the highest quality of verification [15].                                  general processor model enables the verification of several

One successful technique is Interval Property Checking advanced processor features like multicycle instructions, out- (IPC) [23], a technique similar to Bounded Model Checking of-order termination as well as exceptions. The generated [3]. IPC is used to check if a system satisfies a set of properties property suite is complete by construction in the sense of [5]. about the operations of a design like the processing of a As a driving verification engine, the OneSpin 360 MV tool request in a bus bridge, the execution of an instruction in a [24] is used, offering the performance and capacity to formally verify whole processor designs. This work was carried out jointly by the group for computer architecture The main contribution of this work is a well structured yet at University of Bremen∗, Germany and OneSpin Solutions GmbH†. It was supported in part by the German Federal Ministry of Education and Research pragmatic approach to tackle the formal verification of pro- (BMBF) within the project VerisoftXT under contract no. 01 IS 07008 C. cessors. It offers an exhaustive verification for a certain class

of designs, while the automatic generation of the properties            or properties from simulation traces. However, they are only
increases the verification productivity significantly compared    suited for an initial design understanding rather than for a veri-
to manual coding of properties. As the input for the approach     fication against a specification. In contrast, our approach starts
is an abstract architecture description, the method can easily be    with a specification that is then related to the implementation
integrated with existing tool chains for processor design. The         in a well structured way.
automatic generation of the properties is implemented in the
tool FISACO. The approach is demonstrated with an industrial               III. FORMAL VERIFICATION SETTING
control processor used in embedded automotive systems, a               Within the last two decades, there has been a lot of research
domain with particularly high quality requirements.                      in formal verification techniques. Methods based on Boolean
      The paper is structured as follows. Related work is discussed    satisfiability (SAT) have proven to be a robust solution. One
in Sect. II. In Sect. III, our formal verification techniques are            prominent technique is SAT based Bounded Model Checking
reviewed. The automatic property generation is described in              (BMC), that has first been described in [3]. Successive im-
Sect. IV. Sect. V shows the application of the approach to an              provements in performance have made BMC a suitable method
industrial processor design. Sect. VI concludes the work.          for the formal verification of larger scale designs. For the work
II. RELATED WORK                                                          at hand, we use the techniques described in [23], referred
          An approach for the automatic equivalence verification of   to as interval property checking (IPC). In the following, this
general transaction level models (TLM) with timed imple-               verification methodology will be briefly outlined.
mentations is presented in [4], where the different levels of          In contrast to BMC, only safety properties are verified using
abstraction are related by events. However, the lowest level of         IPC. As digital circuits always have a finite response time,
abstraction in [4] is behavioral RTL and it is not clear how     this is not a serious restriction in practice. It is rather natural
the concept of events relates to optimized pipeline designs.             to capture the specification of a design in terms of safety
In other words, an automated equivalence check between a                 properties. Furthermore, using IPC, these properties can be
sequential processor architecture and a pipelined RTL imple-          verified with bounded proofs, which can be checked efficiently
mentation is not feasible for optimized industrial designs.            using a SAT solver.
              There has been work on the formalization of pipelined       The main idea of IPC is to use an arbitrary starting state
designs. Part of the approaches in the literature use formal             instead of the initial state used in BMC. Any property that
models for the automatic design of correct pipelines [19], or        holds starting from an arbitrary state then also holds from any
to accompany the design process [10], [16]. In [10], starting     reachable state and thus, it is exhaustively verified. Conversely,
from a simple model, the design is incrementally refined until            false negatives can occur in IPC, i.e. counterexamples for
a pipelined implementation is obtained. A CTL specification is            properties starting in unreachable states may be produced.
transformed along with the design to prove the correctness of          These false negatives need to be removed by adding invariants
the refinement steps. A similar approach is presented in [22].      in order to restrict the starting state. For more details on the
It decomposes the correctness proof for a complex pipelined            idea of IPC and the following formalization, refer to [23].
machine with branch prediction into several steps, the first              A synchronous circuit is modeled as a finite state machine

of which proves the compliance of a simple version of the (FSM) 푀 = (퐼, 푆, 푆0, Δ, Λ, 푂) with input alphabet 퐼 ⊆ 픹푛, processor with its ISA. The drawback of these approaches is output alphabet 푂 ⊆ 픹푤, a finite set of states 푆 ⊆ 픹푚, output that they cannot handle industrial designs containing legacy function Λ and next state function Δ. The set 푆0 ⊆ 푆 denotes code and manual optimizations that are needed to match hard the initial states. With next state function Δ : 픹푛 ×픹푚 → 픹푚, power and timing constraints. the transition relation of the circuit is given by There are various techniques for the verification of existing 푇 (푠, 푠′) = ∃푥 ∈ 픹푛 : 푠′ ≡ Δ(푥, 푠). (1) processors [1], [13], [14], [30]. In [1], a formal pipeline model A safety property 푓 = AG(휑) is translated to a Boolean is introduced that is based on parcels (instructions) process- function [[푓 ]] ing through the pipeline. By instantiating several predicates point 푡, checking the validity of formula 휑 at time- describing the pipeline, the correctness of the design can 푡. Here, the translation is done such that a satisfying be proved formally. However, the model is rather abstract assignment of [[푓 ]]푡 corresponds to a counterexample of 휑. and the predicates seem difficult to derive. In contrast, we The resulting function depends on the inputs, outputs and provide a clear distinction between the architecture layer and states within a bounded time interval [0, 푐]. IPC searches for the mapping to the implementation. Furthermore, our mapping counterexamples by solving the SAT instance functions have a more intuitive counterpart in the designer’s ^푐 intent of implementing a pipeline. 푇 (푠푡+푖, 푠푡+푖+1) ∧ [[푓 ]]푡. (2) Further approaches for processor verification rely on inter- 푖=0 active theorem proving [18], [26], [29]. This generally offers The transition relation is unrolled within the time interval a high level view on the design. Theorem proving however [0, 푐] and it is connected to the single instantiation of [[푓 ]]푡. requires a significant level of expertise that is usually not In order to avoid unreachable counterexamples, invariants are available to designers or verification engineers in practice. added. In many cases, such invariants can even be generated Approaches for the automatic generation of properties are automatically [31]. In the context of the described method- given in [17], [25]; they are based on learning dependencies ology, the needed invariants are usually less complex than

    Architecture
    Description   Property  Property   Formal                             A    C                                    time
                 Generation  Suite     Equivalence      RTL
      Mapping            Proof                                                              Assertions
    Information
                                                                          reset
        Fig. 1.     Verification Flow with Property Generation                 A        fetch  decode  execute  ...     C
                                                                                                                write−
    the main properties; they can thus be verified using inductive                                               back
   proof techniques like 푘-induction [28]. For more details on the                      A                                    C
method, refer to [23].                                                                                           ...         write−
       IPC is a powerful verification technique, enabling the for-                             fetch   decode   execute      back
       malization of a specification in terms of safety properties                              ...
      and its verification against the implementation. However, to
be  sure  that no    bugs have been    missed,  the   verification            Fig. 2.   Interaction of generated properties
    engineer needs to reason about the completeness of the written
       property suite. A technique to formally check whether a set        After identifying the visible registers and interfaces, the
      of properties forms a complete specification is described in    instruction set and the exception behavior of the processor are
      [5], [8]. These techniques have been successfully applied to        described on the architecture level. The generated property
industrial processor designs [6].                                     suite consists of instruction properties  and a set of  consis-
           Completeness analysis determines whether every possible    tency assertions. While each instruction property describes the
         input scenario—corresponding to a transaction sequence of   processing of a single instruction until it leaves the pipeline,
           the design—can be covered by a chain of properties that       the consistency assertions ensure the correct interaction of
  predicts the value of states and outputs at every point in time.  multiple instructions and the consistent pipeline behavior, if no
     In other words, any two designs fulfilling all the properties   instruction is present in a dedicated stage. The latter includes
         of a complete property suite are formally equivalent. The          e.g. checking that empty stages will not update any state
    completeness analysis basically boils down to check in the end        elements. These assertions also help the user in finding an
    state of each property whether (1) there is always a successor        appropriate mapping by giving him a feedback for debugging.
       property with matching assumptions, (2) the successor prop-       Basically, the equivalence of the property suite and the DUV
       erty is uniquely determined and (3) each property describes      is established by chaining the generated properties, as shown
   the outputs and states of the design uniquely. For more details       in Figure 2. Each property is depicted as a rectangular box,
on the methodology please refer to [5], [8].                               consisting of an assume part (assumption A) and prove part

For the formal verification of the generated property suite (consequent C, shaded gray in the figure). The properties are against the RTL, we use OneSpin 360MV [24]. This com- hooked up at the time point when the processor is ready to mercial solution covers the required spectrum of formal execute the next instruction, represented by the big black dots verification—from the verification of SystemVerilog assertions in the picture. Thus, starting from reset, the first property all the way to the automatic completeness analysis described proves that the new instruction state (NIS) will be reached. above. Among various other proof engines, 360MV also offers Then, the following properties assume the NIS and prove that IPC and 푘-induction with sufficient capacity and performance after fetching the dedicated instruction, NIS will be reached to handle the complete verification of processors [6]. again, enabling the connection to the next instruction property. IV. VERIFICATION USING GENERATED PROPERTIES The basic approach has been described in detail in [20]; it is based on a patent application [7]. Technically, the basis of the approach presented here is to provide a general formal processor model that can be A. General Processor Model customized by the user to match his specific implementation. The approach presented here is limited to a class of pro- The general processor model can be thought of as a tool box cessors that is common in industrial designs. The class is with several design features to be picked out. The customiza- characterized by the following features: tion is done by setting the architecture design parameters, ∙ Single pipeline like the number of pipeline stages and the possible interface ∙ In-order-execution, out-of-order termination transactions. Furthermore the mapping from the architecture ∙ Register files with multiple prioritized write channels description to the RTL has to be established by defining a ∙ Exceptions and interrupts number of mapping functions. The basic flow is shown in ∙ Delayed branch instructions Fig. 1. The general processor model consists of three parts: ∙ Branch prediction 1) The pipeline model describes the movement of the ∙ Multicycle instructions instructions through the stages ∙ Multiple interfaces, including pipelined protocols 2) The datapath model describes register access and data Note that a typical CPU also contains complex data memory forwarding and prefetch logic. With our approach, the core of such a 3) The interface model describes memory and bus ac- CPU can be verified with generated properties, providing cesses exact interface descriptions to the data memory and prefetch.

INSTR 2 INSTR 1 RESET

These modules can in turn be verified manually, thus ensuring              Arithmetic Instruction ADD               opcode = 11000
correctness of the overall CPU. In such manual extensions, the                                                      No memory access.
already established mappings and models can easily be reused.             15  opcode11 10ra 8 7 rb  5 4 rt 2        R[rt] := R[ra] + R[rb];
         The components that are included in the architecture view                                 (a) Specification

are described in the following. A processor receives its instruc- tions via an instruction memory interface, that is addressed a program counter 푃 퐶. The currently processed instruction word is denoted 퐼푊 . There can be an arbitrary number 4 architecture registers and flags. There can be interfaces memories or buses, each associated with a set of transactions, at least containing the idle transaction. The instructions are described on the architecture level. In order to verify them against the RTL implementation, a mapping has to be established by the user. For the components 푃 퐶 and 퐼푊 , the corresponding mapping function is usually pointing to a dedicated signal in the RTL showing the value of the program counter and an instruction register, respectively. However, the mapping of an architecture register file requires a model of the pipelining due to forwarding mechanisms that are part of every pipelined processor design. We first introduce the architecture description, followed by a discussion of the models for the pipeline, the data path, and the interfaces. Finally, the generation of the property suite is described, and the completeness of the model is discussed. B. Architecture Description In our approach, there is a clear distinction between the architecture description and the mapping functions. In this way, a readable and proven correct description of the ISA is obtained. The mapping functions relate the ISA to the RTL. In the first section of the architecture description, the com- ponents of the processor are listed, comprising all architecture registers and flags. Furthermore the interfaces to memories and buses are given, as well as the respective transaction types on these interfaces. The main section of the architecture descrip- tion consists of the ISA description, where all instructions of the processor are defined. In the ISA description, the registers are referred to by their specification name. For each instruction, first the execution condition is given (TRIGGER). Then, the updates of the program counter and the architecture registers and flags need to be defined, followed by the definition of one transaction per interface. The updates are defined by the read registers (VREGISTER), the target register (UREGISTER) and the value that will be written by the instruction (UPDATE). As an example, consider Fig. 3(b) with a simple processor description including an ADD instruction. The triggers for the instruction are divided into two statements, one of which only depends on the architecture state (TRIGGER_STATE), is while the second one depends on the instruction word certain (TRIGGER_IW). Besides the update of the program counter 푖푎, in line 8, there is one update of the register R, where two registers are read addressed by parts of the instruction (lines 9 and 10). The target register is given in line 11 sum of the two source registers is defined in line 12. Finally, there is no transaction on the data memory interface, by the statement DMEM_IDLE in line 13. by 1 registers := R; 2 interfaces := DMEM; of 3 transactions_DMEM := IDLE, READ, WRITE; to data 5 simple_instruction ADD { 6 TRIGGER_STATE := true; 7 TRIGGER_IW := IW[15:11] == ADD_op; 8 UPDATE_PC := (PC + 2)[7:0]; 9 VREGISTER_1 := R(IW[10:8]); 10 VREGISTER_2 := R(IW[7:5]); 11 UREGISTER_1 := R(IW[4:2]); 12 UPDATE_1 := (VREGISTER_1 + VREGISTER_2)[15:0]; 13 DMEM_IDLE; } (b) Architecture description Fig. 3. Informal specification and architecture description example C. Pipeline Model In a pipeline, the processing of instructions is overlapped in order to speed up computations. Thus, a new instruction starts before the preceding one has terminated. For example, a typical simple pipeline would partition an instruction into fetching the instruction word from the memory, decoding it, executing logical and arithmetic operations and writing the result back into the register file. Note that this section only introduces basic pipeline modeling for the control path in order to keep track of the different instructions in the pipeline. The handling of forwarding is part of the data path of a pipeline and discussed in the following Sect. IV-D. The major challenge in designing a correct pipeline are hazards, i.e. conflicts between instructions that are processed at the same time in different stages. If an instruction needs data that is currently being computed by a preceding instruction, a read-after-write conflict occurs and the succeeding instruction needs to wait for the data. Thus, a mechanism to stall a stage is needed. Another hazard is related to branching instructions. When a jump is taken, this is typically detected at a time when subsequent instructions from the sequential program flow already have been fetched. Therefore, the pipeline must possibly be cleaned from wrongly fetched instructions, requir- ing a cancel mechanism. As this may lead to stages that are not processing any instructions, it is desirable to distinguish between empty and full stages to prevent spurious register updates or similar faults. Based on these requirements, we now define our pipeline model. Given the number of pipeline stages 푛, we define the set 풮 = {1, 2, . . . 푛} of pipeline stages. The pipeline architecture further classified by defining some constants that refer to stages like the decode stage 푑푒푐 ∈ 풮 and the stages 푖푣 ∈ 풮 that denote the stage when the instruction memory is accessed and when the instruction word is valid, respec- word tively. The processing of instructions by the pipeline is defined and the by the mapping functions1 푓 푢푙푙, 푠푡푎푙푙, 푐푎푛푐푒푙 : 풮 → 픹. The value of 푓 푢푙푙(푠) reflects if the pipeline stage 푠 currently indicated 1 The state of the design is an implicit parameter of all mapping functions.

    t1      t2      t3                                                   t1,1              t2,1 t3,1                  t2,2 t3,2 t2,3    t3,3
stall(1)                                                             stall(1)
stage 1                                                              stage 1
stall(2)                                                             stall(2)
stage 2                                                              dispatch(2)
stall(3)                                                             stage 2
stage 3                                                              stall(3)
    Fig. 4. Normal processing of instructions by the pipeline        stage3
                                                                         Fig. 6.           Pipeline for a multicycle instruction
    t1      t2  tjmp
stall(1)                                                             ping functions for of out-of-order termination, exceptions, and
stage 1                                                              multicycle instructions need to be defined.
stall(2)                                                                                             푙푎푠푡 푠푡푎푔푒, 푖푛푗푒푐푡, 푑푖푠푝푎푡푐ℎ : 풮 → 픹,         (3)
stage 2                                                              Here, 푙푎푠푡 푠푡푎푔푒(푠) indicates that the instruction in stage                     푠
stall(3)                                                             will leave the pipeline, 푖푛푗푒푐푡(푠) states that an instruction will
stage 3                                                              be injected into stage 푠 in the next cycle due to an exception,
                                                                     and  푑푖푠푝푎푡푐ℎ(푠)                       describes that a multicycle instruction is
    Fig. 5. Pipeline for a taken jump instruction                    started in stage                      푠. The pipeline of a multicycle instruction
    according to our model is shown in Fig. 6. There, the partial

holds an instruction. If 푠푡푎푙푙(푠) is true, the instruction in stage instructions are dispatched in stage 2. 푠 will not proceed to the succeeding stage 푠 + 1. 푐푎푛푐푒푙(푠) indicates that the instruction currently in stage 푠 will be D. Data Path Model removed from the pipeline and will have no more effects Based on the above control path model of the pipeline, we in later stages. The normal processing of two consecutive can now define the data path model, describing the way how instructions is shown in Fig. 4, where time progresses from left data is read, forwarded and stored in the registers. to right. The time-points when the first instruction is allowed For a register file 푅, a mapping function 푐푢푟푟푒푛푡푅 : ℐ푅 → to proceed to the next stage are denoted by 푡1 to 푡3, i.e., 풟푅 is defined that returns the current implementation state of 푠푡푎푙푙(푠) evaluates to false at 푡푠. The boxes indicate the time- the register, where ℐ푅 is the index set and 풟푅 is the data points when the respective stage is processing an instruction, domain of register 푅. For the data path of the register the i.e. 푓 푢푙푙(푠) = 1. The pipeline for a taken jump or mispredicted following mapping functions have to be defined: branch instruction is shown in Fig. 5. At timepoint 푡푗푚푝, the 푤푟푖푡푒 canceling of two succeeding instructions is indicated by the 푅, 푣푎푙푖푑푅 : 풮 → 픹 (4a) dark boxes. After the taken jump, the target instruction is 푑푒푠푡푅 : 풮 → ℐ푅 (4b) fetched from the instruction memory. 푑푎푡푎푅 : 풮 → 풟푅, (4c) The mapping functions have to be defined by the user. This where 푤푟푖푡푒 means for example, that the user needs to identify how the 푅(푠) indicates if the instruction in stage 푠 is going implementation encodes the fact that a stage is full. Since the to update register 푅, while 푑푒푠푡푅(푠) and 푑푎푡푎푅(푠) specify functions are used in the properties, the verification fails as the target register and the data to be written, respectively. By long as the model is not completed properly. 푣푎푙푖푑푅(푠), it is stated if stage 푠 already produces a valid result. In addition to the basic model, further pipeline operations With these building blocks, the forwarding in the pipeline to can be supported. It is common for instructions to leave the some forwarding target stage 푠 ∈ 풮 can easily be captured: the pipeline before the last stage, if no more actions will be value of a register 푅 with index 푖 ∈ ℐ푅 in the forwarding target taken in later stages, in order to prevent conflicts. Thus, a stage 푠 is recursively given by checking whether succeeding last stage can be defined for each instruction. Exceptions are stages write to register 푖; this corresponds to the forwarding a crucial feature for practical applications. By nature, they logic in the pipeline.  interrupt the normal instruction processing. The most general ⎧ 푐푢푟푟푒푛푡 (푖), if 푠 ≥ 푤푟푖푡푒푏푎푐푘 ⎨ 푅 푅; exception model, that is still suited to conform with our  approach, is an injection of a new instruction into the pipeline 퐷푎푡푎푅(푠, 푖) = 푑푎푡푎 푠 + 1), if 푤푟푖푡푒 (푠 + 1)∧  푅 after an exception has been acknowledged. Finally, for more  푅( 푑푒푠푡 (푠푅+ 1) = 푖; ⎩ complex arithmetic operations or interactions with protocol 퐷푎푡푎푅(푠 + 1, 푖), otherwise. driven interfaces, multicycle instructions are frequently used Note that this automatically generated function 퐷푎푡푎푅 in processor designs. Typically, an FSM in an early stage is actually captures the complex mapping of the visible register responsible for dispatching partial instructions in the pipeline. 푅 to the implementation, i.e., the architecture value of 푅 for For these refinements of the simple model, additional map- an instruction in the pipeline is the value of 퐷푎푡푎푅 in the

forwarding target stage of that instruction. Since the value of                                    TABLE I
퐷푎푡푎푅 may be invalid because the result of some instruction is                                             USER INPUT FOR PROPERTY GENERATION
not available yet, we introduce an additional mapping function                                                       (a) Constants
capturing whether the forwarding data is indeed valid:                    Name         Domain               Description
                 ⎧                                                        푛            ℕ                    number of stages
                 
                 false ,                  if 푠 < dec;
                                                                         푑푒푐          풮 = {1, . . . , 푛}   decode stage
                 
                 
                 ⎨true ,                   if 푠 ≥ 푤푟푖푡푒푏푎푐푘푅;             푖푎           풮                    instruction memory access stage
푉 푎푙푖푑푅(푠, 푖) =                           if 푤푟푖푡푒  (푠 + 1)∧             푖푣           풮                    stage in which instr. word is valid
                 푣푎푙푖푑     (푠 + 1),           푅                          푖푛푡          풮                    highest stage for interrupt injection
                 
                 
                      푅                   푑푒푠푡    푠
                 ⎩푉 푎푙푖푑푅(푠 + 1, 푖),            푅(  + 1) = 푖;             푑푎IF         풮                    access stage for interface IF
                                           otherwise.                     푑푣IF         풮                    data valid stage for interface IF
                                                                          푤푟푖푡푒푏푎푐푘푅   풮                    writeback stage for register 푅
E. Interface Transactions                                                                                        (b) Mapping functions
In order    to   verify   the  interfaces of the   processor,   the       Arch.       Function     Signature    Description
following model is used. For each interface 퐼퐹        the constants                                                 Basic components
푑푎퐼퐹 , 푑푣퐼퐹       ∈ 풮 denote the stage when a data access is issued       PC          푝푐           ℕ            program counter
and when valid data is returned, respectively. For each inter-            IW          푖푤           ℕ            instruction word
face, a set of transactions 푇 퐴퐼퐹    is defined by the user with at                                                  Pipeline Model
least 퐼퐷퐿퐸 ∈ 푇 퐴퐼퐹 . For each transaction 푡푎 ∈ 푇 퐴퐼퐹 a func-                          푓 푢푙푙        풮 → 픹        stage active
tion 푡푎퐼퐹          : ℕ × ℕ → 픹 is defined, where 푡푎퐼퐹 (푎푑푑푟, 푤푑푎푡푎)                   푠푡푎푙푙        풮 → 픹        stage stalled
captures    that the  specified  transaction takes place in     the                   푐푎푛푐푒푙       풮 → 픹        stage is canceled
                                                                                      푖푛푗푒푐푡       풮 → 픹        inject launch instr.
design, optionally involving the address 푎푑푑푟 and (for writing                        푑푖푠푝푎푡푐ℎ     풮 → 픹        dispatch micro instr.
transactions) the write data           푤푑푎푡푎. As for the example in                   푙푎푠푡 푠푡푎푔푒   풮 → 픹        instr. leaves pipeline
Fig. 3(b), the three functions                 퐼퐷퐿퐸퐷푀 퐸푀 , 푅퐸퐴퐷푀 퐸푀           8       푐푢푟푟푒푛푡     Datapath Model
                                                                                           푅              푅
and 푊 푅퐼푇 퐸퐷푀 퐸푀 need to be defined, capturing for given                      >                    풟            implementation register
                                                                              >
address and data, if the respective transaction is issued on the          R <         푤푟푖푡푒푅       풮 → 픹        stage will write
                                                                                       푅                    푅   write destination
                                                                              >       푑푒푠푡         풮 → ℐ
                                                                              >
data memory interface.                                                        :       푑푎푡푎푅        풮 → 풟푅       write data
Besides     the  transactions,   for  each interface a      mapping                   푣푎푙푖푑푅       풮 → 픹        data is valid
function    푟푑푎푡푎퐼퐹        points to the implementation port, where                   푖푏푢푠 푟푒푎푑    Interfaces
data is read in to the processor. Finally, there is a static              IF TA       푡푎IF         ℕ → 픹        instruction fetch
interface to the instruction memory, given by the predicate               IF RDATA    푟푑푎푡푎IF      ℕ × ℕ → 픹    transaction
                                                                                                   ℕ            read data
푖푏푢푠 푟푒푎푑 : ℕ → 픹, which checks if the instruction memory
is currently being accessed for a given value of the program
counter.                                                               or stalled, and 푠 is empty or it will proceed to the next stage,
                                                                       then 푠 must be empty in the next cycle. Here, 푓 푡 denotes the
F. Consistency Assertions                                              value of 푓 at timepoint 푡. Other assertions ensure, for example,
         While the above models describe the processing of instruc-    that instructions do not overwrite each other and that empty
tions by the successive pipeline stages, additional assertions         pipeline stages do not have an effect on the visible registers
are needed for the overall correctness of the processor. This          or issue interface transactions.
includes the behavior of empty pipeline stages as well as the
interaction of succeeding instructions. For this purpose, a set        G. Generating The Property Suite
of consistency assertions are automatically generated.                    In order to adapt the general processor model to the actual
    Note that the overall verification is fail safe, i.e. it cannot    DUV, the user needs to specify the mapping functions de-
succeed if the design is not correct. But, even for a correct          scribed in Sections IV-C to IV-E. The user input is summarized
design, finding the appropriate mapping functions can be               in Table I. Besides the basic data on the pipeline, given by
difficult. The consistency assertions provide useful information       a set of constants, the table shows the mapping functions
on the status of the modeling. Failing assertions can point            corresponding to the architectural components of the general
the user to certain mapping functions that need to be revised          processor model.
to complete the verification, thereby guiding the debugging               During the generation of the property suite, an architecture
process.                                                               register              푅(푖) is replaced by an instantiation of the function

We show the following assertion as an example. For a more 퐷푎푡푎푅(푠푓 푤푑, 푖), where 푠푓 푤푑 is the forwarding target stage, detailed description of the consistency assertions, see [20]. which is usually the decode stage. ∀푠, 2 ≤ 푠 ≤ 푛 : The generated properties prove the correctness of the in- ( (¬푓 푢푙푙푡(푠 − 1) ∨ 푠푡푎푙푙푡(푠 − 1))∧ (5) structions on the implementation level. For this, we define 0 to be the timepoint when the respective instruction enters (¬푓 푢푙푙푡(푠) ∨ ¬푠푡푎푙푙푡(푠)) ) ⇒ ¬푓 푢푙푙푡+1(푠) 푡 the pipeline and 푡푖 > 푡푖−1, 1 ≤ 푖 ≤ 푛 to be the timepoints This assertion states that it is illegal to create full stages in when the instruction is allowed to proceed from stage 푖 (see the middle of the pipeline: when the stage before 푠 is empty also Fig. 4–6). The properties have an implication structure

       퐴 ⇒ 퐶. Whenever the assumptions 퐴 evaluate to true, the   given, followed by a presentation of the verification results.
  prove part 퐶 must hold as well. In the following, we give an   Besides, during its development, the method has been applied
   overview of the templates for instruction execution without   for the complete verification of smaller processor designs from
   exceptions. There are similar templates for exceptions; for   the opencores site (www.opencores.org). Details cannot
  more details, see [7]. Note that there are two templates for   be given here due to page limitation.
 conditional branches depending on whether the branch is taken
 or not. In this way, branch prediction can easily be modeled.   A. PCP Processor
The assume part for an instruction 푚 basically consists of the      The PCP processor is a  control   processor that is         part
following assumptions:                                           of automotive systems. Its main purpose is the monitoring
 ∙  The instruction enters the pipeline at 푡0.                   of peripheral components in order    to release the         central
 ∙  In decode, instruction 푚 is triggered.                       CPU [11]. Therefore, a great share of the instruction set
 ∙  The instruction proceeds from stage 푖 at 푡푖, 푖 ≤ 1 ≤ 푛.      is dedicated to data transfer and bit operations, which are

∙ The instruction is not canceled by preceding instructions frequently used in typical control applications. The PCP is and not replaced by an exception call. connected to a data memory and a pipelined FPI bus (Flexible For each instruction, mainly the following will be proved: Peripheral Interface). As the bus operations require complex ∙ The instruction is fetched from the instruction memory protocol transactions, 35% of the instruction set are multicycle ∙ The program counter is updated correctly instructions. In total, the PCP has 66 instructions, divided ∙ The full stages are correctly tagged by the 푓 푢푙푙 function into arithmetic/logic instructions, jump and control, memory ∙ No 푐푎푛푐푒푙 is generated (except for jump instructions) instructions, bus instructions and complex math instructions. ∙ All read registers are valid The processor is implemented as a four stage pipeline. The ∙ The registers will be updated (or remain stable) corre- register file contains 8 registers of 32 bit, where one register is sponding to the ISA; this includes the verification of a special purpose register containing various status flags and correct forwarding the program counter. The whole RTL implementation adds up ∙ The correct transactions will take place on the interfaces to about 17.000 lines of VHDL code, accompanied by a de- tailed informal specification. Regarding the complexity of the H. Completeness design and the quality of the source code and documentation, The pipeline model is built such that the final property suite the time for the formal verification was estimated with 8 to 10 in combination with the consistency assertions is complete person months, needed to manually write a complete property by construction, if some rules are respected for the definition suite using OneSpin 360 MV. of the mapping functions. For a proof for the basic pipeline model, see [7]. B. Results However, the completeness of a concrete generated property The PCP has been verified using the presented approach. suite additionally depends on the proper definition of some of The informal specification was ported to an architecture de- the functions. If, for example, the user defined the function scription. Most of the manual effort was spent for the definition for a read transaction by simply returning true, it is obvious and refinement of the pipeline and datapath model, given by that the interface signals are not checked at all for read the mapping functions explained in Sect. IV-C to IV-E. Using transactions and there is a gap in the verification. In summary, our approach, the instruction set of the PCP could be success- the generation ensures that all possible scenarios are covered fully verified except for two highly complex bus instructions with properties, but not that all transactions verify all outputs. involving nested loops and excluding three complex math However, the automatic gap detection of OneSpin 360 can be instructions. For these instructions, the control mechanisms used to close these gaps as well. of the PCP did not match our general pipeline model. It does V. APPLICATION not seem useful to extend the model for these cases, as they are very specific to the PCP implementation. Instead, having The above method has been implemented as a front-end found a good representation of most of the functionality based for OneSpin 360 MV; we call it FISACO (Formal Instruction on our processor model, the defined functions can be reused Set Architecture Compiler). It takes an architecture description for further manual verification. This has also been done for and automatically generates the instruction properties and the some additional functionality beyond the ISA, like loading consistency assertions in a form readable for 360MV. The and storing full register contexts. The overall verification of the mapping information needs to be supplied in a temporal logic PCP with our methodology took about 5 person months. Thus, format. The processor model formed by both the architecture we could achieve an estimated productivity gain of 100%. description and the mapping information can then be verified The verification has been carried out on 2.2 GHz work- and debugged using 360MV. station with 16 GB memory. Details on the proof times and In the following we will describe the application of the the used memory can be found in Table II. As can be seen, proposed method on an industrial processor design. We suc- the generated consistency assertions could be proved quickly. cessfully verified a control processor that is used in automotive Most of the time was spent for the verification of arithmetic applications, the Peripheral Control Processor (PCP) by In- and bus instructions. The latter ones are mostly multicycle fineon Technologies. First, the basic data of the PCP will be instructions and thus the design needs to be unrolled for up to

          TABLE II
          VERIFICATION RESULTS                                             [7]  J. Bormann, S. Beyer, and S. Skalberg. Equivalence verification between
                                                                                transaction level models and RTL at the example of processors, 2008.
      Category  Properties          Total time  Avg. time      Memory      [8]  European Patent Application, publication number EP1933245.
      assertions    34                   600 s     17.6 s      937 MB           J. Bormann and H. Busch. Method for determining the quality of a set
      arithmetic    15                 18788 s   1252.5 s     2500 MB           of properties, applicable for the verification and specification of circuits,
                                                                                2007. European Patent Application, publication number EP1764715.
      logic/bit     18                  3368 s    187.1 s     2500 MB      [9]  G. Braun, A. Nohl, A. Hoffmann, O. Schliebusch, R. Leupers, and
      jump          7                    220 s     31.4 s     2500 MB           H. Meyr.       A universal technique for fast and flexible instruction-set
      memory        16                  2135 s    133.4 s     2500 MB           architecture simulation. IEEE Transactions on CAD, 23(12):1625–1639,
      bus           10                  3214 s    321.4 s     2500 MB           2004.
      other         3                    147 s     49.1 s     1763 MB     [10]  C. Braunstein and E. Encrenaz. Formalizing the incremental design and
      total        103                  7:54 h                                  verification process of a pipelined protocol converter.      In IEEE Int’l
    Workshop on Rapid System Prototyping (RSP), pages 103–109, 2006.
    [11] S. Brewerton. Dual core processor solutions for IEC61508 SIL3 vehicle

26 cycles. Note that the two most difficult instructions make [12] safety systems. In Embedded World Conference, 2007. R. Brinkmann, P. Johansson, and K. Winkelmann. Advanced Formal up 3 hours and 48 minutes or 48% of the total runtime. Verification, chapter Application of Property Checking and Underlying Techniques, pages 125–166. Kluwer Academic Publishers, 2004. VI. CONCLUSIONS [13] R. Bryant, S. German, and M. Velev. Processor verification using effi- We have presented an approach for the automatic generation cient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):93–134, 2001. of a complete property suite from an architecture description [14] J. Burch and D. Dill. Automatic verification of pipelined microprocessor of a processor. There is a clear distinction between the control. In Proc. of the Int’l Conf. on Computer Aided Verification, pages architecture model and the mapping information connecting [15] 68–80, 1994. W. B¨uttner. Complex hardware modules can now be made free of architecture to RTL implementation. The architecture model functional errors without sacrificing productivity. In Proc. of Int’l Conf. can be easily derived from an informal specification. [16] on Abstract State Machines, B and Z, LNCS, pages 1–3. Springer, 2008. The mapping from the specification to the implementation is A. Chattopadhyay, A. Sinha, D. Zhang, R. Leupers, G. Ascheid, and H. Meyr. Integrated verification approach during ADL-driven processor based on a general pipeline model that reflects the designer’s design. In IEEE Int’l Workshop on Rapid System Prototyping (RSP), intent in implementing a correct pipelined processor. A set [17] pages 110–118, 2006. of consistency assertions is automatically generated to check G. Fey and R. Drechsler. Design understanding by automatic property generation. In Workshop on Synthesis And System Integration of Mixed the correctness of the model and helps the user in finding Information technologies (SASIMI), pages 274–281, 2004. a suitable mapping. When the mapping and the architecture [18] R. Hosabettu, G. Gopalakrishnan, and M. Srivas. Verifying advanced description are finished, the generated property suite forms a microarchitectures that support speculation and exceptions. In Proc. of the Int’l Conf. on Computer Aided Verification, pages 521–537, 2000. model of the design, i.e. the verification is exhaustive. [19] D. Kroening and W. J. Paul. Automated pipeline design. In Proc. of the The practicability of the approach has been demonstrated [20] 38th Conference on Design Automation, pages 810–815. ACM, 2001. on an industrial processor design, a control processor from U. K¨uhne. Advanced automation in formal verification of processors. PhD thesis, University of Bremen, 2009. the automotive domain. With the presented methodology, the [21] U. K¨uhne, S. Beyer, and C. Pichler. Generating an efficient instruction estimated verification productivity could be doubled. set simulator from a complete property suite. In Proc. of the IEEE Int’l In the future, we want to integrate this approach with our au- [22] Symposium on Rapid System Prototyping, 2009. P. Manolios and S. Srinivasan. A complete compositional reasoning tomatic generation of efficient instructions set simulators (ISS) framework for the efficient verification of pipelined machines. In Proc. [21]. This allows to generate both a complete property suite [23] of the Int’l Conf. on Computer Aided Design, pages 863–870, 2005. and an efficient ISS from a common architecture description, M. Nguyen, M. Thalmaier, M. Wedler, J. Bormann, D. Stoffel, and W. Kunz. Unbounded protocol compliance verification using inter- ensuring that the generated ISS complies to the verified RTL val property checking with invariants. IEEE Transactions on CAD, code. A complementary extension would be the use of existing [24] 27(11):2068–2082, Nov. 2008. ADL like LISA, facilitating the integration of formal methods OneSpin Solutions GmbH, Munich, Germany. OneSpin Verification Solutions. http://www.onespin-solutions.com, 2009. into the tool chain for processor design. [25] F. Rogin, T. Klotz, G. Fey, R. Drechsler, and S. R¨ulke. Automatic generation of complex properties for hardware designs. In Design, REFERENCES [26] Automation and Test in Europe, 2008. J. Sawada and W. Hunt. Processor verification with precise exceptions [1] M. Aagaard. A hazards-based correctness statement for pipelined and speculative execution. In Proc. of the Int’l Conf. on Computer Aided circuits. In CHARME, pages 66–80, 2003. Verification, pages 135–146, 1998. [2] S. Beyer, C. Jacobi, D. Kroening, D. Leinenbach, and W. Paul. Putting [27] E. Schnarr, M. Hill, and J. Larus. Facile: a language and compiler it all together—formal verification of the VAMP. STTT Journal, 8(4– for high-performance processor simulators. In Proc. of the Conf. on 5):411–430, Aug. 2006. Programming language design and implementation, pages 321–331, [3] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checking 2001. without BDDs. In Tools and Algorithms for the Construction and [28] M. Sheeran, S. Singh, and G. St˚almarck. Checking safety properties Analysis of Systems, volume 1579 of LNCS, pages 193–207. Springer using induction and a SAT-solver. In FMCAD, pages 127–144. Springer, Verlag, 1999. 2000. [4] N. Bombieri, F. Fummi, G. Pravadelli, and J. Marques-Silva. Towards [29] J. Skakkebæk, R. Jones, and D. Dill. Formal verification of out-of-order equivalence checking between TLM and RTL models. In Int’l Conf. on execution using incremental flushing. In Proc. of the Int’l Conf. on Formal Methods and Models for Codesign, pages 113–122, 2007. Computer Aided Verification, pages 98–109, 1998. [5] J. Bormann. Vollstaendige funktionale Verifikation. PhD thesis, Univer- [30] M. Velev. Formal verification of pipelined microprocessors with delayed sity of Kaiserslautern, 2009. branches. In Int’l Symp. on Quality Electronic Design, page 4pp., 2006. [6] J. Bormann, S. Beyer, A. Maggiore, M. Siegel, S. Skalberg, T. Black- [31] M. Wedler, D. Stoffel, and W. Kunz. Exploiting state encoding for more, and F. Bruno. Complete formal verification of TriCore2 and other invariant generation in induction-based property checking. In Asia and processors. In Design and Verification Conference (DVCon), 2007. South Pacific Design Automation Conference, pages 424–429, 2004.