Source 2d23ce85... — STIMSMITH

SOURCE ARCHIVE

SHA256: 2d23ce8537e6765c71ad245e36bf2f9a99b258f72cf5808c9cdfad190dcfcf51

URL: https://www.cecs.uci.edu/~papers/compendium94-03/papers/1996/dac96/pdffiles/23_1.pdf

TYPE: application/pdf

SIZE: 85.3 KB

FETCHED: 6/5/2026, 10:33:52 AM

EXTRACTOR: liteparse

CHARS: 44,163

EXTRACTED CONTENT

44,163 chars

Code Generation and Analysis for the Functional Verification of Microprocessors

Anoosh Hosseini Dimitrios Mavroidis Pavlos Konas
               Silicon Graphics Inc.
              2011 N. Shoreline Blvd.,
              Mountain View, CA 94043
                   anoosh@sgi.com


           Abstract                                                    hand, focus on producing long sequences of legal instructions as-
     A collection of code generation tools which assist designers in   suming that the random interaction of these instructions will pro-
the functional verification of high performance microprocessors is     duce conditions rarely created by compiler-generated code, or con-
presented. These tools produce interesting test cases by using a va-   ceived by a programmer. Unfortunately, they usually produce code
riety of code generation methods including heuristic algorithms,       of poor quality. Finally, heuristic-based code generators combine
constraint-solving systems, user-provided templates, and pseudo-       user-provided attributes and properties with knowledge of the ar-
random selection. Run-time analysis and characterization of the        chitecture and of the design to produce algorithms targeting the
generated programs provide an evaluation of their effectiveness in     most complicated features of the design. They generate code of
verifying a microprocessor design, and suggest improvements to         high quality by intelligently selecting instructions whose execution
the code generation process. An environment combining the code         will create the proper conditions for an interesting case, which has
generation tools with the analysis tools has been developed, and it    not been previously covered, to arise.
has provided excellent functional coverage for several generations                                       Isolating a design flaw can be accomplished in two ways. The
of high-performance microprocessors.                                   simplest approach is to generate self-checking code. The test pro-
1 Introduction                                                         gram sets up a combination of conditions and then checks whether
                                                                       the RTL model reacted correctly to the given situation. Unfortu-
    Functional verification is a vital part in the design and imple-   nately, the state compare instruction sequence is usually too intru-
mentation of high performance microprocessors. Both customer           sive at the RTL level; it is coarse grain and, thus, not so accurate; it
confidence and commercial success depend on a defect-free func-        consumes precious simulation cycles; and it may burden the code
tional product which is introduced into the market in a timely fash-   generation tool by requiring it to maintain an extensive amount of
ion [1].        A design verification team (DVT) presently relies on   state. The most efficient approach is to non-intrusively compare
extensive simulation-based testing of the microprocessor’s RTL         the traces generated by the simulation of the RTL model with the
model to achieve the functional coverage necessary for a design        simulation traces of an architectural reference model. Such an ap-
to be released to the manufacturing process. State-of-the-art mi-      proach frees the diagnostic program from continuously checking
croprocessors, however, achieve high performance through several       the reactions of the design under testing, it is more accurate, it al-
advanced execution mechanisms [5]. The increased complexity in-        lows for a more powerful comparison process to be employed, and
troduced by these mechanisms forces DVT teams to increasingly          it relieves the code generation tool from computing the results of
depend on advanced code generation tools for the functional veri-      all the instructions it generates.
fication of microprocessors [1, 2, 3, 6].                                                                The execution of most tool-generated diagnostic programs re-
      Code generation tools create interesting instruction sequences   sults in instruction sequences which the designer can usually nei-
which when simulated on the microprocessor’s RTL model can ex-         ther completely anticipate nor fully evaluate. It is important for
pose flaws and errors in the implementation. Code generation tools     the designer, therefore, to analyze the sequence of instructions gen-
are divided into three major categories: user-assisting tools, pseu-   erated by the tool, to characterize their behavior, and to evaluate
dorandom and heuristic-based code generators.                          their effectiveness using several architectural and microarchitec-
    User-assisting tools simplify and automate tedious tasks such as   tural metrics. Such metrics relate to utilization across the differ-
the permutation, iteration, and interleaving of existing instruction   ent units of the microprocessor and include instruction histograms,
sequences into new sequences with interesting properties. Such         event coverage, and queue sizes. Furthermore, we can use these
tools make the generation of diagnostics for known cases easier and    metrics in subsequent code generations to improve the quality of
less time consuming. Pseudorandom code generators, on the other        the generated programs as well as the efficiency of the generators
  0                                                                    themselves.
                                                                                                         This paper presents a collection of advanced code generation
                                                                       tools employed in the functional verification of high-performance
                                                                       microprocessors. In section 2 we briefly outline our verification
                                                                       methodology. In sections 3 through 6 we present a few of our so-
                                                                       phisticated code generation tools. In section 7 we present an anal-
                                                                       ysis tool which is used in evaluating diagnostic programs. Finally,
           33rd Design Automation Conference 
       Permission to make digital/hard copy of all or part of this work for personal or class-room use is granted without fee provided that copies are not made
       or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is
       by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permssion and/or a fee.
       DAC 96 - 06/96 Las Vegas, NV, USA     1996 ACM, Inc. 0-89791-833-9/96/0006..$3.50

Hand−written diags (AVP, MVP,IVP) Recycled Tool−generated diags Diags Real−world applications Ans Randomly−generated diags RTL Arch. Simulator Simulator DDB

                      RTL              Arch.

Other profiler Trace Trace Ans applications −Coverage Streamer Diag −Compare Arch. Attributes X−based Trace Analyze debug facilities Refdif BUG?

                                       Profiler                                    The diagnostic programs generated in any of the above ways
                                                                       are compiled and provided as input into two simulators. The RTL
                                                                       simulator represents the specific microprocessor’s implementation.
                                                                       The architectural simulator, on the other hand, describes the be-
                                                                       havior of any microprocessor design implementing the given ar-
                                                                       chitecture as the latter is specified in the architectural manual. The
                                                                       execution of the object code on the two simulators produces two
                                                                       traces. The architectural trace captures how the architecturally vis-
                                                                       ible state changes as a result of executing the instructions in the
                                                                       diagnostic. The RTL trace, on the other hand, captures how the
                                                                       microprocessor’s state changes as a result of executing the same
                                                                       sequence of instructions. However, because of the large number
                                                                       of advanced implementation features contained in state-of-the-art
                                                                       microprocessors the two traces may not be the same. A conver-
                                                                       sion tool (streamer) transforms the RTL trace into a trace repre-
                                                                       senting the changes in the architectural state as they are deduced
                                                                       from the information in the RTL trace. There are several interest-
                                                                       ing and hard issues involved in such a conversion process, but they
                                                                       are beyond the scope of this paper.
                                                                                Once we have obtained an architectural trace from the RTL, we
Feedback                                                               compare it with the trace produced by the architectural simulator,
  Path                                                                 using an architectural comparator (refdif ). If the two traces differ,
                                                                       then the model does not behave correctly, and the diagnostic has
                                                                       identified a flaw in the microprocessor’s implementation. A pow-
                                                                       erful X-based graphical environment which exploits the informa-
                                                                       tion provided by the architectural comparator can then be used to
                                                                       debug the identified error.
Figure 1: Functional Verification Methodology                               In addition to identifying flaws in the implementation, traces of
                                                                       diagnostic program executions are also used to analyze the test pro-
                                                                       grams, determine their properties and characteristics, and evaluate
section 8 summarizes our approach to simulation-based verifica-        their effectiveness (Profiler). The results of this analysis and eval-
tion of microprocessor designs.                                        uation are stored in a diagnostic database, and they are used subse-
                                                                       quently to improve the quality of the generated code as well as the
2   A Functional Verification Methodology                              effectiveness of the code generation tools.
    Functional verification aims at isolating design and implemen-           In the following sections we take a closer look at the code gen-
tation flaws so that the design released to the manufacturing pro-     eration tools as well as at the analyzer and the diagnostic database.
cess is fully operational; that is, the RTL model exhibits the         These are the most important parts of our approach to code gener-
same behavior as an architectural simulator would when execut-         ation for the functional verification of microprocessors.
ing the same instruction sequence.        As the complexity of new     3 SBVer: An External Interface Verifier
high-performance microprocessors increases, as the quality expec-                    High-performance microprocessors employ complex external
tations of new products are rising, and as the time-to-market de-      interface units which buffer requests, allow multiple outstanding
creases, functional verification becomes a more difficult process      loads and stores, maintain multi-level caches, and perform cache
and emerges as the bottleneck of the development cycle.                coherency in multiprocessor configurations. The many states of the
 In order to improve the efficiency and the effectiveness of func-     external interface combined with an abundance of asynchronous
tional verification, we follow the methodology outlined in Fig-        events from other devices, makes the external interface a verifica-
ure 1. First, four different sources (verifiers) generate diagnos-     tion challenge.
tic programs. Hand-written directed diagnostics are developed by                  For this purpose we have developed SBVer (Store Buffer Ver-
the members of the DVT team and include architectural (AVP),           ifier), a code generator which focuses on exercising the external
microarchitectural (MVP), and implementation (IVP) verification        interface and the cache management units of the microprocessor.
programs. These diagnostics set up and check conditions deemed         Knowledge about the design of the primary and secondary caches,
interesting by the developer of each test. Second, advanced pseu-      of the various address spaces, and of the memory management unit
dorandom code generators produce long instruction sequences            have been built into the tool. SBVer, combined with heuristic algo-
which aim at creating complicated interaction patterns among the       rithms, produces sequences of instructions which cause interesting
instructions. Such instruction sequences are rarely conceived by       interactions between the processor, the caches, and the main mem-
a programmer or generated by a compiler.      Third, sophisticated     ory. SBVer has also the ability to program external event gener-
tools generate instruction sequences which stress the microproces-     ators in the system model so that they interact with the processor
sor model in ways that cannot be achieved by the first two code gen-   in a coordinated fashion. For system verification purposes, SBVer
eration approaches. Finally, “real world” software applications are    may also produce self-checking code based on an internal mem-
used to ensure that the design implements correctly and efficiently    ory model maintained during code generation. Finally, SBVer has
the most common operations.                                            a large number of configuration options in order to provide the user

Random or        0 27 FTFTFTTTFTFTTFFFFTTFTFFTFTTFT   Internal                0    1    2    3                                         Compute
                 1 10 TTFTTTTFFTTFFFFFFFFFFFFFFFFF
User−designed    2 FTTFFFTFTFTFTFFFFFTTTTFFTTFF       Branch
                 3 13 FFFFFFTTFFFFFFTFFFF                                     4    5    6    7                                          Cpu ID
Abstract graph   4 0 FT
                 5 FTFTFTFTTTTTFFFF                   Simulation
                 6 8 FTFFFTTTTFFFTTTTFF
Description      7 TTTFFTFTFTFFFTFTTF
                 8 28 18 FTFTTFF
                 9 FTFTTFTTFFFFTFFTTFFTTTFF           Branch.s
                     (code+data to
                                                      control flow)

                     Branch     Node
Branch                                                                        C    C    C    C
 Node               Filler code                                               UP   UP   UP   UP
                 Branch setup code                                            0    1    2    3
                    Filler code                                                                   Final
                                                                                                  Check
                       Branch
                 Branch delay slot                                                 Figure 3: False Sharing in MPVer
                    Filler code
                                                                         portant issues. First, we need to verify the microprocessor’s correct
                     Figure 2: BRVer Design                              operation under stressful conditions, which rarely, if at all, happen
                                                                              during its operation in a deliverable MP system. Second, we need
                                                                             to verify its functionality and performance when the multiproces-
with control over the tool’s behavior. SBVer has been successful in     sor is running “real world” parallel applications.
finding flaws in four generations of microprocessors, and in vari-      5.1   MPVer: A Multiprocessor Verifier
ous hardware systems.                                                           The verification of multiprocessing features is complicated by
4 BRVer: A Branch Verifier                                                    the interaction between multiple code streams; the unpredictable
                                                                            nature of MP arbitration; and the limited number of MP test suites
              Many pseudorandom code generators avoid complex branch-    available to the verification engineer. In order to address these is-
ing sequences, especially backward jumps, in order to prevent in-              sues, we have used an abundance of asynchronous external events
finite loops. On the other hand, the length of the produced pseudo-             in a uniprocessor environment, as well as developed an MP code
random programs results in the verification engineers having lim-       generator.
ited knowledge of the program flow, and of whether critical sec-             In general, MP verification necessitates the testing of cache co-
tions of the program have been executed. Furthermore, new micro-              herency protocols and of the correct operation of MP primitives.
processors attempt to predict the direction of branches and execute              Generating MP test cases requires the sharing of data between
instructions beyond a branch speculatively. The result of specula-                processors combined with locking mechanisms which manage ac-
tive execution is a significant increase in the number of branch re-           cesses to shared data structures, and which synchronize concur-
lated cases which need to be examined. In order to address these              rently executing instruction streams. Computing the expected re-
issues in a systematic way, we have developed BRVer. Figure 2             sults of MP test programs is challenging and it is not easily accom-
shows the various components of BRVer and how the branches are                plished with a traditional reference machine. MPVer successfully
modeled.                                                                      addresses these issues by generating multiple code streams which
       BRVer accepts as input a large number of configuration parame-   interact with each other, and yet they are able to verify the produced
ters and an Abstract Graph Description (AGD) which is either pro-         results with fine granularity. The runtime flow and relationship be-
vided by the user or it is generated heuristically. The input AGD       tween the code streams is shown in Figure 3.
contains the number of nodes (effectively branches) in the graph,             A novel approach is used to exploit the important issue of false
how the nodes are connected to one another, and for each branch                sharing. Through this approach we are able to achieve high pro-
the action to be performed (fall through or take the branch) upon          cessor interaction and provide full coverage of the cache coherency
successive arrivals. BRVer “compiles” the AGD input producing                   mechanisms without using expensive locking and synchronization
an instruction stream whose run time behavior correctly represents              operations, which interfere with the MP program flow and which
the flow described.                                                     even limit the number of interesting situations.
     BRVer also accepts user provided input streams as filler code in             True data sharing is supported and tested through the use of
between branches. This proves to be a convenient way to apply the           locks. However, because intermediate values are unpredictable, re-
branch management mechanisms to code produced by other tools                  sults are checked after all MP operations are guaranteed to have
such as SBVer and Theo.                                                    finished. For the verification of a microprocessor in a distributed
5 Multiprocessor Verification                                                       shared memory system, we have parameterized MPVer with the
                                                                                 frequency with which each CPU is to access the different mem-
        Over the last few years, most manufacturers develop multipro-            ory segments. Such a parameterization is important because we
cessor ready microprocessors [7, 8]. As a result, it is essential that   are able to program different traffic patterns, to stress routing al-
the DVT team verifies the microprocessor’s mechanisms facilitat-        gorithms, and to observe MP system stability.
ing the sharing of information across the processors of a multipro-                MPVer produces portable code which can run on either a sim-
cessor (MP) machine. Such a verification process entails two im-                ulation model or a true MP system. In both environments, MPVer

Cache Primary Cache Secondary

has been very successful in finding MP related microprocessor and system hardware flaws. 5.2 MPApplicationVerifier MPApplicationVerifier (MPAV) is an environment for the de- velopment and execution of “real world” parallel applications as diagnostics in the MP verification of a microprocessor. The en- vironment supports thread-based parallel execution, and considered as a user-level, bare-minimum operating system [4]. The user of the environment writes a single C program, aug- mented with directives which support its parallel execution. The C program is compiled into two executables which facilitate three execution modes. In the first mode, the user executes the tion natively on a workstation or on an MP system. In that way the user is able to debug the application code, and improve its per- formance and efficiency. In the other two modes of execution, parallel program is simulated by an architectural simulator the microprocessor’s RTL model. The purpose of these two exe- cution modes is to test the hardware under construction both at the microprocessor level and at the system level. These modes of ex- ecution allow us not only to isolate implementation flaws, but also to pinpoint performance problems. So far we have ported onto this environment several “real world” parallel applications including the SPLASH-2 benchmarks [9]. Other parallel applications including chaotic algorithms and branch-and-bound algorithms are currently being ported. Incorpo- rating a new application into the MPAV environment is simple. The user only needs to write three “interface functions.” Two of these functions perform the initializations of the data structures of the parallel program, whereas the third function provides the environ- ment with the “starting points” of the parallel program’s execution. In addition, we can easily incorporate sequential applications into the MPAV environment, such as the diagnostics programs created by other code generation tools. A powerful, yet flexible, X-based user interface makes MPAV an easy to use MP code generation and execution environment. The user selects the applications to be included in a particular ex- ecution, sets the corresponding input parameters for each included application, and then compiles and executes the resulting suite. MPAV’s user interface makes the construction and execution of MP test programs a simple exercise for the user. 6 Theo: A Sophisticated Code Generator State-of-the-art microprocessors employ several advanced techniques in order to improve their performance. At any given time several partially executed instructions are active (i.e. at some stage of their execution) in the processor. Instructions move between different units as resources become available. In order to reduce interruptions in the execution pipeline, which result in lost performance, computed results are bypassed to previous pipeline stages, and state is committed to registers or to memory many cycles after the instruction was issued. Historically, most design flaws have been attributed to the implementation of these complex features. The design flaws typically exhibit themselves when sequences of dependent instructions activate a combination of conditions within the design. Theo is based on the idea that if we focus on instruction se- quences to which a particular implementation may be sensitive, then we can reduce the number of test cases examined, as well as improve the quality of the verification code generated. The overall architecture of Theo is shown in Figure 4. User Templates Branch Parsed Instruction Manager Class Tree Address Register Manager ENGINE Allocation Manager it can be Event Manager Data Operand THEO.s Manager applica- Figure 4: Theo Architecture the and by The input to Theo is a collection of templates written in a super- set of the assembly language, which permits instruction specifica- tion at any level of detail, and, at the same time, allows the use of symbolic notation for operands. These templates define sequences of instructions representing “constraints.” Theo allows the users to focus on developing sequences for their own area of interest, while Theo’s engine searches for their “optimal” placement which sat- isfies the specified constraints. A typical hand-written diagnostic only stresses a particular unit, while other sections of the micropro- cessor remain idle. Theo, on the other hand, attempts to combine templates so that all units of the microprocessor are active simul- taneously. Theo uses a constraint solving engine to produce Intermediate Code Representation (ICR) through repetitive application of tem- plate instances. Subsequently, it performs instruction assignment, global resource allocation, and condition setup to produce an as- sembly program ready for simulation [2]. Templates only use symbolic names for registers. The actual register assignment is performed by Theo during one of the last phases in the code generation process. The use of symbolic in- struction class names, register names, and operands in templates is encouraged, since this allows Theo to select the actual assembly instructions and operands using sophisticated heuristic algorithms. At the same time, such a notation permits the verification engineer to express the conditions of interest in the most generic way. Code generation starts with an uninstantiated ICR. Each ele- ment in this ICR is a place holder for an instruction which initially has no particular attribute or property. Subsequently, Theo selects one of the user provided templates and applies it to the ICR; that is, the template instruction sequence, its properties, and its constraints are transferred into the ICR. Theo’s template placement algorithm avoids placing templates one after the other. Rather, it strives to achieve overlap between templates while maintaining the require- ments of each template. This is accomplished by checking for sub- set properties, by constraint solving, and by temporary unification in order to verify that an overlap can occur. If all resource require- ments are met, then the unification becomes permanent. Succes- sive application of the input templates to the ICR results in the fur- ther refinement and growth of the code. Template placement stops when the code size requirement is met. Theo goes through the ICR assigning actual instructions for any instruction class references that may exist. Then, the engine consults the register allocation manager, the address manager, the branch manager, the operand manager, and the external event man-

ager in order to allocate resources and insert condition setups. Fi-
nally, the ICR is translated into assembly code.
    Though this technique for code generation is complex, it has the
unique property that it can create new test sequences from previ-
ously independent blocks which now interact with each other. By
overlapping templates, we are also able to activate multiple units of
the microprocessor while still maintaining the sequence and condi-
tions represented by each template. The various managers utilized
by Theo encapsulate heuristic and formal algorithms which may be
applied across the entire code stream and which can be tuned with
user biasing.
7    Diagnostic Programs Evaluation
7.1  Code Analysis and Diagnostics Retrieval
 In their effort to cover as many interesting cases of the given ar-
chitecture as possible, the code generators presented so far tend to
create a large number of lengthy diagnostic programs. This abun-        Simulation(s)
dance of test programs forces us to seek a systematic and automated
way of analyzing the run time behavior of these diagnostics, and          Diagnostic
post processing this information into concise and meaningful met-
rics.
    Several reasons warrant such an evaluation. First, the code gen-
eration tools could use the information from the analysis tool as a
feedback in order to improve their effectiveness. Given the pseu-
dorandom nature of the code generation tools, such an analysis
has been proven extremely useful in creating diagnostic programs            Figure 5: Code Analysis Methodology
which cover in depth specific sets of interesting cases.
        Second, even though the tools can generate a large number of
diagnostics relatively fast, only a limited number of them can ac-      about the interesting cases covered during the particular simula-
tually be simulated daily on the RTL model, because this model          tion. Examples of interesting cases include cache hits and misses,
is complex and, thus, expensive to run. Code analysis is valuable       types of exceptions, and queue sizes.       This information is later
when trying to decide the subset of the created diagnostics that        stored in the DDB.
should be simulated on the RTL model.                                       In order to probe into the trace files systematically and extract
        Third, as the design evolves the number of accumulated diag-    interesting information quickly, we have developed the Profiler li-
nostics continuously increases, and the selection of the diagnostics    brary which is used as an interface between the analysis code and
that cover a specific case hardens. One way to address this issue       the trace files. It provides the user with a mechanism for “stepping”
is to build a diagnostic database (DDB) containing all the test pro-    through the simulation cycles recorded in a trace file, including go-
grams, along with some information characterizing their run-time        ing forward and backwards in simulation time. At any given “step”
behavior. This information can later be used to retrieve a set of di-   (simulation cycle) the user can retrieve the value of any one of the
agnostics with particular characteristics from the DDB.                 variables which constitute the machine state.
      In the following two sections we describe ther two major parts             The library approach was chosen mainly because of the flexi-
of the evaluation process: the code analysis, which for each diag-      bility it provides. Due to its object-oriented design, the interface
nostic deduces a set of attribute values, and the systematic storage    remains the same irrespectively of the type or format of the trace
and retrieval of this information into and from the database. The       file being processed. This interface allows the user to write C++
entire process is outlined in Figure 5.                                 programs that are guaranteed to work in current and future simula-
7.2  Code Analysis - The Profiler                                       tion environments.
          Each generated diagnostic program is currently executed on      In addition to diagnostic evaluation, the Profiler library has also
two simulators. The first one is an architectural simulator which       been used in a number of other tasks. We have used it to check
is used as a reference machine. This simulator is fast and inexpen-     transition coverage in the RTL model; to compare traces from dif-
sive to use. The second one is the RTL simulator, representing the      ferent models; and to verify that certain (illegal) conditions never
particular microprocessor implementation. This simulator is much        arise during the simulation of the model.
slower than the architectural one, and much more expensive to use.      7.3                    Storage and Retrieval of the Results - The Di-
        Whenever a diagnostic is run on any of the two simulators, a        agnostic Database
trace file containing information about each execution cycle of the            Every time a diagnostic program is simulated, a Profiler-based
diagnostic is created. The current model “passes” the specific di-      analysis code is executed on the trace file which represents the par-
agnostic when the RTL and the architectural traces match under the      ticular simulation.    The results of this analysis are typically ex-
architectural comparator (refdif in Figure 1).                          pressed as a set of values for a prespecified, common for all diag-
         In order to analyze the execution of a diagnostic, we post-    nostics, set of attributes. Example of attributes generated during
process the trace file created during the execution of the code on      the analysis include the number of instructions executed, the num-
either of the simulators. By doing so, we can deduce information        ber of cache hits and misses, and the lengths of various queues in










Attribute names      Author      =          Jones  Attribute

(same for all diags) ICount = 709 values Immediate = 457 Analysis Arithmetic = 44 (Profiler application) Exception = 4 CacheError = 1 Diag ExtInt = 3 Data Feedback to tool .......... Attributes Base (Profiler application)

Code Other Profiler Generation Trace Applications Tool File(s) −State coverage

                     Ans −Comparison of traces
           (User provides a query)

the microprocessor. References We developed a tool called Ans which compresses all this in- [1] M. Bass, T.W. Blanchard, D.D. Josephson, D. Weir, and D.L. formation into a highly efficient, object-based diagnostic database Halperin. Design Methodologies for the PA 7100LC Micro- (ODDB). Ans both modifies the database and queries it to retrieve processor. Hewlett-Packard Journal, 46(2):23–35, April 1995. a set of objects (diagnostics) that satisfy a given set of criteria. Ans [2] A. Chandra et al. AVPGEN –A Test Generator for Architecture is a general tool, designed to handle any object-based collection of Verification. IEEE Transactions on Very Large Scale Integra- data, in a highly sophisticated and user friendly way. tion (VLSI) Systems, 3(2):188–200, June 1995. When retrieving objects from a database, Ans uses an input set of criteria to select and return a set of objects which satisfy the [3] B. Turumella et al. Design Verification of a Super-Scalar RISC given criteria. These criteria are usually expressed in some form Processor. In Twentyfifth International Symposium on Fault of equalities or inequalities on the attribute values of the objects Tolerant Computing, pages 472–477, June 1995. stored in the database. For example, the user can ask for all the [4] I. Foster. Designing and Building Parallel Programs. Addison diagnostics that contain less than 2000 instructions (i.e. attribute Wesley, 1995. ’ICount’ is less than 2000), and take at least 10 floating point ex- [5] J.L. Hennessy and D.A. Patterson. Computer Architecture: A ceptions (i.e. attribute ’FP Exception’ is greater than or equal to Quantitative Approach. Morgan Kaufmann Publishers Inc., 10). The following attribute form describes these two constraints: 1990. ((IC ount <= 2000)&&(F P Exception >= 10)). Given this form, Ans would select a subset of the diagnostics currently [6] M. Kantrowitz and L.M. Noack. Functional Verification of a stored in the given DDB whose attribute values satisfy the given Multi-issue, Pipelined, Superscalar Alpha-Processor – the Al- constraints and would return these diagnostics to the user. pha 21164 CPU Chip. Digital Technical Journal, 7(1):136– 8 Summary 144, August 1995. In this paper we have presented a collection of advanced code [7] D. Marr, S. Thakkar, and R. Zucker. Multiprocessor Validation generation tools employed in the simulation-based verification of of the Pentium Pro Microprocessor. In Proceedings of COM- high-performance microprocessor designs. Each of the presented PCON ’96, pages 395–400, January 1996. tools addresses a unit of the microprocessor which historically has [8] B. O’Krafka, S. Mandyam, J. Kreulen, R. Raghavan, A. Saha, been a significant source of hard to find flaws. We presented SBVer, and N. Malik. MTPG: A Portable Test Generator for Cache- a code generator which focuses on exercising the external inter- Coherent Multiprocessors. In Fourteenth Annual Phoenix face and cache management units of the microprocessor. Then we Conference on Computers and Communications, pages 38–44, described BRVer which targets the branch mechanisms of the de- March 1995. sign; these mechanisms become increasingly more complicated as [9] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The designers attempt to improve the performance of the chip through SPLASH-2 Programs: Characterization and Methodological speculative execution. We addressed MP verification by present- Considerations. In Proceedings of the 22nd ISCA, pages 24– ing two tools with complementary roles. MPVer targets the shar- 36, June 1995. ing of information across the processor of an MP system as well as the communication between processors. MPAV, on the other hand, provides an environment for the development and execution of “real world” parallel applications on our simulators. Finally, Theo provides a state-of-the-art environment for the generation of diagnostics based on user provided templates, constraint solving systems, and knowledge of the microprocessor design. We have also presented the Profiler and Diagnostic Database which comprise a set of tools for the analysis of diagnostics and their efficient storage and retrieval. These tools provide us with ef- ficient ways to evaluate the code produced by the generators and to propagate this information back to the tools so that we can improve their effectiveness. We are currently working on expanding our tool-set with highly specialized code generators as well as powerful generic ones. Fur- thermore, we are extending our sophisticated heuristic algorithms to cover areas that have not yet been addressed. Finally, we incor- porate all our verification tools in an integrated environment which supports the easy and efficient production of high quality diagnos- tic programs. Design verification is an important part of the development of a microprocessor. As time-to-market decreases and the complex- ity of the high-performance microprocessors increases, design ver- ification becomes the bottleneck of the development cycle. Good verification tools become vital to the success of any microproces- sor design, and their significance will continue to increase as we move to even higher performance microprocessors.