SOURCE ARCHIVE
EXTRACTED CONTENT
44,163 charsCode Generation and Analysis for the Functional Verification of Microprocessors
Anoosh Hosseini Dimitrios Mavroidis Pavlos Konas
Silicon Graphics Inc.
2011 N. Shoreline Blvd.,
Mountain View, CA 94043
anoosh@sgi.com
Abstract hand, focus on producing long sequences of legal instructions as-
A collection of code generation tools which assist designers in suming that the random interaction of these instructions will pro-
the functional verification of high performance microprocessors is duce conditions rarely created by compiler-generated code, or con-
presented. These tools produce interesting test cases by using a va- ceived by a programmer. Unfortunately, they usually produce code
riety of code generation methods including heuristic algorithms, of poor quality. Finally, heuristic-based code generators combine
constraint-solving systems, user-provided templates, and pseudo- user-provided attributes and properties with knowledge of the ar-
random selection. Run-time analysis and characterization of the chitecture and of the design to produce algorithms targeting the
generated programs provide an evaluation of their effectiveness in most complicated features of the design. They generate code of
verifying a microprocessor design, and suggest improvements to high quality by intelligently selecting instructions whose execution
the code generation process. An environment combining the code will create the proper conditions for an interesting case, which has
generation tools with the analysis tools has been developed, and it not been previously covered, to arise.
has provided excellent functional coverage for several generations Isolating a design flaw can be accomplished in two ways. The
of high-performance microprocessors. simplest approach is to generate self-checking code. The test pro-
1 Introduction gram sets up a combination of conditions and then checks whether
the RTL model reacted correctly to the given situation. Unfortu-
Functional verification is a vital part in the design and imple- nately, the state compare instruction sequence is usually too intru-
mentation of high performance microprocessors. Both customer sive at the RTL level; it is coarse grain and, thus, not so accurate; it
confidence and commercial success depend on a defect-free func- consumes precious simulation cycles; and it may burden the code
tional product which is introduced into the market in a timely fash- generation tool by requiring it to maintain an extensive amount of
ion [1]. A design verification team (DVT) presently relies on state. The most efficient approach is to non-intrusively compare
extensive simulation-based testing of the microprocessor’s RTL the traces generated by the simulation of the RTL model with the
model to achieve the functional coverage necessary for a design simulation traces of an architectural reference model. Such an ap-
to be released to the manufacturing process. State-of-the-art mi- proach frees the diagnostic program from continuously checking
croprocessors, however, achieve high performance through several the reactions of the design under testing, it is more accurate, it al-
advanced execution mechanisms [5]. The increased complexity in- lows for a more powerful comparison process to be employed, and
troduced by these mechanisms forces DVT teams to increasingly it relieves the code generation tool from computing the results of
depend on advanced code generation tools for the functional veri- all the instructions it generates.
fication of microprocessors [1, 2, 3, 6]. The execution of most tool-generated diagnostic programs re-
Code generation tools create interesting instruction sequences sults in instruction sequences which the designer can usually nei-
which when simulated on the microprocessor’s RTL model can ex- ther completely anticipate nor fully evaluate. It is important for
pose flaws and errors in the implementation. Code generation tools the designer, therefore, to analyze the sequence of instructions gen-
are divided into three major categories: user-assisting tools, pseu- erated by the tool, to characterize their behavior, and to evaluate
dorandom and heuristic-based code generators. their effectiveness using several architectural and microarchitec-
User-assisting tools simplify and automate tedious tasks such as tural metrics. Such metrics relate to utilization across the differ-
the permutation, iteration, and interleaving of existing instruction ent units of the microprocessor and include instruction histograms,
sequences into new sequences with interesting properties. Such event coverage, and queue sizes. Furthermore, we can use these
tools make the generation of diagnostics for known cases easier and metrics in subsequent code generations to improve the quality of
less time consuming. Pseudorandom code generators, on the other the generated programs as well as the efficiency of the generators
0 themselves.
This paper presents a collection of advanced code generation
tools employed in the functional verification of high-performance
microprocessors. In section 2 we briefly outline our verification
methodology. In sections 3 through 6 we present a few of our so-
phisticated code generation tools. In section 7 we present an anal-
ysis tool which is used in evaluating diagnostic programs. Finally,
33rd Design Automation Conference
Permission to make digital/hard copy of all or part of this work for personal or class-room use is granted without fee provided that copies are not made
or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is
by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permssion and/or a fee.
DAC 96 - 06/96 Las Vegas, NV, USA 1996 ACM, Inc. 0-89791-833-9/96/0006..$3.50
Hand−written diags (AVP, MVP,IVP) Recycled Tool−generated diags Diags Real−world applications Ans Randomly−generated diags RTL Arch. Simulator Simulator DDB
RTL Arch.
Other profiler Trace Trace Ans applications −Coverage Streamer Diag −Compare Arch. Attributes X−based Trace Analyze debug facilities Refdif BUG?
Profiler The diagnostic programs generated in any of the above ways
are compiled and provided as input into two simulators. The RTL
simulator represents the specific microprocessor’s implementation.
The architectural simulator, on the other hand, describes the be-
havior of any microprocessor design implementing the given ar-
chitecture as the latter is specified in the architectural manual. The
execution of the object code on the two simulators produces two
traces. The architectural trace captures how the architecturally vis-
ible state changes as a result of executing the instructions in the
diagnostic. The RTL trace, on the other hand, captures how the
microprocessor’s state changes as a result of executing the same
sequence of instructions. However, because of the large number
of advanced implementation features contained in state-of-the-art
microprocessors the two traces may not be the same. A conver-
sion tool (streamer) transforms the RTL trace into a trace repre-
senting the changes in the architectural state as they are deduced
from the information in the RTL trace. There are several interest-
ing and hard issues involved in such a conversion process, but they
are beyond the scope of this paper.
Once we have obtained an architectural trace from the RTL, we
Feedback compare it with the trace produced by the architectural simulator,
Path using an architectural comparator (refdif ). If the two traces differ,
then the model does not behave correctly, and the diagnostic has
identified a flaw in the microprocessor’s implementation. A pow-
erful X-based graphical environment which exploits the informa-
tion provided by the architectural comparator can then be used to
debug the identified error.
Figure 1: Functional Verification Methodology In addition to identifying flaws in the implementation, traces of
diagnostic program executions are also used to analyze the test pro-
grams, determine their properties and characteristics, and evaluate
section 8 summarizes our approach to simulation-based verifica- their effectiveness (Profiler). The results of this analysis and eval-
tion of microprocessor designs. uation are stored in a diagnostic database, and they are used subse-
quently to improve the quality of the generated code as well as the
2 A Functional Verification Methodology effectiveness of the code generation tools.
Functional verification aims at isolating design and implemen- In the following sections we take a closer look at the code gen-
tation flaws so that the design released to the manufacturing pro- eration tools as well as at the analyzer and the diagnostic database.
cess is fully operational; that is, the RTL model exhibits the These are the most important parts of our approach to code gener-
same behavior as an architectural simulator would when execut- ation for the functional verification of microprocessors.
ing the same instruction sequence. As the complexity of new 3 SBVer: An External Interface Verifier
high-performance microprocessors increases, as the quality expec- High-performance microprocessors employ complex external
tations of new products are rising, and as the time-to-market de- interface units which buffer requests, allow multiple outstanding
creases, functional verification becomes a more difficult process loads and stores, maintain multi-level caches, and perform cache
and emerges as the bottleneck of the development cycle. coherency in multiprocessor configurations. The many states of the
In order to improve the efficiency and the effectiveness of func- external interface combined with an abundance of asynchronous
tional verification, we follow the methodology outlined in Fig- events from other devices, makes the external interface a verifica-
ure 1. First, four different sources (verifiers) generate diagnos- tion challenge.
tic programs. Hand-written directed diagnostics are developed by For this purpose we have developed SBVer (Store Buffer Ver-
the members of the DVT team and include architectural (AVP), ifier), a code generator which focuses on exercising the external
microarchitectural (MVP), and implementation (IVP) verification interface and the cache management units of the microprocessor.
programs. These diagnostics set up and check conditions deemed Knowledge about the design of the primary and secondary caches,
interesting by the developer of each test. Second, advanced pseu- of the various address spaces, and of the memory management unit
dorandom code generators produce long instruction sequences have been built into the tool. SBVer, combined with heuristic algo-
which aim at creating complicated interaction patterns among the rithms, produces sequences of instructions which cause interesting
instructions. Such instruction sequences are rarely conceived by interactions between the processor, the caches, and the main mem-
a programmer or generated by a compiler. Third, sophisticated ory. SBVer has also the ability to program external event gener-
tools generate instruction sequences which stress the microproces- ators in the system model so that they interact with the processor
sor model in ways that cannot be achieved by the first two code gen- in a coordinated fashion. For system verification purposes, SBVer
eration approaches. Finally, “real world” software applications are may also produce self-checking code based on an internal mem-
used to ensure that the design implements correctly and efficiently ory model maintained during code generation. Finally, SBVer has
the most common operations. a large number of configuration options in order to provide the user
Random or 0 27 FTFTFTTTFTFTTFFFFTTFTFFTFTTFT Internal 0 1 2 3 Compute
1 10 TTFTTTTFFTTFFFFFFFFFFFFFFFFF
User−designed 2 FTTFFFTFTFTFTFFFFFTTTTFFTTFF Branch
3 13 FFFFFFTTFFFFFFTFFFF 4 5 6 7 Cpu ID
Abstract graph 4 0 FT
5 FTFTFTFTTTTTFFFF Simulation
6 8 FTFFFTTTTFFFTTTTFF
Description 7 TTTFFTFTFTFFFTFTTF
8 28 18 FTFTTFF
9 FTFTTFTTFFFFTFFTTFFTTTFF Branch.s
(code+data to
control flow)
Branch Node
Branch C C C C
Node Filler code UP UP UP UP
Branch setup code 0 1 2 3
Filler code Final
Check
Branch
Branch delay slot Figure 3: False Sharing in MPVer
Filler code
portant issues. First, we need to verify the microprocessor’s correct
Figure 2: BRVer Design operation under stressful conditions, which rarely, if at all, happen
during its operation in a deliverable MP system. Second, we need
to verify its functionality and performance when the multiproces-
with control over the tool’s behavior. SBVer has been successful in sor is running “real world” parallel applications.
finding flaws in four generations of microprocessors, and in vari- 5.1 MPVer: A Multiprocessor Verifier
ous hardware systems. The verification of multiprocessing features is complicated by
4 BRVer: A Branch Verifier the interaction between multiple code streams; the unpredictable
nature of MP arbitration; and the limited number of MP test suites
Many pseudorandom code generators avoid complex branch- available to the verification engineer. In order to address these is-
ing sequences, especially backward jumps, in order to prevent in- sues, we have used an abundance of asynchronous external events
finite loops. On the other hand, the length of the produced pseudo- in a uniprocessor environment, as well as developed an MP code
random programs results in the verification engineers having lim- generator.
ited knowledge of the program flow, and of whether critical sec- In general, MP verification necessitates the testing of cache co-
tions of the program have been executed. Furthermore, new micro- herency protocols and of the correct operation of MP primitives.
processors attempt to predict the direction of branches and execute Generating MP test cases requires the sharing of data between
instructions beyond a branch speculatively. The result of specula- processors combined with locking mechanisms which manage ac-
tive execution is a significant increase in the number of branch re- cesses to shared data structures, and which synchronize concur-
lated cases which need to be examined. In order to address these rently executing instruction streams. Computing the expected re-
issues in a systematic way, we have developed BRVer. Figure 2 sults of MP test programs is challenging and it is not easily accom-
shows the various components of BRVer and how the branches are plished with a traditional reference machine. MPVer successfully
modeled. addresses these issues by generating multiple code streams which
BRVer accepts as input a large number of configuration parame- interact with each other, and yet they are able to verify the produced
ters and an Abstract Graph Description (AGD) which is either pro- results with fine granularity. The runtime flow and relationship be-
vided by the user or it is generated heuristically. The input AGD tween the code streams is shown in Figure 3.
contains the number of nodes (effectively branches) in the graph, A novel approach is used to exploit the important issue of false
how the nodes are connected to one another, and for each branch sharing. Through this approach we are able to achieve high pro-
the action to be performed (fall through or take the branch) upon cessor interaction and provide full coverage of the cache coherency
successive arrivals. BRVer “compiles” the AGD input producing mechanisms without using expensive locking and synchronization
an instruction stream whose run time behavior correctly represents operations, which interfere with the MP program flow and which
the flow described. even limit the number of interesting situations.
BRVer also accepts user provided input streams as filler code in True data sharing is supported and tested through the use of
between branches. This proves to be a convenient way to apply the locks. However, because intermediate values are unpredictable, re-
branch management mechanisms to code produced by other tools sults are checked after all MP operations are guaranteed to have
such as SBVer and Theo. finished. For the verification of a microprocessor in a distributed
5 Multiprocessor Verification shared memory system, we have parameterized MPVer with the
frequency with which each CPU is to access the different mem-
Over the last few years, most manufacturers develop multipro- ory segments. Such a parameterization is important because we
cessor ready microprocessors [7, 8]. As a result, it is essential that are able to program different traffic patterns, to stress routing al-
the DVT team verifies the microprocessor’s mechanisms facilitat- gorithms, and to observe MP system stability.
ing the sharing of information across the processors of a multipro- MPVer produces portable code which can run on either a sim-
cessor (MP) machine. Such a verification process entails two im- ulation model or a true MP system. In both environments, MPVer
Cache Primary Cache Secondary
has been very successful in finding MP related microprocessor and User Templates system hardware flaws. Branch Parsed Instruction 5.2 MPApplicationVerifier Manager Class Tree MPApplicationVerifier (MPAV) is an environment for the de- Address Register velopment and execution of “real world” parallel applications as Manager ENGINE Allocation diagnostics in the MP verification of a microprocessor. The en- Manager vironment supports thread-based parallel execution, and it can be considered as a user-level, bare-minimum operating system [4]. Event The user of the environment writes a single C program, aug- Manager Data mented with directives which support its parallel execution. The Operand C program is compiled into two executables which facilitate three THEO.s Manager execution modes. In the first mode, the user executes the applica- tion natively on a workstation or on an MP system. In that way the user is able to debug the application code, and improve its per- Figure 4: Theo Architecture formance and efficiency. In the other two modes of execution, the parallel program is simulated by an architectural simulator and by the microprocessor’s RTL model. The purpose of these two exe- The input to Theo is a collection of templates written in a super- cution modes is to test the hardware under construction both at the set of the assembly language, which permits instruction specifica- microprocessor level and at the system level. These modes of ex- tion at any level of detail, and, at the same time, allows the use of ecution allow us not only to isolate implementation flaws, but also symbolic notation for operands. These templates define sequences to pinpoint performance problems. of instructions representing “constraints.” Theo allows the users to So far we have ported onto this environment several “real focus on developing sequences for their own area of interest, while world” parallel applications including the SPLASH-2 benchmarks Theo’s engine searches for their “optimal” placement which sat- [9]. Other parallel applications including chaotic algorithms and isfies the specified constraints. A typical hand-written diagnostic branch-and-bound algorithms are currently being ported. Incorpo- only stresses a particular unit, while other sections of the micropro- rating a new application into the MPAV environment is simple. The cessor remain idle. Theo, on the other hand, attempts to combine user only needs to write three “interface functions.” Two of these templates so that all units of the microprocessor are active simul- functions perform the initializations of the data structures of the taneously. parallel program, whereas the third function provides the environ- Theo uses a constraint solving engine to produce Intermediate ment with the “starting points” of the parallel program’s execution. Code Representation (ICR) through repetitive application of tem- In addition, we can easily incorporate sequential applications into plate instances. Subsequently, it performs instruction assignment, the MPAV environment, such as the diagnostics programs created global resource allocation, and condition setup to produce an as- by other code generation tools. sembly program ready for simulation [2]. A powerful, yet flexible, X-based user interface makes MPAV Templates only use symbolic names for registers. The actual an easy to use MP code generation and execution environment. register assignment is performed by Theo during one of the last The user selects the applications to be included in a particular ex- phases in the code generation process. The use of symbolic in- ecution, sets the corresponding input parameters for each included struction class names, register names, and operands in templates application, and then compiles and executes the resulting suite. is encouraged, since this allows Theo to select the actual assembly MPAV’s user interface makes the construction and execution of MP instructions and operands using sophisticated heuristic algorithms. test programs a simple exercise for the user. At the same time, such a notation permits the verification engineer 6 Theo: A Sophisticated Code Generator to express the conditions of interest in the most generic way. Code generation starts with an uninstantiated ICR. Each ele- State-of-the-art microprocessors employ several advanced ment in this ICR is a place holder for an instruction which initially techniques in order to improve their performance. At any given has no particular attribute or property. Subsequently, Theo selects time several partially executed instructions are active (i.e. at some one of the user provided templates and applies it to the ICR; that is, stage of their execution) in the processor. Instructions move the template instruction sequence, its properties, and its constraints between different units as resources become available. In order are transferred into the ICR. Theo’s template placement algorithm to reduce interruptions in the execution pipeline, which result avoids placing templates one after the other. Rather, it strives to in lost performance, computed results are bypassed to previous achieve overlap between templates while maintaining the require- pipeline stages, and state is committed to registers or to memory ments of each template. This is accomplished by checking for sub- many cycles after the instruction was issued. Historically, most set properties, by constraint solving, and by temporary unification design flaws have been attributed to the implementation of these in order to verify that an overlap can occur. If all resource require- complex features. The design flaws typically exhibit themselves ments are met, then the unification becomes permanent. Succes- when sequences of dependent instructions activate a combination sive application of the input templates to the ICR results in the fur- of conditions within the design. ther refinement and growth of the code. Theo is based on the idea that if we focus on instruction se- Template placement stops when the code size requirement is quences to which a particular implementation may be sensitive, met. Theo goes through the ICR assigning actual instructions for then we can reduce the number of test cases examined, as well as any instruction class references that may exist. Then, the engine improve the quality of the verification code generated. The overall consults the register allocation manager, the address manager, the architecture of Theo is shown in Figure 4. branch manager, the operand manager, and the external event man-
ager in order to allocate resources and insert condition setups. Fi-
nally, the ICR is translated into assembly code.
Though this technique for code generation is complex, it has the
unique property that it can create new test sequences from previ-
ously independent blocks which now interact with each other. By
overlapping templates, we are also able to activate multiple units of
the microprocessor while still maintaining the sequence and condi-
tions represented by each template. The various managers utilized
by Theo encapsulate heuristic and formal algorithms which may be
applied across the entire code stream and which can be tuned with
user biasing.
7 Diagnostic Programs Evaluation
7.1 Code Analysis and Diagnostics Retrieval
In their effort to cover as many interesting cases of the given ar-
chitecture as possible, the code generators presented so far tend to
create a large number of lengthy diagnostic programs. This abun- Simulation(s)
dance of test programs forces us to seek a systematic and automated
way of analyzing the run time behavior of these diagnostics, and Diagnostic
post processing this information into concise and meaningful met-
rics.
Several reasons warrant such an evaluation. First, the code gen-
eration tools could use the information from the analysis tool as a
feedback in order to improve their effectiveness. Given the pseu-
dorandom nature of the code generation tools, such an analysis
has been proven extremely useful in creating diagnostic programs Figure 5: Code Analysis Methodology
which cover in depth specific sets of interesting cases.
Second, even though the tools can generate a large number of
diagnostics relatively fast, only a limited number of them can ac- about the interesting cases covered during the particular simula-
tually be simulated daily on the RTL model, because this model tion. Examples of interesting cases include cache hits and misses,
is complex and, thus, expensive to run. Code analysis is valuable types of exceptions, and queue sizes. This information is later
when trying to decide the subset of the created diagnostics that stored in the DDB.
should be simulated on the RTL model. In order to probe into the trace files systematically and extract
Third, as the design evolves the number of accumulated diag- interesting information quickly, we have developed the Profiler li-
nostics continuously increases, and the selection of the diagnostics brary which is used as an interface between the analysis code and
that cover a specific case hardens. One way to address this issue the trace files. It provides the user with a mechanism for “stepping”
is to build a diagnostic database (DDB) containing all the test pro- through the simulation cycles recorded in a trace file, including go-
grams, along with some information characterizing their run-time ing forward and backwards in simulation time. At any given “step”
behavior. This information can later be used to retrieve a set of di- (simulation cycle) the user can retrieve the value of any one of the
agnostics with particular characteristics from the DDB. variables which constitute the machine state.
In the following two sections we describe ther two major parts The library approach was chosen mainly because of the flexi-
of the evaluation process: the code analysis, which for each diag- bility it provides. Due to its object-oriented design, the interface
nostic deduces a set of attribute values, and the systematic storage remains the same irrespectively of the type or format of the trace
and retrieval of this information into and from the database. The file being processed. This interface allows the user to write C++
entire process is outlined in Figure 5. programs that are guaranteed to work in current and future simula-
7.2 Code Analysis - The Profiler tion environments.
Each generated diagnostic program is currently executed on In addition to diagnostic evaluation, the Profiler library has also
two simulators. The first one is an architectural simulator which been used in a number of other tasks. We have used it to check
is used as a reference machine. This simulator is fast and inexpen- transition coverage in the RTL model; to compare traces from dif-
sive to use. The second one is the RTL simulator, representing the ferent models; and to verify that certain (illegal) conditions never
particular microprocessor implementation. This simulator is much arise during the simulation of the model.
slower than the architectural one, and much more expensive to use. 7.3 Storage and Retrieval of the Results - The Di-
Whenever a diagnostic is run on any of the two simulators, a agnostic Database
trace file containing information about each execution cycle of the Every time a diagnostic program is simulated, a Profiler-based
diagnostic is created. The current model “passes” the specific di- analysis code is executed on the trace file which represents the par-
agnostic when the RTL and the architectural traces match under the ticular simulation. The results of this analysis are typically ex-
architectural comparator (refdif in Figure 1). pressed as a set of values for a prespecified, common for all diag-
In order to analyze the execution of a diagnostic, we post- nostics, set of attributes. Example of attributes generated during
process the trace file created during the execution of the code on the analysis include the number of instructions executed, the num-
either of the simulators. By doing so, we can deduce information ber of cache hits and misses, and the lengths of various queues in
Attribute names Author = Jones Attribute
(same for all diags) ICount = 709 values Immediate = 457 Analysis Arithmetic = 44 (Profiler application) Exception = 4 CacheError = 1 Diag ExtInt = 3 Data Feedback to tool .......... Attributes Base (Profiler application)
Code Other Profiler Generation Trace Applications Tool File(s) −State coverage
Ans −Comparison of traces
(User provides a query)
the microprocessor. References We developed a tool called Ans which compresses all this in- [1] M. Bass, T.W. Blanchard, D.D. Josephson, D. Weir, and D.L. formation into a highly efficient, object-based diagnostic database Halperin. Design Methodologies for the PA 7100LC Micro- (ODDB). Ans both modifies the database and queries it to retrieve processor. Hewlett-Packard Journal, 46(2):23–35, April 1995. a set of objects (diagnostics) that satisfy a given set of criteria. Ans [2] A. Chandra et al. AVPGEN –A Test Generator for Architecture is a general tool, designed to handle any object-based collection of Verification. IEEE Transactions on Very Large Scale Integra- data, in a highly sophisticated and user friendly way. tion (VLSI) Systems, 3(2):188–200, June 1995. When retrieving objects from a database, Ans uses an input set of criteria to select and return a set of objects which satisfy the [3] B. Turumella et al. Design Verification of a Super-Scalar RISC given criteria. These criteria are usually expressed in some form Processor. In Twentyfifth International Symposium on Fault of equalities or inequalities on the attribute values of the objects Tolerant Computing, pages 472–477, June 1995. stored in the database. For example, the user can ask for all the [4] I. Foster. Designing and Building Parallel Programs. Addison diagnostics that contain less than 2000 instructions (i.e. attribute Wesley, 1995. ’ICount’ is less than 2000), and take at least 10 floating point ex- [5] J.L. Hennessy and D.A. Patterson. Computer Architecture: A ceptions (i.e. attribute ’FP Exception’ is greater than or equal to Quantitative Approach. Morgan Kaufmann Publishers Inc., 10). The following attribute form describes these two constraints: 1990. ((IC ount <= 2000)&&(F P Exception >= 10)). Given this form, Ans would select a subset of the diagnostics currently [6] M. Kantrowitz and L.M. Noack. Functional Verification of a stored in the given DDB whose attribute values satisfy the given Multi-issue, Pipelined, Superscalar Alpha-Processor – the Al- constraints and would return these diagnostics to the user. pha 21164 CPU Chip. Digital Technical Journal, 7(1):136– 8 Summary 144, August 1995. In this paper we have presented a collection of advanced code [7] D. Marr, S. Thakkar, and R. Zucker. Multiprocessor Validation generation tools employed in the simulation-based verification of of the Pentium Pro Microprocessor. In Proceedings of COM- high-performance microprocessor designs. Each of the presented PCON ’96, pages 395–400, January 1996. tools addresses a unit of the microprocessor which historically has [8] B. O’Krafka, S. Mandyam, J. Kreulen, R. Raghavan, A. Saha, been a significant source of hard to find flaws. We presented SBVer, and N. Malik. MTPG: A Portable Test Generator for Cache- a code generator which focuses on exercising the external inter- Coherent Multiprocessors. In Fourteenth Annual Phoenix face and cache management units of the microprocessor. Then we Conference on Computers and Communications, pages 38–44, described BRVer which targets the branch mechanisms of the de- March 1995. sign; these mechanisms become increasingly more complicated as [9] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The designers attempt to improve the performance of the chip through SPLASH-2 Programs: Characterization and Methodological speculative execution. We addressed MP verification by present- Considerations. In Proceedings of the 22nd ISCA, pages 24– ing two tools with complementary roles. MPVer targets the shar- 36, June 1995. ing of information across the processor of an MP system as well as the communication between processors. MPAV, on the other hand, provides an environment for the development and execution of “real world” parallel applications on our simulators. Finally, Theo provides a state-of-the-art environment for the generation of diagnostics based on user provided templates, constraint solving systems, and knowledge of the microprocessor design. We have also presented the Profiler and Diagnostic Database which comprise a set of tools for the analysis of diagnostics and their efficient storage and retrieval. These tools provide us with ef- ficient ways to evaluate the code produced by the generators and to propagate this information back to the tools so that we can improve their effectiveness. We are currently working on expanding our tool-set with highly specialized code generators as well as powerful generic ones. Fur- thermore, we are extending our sophisticated heuristic algorithms to cover areas that have not yet been addressed. Finally, we incor- porate all our verification tools in an integrated environment which supports the easy and efficient production of high quality diagnos- tic programs. Design verification is an important part of the development of a microprocessor. As time-to-market decreases and the complex- ity of the high-performance microprocessors increases, design ver- ification becomes the bottleneck of the development cycle. Good verification tools become vital to the success of any microproces- sor design, and their significance will continue to increase as we move to even higher performance microprocessors.