Notes
Outline
Pipelining a RISC Architecture
Stepwise Verification of DLX
Egon Börger
Dipartimento di Informatica, Universita di Pisa
http://www.di.unipi.it/~boerger
Slide 2
The Problem
Prove the pipelined version of a sequential architecture to be semantically correct
i.e. show for every program that its execution  under the pipelining discipline is equivalent to its step-by-step (so-called sequential) execution
Tasks:
model the two architectures
define the equivalence notion
state the assumptions on the environment (compiler, hardware, etc.)
prove the correctness
The Case Study
RISC microprocessor architecture DLX (Hennessy/Patterson 1990,1996)
typical instruction set architecture comprising instructions for arithmetical and set operations, jumps, interrupts, memory access
Pipelining techniques resolving structural, data and control hazards:
instruction scheduling, forwarding, new hardware links with additional control logic, stalling
Design and verification technique:
A model for datapath and sequential control of DLX is stepwise refined to its pipelined version which is proven to be correct
using incremental design and modular proof techniques (based upon locality principle)
Modeling and Verification Steps
Ground model DLXseq for sequential DLX providing one-instruction-at-a-time RISC processor view
correct wrt control graph definition by Hennessy/Patterson 1996
Three application domain driven refinement steps (concentrated on control aspects)
parallelization to DLXpar resolving structural hazards
resolving data hazards in DLXdata
resolving control hazards in DLXpipe
Each refined model proven to be correct wrt the preceding model
using inductions and case distinctions which correspond to the pipelining conflict types and standard methods to solve them
Main Theorem
For each DLX program P, the result of its sequential (one-instruction-at-a-time) execution on DLXseq is the same as the result of its pipelined (up to five instructions at-a-time) execution on DLXpipe .
Main contribution of the proof:
from segments of parallel DLXpipe runs, which perform at each step multiple operations for multiple instructions, extracting for each analyzed instruction an equivalent sequential DLXseq subcomputation, which together form an equivalent DLXseq run
defining the appropriate notions of
data dependency and control dependency of instructions
instruction cycle, result locations, relevant locations (per instruction, supporting instructionwise proofs)
Ground Model DLXseq
Each instruction is executed by 5 successive steps
IF: fetching instr from memory to datapath register IR
ID: decoding instr, including extracting its operands from register file registers fstop,scdop into ALU input ports A,B
EX: execution proper,  computing via ALU
ALU operations: value of ALU output port C for given input A,B
data memory or branch address:
update memory address register MAR using A
update memory data register MDR with B
update jump address PC
update by A or transfer  to C interrupt address register IAR
MEM: loading from or storing to mem at MAR via MDR
WB: writing back C to the destination register
Phases reflected in ground model by control states
Uniformity of ground model to architectural features
size of register file using abstract content fct reg
width of datapath using abstract registers PC,IR,A,B,C, MAR,MDR,IAR
 instruction set (in immediate or not immediate form)
Alu È Set
Jump È Branch                        Denote JumpLink = Jump - {TRAP}
Jump consists of absolute jump instructions of three types
TRAP: system jump (save PC in IAR  and update PC)
Link: a set of jump links (save PC in C, to be stored in reg file). The destination register of LINK instructions is assumed to be fixed (in Hennessy/Patterson R31).
a set of other (so-called plain) jumps
Mem = Load È Store
Interrupt = { MOVS2I, MOVI2S } manipulating IAR
instruction format: using abstract parameter functions opcode, fstop, scdop, dest, iop, ival (arg IR suppressed)
memory access (bandwith) using abstract function mem
Slide 9
Ground Model DLXseq Macros
IF: FETCH = IR ¬ mem(PC), PC ¬ next(PC)
ID: OPERAND = A ¬ fstop,     B ¬ scdop
JumpBranch = opcode Î Jump È Branch
AluSet = opcode Î Alu È Set
LoadStore = opcode Î Load È Store
IARS2I/IARI2S = opcode Î { MOVS2I, MOVI2S }
EX: ALU = if iop(opcode) then C ¬opcode(A,ival)
                                             else C ¬opcode(A,B)
MEMADDR = MAR ¬ A + ival
BRANCH = if reg(A) = 0 then PC ¬ PC + ival
TRAP = IAR ¬ PC,    PC  ¬ ival
JUMPLINK = if iop(opcode) then PC ¬ PC + ival
                                                  else PC ¬ A
MEM: LOAD = MDR ¬ mem(MAR)
STORE = mem(MAR) ¬ MDR
SUBWORD = C ¬opcode(MDR)
WB:   WRITEBACK = dest ¬ C
Slide 11
Slide 12
Slide 13
Slide 14
DLXpar  Correctness Theorem
Theorem: For each DLX program P, let Ppar be its transformation obtained by inserting two empty instructions after each occurrence of a jump or branch instruction ( “instruction scheduling” ). Let C be the DLXseq  computation with input P and Cpar the corresponding DLXpar computation with input Ppar. Then C and Cpar have the same result if Cpar  is data-hazard-free.
Tasks to be accomplished:
define “correspondence” of computation
define “ computation result” and “sameness” for them
define “data-hazard” and data-hazard-freeness
prove the statement
justify the assumption on transforming P to Ppar
Instruction scheduling serves here to separate semantics preserving parallelization from hazard prevention
Defining corresponding instruction cycles
instruction cycle for instr
in DLXseq any subcomputation starting with control state FETCH with loading instr from the code memory until the next occurrence of control state FETCH
in DLXseq any subcomputation starting with instr and ending with the first pipe stage of instr at the end of which the values of all result locations of instr are computed and recorded in the appropriate location as defined below
this pipe stage is denoted EX(instr), MEM(instr), WB(instr)  depending on instr (see table below)
corresponding instruction cycles in C and Cpar
ever pair (Ii,Ii’) of instruction cycles I0 , I1, … in C and
I0’ , I1’ ,…in Cpar  where I0 , I’0 are corresponding initial states (with corresponding initialization)
Defining result locations and result of an instr
In DLX every instr has only one result proper (different from <reg,PC>). It is written at the end of the execution of instr.
Defining used locations
Defining data hazards
conflicts can arise in two cases:
data hazard: I’ uses as one of its operands the content of the destination register of a preceding I in the pipe
control hazard: I’ enters the pipe shortly after a jump or branch instruction
I <1,2,3 I’ iff instr cycle for I’ starts 1, 2, or 3 steps after an instr cycle for I
I’ data dependent on I iff I <1,2,3 I’ and one of the following two cases holds:
dest(I) Î {fstop(I’),scdop(I’)} and I’ÏJumpÈBranch
dest(I) = fstop(I’)  and I’ Î JumpÈBranch
DLXpar computation data-hazard-free if it contains no occurrence of an I’ which is in the pipe together with an occurrence of an I on which it is data dependent
Instruction Scheduling
Goal: separate analysis of control hazards from correctness proof for parallelization:
Problem: when a jump or branch instr is fetched, the two immediately following instruction cycles generate results which could spoil the continuation of the computation after the jump
Solution: assume for the correctness proof that the compiler produces Ppar from P placing two empty instructions after each jump or branch instruction.
We define empty instructions NOT to start an instruction cycle.
We assume these new instructions to be linked by the extension of the next function of P in the obvious way, as formalized by the PC- preservation rule PC1  ¬ next(next(PC)).
Assume the compiler to guarantee hazard-freeness
Both assumptions will be discharged in separate refinement steps
Proof of DLXpar  Correctness Theorem
To establish: C and Cpar have the same result (assuming Cpar  is data-hazard-free)
Proof by in induction on the given runs, decomposing computations in instruction cycles so that the proof can be done instructionwise with ‘local’ arguments. This needs a stronger induction hypothesis:
DLXpar Lemma. Let ICn,ICparn be the n-th instruction cycle in C, Cpar. The following holds:
a) Completeness: If Cpar is data-hazard-free, then ICn,ICparn are instruction cycles for the same instr and start with the same values for the “relevant locations” used by instr
b) Correctness: Any instr cycle pairs as in a) for any instr which is not data-dependent on any instruction in the pipe compute the same result
The lemma implies the theorem.
Defining relevant locations for DLXpar
A location l used by instr is a relevant location except in the following two cases:
Irrelev 1.  I = <reg,IAR> and instr = MOVS2I enters the pipe 1, 2 or 3 stages after an occurrence of MOVI2S or of  TRAP
NB. No conflict can arise from using <reg,IAR> because MOVS2I. the only instruction which uses IAR, can never be in conflict with any preceding instruction. If I writes into IAR, then I=TRAP or I=MOVI2S and I writes into IAR in its third pipe stage. Therefore if I <1,2,3 I’ , then I has already written into IAR when I’ uses it.
Irrelev 2. I = <mem,arg> and instr is a Load instruction which enters the pipe 1, 2, or 3 steps after an occurrence of a Store instruction for the same arg (see justification below)
Rational for Irrelev 2 for DLXpar
Irrelev 2. I = <mem,arg> and instr is a Load instruction which enters the pipe 1, 2, or 3 steps after an occurrence of a Store instruction for the same arg
No conflict can arise from using a mem location because load instructions---the only ones which use memory locations---can never be in conflict with preceding store instructions---the only ones which write into mem locations.
If I ÎStore and I' ÎLoad, then I updates its result location < mem, reg (fstop(I)) + ival (I)> in its fourth pipe stage and I' reads the value of the location < mem, reg (fstop (I)) + ival (I)> in its fourth pipe stage too. Therefore if I <1,2,3 I’ and I' loads the value of the result location of I as updated by I, then I has already updated this result location when I' loads from there.
NB. DLX does not support self-modifying code
Proof of DLXpar  Lemma (a)
ad completeness: ICn,ICparn are instruction cycles for the same instr and start with the same values for the “relevant locations” used by instr
n=0: by correspondence of runs and initialization
n+1: by inductive hypothesis, for each i £ n, ICpar’ starts with the same values for the relevant locations as does ICi and they both compute the same result. Therefore ICn+1,ICparn+1 are instruction cycles for the same instruction, say instr, and start with the same values for the relevant locations to be used by that instruction.
Proof of DLXpar  Lemma (b)
ad correctness: any instr cycle pair ICn,ICparn as in a) where instr is not data-dependent on any instruction in the pipe compute the same result
Due to the absence of stalls, the n+1-th instruction cycle in Cpar starts
after the first step of ICparn in case instrn is neither a branch instruction with true branching condition nor a jump
otherwise the n+1-th instruction cycle in Cpar  starts after the third step of ICparn  due to the following Jump Lemma
Jump Lemma. If a jump or branch instruction I is fetched in a DLXpar-computation, then the following two fetched instructions are empty and at stage ID(I) the register PC1 is updated by the correct value to be used for the computation of the possible new PC-value in stage EX(I).
By a case distinction following the instruction type it then follows from the definition of the rule macros that corresponding updates in C and Cpar provide the same values for corresponding result locations.
Proof of Jump Lemma
Jump Lemma. If a jump or branch instruction I is fetched in a DLXpar-computation, then the following two fetched instructions are empty and at stage ID(I) the register PC1 is updated by the correct value to be used for the computation of the possible new PC-value in stage EX(I).
Proof: follows by an induction on the number of jump instructions which are fetched during a run, using the transformation of P to Ppar.
Resolving data hazards for linear code execution
Goal: refine DLXpar  to DLXdata where data hazards are resolved for non-jump/branch instructions so that the compiler assumption of hazard-freeness can be restricted to jump/branch instructions.
I.e. we have to define DLXdata and to prove the following:
Theorem. Let C, Cdata be the computations of DLXseq  , DLXdata started with program P, Ppar . Assume that in Cdata no occurrence of a jump or branch instruction is in the pipe together with an occurrence of an instruction on which it is data dependent. Then C and Cdata compute the same result.
Data hazards for linear code execution
Data dependence condition DataDep(I,I’):
I’ Ï Jump ÈBranch and
I <1,2,3 I’ and dest(I) Î {fstop(I’), scdop(I’) }
NB. no write-after-write hazard can occur in DLX
writing only in stage WB, but together with any stalled instruction every later instruction in the pipe is also stalled
NB. no write-after-read hazard can occur in DLX
read stage ID precedes the write stage WB
Hazard Resolution Method: standard techniques for hazard resolution incorporated into DLXpar  macros
forwarding via
new dataflow connections (hardware links)
additional decoder logic (multiplexing)
stalling
Conservative refinement of DLXpar macros
to resolve data hazards for linear code execution
Idea:
refine macros only for case DataDep(I,I’):
if DataDep(I,I’), then the DLXdata macro refinement guarantees that correct arguments are taken to compute the result of instr, namely during EX(instr)-phase
 otherwise the DLXpar macro is executed
weakening the compiler hazard-freeness assumption
by refining the notion of relevant locations, taking out the hazardous locations since they are taken care of by the refined architecture
NB. This idea realizes an abstract form of the typical implementation using a scoreboard to keep track of data dependencies
Proof of DLXdata  Correctness Theorem
Proof follows from the following lemma:
DLXdata Lemma. Let ICn,ICparn,ICndata be the n-th instruction cycle in C, Cpar, Cdata. Then:
Completeness: If Cdata is free of data hazards for jump/branch instructions, then ICn,ICparn,ICndata are instruction cycles for the same instruction I’ and start with the same values for the relevant locations used by I’.
For any I’-cycles IC, ICpar,ICdata as in a):
Conservativity: if I’ is not data-dependent on any I in the pipe, then ICpar,ICdata compute the same result
Refinement: if I’ Ï Jump ÈBranch is data-dependent on an I <1,2,3 I’ in the pipe, then IC,ICdata compute the same result
Proof of DLXdata  Lemma (1)
By induction on number n of instr cycles:
ad completeness: ICn,ICndata are cycles for the same instr and start with the same values of relevant locs
n=0: by correspondence of computations
n+1: follows from Jump Lemma as for DLXpar
ad conservativity: to show: if I’ is not data-dependent on any I in the pipe, then ICpar,ICdata compute the same result
in case of no data dependence, in every pipe stage rule applied to I’, the DLXpar-macro is executed.  Since by assumption ICpar and ICdata start with the same values for the relevant locations used by I', the effect of the rule applications to I' is the same in ICpar and ICdata, including the result locations of I‘. Then by the DLXpar lemma the result loc values of I’ are the same in IC and ICdata
Proof of DLXdata  Lemma (2)
ad Refinement: Let I’ÏJump È Branch be data-dependent on an I <1,2,3 I’ in the pipe. To show: IC,ICdata compute the same I’-result.
By definition of dest and assumption on I’:
I Î AluÈSet ÈLoad È Link È{ MOVS2I}
I’ Î AluÈSet ÈLoad ÈStore È{ MOVI2S}        since  I’ Ï {MOVS2I} È Jump ÈBranch
Proof of DLXdata  Lemma: Case I Ï Mem
Then
I Î AluÈSet ÈLink È{ MOVS2I}, dest(I) is updated to the value used by I’ in WB(I), namely to the value of C1, which is copied in MEM(I) from C as
computed in EX(I) by an ALU È SET operation
or as content of PC1 (Link) or IAR (MOVS2I)
Case I <2,3 I’: ID(I‘) overlaps with WB(I) or MEM(I) with the expected operand value in C1 or C
Case I <1 I’: expected operand value computed during ID(I‘) and available not before EX(I‘)
DLXdata  Lemma: Case I Ï Mem , I <2,3 I’
Case I <2,3 I’: Refinement of OPERAND
Since ID(I‘) overlaps with WB(I) or MEM(I) with the expected operand value in C1 or C, nthReg=A,B (nth=fst,scd) can be updated with C1 or C (called below C’)
if nthop(IR) Î{dest(IR3),dest(IR2)}       refinement case
    then nthReg ¬ C’
    else nthReg ¬ nthop(IR)                     conservative case
where C’ = if nthop(IR) = dest(IR3) ¹ dest(IR2) then C1
                  if nthop(IR) = dest(IR2) then C  last update counts
Irrelev 3. I = <reg,nthop(I’)> where I’ÏJump È Branch for some I <2,3 I’ such that I Ï Mem, nthop(I’) =dest(I)
Hw Cost: direct link bw register file exits A,B and C,C1
DLXdata  Lemma: Case I Ï Mem , I <1 I’
Case I <1 I’: Refining EX-macros by forwarding
expected operand value computed during ID(I‘) is available at the end of EX(I) in C, to be forwarded directly to ALU for EX(I’), shortcutting transfer from C to register file to A,B
Refine EX-rules for I’ Î Alu È Set ÈLoad ÈStore È{ MOVI2S}
Method: adding to macros a clause which in the data hazard case provides the argument C instead of A,B
Hw Cost: direct link between C and
both ALU ports for I’ Î Alu È Set
IAR for I' = MOVI2S
MAR and SMDR for I’ Î MEM
    together with additional control logic (multiplexers) to select as ALU input the forwarded value rather than the value from the register file
Irrelev 4. <reg,nthop(I’)> for possible combinations for I’
DLXdata  Lemma: Case I Ï Mem , I <1 I’ : Irrelev locs
Irrelev 4. <reg,nthop(I’)> such that for some     I <1 I’ with I Ï Mem one of the following holds:
opcode (I') Î Alu È Set, iop (opcode (I')),          dest (I) = fstop (I'), nth = fst
opcode (I') Î Alu È Set, not iop (opcode (I')),     dest (I) = nthop (I')
opcode (I') Î Mem È {MOVI2S}, nth = fst,        dest (I) = fstop (I')
opcode (I') Î Store, nth = scd, dest (I) = scdop (I')
DLXdata  Lemma: Case I Ï Mem , I <1 I’ : refined rules
If opcode(IR1)ÎLoad È Store then                    MEMADDR
       if fstop(IR1) = dest(IR2)                        refinement case
          then MAR ¬ valfst + ival(IR1)
          else MAR ¬ A + ival(IR1)                 conservative case
where valnth = C                 if nothop (IR1) = dest(IR2)
                    = nthReg        otherwise with fstReg=A,scdReg=B
If opcode(IR1)ÎStore then                                 PassBtoMDR
        if scdop(IR1) = dest(IR2)                           refinement case
          then SMDR ¬ valscd
          else SMDR ¬ B                                     conservative case
Analogously for ALU and MOVI2S rules
Proof of DLXdata  Lemma: Case I Î Mem and I’ Ï Mem
Then IÎLoad, I’Î AluÈSet ÈLinkÈ{ MOVI2S}
expected value, loaded by I, available in LMDR at the end of MEM(I)
Case I <2,3 I’: expected operand value available in EX or ID stage in LMDR
refining ID-rule OPERAND and EX-rules ALU,MOVI2S
Case I <1 I’: stall pipeline 1 clock cycle, at the latest before EX(I’), to let only MEM and WB stages progress
NB. During a pipeline stall a just loaded value in LMDR may be written back; since in the next step it may serve as operand of a stalled instruction, we preserve LMDR
DLXdata  Lemma: Case I Î Mem, I’ Ï Mem, I <2,3 I’: refined C’
Case I <2,3 I’: refining
C’ in OPERAND by LMDR and
valnnth in ALU, MOVI2S by LMDR1
using new  WB-stage preservation rule LMDR1 ¬ LMDR
C’ =
     LMDR last update in ante-ante-preceding load instr
          if nthop(IR) = dest(IR3) ¹ dest(IR2), opcode(IR3) Î LOAD
                                             where LMDR = opcode(IR3) (LMDR)
C1                                                                      previous case
    if nthop(IR) =dest(IR3) ¹ dest(IR2), opcode(IR3) Ï LOAD
C if nthop(IR) = dest(IR2)                                  previous case
DLXdata Lemma:I Î Mem, I’ Ï Mem, I <2,3 I’: refine valnth
Case I <2,3 I’: refining
valnnth in ALU, MOVI2S by LMDR1
new  WB-stage preservation rules LMDR1 ¬ LMDR, IR4 ¬ IR3 in case an LMDR-value, written back during a pipeline stall, in the next step serves as operand of an instruction stalled after ID
 valnnth =
     C           if nthop(IR) = dest(IR2)      previous case
     LMDR    if nthop(IR1) = dest(IR3) ¹ dest(IR2),                                 opcode(IR3) Î LOAD, opcode(IR4) Ï LOAD
                     case of no two successive load instructions
     LMDR1  if nthop(IR1) = dest(IR4) ¹ dest(IR2), dest(IR3),                 opcode(IR3), opcode(IR4) Î LOAD
   nthReg otherwise                             DLXpar case
DLXdata Lemma:I Î Mem, I’ Ï Mem, I <2,3 I’: refined rules
If opcode(IR1)Î Alu È Set then                              ALU macro
       if iop(opcode) then
             if fstop(IR1) = dest(IR2) or for some m Î {3, 4}
                fstop(IR1) = dest(IRm)  and opcode(IRm) Î LOAD
       then C ¬ opcode(IR1) (valfst , ival(IR1))
             else C ¬ opcode(IR1) (A, ival(IR1))
       else if dest(IR2) Î {fstop(IR1), scdop(IR1) } or
                  for some m Î {3, 4} opcode(IRm) Î LOAD and
                                dest(IRm) Î {fstop(IR1), scdop(IR1)}
               then C ¬ opcode(IR1) (valfst , valscd )
               else C ¬ opcode(IR1) (A, B)
Analogously for MOVI2S, MEMADDR, PassBtoMDR
DLXdata Lemma:I Î Mem, I’ Ï Mem, I <2,3 I’: Irrelev locs
Irrelev 5. <reg,nthop(I’)> with I’ÏMemÈBranchÈJump such that for some IÎLoad  with dest (I) = nth op (I'), one of the following holds:
I <3 I’
I <2 I’ , iop (opcode (I')), nth = fst
I <2 I’ , not iop (opcode (I‘))
DLXdata  Lemma: Case I Î Mem, I’ Ï Mem, I <1 I’
Case I <1 I’: stop EX(I’)  together with IF,ID of later instructions  for 1 clock cycle, letting only MEM and WB stages progress, to make value loaded by I available to I’
loadRisk = opcode (IR2) Î LOAD and
                      reg (IR1) Ï Mem È Jump È Branch and
                      dest (IR2) Î {fstop (IR1) , scdop (IR1)}
adding guard not loadRisk to EX,ID,IF stage rules
adding IF stage rule if loadRisk then IR2 ¬ undef
whose execution makes loadRisk false so that the full pipelining will be resumed and this case can be handled as the previous one and is resolved by the refined EX-rules
Irrelev 6. <reg,nthop(I’)> for I’ÏMemÈBranchÈJump and some IÎLoad:  I <1 I’, dest (I) = nth op (I')
Proof of DLXdata  Lemma: Case I,I’ Î Mem
Case I <2,3 I’: expected operand value available in EX or ID stage
forwarding solution by ID-rule OPERAND and refined EX-rules ALU,MOVI2S
hw price: direct links  bw LMDR and MAR, SMDR, new LMDR1
Case I <1 I’:
Subcase 1. I’ uses the loaded val as datum to be stored: dest(I)=scdop(I’)
forwarding solution
Subcase 2. I’ uses the loaded val as address for Mem opn: dest(I)=fstop(I’)
stalling-pipeline solution
DLXdata  Lemma: Case I,I’ Î Mem, I <2,3 I’
Subcase I <3 I’: resolved by ID-rule OPERAND
Subcase I <2 I’: resolved by refining MEMADDR and PassBtoMDR as shown above for ALU
Irrelev 7.
<reg,nthop(I’)> for I’ Î Mem such that for some IÎLoad: I <3 I’, dest (I) = nthop (I')
<reg,fstop(I’)> for I’ Î Mem such that for some IÎLoad: I <2 I’, dest (I) = fstop (I')
<reg,scdop(I’)> for I’ Î Store such that for some IÎLoad: I <2 I’, dest (I) = scdop (I')
DLXdata  Lemma: Case I,I’ Î Mem, I <1 I’
Subcase 1: dest(I)=scdop(I’), i.e.I’ uses the loaded val as datum to be stored
Then I’ Î Store, val loaded by I needed in MEM(I’) where available in LMDR
forwarding solution by refining STORE
hw price: link bw LMDR and memory input port
Irrelev 8. <reg,scdop(I’)> for I’ Î Store such that for some IÎLoad: I <1 I’, dest (I) = scdop (I')
Subcase 2: dest(I)=fstop(I’), i.e. I’ uses the loaded val as address for Mem opn
stalling pipeline: refine loadRisk by clause
        or dest(IR2) = fstop(IR1), IR1 Î Mem
Irrelev 9. <reg,fstop(I’)> for I’ Î Mem such that for some IÎLoad: I <1 I’, dest (I) = fstop (I')
Exercises
Resolve the conflict in case IÎMem, I’Ï Mem above without preserving LMDR.
Hint. Anticipate the stalling from the EX phase to the ID phase, recognizing loadRisk already in phase ID. See Hinrichsen 1998.
Refine DLXdata to DLXpipe and prove it to correctly solve also control hazards (those triggered by jump or branch instructions).
Hint. See Section 5 in Börger and Mazzanti 1997 (LNCS 1212)
References
J.L.Hennessy and D.A.Patterson: Computer Architecture: A Quantitative Approach
Morgan Kaufmann 1990,1996
E. Börger and S. Mazzanti: A practical method for rigorously controllable hardware design
Springer LNCS 1212 (1997) 151-187
H. Hinrichsen: Formally correct construction of a pipelined DLX architecture.
TR 98-5-1, May 1998, Darmstadt University of Technology, Dept. of Electrical and Computer Engineering
S.Tahar.Eine Methode zur formalen Verifikation von RISC-Prozessoren.
Fortschrittberichte VDI, Reihe 10:Informatik/Kommunikationstechnik Nr. 350, VDI-Verlag, Duesseldorf 1995, pp.XIV+162.
E. Börger, R. Stärk: Abstract State Machines.  A Method for High-Level System Design and Analysis Springer-Verlag  2003, see  http://www.di.unipi.it/~AsmBook