|
|
|
Egon Börger |
|
|
|
|
|
Dipartimento di Informatica, Universita di Pisa |
|
http://www.di.unipi.it/~boerger |
|
|
|
|
|
|
|
|
|
|
|
Prove the pipelined version of a sequential
architecture to be semantically correct |
|
i.e. show for every program that its
execution under the pipelining
discipline is equivalent to its step-by-step (so-called sequential)
execution |
|
Tasks: |
|
model the two architectures |
|
define the equivalence notion |
|
state the assumptions on the environment
(compiler, hardware, etc.) |
|
prove the correctness |
|
|
|
|
|
|
RISC microprocessor architecture DLX (Hennessy/Patterson
1990,1996) |
|
typical instruction set architecture comprising
instructions for arithmetical and set operations, jumps, interrupts, memory
access |
|
Pipelining techniques resolving structural, data
and control hazards: |
|
instruction scheduling, forwarding, new hardware
links with additional control logic, stalling |
|
Design and verification technique: |
|
A model for datapath and sequential control of
DLX is stepwise refined to its pipelined version which is proven to be
correct |
|
using incremental design and modular proof
techniques (based upon locality principle) |
|
|
|
|
|
Ground model DLXseq for sequential
DLX providing one-instruction-at-a-time RISC processor view |
|
correct wrt control graph definition by Hennessy/Patterson
1996 |
|
Three application domain driven refinement steps
(concentrated on control aspects) |
|
parallelization to DLXpar resolving
structural hazards |
|
resolving data hazards in DLXdata |
|
resolving control hazards in DLXpipe |
|
Each refined model proven to be correct wrt the
preceding model |
|
using inductions and case distinctions which
correspond to the pipelining conflict types and standard methods to solve
them |
|
|
|
|
|
|
For each DLX program P, the result of its
sequential (one-instruction-at-a-time) execution on DLXseq is
the same as the result of its pipelined (up to five instructions at-a-time)
execution on DLXpipe . |
|
Main contribution of the proof: |
|
from segments of parallel DLXpipe
runs, which perform at each step multiple operations for multiple
instructions, extracting for each analyzed instruction an equivalent
sequential DLXseq subcomputation, which together form an equivalent
DLXseq run |
|
defining the appropriate notions of |
|
data dependency and control dependency of
instructions |
|
instruction cycle, result locations, relevant
locations (per instruction, supporting instructionwise proofs) |
|
|
|
|
|
|
|
Each instruction is executed by 5 successive
steps |
|
IF: fetching instr from memory to datapath
register IR |
|
ID: decoding instr, including extracting its
operands from register file registers fstop,scdop into ALU input ports A,B |
|
EX: execution proper, computing via ALU |
|
ALU operations: value of ALU output port C for
given input A,B |
|
data memory or branch address: |
|
update memory address register MAR using A |
|
update memory data register MDR with B |
|
update jump address PC |
|
update by A or transfer to C interrupt address register IAR |
|
MEM: loading from or storing to mem at MAR via
MDR |
|
WB: writing back C to the destination register |
|
Phases reflected in ground model by control
states |
|
|
|
|
|
|
|
|
size of register file using abstract content fct
reg |
|
width of datapath using abstract registers PC,IR,A,B,C,
MAR,MDR,IAR |
|
instruction set (in immediate or not immediate form) |
|
Alu È Set |
|
Jump È Branch Denote JumpLink = Jump - {TRAP} |
|
Jump consists of absolute jump instructions of
three types |
|
TRAP: system jump (save PC in IAR and update PC) |
|
Link: a set of jump links (save PC in C, to be
stored in reg file). The destination register of LINK instructions is
assumed to be fixed (in Hennessy/Patterson R31). |
|
a set of other (so-called plain) jumps |
|
Mem = Load È Store |
|
Interrupt = { MOVS2I, MOVI2S } manipulating IAR |
|
instruction format: using abstract parameter
functions opcode, fstop, scdop, dest, iop, ival (arg IR suppressed) |
|
memory access (bandwith) using abstract function
mem |
|
|
|
|
|
|
|
|
IF: FETCH = IR ¬ mem(PC), PC ¬ next(PC) |
|
ID: OPERAND = A ¬ fstop, B ¬ scdop |
|
JumpBranch = opcode Î Jump È Branch |
|
AluSet = opcode Î Alu È Set |
|
LoadStore = opcode Î Load È Store |
|
IARS2I/IARI2S = opcode Î { MOVS2I, MOVI2S } |
|
EX: ALU = if iop(opcode) then C ¬opcode(A,ival) |
|
else C ¬opcode(A,B) |
|
MEMADDR = MAR ¬ A + ival |
|
BRANCH = if reg(A) = 0 then PC ¬ PC + ival |
|
TRAP = IAR ¬ PC, PC ¬ ival |
|
JUMPLINK = if iop(opcode) then PC ¬ PC + ival |
|
else PC ¬ A |
|
MEM: LOAD = MDR ¬ mem(MAR) |
|
STORE = mem(MAR) ¬ MDR |
|
SUBWORD = C ¬opcode(MDR) |
|
WB: WRITEBACK
= dest ¬ C |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Theorem: For each DLX program P, let Ppar
be its transformation obtained by inserting two empty instructions after
each occurrence of a jump or branch instruction ( “instruction scheduling”
). Let C be the DLXseq
computation with input P and Cpar the corresponding DLXpar
computation with input Ppar. Then C and Cpar have the
same result if Cpar is
data-hazard-free. |
|
Tasks to be accomplished: |
|
define “correspondence” of computation |
|
define “ computation result” and “sameness” for
them |
|
define “data-hazard” and data-hazard-freeness |
|
prove the statement |
|
justify the assumption on transforming P to Ppar |
|
Instruction scheduling serves here to separate
semantics preserving parallelization from hazard prevention |
|
|
|
|
|
|
instruction cycle for instr |
|
in DLXseq any subcomputation starting
with control state FETCH with loading instr from the code memory until the
next occurrence of control state FETCH |
|
in DLXseq any subcomputation starting
with instr and ending with the first pipe stage of instr at the end of
which the values of all result locations of instr are computed and recorded
in the appropriate location as defined below |
|
this pipe stage is denoted EX(instr),
MEM(instr), WB(instr) depending on
instr (see table below) |
|
corresponding instruction cycles in C and Cpar |
|
ever pair (Ii,Ii’) of
instruction cycles I0 , I1, … in
C and |
|
I0’ , I1’ ,…in Cpar where I0 , I’0
are corresponding initial states (with corresponding initialization) |
|
|
|
|
In DLX every instr has only one result proper
(different from <reg,PC>). It is written at the end of the execution
of instr. |
|
|
|
|
|
|
|
|
conflicts can arise in two cases: |
|
data hazard: I’ uses as one of its operands the
content of the destination register of a preceding I in the pipe |
|
control hazard: I’ enters the pipe shortly after
a jump or branch instruction |
|
I <1,2,3 I’ iff instr cycle for I’
starts 1, 2, or 3 steps after an instr cycle for I |
|
I’ data dependent on I iff I <1,2,3
I’ and one of the following two cases holds: |
|
dest(I) Î {fstop(I’),scdop(I’)} and I’ÏJumpÈBranch |
|
dest(I) = fstop(I’) and I’ Î JumpÈBranch |
|
DLXpar computation data-hazard-free if
it contains no occurrence of an I’ which is in the pipe together with an
occurrence of an I on which it is data dependent |
|
|
|
|
|
|
|
|
|
Goal: separate analysis of control hazards from
correctness proof for parallelization: |
|
Problem: when a jump or branch instr is fetched,
the two immediately following instruction cycles generate results which
could spoil the continuation of the computation after the jump |
|
Solution: assume for the correctness proof that
the compiler produces Ppar from P placing two empty instructions
after each jump or branch instruction. |
|
We define empty instructions NOT to start an
instruction cycle. |
|
We assume these new instructions to be linked by
the extension of the next function of P in the obvious way, as formalized
by the PC- preservation rule PC1 ¬ next(next(PC)). |
|
Assume the compiler to guarantee hazard-freeness |
|
Both assumptions will be discharged in separate
refinement steps |
|
|
|
|
|
|
|
To establish: C and Cpar have the
same result (assuming Cpar is data-hazard-free) |
|
Proof by in induction on the given runs,
decomposing computations in instruction cycles so that the proof can be
done instructionwise with ‘local’ arguments. This needs a stronger
induction hypothesis: |
|
DLXpar Lemma. Let ICn,ICparn
be the n-th instruction cycle in C, Cpar. The following holds: |
|
a) Completeness: If Cpar is
data-hazard-free, then ICn,ICparn are
instruction cycles for the same instr and start with the same values for
the “relevant locations” used by instr |
|
b) Correctness: Any instr cycle pairs as in a)
for any instr which is not data-dependent on any instruction in the pipe
compute the same result |
|
The lemma implies the theorem. |
|
|
|
|
|
|
A location l used by instr is a relevant
location except in the following two cases: |
|
Irrelev 1. I = <reg,IAR> and instr = MOVS2I enters the pipe 1, 2 or
3 stages after an occurrence of MOVI2S or of TRAP |
|
NB. No conflict can arise from using
<reg,IAR> because MOVS2I. the only instruction which uses IAR, can
never be in conflict with any preceding instruction. If I writes into IAR,
then I=TRAP or I=MOVI2S and I writes into IAR in its third pipe stage.
Therefore if I <1,2,3 I’ , then I has already written into
IAR when I’ uses it. |
|
Irrelev 2. I = <mem,arg> and instr is a
Load instruction which enters the pipe 1, 2, or 3 steps after an occurrence
of a Store instruction for the same arg (see justification below) |
|
|
|
|
|
|
Irrelev 2. I = <mem,arg> and instr is a
Load instruction which enters the pipe 1, 2, or 3 steps after an occurrence
of a Store instruction for the same arg |
|
No conflict can arise from using a mem location
because load instructions---the only ones which use memory locations---can never
be in conflict with preceding store instructions---the only ones which
write into mem locations. |
|
If I ÎStore and I' ÎLoad, then I updates its result location < mem, reg
(fstop(I)) + ival (I)> in its fourth pipe stage and I' reads the value
of the location < mem, reg (fstop (I)) + ival (I)> in its fourth pipe
stage too. Therefore if I <1,2,3 I’ and I' loads the value of
the result location of I as updated by I, then I has already updated this
result location when I' loads from there. |
|
NB. DLX does not support self-modifying code |
|
|
|
|
|
ad completeness: ICn,ICparn
are instruction cycles for the same instr and start with the same values
for the “relevant locations” used by instr |
|
n=0: by correspondence of runs and
initialization |
|
n+1: by inductive hypothesis, for each i £ n, ICpar’
starts with the same values for the relevant locations as does ICi
and they both compute the same result. Therefore ICn+1,ICparn+1
are instruction cycles for the same instruction, say instr, and start with
the same values for the relevant locations to be used by that instruction. |
|
|
|
|
|
|
ad correctness: any instr cycle pair ICn,ICparn
as in a) where instr is not data-dependent on any instruction in the pipe
compute the same result |
|
Due to the absence of stalls, the n+1-th
instruction cycle in Cpar starts |
|
after the first step of ICparn
in case instrn is neither a branch instruction with true
branching condition nor a jump |
|
otherwise the n+1-th instruction cycle in Cpar starts after the third step of ICparn due to the following Jump Lemma |
|
Jump Lemma. If a jump or branch instruction I is
fetched in a DLXpar-computation, then the following two fetched
instructions are empty and at stage ID(I) the register PC1 is updated by
the correct value to be used for the computation of the possible new
PC-value in stage EX(I). |
|
By a case distinction following the instruction
type it then follows from the definition of the rule macros that
corresponding updates in C and Cpar provide the same values for
corresponding result locations. |
|
|
|
|
Jump Lemma. If a jump or branch instruction I is
fetched in a DLXpar-computation, then the following two fetched
instructions are empty and at stage ID(I) the register PC1 is updated by
the correct value to be used for the computation of the possible new
PC-value in stage EX(I). |
|
Proof: follows by an induction on the number of
jump instructions which are fetched during a run, using the transformation
of P to Ppar. |
|
|
|
|
|
Goal: refine DLXpar to DLXdata where data
hazards are resolved for non-jump/branch instructions so that the compiler
assumption of hazard-freeness can be restricted to jump/branch
instructions. |
|
I.e. we have to define DLXdata and to
prove the following: |
|
Theorem. Let C, Cdata be the
computations of DLXseq ,
DLXdata started with program P, Ppar . Assume that in
Cdata no occurrence of a jump or branch instruction is in the
pipe together with an occurrence of an instruction on which it is data
dependent. Then C and Cdata compute the same result. |
|
|
|
|
|
|
Data dependence condition DataDep(I,I’): |
|
I’ Ï Jump ÈBranch and |
|
I <1,2,3 I’ and dest(I) Î {fstop(I’), scdop(I’) } |
|
NB. no write-after-write hazard can occur in DLX |
|
writing only in stage WB, but together with any
stalled instruction every later instruction in the pipe is also stalled |
|
NB. no write-after-read hazard can occur in DLX |
|
read stage ID precedes the write stage WB |
|
Hazard Resolution Method: standard techniques
for hazard resolution incorporated into DLXpar macros |
|
forwarding via |
|
new dataflow connections (hardware links) |
|
additional decoder logic (multiplexing) |
|
stalling |
|
|
|
|
|
|
Idea: |
|
refine macros only for case DataDep(I,I’): |
|
if DataDep(I,I’), then the DLXdata
macro refinement guarantees that correct arguments are taken to compute the
result of instr, namely during EX(instr)-phase |
|
otherwise the DLXpar macro is executed |
|
weakening the compiler hazard-freeness
assumption |
|
by refining the notion of relevant locations,
taking out the hazardous locations since they are taken care of by the
refined architecture |
|
NB. This idea realizes an abstract form of the
typical implementation using a scoreboard to keep track of data
dependencies |
|
|
|
|
|
Proof follows from the following lemma: |
|
DLXdata Lemma. Let ICn,ICparn,ICndata
be the n-th instruction cycle in C, Cpar, Cdata.
Then: |
|
Completeness: If Cdata is free of
data hazards for jump/branch instructions, then ICn,ICparn,ICndata
are instruction cycles for the same instruction I’ and start with the same
values for the relevant locations used by I’. |
|
For any I’-cycles IC, ICpar,ICdata
as in a): |
|
Conservativity: if I’ is not data-dependent on
any I in the pipe, then ICpar,ICdata compute the same
result |
|
Refinement: if I’ Ï Jump ÈBranch is data-dependent on an I
<1,2,3 I’ in the pipe, then IC,ICdata compute the
same result |
|
|
|
|
|
By induction on number n of instr cycles: |
|
ad completeness: ICn,ICndata
are cycles for the same instr and start with the same values of relevant
locs |
|
n=0: by correspondence of computations |
|
n+1: follows from Jump Lemma as for DLXpar |
|
ad conservativity: to show: if I’ is not
data-dependent on any I in the pipe, then ICpar,ICdata
compute the same result |
|
in case of no data dependence, in every pipe
stage rule applied to I’, the DLXpar-macro is executed. Since by assumption ICpar and
ICdata start with the same values for the relevant locations
used by I', the effect of the rule applications to I' is the same in ICpar
and ICdata, including the result locations of I‘. Then by the
DLXpar lemma the result loc values of I’ are the same in IC
and ICdata |
|
|
|
|
|
ad Refinement: Let I’ÏJump È Branch be data-dependent on an I
<1,2,3 I’ in the pipe. To show: IC,ICdata compute
the same I’-result. |
|
By definition of dest and assumption on I’: |
|
I Î AluÈSet ÈLoad È Link È{ MOVS2I} |
|
I’ Î AluÈSet ÈLoad ÈStore È{ MOVI2S}
since I’ Ï {MOVS2I} È Jump ÈBranch |
|
|
|
|
|
|
Then |
|
I Î AluÈSet ÈLink È{ MOVS2I}, dest(I) is updated to the value used by I’ in
WB(I), namely to the value of C1, which is copied in MEM(I) from C as |
|
computed in EX(I) by an ALU È SET operation |
|
or as content of PC1 (Link) or IAR (MOVS2I) |
|
Case I <2,3 I’: ID(I‘) overlaps
with WB(I) or MEM(I) with the expected operand value in C1 or C |
|
Case I <1 I’: expected operand
value computed during ID(I‘) and available not before EX(I‘) |
|
|
|
|
|
Case I <2,3 I’: Refinement of
OPERAND |
|
Since ID(I‘) overlaps with WB(I) or MEM(I) with
the expected operand value in C1 or C, nthReg=A,B (nth=fst,scd) can be
updated with C1 or C (called below C’) |
|
if nthop(IR) Î{dest(IR3),dest(IR2)}
refinement case |
|
then
nthReg ¬ C’ |
|
else
nthReg ¬
nthop(IR) conservative
case |
|
where C’ = if nthop(IR) = dest(IR3) ¹ dest(IR2)
then C1 |
|
if nthop(IR) = dest(IR2) then C last update counts |
|
Irrelev 3. I = <reg,nthop(I’)> where I’ÏJump È Branch for some I <2,3
I’ such that I Ï Mem, nthop(I’) =dest(I) |
|
Hw Cost: direct link bw register file exits A,B
and C,C1 |
|
|
|
|
|
|
Case I <1 I’: Refining EX-macros
by forwarding |
|
expected operand value computed during ID(I‘) is
available at the end of EX(I) in C, to be forwarded directly to ALU for
EX(I’), shortcutting transfer from C to register file to A,B |
|
Refine EX-rules for I’ Î Alu È Set ÈLoad ÈStore È{ MOVI2S} |
|
Method: adding to macros a clause which in the
data hazard case provides the argument C instead of A,B |
|
Hw Cost: direct link between C and |
|
both ALU ports for I’ Î Alu È Set |
|
IAR for I' = MOVI2S |
|
MAR and SMDR for I’ Î MEM |
|
together with additional control logic (multiplexers) to select as
ALU input the forwarded value rather than the value from the register file |
|
Irrelev 4. <reg,nthop(I’)> for possible
combinations for I’ |
|
|
|
|
|
Irrelev 4. <reg,nthop(I’)> such that for
some I <1 I’ with
I Ï Mem one
of the following holds: |
|
opcode (I') Î Alu È Set, iop (opcode (I')), dest (I) = fstop (I'), nth = fst |
|
opcode (I') Î Alu È Set, not iop (opcode (I')), dest (I) = nthop (I') |
|
opcode (I') Î Mem È {MOVI2S}, nth = fst, dest (I) = fstop (I') |
|
opcode (I') Î Store, nth = scd, dest (I) = scdop (I') |
|
|
|
|
|
|
If opcode(IR1)ÎLoad È Store then MEMADDR |
|
if
fstop(IR1) = dest(IR2) refinement case |
|
then MAR ¬ valfst + ival(IR1) |
|
else MAR ¬ A + ival(IR1)
conservative case |
|
where valnth = C if nothop (IR1) =
dest(IR2) |
|
= nthReg otherwise with fstReg=A,scdReg=B |
|
|
|
If opcode(IR1)ÎStore then PassBtoMDR |
|
if scdop(IR1) = dest(IR2) refinement case |
|
then SMDR ¬ valscd |
|
else SMDR ¬ B conservative case |
|
Analogously for ALU and MOVI2S rules |
|
|
|
|
|
|
Then IÎLoad, I’Î AluÈSet ÈLinkÈ{ MOVI2S} |
|
expected value, loaded by I, available in LMDR
at the end of MEM(I) |
|
Case I <2,3 I’: expected operand
value available in EX or ID stage in LMDR |
|
refining ID-rule OPERAND and EX-rules ALU,MOVI2S |
|
Case I <1 I’: stall pipeline 1
clock cycle, at the latest before EX(I’), to let only MEM and WB stages
progress |
|
NB. During a pipeline stall a just loaded value
in LMDR may be written back; since in the next step it may serve as operand
of a stalled instruction, we preserve LMDR |
|
|
|
|
|
|
Case I <2,3 I’: refining |
|
C’ in OPERAND by LMDR and |
|
valnnth in ALU, MOVI2S by LMDR1 |
|
using new
WB-stage preservation rule LMDR1 ¬ LMDR |
|
C’ = |
|
LMDR
last update in ante-ante-preceding load instr |
|
if nthop(IR) = dest(IR3) ¹ dest(IR2), opcode(IR3) Î LOAD |
|
where LMDR = opcode(IR3)
(LMDR) |
|
C1
previous case |
|
if
nthop(IR) =dest(IR3) ¹ dest(IR2), opcode(IR3) Ï LOAD |
|
C if nthop(IR) = dest(IR2) previous
case |
|
|
|
|
|
|
Case I <2,3 I’: refining |
|
valnnth in ALU, MOVI2S by LMDR1 |
|
new
WB-stage preservation rules LMDR1 ¬ LMDR, IR4 ¬ IR3 in case an LMDR-value, written back during a
pipeline stall, in the next step serves as operand of an instruction
stalled after ID |
|
valnnth
= |
|
C if nthop(IR) = dest(IR2) previous case |
|
LMDR if nthop(IR1) = dest(IR3) ¹
dest(IR2), opcode(IR3) Î LOAD, opcode(IR4) Ï LOAD |
|
case of
no two successive load instructions |
|
LMDR1 if nthop(IR1) = dest(IR4) ¹ dest(IR2),
dest(IR3), opcode(IR3), opcode(IR4) Î LOAD |
|
nthReg otherwise DLXpar case |
|
|
|
|
|
|
If opcode(IR1)Î Alu È Set then ALU macro |
|
if
iop(opcode) then |
|
if fstop(IR1) = dest(IR2) or for some m Î {3, 4} |
|
fstop(IR1) = dest(IRm) and
opcode(IRm) Î LOAD |
|
then
C ¬ opcode(IR1)
(valfst , ival(IR1)) |
|
else C ¬ opcode(IR1) (A, ival(IR1)) |
|
else if dest(IR2) Î {fstop(IR1), scdop(IR1) } or |
|
for some m Î {3, 4} opcode(IRm) Î LOAD and |
|
dest(IRm) Î {fstop(IR1),
scdop(IR1)} |
|
then C ¬ opcode(IR1) (valfst , valscd ) |
|
else C ¬ opcode(IR1) (A, B) |
|
Analogously for MOVI2S, MEMADDR, PassBtoMDR |
|
|
|
|
|
Irrelev 5. <reg,nthop(I’)> with I’ÏMemÈBranchÈJump such that for some
IÎLoad with dest (I) = nth op (I'), one of the
following holds: |
|
I <3 I’ |
|
I <2 I’ , iop (opcode (I')), nth =
fst |
|
I <2 I’ , not iop (opcode (I‘)) |
|
|
|
|
|
Case I <1 I’: stop EX(I’) together with IF,ID of later
instructions for 1 clock cycle,
letting only MEM and WB stages progress, to make value loaded by I
available to I’ |
|
loadRisk = opcode (IR2) Î LOAD and |
|
reg (IR1) Ï Mem È Jump È Branch and |
|
dest (IR2) Î {fstop (IR1) , scdop (IR1)} |
|
adding guard not loadRisk to EX,ID,IF stage
rules |
|
adding IF stage rule if loadRisk then IR2 ¬ undef |
|
whose execution makes loadRisk false so that the
full pipelining will be resumed and this case can be handled as the
previous one and is resolved by the refined EX-rules |
|
Irrelev 6. <reg,nthop(I’)> for I’ÏMemÈBranchÈJump and some IÎLoad: I <1 I’, dest (I) = nth op
(I') |
|
|
|
|
|
|
Case I <2,3 I’: expected operand
value available in EX or ID stage |
|
forwarding solution by ID-rule OPERAND and
refined EX-rules ALU,MOVI2S |
|
hw price: direct links bw LMDR and MAR, SMDR, new LMDR1 |
|
Case I <1 I’: |
|
Subcase 1. I’ uses the loaded val as datum to be
stored: dest(I)=scdop(I’) |
|
forwarding solution |
|
Subcase 2. I’ uses the loaded val as address for
Mem opn: dest(I)=fstop(I’) |
|
stalling-pipeline solution |
|
|
|
|
|
Subcase I <3 I’: resolved by
ID-rule OPERAND |
|
Subcase I <2 I’: resolved by
refining MEMADDR and PassBtoMDR as shown above for ALU |
|
Irrelev 7. |
|
<reg,nthop(I’)> for I’ Î Mem such that for some
IÎLoad: I
<3 I’, dest (I) = nthop (I') |
|
<reg,fstop(I’)> for I’ Î Mem such that for some
IÎLoad: I
<2 I’, dest (I) = fstop (I') |
|
<reg,scdop(I’)> for I’ Î Store such that for some
IÎLoad: I
<2 I’, dest (I) = scdop (I') |
|
|
|
|
|
|
Subcase 1: dest(I)=scdop(I’), i.e.I’ uses the
loaded val as datum to be stored |
|
Then I’ Î Store, val loaded by I needed in MEM(I’) where available
in LMDR |
|
forwarding solution by refining STORE |
|
hw price: link bw LMDR and memory input port |
|
Irrelev 8. <reg,scdop(I’)> for I’ Î Store such that for some
IÎLoad: I
<1 I’, dest (I) = scdop (I') |
|
Subcase 2: dest(I)=fstop(I’), i.e. I’ uses the
loaded val as address for Mem opn |
|
stalling pipeline: refine loadRisk by clause |
|
or
dest(IR2) = fstop(IR1), IR1 Î Mem |
|
Irrelev 9. <reg,fstop(I’)> for I’ Î Mem such that for some
IÎLoad: I
<1 I’, dest (I) = fstop (I') |
|
|
|
|
|
Resolve the conflict in case IÎMem, I’Ï Mem above without
preserving LMDR. |
|
Hint. Anticipate the stalling from the EX phase
to the ID phase, recognizing loadRisk already in phase ID. See Hinrichsen
1998. |
|
Refine DLXdata to DLXpipe
and prove it to correctly solve also control hazards (those triggered by
jump or branch instructions). |
|
Hint. See Section 5 in Börger and Mazzanti 1997
(LNCS 1212) |
|
|
|
|
|
J.L.Hennessy and D.A.Patterson: Computer
Architecture: A Quantitative Approach |
|
Morgan Kaufmann 1990,1996 |
|
E. Börger and S. Mazzanti: A practical method
for rigorously controllable hardware design |
|
Springer LNCS 1212 (1997) 151-187 |
|
H. Hinrichsen: Formally correct construction of
a pipelined DLX architecture. |
|
TR 98-5-1, May 1998, Darmstadt University of
Technology, Dept. of Electrical and Computer Engineering |
|
S.Tahar.Eine Methode zur formalen Verifikation
von RISC-Prozessoren. |
|
Fortschrittberichte VDI, Reihe
10:Informatik/Kommunikationstechnik Nr. 350, VDI-Verlag, Duesseldorf 1995,
pp.XIV+162. |
|
E. Börger, R. Stärk: Abstract State
Machines. A Method for High-Level
System Design and Analysis Springer-Verlag
2003, see http://www.di.unipi.it/~AsmBook |
|