Computer Organization and Structure

 

Homework #4

Due: 2010/12/21

 

1.      Different instructions utilize different hardware blocks in the basic single-cycle implementation. The next three problems refer to the following instruction:

 

 

Instruction

Interpretation

a.

add Rd, Rs, Rt

Reg[Rd]=Reg[Rs]+Reg[Rt]

b.

lw Rt, Offs(Rs)

Reg[Rt]=Mem[Reg[Rs]+Offs]

 

ΰ–Ύ: f04-02-P374493

Figure 1: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.

 

a.         What are the values of control signals generated by the control in Figure 1 for this instruction?

b.        Which resources (blocks) perform a useful function for this instruction?

c.         Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

 

Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 1 there are seven kinds of major blocks. Latencies of blocks along the critical (longest-latency) path for an instruction determine the minimum latency of that instruction. For the following three problems, assume the following resource latencies:

 

 

I-Mem

Add

Mux

ALU

Regs

D-Mem

Control

a.

400ps

100ps

30ps

120ps

200ps

350ps

100ps

b.

500ps

150ps

100ps

180ps

220ps

1000ps

65ps

 

d.        What is the critical path for a MIPS AND instruction?

e.         What is the critical path for a MIPS load (LD) instruction?

f.         What is the critical path for a MIPS BEQ instruction?

 

2.      Assuming that logic blocks needed to implement a processorfs datapath have the following latencies:

 

 

I-Mem

Add

Mux

ALU

Regs

D-Mem

Sign-extend

Shift-left-2

a.

400ps

100ps

30ps

120ps

200ps

350ps

20ps

2ps

b.

500ps

150ps

100ps

180ps

220ps

1000ps

90ps

20ps

 

ΰ–Ύ: ΰ–Ύ: f04-06-P374493

Figure 2: A portion of the datapath used for fetching instructions and incrementing the program counter. The fetched instruction is used by other parts of the datapath.

 

a.         If the only thing we need to do in a processor is fetch consecutive instructions (Figure 2), what would the cycle time be?

b.        Consider a datapath similar to the one in Figure 3, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath?

c.         Repeat the above problem, but this time we need to support only conditional PC-relative branches.

 

The remaining three problems refer to the following logic block (resource) in the datapath:

 

 

Resource

a.

Add 4 (to the PC)

b.

Data Memory

 

ΰ–Ύ: ΰ–Ύ: f04-11-P374493

Figure 3: The simple datapath for the MIPS architecture combines the elements required by different instruction classes.This datapath can execute the basic instructions (load-store word, ALU operations, and branches) in a single clock cycle.

 

d.        Which kinds of instructions require this resource?

e.         For which kinds of instructions (if any) is this resource on the critical path?

f.         Assuming that we only support beq and add instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change.

 

3.      The following problems refer to the following sequence of instructions:

 

 

Instruction sequence

a.

lw $1, 40($6)

add $6, $2, $2

sw $6, 50($1)

b.

lw $5, -16($5)

sw $5, -16($5)

add $5, $5, $5

 

a.         Indicate dependences and their type.

b.        Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.

c.         Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them.

 

The remaining problems assume the following clock cycle times:

 

 

Without forwarding

With full forwarding

With ALU-ALU forwarding only

a.

300ps

400ps

360ps

b.

200ps

250ps

220ps

 

d.        What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speed-up achieved by adding full forwarding to a pipeline that had no forwarding?

e.         Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage)?

f.         What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speed-up over a no-forwarding pipeline?

 

4.      Assuming that the breakdown of dynamic instructions into various instruction categories is as follows:

 

 

R-Type

beq

jmp

lw

sw

a.

50%

15%

10%

15%

10%

b.

30%

10%

5%

35%

20%

 

Also, assume the following branch predictor accurancies:

 

 

Always-taken

Always not-taken

2-bit

a.

40%

60%

80%

b.

60%

40%

95%

 

a.         Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.

b.        Repeat the above problem for the galways not-takenh predictor.

c.         Repeat the above problem for the g2-bith predictor.

d.        With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

e.         With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

f.         Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions?