Computer Organization and Structure

 

Homework #3

Due: 2009/12/15

 

1.      Different instructions utilize different hardware blocks in the basic single-cycle implementation as shown in Figure 1. The next three problems refer to the following instruction:

 

 

Instruction

Interpretation

a.

add Rd, Rs, Rt

Reg[Rd]=Reg[Rs]+Reg[Rt]

b.

lw Rt, Offs(Rs)

Reg[Rt]=Mem[Reg[Rs]+Offs]

 

f04-02-P374493

Figure 1: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.

 

a.         What are the values of control signals generated by the control in Figure 1 for this instruction?

b.        Which resources (blocks) perform a useful function for this instruction?

c.         Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

 

2.      Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 1 there are seven kinds of major blocks. Latencies of blocks along the critical (longest-latency) path for an instruction determine the minimum latency of that instruction. For the following three problems, assume the following resource latencies:

 

 

I-Mem

Add

Mux

ALU

Regs

D-Mem

Control

a.

400ps

100ps

30ps

120ps

200ps

350ps

100ps

b.

500ps

150ps

100ps

180ps

220ps

1000ps

65ps

 

a.         What is the critical path for a MIPS AND instruction?

b.        What is the critical path for a MIPS load (LD) instruction?

c.         What is the critical path for a MIPS BEQ instruction?

 

3.      The basic single-cycle MIPS implementation in Figure 1 can only implement some instructions. New instructions can be added to an existing ISA, but the decision whether or not to do that depends, among other things, on the cost and complexity such an addition introduces into the processor datapath and control. The next three problems refer to this new instruction:

 

 

Instruction

Interpretation

a.

add3 Rd, Rs, Rt,Rx

Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]

b.

sll Rt, Rd, Shift

Reg[Rd]= Reg[Rt] << Shift (shift left by Shift bits)

 

a.         Which existing blocks (if any) can be used for this instruction?

b.        Which new functional blocks (if any) do we need for this instruction?

c.         What new signals do we need (if any) from the control unit to support this instruction?

 

4.      The following problems assume that individual stages of the datapath have the following latencies:

 

 

IF

ID

EX

MEM

WB

a.

300ps

400ps

350ps

500ps

100ps

b.

200ps

150ps

120ps

190ps

140ps

 

a.         What is the clock cycle time in a pipelined and nonpipelined processor?

b.        What is the total latency of a lw instruction in a pipelined and nonpipelined processor?

c.         If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?

 

5.      The following problems assume that instructions executed by the processor are broken down as follows:

 

 

ALU

beq

lw

sw

a.

50%

25%

15%

10%

b.

30%

25%

30%

15%

 

a.         Assuming there are no stalls or hazards, what is the utilization (% of cycles used) of the data memory?

b.        Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit?

c.         Instead of a singl-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction through stages it actually needs (e.g., ST only takes four cycles because it does not need the WB stage). Compare clock cycle times and execution times with single-cycle, multi-cycle, and pipelined organization.

 

6.      The following problems refer to the following sequence of instructions:

 

 

Instruction sequence

a.

lw $1, 40($6)

add $6, $2, $2

sw $6, 50($1)

b.

lw $5, -16($5)

sw $5, -16($5)

add $5, $5, $5

 

a.         Indicate dependences and their type.

b.        Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.

c.         Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them.

 

The remaining problems assume the following clock cycle times:

 

 

Without forwarding

With full forwarding

With ALU-ALU forwarding only

a.

300ps

400ps

360ps

b.

200ps

250ps

220ps

 

d.        What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speed-up achieved by adding full forwarding to a pipeline that had no forwarding?

e.         Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage)?

f.         What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speed-up over a no-forwarding pipeline?