Computer Organization and Structure

Computer Organization and Structure

Homework #3

Due: 2009/12/15

1. Different instructions utilize different hardware blocks in the basic single-cycle implementation as shown in Figure 1. The next three problems refer to the following instruction:

	Instruction	Interpretation
a.	add Rd, Rs, Rt	Reg[Rd]=Reg[Rs]+Reg[Rt]
b.	lw Rt, Offs(Rs)	Reg[Rt]=Mem[Reg[Rs]+Offs]

f04-02-P374493

Figure 1: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.

a. What are the values of control signals generated by the control in Figure 1 for this instruction?

b. Which resources (blocks) perform a useful function for this instruction?

c. Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

2. Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 1 there are seven kinds of major blocks. Latencies of blocks along the critical (longest-latency) path for an instruction determine the minimum latency of that instruction. For the following three problems, assume the following resource latencies:

	I-Mem	Add	Mux	ALU	Regs	D-Mem	Control
a.	400ps	100ps	30ps	120ps	200ps	350ps	100ps
b.	500ps	150ps	100ps	180ps	220ps	1000ps	65ps

a. What is the critical path for a MIPS AND instruction?

b. What is the critical path for a MIPS load (LD) instruction?

c. What is the critical path for a MIPS BEQ instruction?

3. The basic single-cycle MIPS implementation in Figure 1 can only implement some instructions. New instructions can be added to an existing ISA, but the decision whether or not to do that depends, among other things, on the cost and complexity such an addition introduces into the processor datapath and control. The next three problems refer to this new instruction:

	Instruction	Interpretation
a.	add3 Rd, Rs, Rt,Rx	Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]
b.	sll Rt, Rd, Shift	Reg[Rd]= Reg[Rt] << Shift (shift left by Shift bits)

a. Which existing blocks (if any) can be used for this instruction?

b. Which new functional blocks (if any) do we need for this instruction?

c. What new signals do we need (if any) from the control unit to support this instruction?

4. The following problems assume that individual stages of the datapath have the following latencies:

	IF	ID	EX	MEM	WB
a.	300ps	400ps	350ps	500ps	100ps
b.	200ps	150ps	120ps	190ps	140ps

a. What is the clock cycle time in a pipelined and nonpipelined processor?

b. What is the total latency of a lw instruction in a pipelined and nonpipelined processor?

c. If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?

5. The following problems assume that instructions executed by the processor are broken down as follows:

	ALU	beq	lw	sw
a.	50%	25%	15%	10%
b.	30%	25%	30%	15%

a. Assuming there are no stalls or hazards, what is the utilization (% of cycles used) of the data memory?

b. Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit?

c. Instead of a singl-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction through stages it actually needs (e.g., ST only takes four cycles because it does not need the WB stage). Compare clock cycle times and execution times with single-cycle, multi-cycle, and pipelined organization.

6. The following problems refer to the following sequence of instructions:

Instruction sequence

lw $1, 40($6)

add $6, $2, $2

sw $6, 50($1)

lw $5, -16($5)

sw $5, -16($5)

add $5, $5, $5

a. Indicate dependences and their type.

b. Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.

c. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them.

The remaining problems assume the following clock cycle times:

	Without forwarding	With full forwarding	With ALU-ALU forwarding only
a.	300ps	400ps	360ps
b.	200ps	250ps	220ps

d. What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speed-up achieved by adding full forwarding to a pipeline that had no forwarding?

e. Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage)?

f. What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speed-up over a no-forwarding pipeline?