Computer Organization and Structure
1. A computer architect needs to design the pipeline of a new microprocessor. She has an example workload program core with 106 instructions. Each instruction takes 100 ps to finish.
a. How long does it take to execute this program core on a nonpipelined processor?.
b. The current state-of-the-art microprocessor has about 20 pipeline stages. Assume it is perfectly pipelined. How much speedup will it achieve compared to the nonpipelined processor?
c. Real pipelining isn’t perfect, since implementing pipelining introduces some overhead per pipeline stage. Will this overhead affect instruction latency, instruction throughput, or both?
2. Consider executing the following code on the pipelined datapath shown as the following figure:
add $2, $3, $1
sub $4, $3, $5
add $5, $3, $7
add $7, $6, $1
add $8, $2, $6
a. At the end of the fifth cycle of execution, which registers are being read and which register will be written?
b. Explain what the forwarding unit is doing during the fifth cycle of execution. If any comparisons are being made, mention them.
c. Explain what the hazard detection unit is doing during the fifth cycle of execution. If any comparisons are being made, mention them.
3. The following piece of code is executed using the pipeline shown in the following figure:
lw $5, 40($2)
add $6, $3, $2
or $7, $2, $1
and $8, $4, $3
sub $9, $2, $1
At cycle 5, right before the instructions are executed, the processor state is as follows:
a. The PC has the value 100ten, the address of the sub_instruction.
b. Every register has the initial value 10ten plus the register number (e.g., register $8 has the initial value 18ten).
c. Every memory word accessed as data has the initial value 1000ten plus the byte address of the word (e.g., Memory has the initial value 1008ten).
Determine the value of every field in the four pipeline registers in cycle 5.
4. The following code has been unrolled once but not yet scheduled. Assume the loop index is a multiple of two (i.e., $10 is a multiple of eight):
Loop: lw $2, 0($10)
sub $4, $2, $3
sw $4, 0($10)
lw $5, 4($10)
sub $6, $5, $3
sw $6, 4($10)
addi $10, $10, 8
bne $10, $30, Loop
Schedule this code for fast execution on the standard MIPS pipeline (assume that it supports addi instruction). Assume initially $10 is 0 and $30 is 400 and that branches are resolved in the MEM stage. How does the scheduled code compare against the original unscheduled code?