Computer Organization and Structure

Computer Organization and Structure

Homework #5

Due: 2010/1/4

1. For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the cache.

	Tag	Index	Offset
a.	31-10	9-4	3-0
b.	31-13	12-5	4-0

a. What is the cache line size (in words)?

b. How many entries does the cache have?

c. What is the ratio between total bits required for such a cache implementation over the data storage bits?

Starting from power on, the following byte-addressed cache references are recorded.

Address

132

232

160

1024

140

3100

180

2180

d. How many blocks are replaced?

e. What is the hit ratio?

f. List the final state of the cache, with each valid entry represented as a record of <index, tag, data>.

2. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70ns and that memory accesses are 36% of all instructions. The following table shows data for L1 caches attached to each of two processors P1 and P2.

		L1 size	L1 miss rate	L1 hit time
a.	P1	1KB	11.4%	0.62ns
a.	P2	2KB	8.0%	0.66ns
b.	P1	8KB	4.3%	0.96ns
b.	P2	16KB	3.4%	1.08ns

a. Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?

b. What is the AMAT for each of P1 and P2?

c. Assuming a base CPI of 1.0, what is the total CPI for each of P1 and P2? Which processor is faster?

For the next three sub-problems, we will consider the addition of an L2 cache to P1 to presumably make up for its limited L1 cache capacity. Use the L1 cache capacities and hit times from the previous table when solving these sub-problems. The L2 miss rate indicate is its local miss rate.

	L2 size	L2 miss rate	L2 hit time
a.	512KB	98%	3.22ns
b.	4MB	73%	11.48ns

d. What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?

e. Assuming a base CPI of 1.0, what is the total CPI for P1 with the addition of an L2 cache?

f. Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its L1 cache to match P2’s performance?

3. To support multiple virtual machines, two levels of memory virtualization are needed. Each virtual machine still controls the mapping of virtual address (VA) to physical address (PA), while the hypervisor maps the physical address (PA) of each virtual machine to the actual machine address (MA). To accelerate such mappings, a software approach called “shadow paging” duplicates each virtual machine’s page tables in the hypervisor, and intercepts VA to PA mapping changes to keep both copies consistent. To remove the complexity of shadow page tables, a hardware approach called nested page table (or extended page table) explicitly support two classes of page tables (VAÞPA and PAÞMA) and can walk such tables purely in hardware. Consider the following sequence of operations:

(1) Create process; (2) TLB miss; (3) page fault; (4) context switch;

a. What would happen for the given operation sequence, for shadow page table, and nested page table respectively?

b. Assuming an x86-based four-level page table in both guest and nested page table, how many memory references are needed to service a TLB miss ofr native versus nested page table?

c. Among TLB miss rate, TLB miss latency, page fault rate, and page fault handler latency, which metrics are more important for shadow page table? What are important for nested page table?

The following table shows parameters for a shadow paging system.

TLB misses per 1000 instruction	NPT TLB miss latency	Page faults per 1000 instruction	Shadowing page fault overhead
0.2	200 cycles	0.001	30000 cycles

d. For a benchmark with native execution CPI of 1, what are the CPI numbers if using shadow page tables versus NPT (assuming only page table virtualization overhead)?

e. What techniques can be used to reduce page table shadowing induced overhead?

f. What techniques can be used to reduce NPT induced overhead?