Page 1

Announcements – 6/15/10 



Computer Organization Lecture Set – 30 林惠勇 Huei-Yung Lin

Reading 5.4



Exam II this Friday (6/18), 7:00 – 9:00pm Rooms 127, 227, 225



Optional Final Exam 6/25, 10:00 – 12:00 Room 227

CCUEE

Cache Performance 



CPU time = (CPU execution clock cycles + Memory-stall clock cycles) × Clock cycle time Memory-stall clock cycles = Read-stall cycles + Write-stall cycles 







Read-stall cycles = (Reads/Program) × Read miss rate × Read miss penalty Write-stall cycles = ((Writes/Program) × Write miss rate × Write miss penalty) + Write buffer stalls

Memory-stall clock cycles = (Memory Access/Program) × Miss Rate × Miss Penalty Memory-stall clock cycles = (Instructions/Program) × Misses/Instructions) × Miss Penalty

CCUEE

Computer Organization

Computer Organization

Example 

Assume:     



Instruction cache miss rate for gcc: 2% Data cache miss rate for gcc: 4% CPI: 2 without memory stalls Miss penalty: 40 cycles for all misses Percentage of load/store instruction: 36%

How much faster with a perfect cache (the one that never misses)? # of stall cycles = I × 2% × 40 + I × 36% × 4% × 40 = 1.38 × I Instruction miss

Data miss

CPIstall = 3.38 × I The performance with perfect cache is 3.38/2 = 1.69 times better! CCUEE

Computer Organization


What If …

What If …



What if the processor is made faster, but the memory system stays the same?



The clock rate is double, but the memory system stays the same?



Speed up the machine by improving the CPI from 2 to 1 without increasing the clock



Total of miss cycle per instruction : 2% × 80 + 36% × 4% × 80 = 2.75 (penalty: same time, but 2× clock cycles) Therefore, CPI for the new machine is 2 + 2.75 = 4.75





The system with a perfect cache would be 2.38 / 1 = 2.38 times faster The amount of time spent on memory stalls rises from 1.38/3.38 = 41% to 1.38/2.38 = 58%



Execution time with slow clock IC × CPI slow × Clock cycle 3.38 = = = 1.42 Execution time with fast clock IC × CPI fast × Clock cycle 4.75 × 1 2 Not 2× faster

CCUEE

Computer Organization

Our Observations 





Relative cache penalties increases as a processor becomes faster The lower the CPI, the more pronounced the impact of stall cycles If the main memory system is the same, a higher CPU clock rate leads to a larger miss penalty

CCUEE

Computer Organization

Associative Cache – Decreasing Miss 

Direct-mapped cache: 



Set-associative cache: 



Each memory location is mapped to exactly one location in the cache

One- way set associative (direct mapped) Block

Tag Data

0

Two- way set associative

1

Set

2

A cache with a fixed number of locations (at least two) where each block can be placed

3

0

4

1

5

2

6

3

Tag Data Tag Data

7

Fully associative cache: Four-w ay set associative



A cache structure in which a block can be placed in any location in the cache

Set

Tag Data Tag Data Tag Data Tag Data

0 1

Eight-way set associative (fully associative) Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

CCUEE

Computer Organization

CCUEE

Computer Organization


Example (page 482) 





Direct Mapped

Assume there are three small caches, each consisting of four one-word blocks One cache is fully associative, a second is two-way set associative, and the third is direct mapped Find the number of misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, 8

CCUEE

Two-Way Set Associative Cache

8

Cache Address

0

(0 mod 2) = 0

6

(6 mod 2) = 0

8

(8 mod 2) = 0

Set 0

Miss

Memory [0]

Set 0

Memory [0]

Memory [8]

0

Hit

Memory [0]

Memory [8]

6

Miss

Memory [0]

Memory [6]

8

Miss

Memory [8]

Memory [6]

4 Misses CCUEE

Contents of cache block after reference

Hit or miss

Miss

Address of memory block accessed

Set 1

Set 1

Cache Address

0

(0 mod 4) = 0

6

(6 mod 4) = 2

8

(8 mod 4) = 0

Contents of cache block after reference 0

1

2

3

0

miss Memory [0]

8

miss Memory [8]

0

miss Memory [0]

6

miss Memory [0]

Memory [6]

8

miss Memory [8]

Memory [6]

5 Misses

Computer Organization

Address of memory block accessed

Hit or miss

Block 0

0

miss

Memory [0]

8

miss

Memory [0] Memory [8]

0

Hit

Memory [0] Memory [8]

6

miss

Memory [0] Memory [8] Memory [6]

8

Hit

Memory [0] Memory [8] Memory [6]

Contents of cache block after reference Block 1

Block 2

Block 3

3 Misses Increasing degree of associativity  decrease in miss rate

Least recently used (LRU) A replacement scheme in which the block replaced is the one that has been unused for the longest time Computer Organization

Hit or miss

Block Address

Fully Associative Cache

Block Address

Which block to replace? – commonly used is LRU scheme

0

0, 8, 0, 6, 8

CCUEE

Computer Organization

Address of memory block accessed



CCUEE

Computer Organization


Performance of Multilevel Cache

Summary: Cache Memory 

CPI

1

Clock Rate

5 GHz

Memory Access Time

100 ns

Miss Rate per instruction at the primary cache

2%

Secondary cache Access Time (Hit or Miss)

5 ns

Reduce the miss rate to main memory to ‌

0.5%

The miss penalty 100 /0.2 = 500 c.c. The miss penalty 5 /0.2 = 25 c.c.

 

Speeds up access by storing recently-used data Structure has a strong impact on performance Modern microprocessors use on-chip cache (sometimes multilevel caches)

Multilevel Total CPI = 1 + Primary-Stall per instruction + Secondary-Stall per instruction = 1 + (25 * 2%) + (500 * 0.5%) = 1 + 0.5 + 2.5 = 4.0 Original Total CPI = 1 + Memory-Stall cycle per instruction = 1 + 500 * 2% = 1 + 10 = 11 CCUEE

11/4 = 2.8

Computer Organization

The 3Cs 



The Design Challenges

Caused by the first access to a block that has never been in the cache (cold-start misses) INCREASE THE BLOCK SIZE (increase in miss penalty)

Capacity misses 





Computer Organization

Compulsory misses 



CCUEE

Design change

Caused when the cache cannot contain all the blocks needed by the program. Blocks are being replaced and later retrieved again. INCREASE THE SIZE (access time increases as well)

Increase size



CCUEE

Decrease capacity misses

Increase block size

Decreases miss rate for a wide range of block sizes

Occur when multiple blocks compete for the same set (collision misses) INCREASE ASSOCIATIVITY (may slow down access time) Computer Organization

Possible negative effect Increase access time

Increase associativity Decrease miss rate due to Increase access time conflict misses

Conflict misses 

Effect on miss rate

CCUEE

Computer Organization

Increase miss penalty

666  

computer organize

Read more
Read more
Similar to
Popular now
Just for you