Page 1

Prateek Pujara & Aneesh Aggarwal

CDA 5106 Advanced Computer Architecture Fall 2007

Group Member Raul A. Dookhoo Abu Rahat Chowdhury


Foreword • Cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch or to compute, compared to the cost of reading the cache • Caches are very important to cover the processor-memory performance gap.

So Caches should be utilized very efficiently Fetch only the useful data


Foreword (cont.) Increasing Cache Efficiency by Eliminating Noise International Conference on High Performance Computer Architecture (HPCA 2006) Aneesh Aggarwal (Assistant Professor) His research interests are in the areas of energy-efficient, and reliable micro architectural and compiler optimization techniques for high performance processors Prateek Pujara (PhD student) He is interested in micro architectural techniques for improving the performance of multi-threaded and multi-core processors. They are both still part of State University of New York, Binghamton


Some Related Definition Cache Pollution: It is a situations where an executing computer program loads data into CPU cache unnecessarily, thus causing other needed data to be evicted from the cache into lower levels of the memory hierarchy, potentially all the way down to main memory, . hit thus causing a performance

Cache Noise: Caches exploit spatial locality by fetching words in the same locality as the word for which the miss has occurred. These words that which might not be required are part of Cache Noise.

Cache Utilization : Percentage of the useful words out of the total words fetched into the cache.

Fetch only the useful data


Utilization vs Block-Size • Larger Cache Blocks • Increase bandwidth requirement • Reduce utilization • High spatial locality

• Smaller Cache Blocks • Reduces bandwidth requirement • Increase utilization • Lower spatial locality


Percent Cache Utilization Percentage Utilization 100.000 90.000 80.000

Percentage

70.000 60.000 50.000 40.000 30.000 20.000 10.000

16KB, 4-way Set Associative Cache, 32 byte block size

Average

FP Average

Int Average

Wupwise

Swim

Mgrid

Equake

Art

Apsi

Applu

Ammp

Vpr

Vortex

Parser

Mcf

Gcc

Bzip2

0.000


Methods to improve utilization • Rearrange data/code • Dynamically adapt cache line size • Sub-blocking


Advantages in Utilization Improvement • Lower energy consumption By avoiding wastage of energy on useless words.

• Improve performance By better utilizing the available cache space.

• Reduce memory traffic By not fetching useless words.


The Goal of this Paper • Improve Utilization – Predict the words to-be-referenced – Avoid cache pollution by fetching only the predicted words


Contributions of this Paper • Illustrate high predictability of cache noise • Propose efficient cache noise predictor • Show potential benefits of cache noise prediction based fetching in terms of – Cache utilization – Cache power consumption – Bandwidth requirement

• Illustrate benefits of cache noise prediction for prefetching • Investigate cache noise prediction as an alternative to sub-blocking


Cache Noise Prediction • Programs repeat the pattern of memory references. • Predict cache noise based on the history of words accessed in the cache blocks.


Cache Noise Predictors 1) Phase Context Predictor (PCP)

Records the words usage history of the most recently evicted cache block.

2) Memory Context Predictor (MCP)

    Assuming that data accessed from contiguous memory locations will

be accessed in same fashion.

3) Code Context Predictor (CCP)

    Assuming that instructions in a particular portion of the code will

access data in same fashion.


Cache Noise Predictors • For code context predictors • Use higher order bits of PC as context

• Store the context along with the cache block • Add 2 bit-vectors for each cache block • One for identifying the valid words present • One for storing the access pattern


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110)


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Last Word Usage History

Valid-Bit


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (xxxxxx)

1001

xxxx

1

0


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (xxxxxx)

1001

xxxx

1

0

Miss due to PC 1001100100 Only 1st and 2nd words are brought Evicted cache block was brought by PC 101110 and used only 1st word


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (xxxxxx)

1001

xxxx

1

0

Miss due to PC 1001100100 Only 1st and 2nd words are brought Evicted cache block was brought by PC 101110 and used only 1st word


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (101110)

1001

1000

1

1


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (101110)

1001

1000

1

Miss due to PC 1011101100 Only 1st word brought Evicted block was brought by PC 101001 and used 2nd and 4th word

1


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (101110)

1001

1000

1

Miss due to PC 1011101100 Only 1st word brought Evicted block was brought by PC 101001 and used 2nd and 4th word

1


Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1100

1

Y (101001)

Z (101110)

0101

1000

1

1


Predictability of CCP CCP(30bits)

CCP(28bits)

CCP(26bits)

100 90 80

Percentage

70 60 50 40 30 20 10

Predictability = Correct prediction/Total misses No prediction almost 0%

Average

FP Average

Int Average

Wupwise

Swim

Mgrid

Equake

Art

Apsi

Applu

Ammp

Vpr

Vortex

Parser

Mcf

Gcc

Bzip2

0


Improving the Predictability • Miss Initiator Based History (MIBH)

Words usage history based on the offset of the word that initiated the miss (Miss Initiator Word).

• ORing Previous Two Histories (OPTH) Bitwise ORing past two histories.


Predictability of CCP CCP(30bits) – MIBH

CCP(28bits) – MIBH

100 90 80

Percentage

70 60 50 40 30 20 10

The predictability of PCP and MCP was about 68% and 75% respectively using both MIBH and OPTH.

Average

FP Average

Int Average

Wupwise

Swim

Mgrid

Equake

Art

Apsi

Applu

Ammp

Vpr

Vortex

Parser

Mcf

Gcc

Bzip2

0


CCP Implementation broadcast tag read/write port

valid-bit read/write port

valid-bit MIWO context

words usage history

=

MIWO -- Miss Initiator Word Offset

MIWO

=

=

words usage history


Experimental Setup • Applied noise prediction to L1 data cache • L1 Dcache of 16KB 4-way associative 32byte block size • Unified L2 cache of 512KB 8-way associative 64 byte block size • L1 Icache of 16KB direct mapped • Issue Queue - 96 Int/64 FP


RESULTS BASE

BASE

CCP

9

90

8

80

7

70

6

60

5

50

4

40

3

30

2

20

1

10

0

0

Utilization

Bandwidth BASE

CCP

BASE

CCP

CCP

9.8

1.2 1

9.6 0.8 0.6

9.4

0.4

9.2

0.2 0

9

IPC

Miss Rate


-25 FP

M cf

ps i

is e

im

ve ra ge A ve ra ge A ve ra ge

A

pw

S w

rt Eq u ak e M gr id

A

A

m m p A pp lu

A

V pr

ar se r V or te x

W u

In t

-15 G cc

zi p2

-5

P

B

Percentage Savings

Percentage Dynamic Energy Savings Energy Savings

65

55

45

35

25

15

5


Sub-blocking • Sub-blocking is used to • Reduce bandwidth requirement

• Limitations of sub-blocking • Increased miss rate

Can we use cache noise prediction as an alternative to sub-blocking?


Cache Noise Prediction vs Sub-blocking Sub-block

CCP

Sub-block

CCP

78

20 18

76

16 14

74

12

72

10

70

8

68

6 4

66

2

64

0

Utilization

Miss Rate Sub-block

Sub-block

CCP

7

35

6

30

5

25

4

20

3

15

2

10

1

5

0

0

Bandwidth

CCP

Energy Savings


Conclusion • Cache noise is highly predictable. • Proposed cache noise predictors.

• CCP achieves 75% prediction rate with correct prediction of 97% using a small 16 entry table.

• Prediction without impact on IPC and minimal impact (0.1%) on miss rate. • Compared to sub-blocking cache noise prediction based fetching improves • Miss rate by 97% and Utilization by 10%


QUESTIONS What is Cache Noise and Cache Utilization ? How can we use Code Context predictor to get a better performance ?


0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 010 1111 1111 0001 11 10 1111 1 0 1 Thank 0 1 1 1 1 1 1 0 1 1 0 1 1 0 0 1 1 0 1 1 0 1 11 1 10 0 0 1 11 11 1 1 0 1 1 1 0 1 You 1 1 0 1 011 0 1 11 1 0 1 1 0 1 1 01

1 01

0 1

0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 010 0101010 0101010 0101010 0101010 0101010 0101010 0101010

0101010 0101010 0101010 0101010

0101010 0101010 0101010

0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010 0101010

0101111110111111101010111111100111111111010101 0 0101010 0101010 0101010 0101010 0101010 0101010

0101010

increasing_cache_efficiency  

Prateek Pujara & Aneesh Aggarwal Group MemberGroupMember Raul A. DookhooRaulA.Dookhoo Abu Rahat ChowdhuryAbuRahatChowdhury • Cache is a...

Read more
Read more
Similar to
Popular now
Just for you