Page 1

memcache@facebook

Marc Kwiatkowski memcache tech lead QCon 2010

Monday, April 12, 2010


How big is facebook?

Monday, April 12, 2010


400 million active users

Monday, April 12, 2010

400M Active


400 million active users Title

450M

400M

360M

270M

180M

90M

0M

2008

Monday, April 12, 2010

400M Active

2009

2010


Objects ▪

More than 60 million status updates posted each day ▪

More than 3 billion photos uploaded to the site each month ▪

8K/s

Average user has 130 friends on the site ▪

23/s

More than 5 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each week ▪

694/s

50 Billion friend graph edges

Average user clicks the Like button on 9 pieces of content each month

Monday, April 12, 2010


- Infrastructure ▪

Thousands of servers in several data centers in two regions ▪

Web servers

DB servers

Memcache Servers

Other services

Monday, April 12, 2010


The scale of memcache @ facebook ▪

Memcache Ops/s ▪

over 400M gets/sec

over 28M sets/sec

over 2T cached items

over 200Tbytes

Network IO ▪

peak rx 530Mpkts/s 60GB/s

peak tx 500Mpkts/s 120GB/s

Monday, April 12, 2010


A typical memcache server’s P.O.V. ▪

Network I/O ▪

rx 90Kpkts/s 9.7MB/s

tx 94Kpkts/s 19MB/s

Memcache OPS ▪

80K gets/s

2K sets/s

200M items

Monday, April 12, 2010

All rates are 1 day moving averages


Evolution of facebook’s architecture

Monday, April 12, 2010


Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone

on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend


Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone

on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend


Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone

on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend


Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone

on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend


Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone

on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend


Scaling Facebook: Interconnected data

Bob

Monday, April 12, 2010

•On Facebook, the data required to serve your home page or

any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.


Scaling Facebook: Interconnected data

Bob

Brian

Monday, April 12, 2010

•On Facebook, the data required to serve your

home page or any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.


Scaling Facebook: Interconnected data

Felicia

Bob

Brian

Monday, April 12, 2010

•On Facebook, the data required to serve your

home page or any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.


Memcache Rules of the Game ▪

GET object from memcache ▪

on miss, query database and SET object to memcache

Update database row and DELETE object in memcache

No derived objects in memcache ▪

Every memcache object maps to persisted data in database

Monday, April 12, 2010


Scaling memcache

Monday, April 12, 2010


Phatty Phatty Multiget

Monday, April 12, 2010


Phatty Phatty Multiget

Monday, April 12, 2010


Phatty Phatty Multiget (notes) ▪

PHP runtime is single threaded and synchronous

To get good performance for data-parallel operations like retrieving info for all friends, it’s necessary to dispatch memcache get requests in parallel

Initially we just used polling I/O in PHP.

Later we switched to true asynchronous I/O in a PHP C extension

In both case the result was reduced latency through parallelism.

Monday, April 12, 2010


Pools and Threads

PHP Client

Monday, April 12, 2010


sp:12346

sp:12347

cs:12345 cs:12346

cs:12347

sp:12345

PHP Client

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.


sp:12345

sp:12346

sp:12347

cs:12345

cs:12346

cs:12347

PHP Client

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.


PHP Client

Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.


Pools and Threads (notes) ▪

Privacy objects are small but have poor hit rates

User-profiles are large but have good hit rates

We achieve better overall caching by segregating different classes of objects into different pools of memcache servers

Memcache was originally a classic single-threaded unix daemon ▪

This meant we needed to run 4 instances with 1/4 the RAM on each memcache server

4X the number of connections to each both

4X the meta-data overhead

We needed a multi-threaded service

Monday, April 12, 2010


Connections and Congestion â–Ş

[animation]

Monday, April 12, 2010


Connections and Congestion (notes) ▪

As we added web-servers the connections to each memcache box grew. ▪

Each webserver ran 50-100 PHP processes

Each memcache box has 100K+ TCP connections

UDP could reduce the number of connections

As we added users and features, the number of keys per-multiget increased ▪

Popular people and groups

Platform and FBML

We began to see incast congestion on our ToR switches.

Monday, April 12, 2010


Serialization and Compression ▪

We noticed our short profiles weren’t so short ▪

1K PHP serialized object

fb-serialization

based on thrift wire format

3X faster

30% smaller

gzcompress serialized strings

Monday, April 12, 2010


Multiple Datacenters SC Web

SC Memcache

SC MySQL

Monday, April 12, 2010


Multiple Datacenters SC Web

SF Web

SC Memcache

SF Memcache

SC MySQL

Monday, April 12, 2010


Multiple Datacenters SC Web

SF Web

Memcache Proxy

Memcache Proxy

SC Memcache

SF Memcache

SC MySQL

Monday, April 12, 2010


▪Multiple Datacenters (notes) ▪

In the early days we had two data-centers ▪

The one we were about to turn off

The one we were about to turn on

Eventually we outgrew a single data-center ▪

Still only one master database tier

Rules of the game require that after an update we need to broadcast deletes to all tiers

The mcproxy era begins

Monday, April 12, 2010


Multiple Regions West Coast

East Coast

SC Web

VA Web

SC Memcache

VA Memcache

Memcache Proxy

SC MySQL

Monday, April 12, 2010

VA MySQL


Multiple Regions West Coast SC Web

SF Web

Memcache Proxy

Memcache Proxy

SC Memcache

SF Memcache

East Coast VA Web

VA Memcache

Memcache Proxy

SC MySQL

Monday, April 12, 2010

VA MySQL


Multiple Regions West Coast

East Coast

SC Web

SF Web

VA Web

Memcache Proxy

Memcache Proxy

SC Memcache

SF Memcache

VA Memcache

Memcache Proxy

SC MySQL

Monday, April 12, 2010

MySql replication

VA MySQL


Multiple Regions (notes)

Latency to east coast and European users was/is terrible.

So we deployed a slave DB tier in Ashburn VA ▪

Slave DB tracks syncs with master via MySQL binlog

This introduces a race condition

mcproxy to the rescue again ▪

Add memcache delete pramga to MySQL update and insert ops

Added thread to slave mysqld to dispatch deletes in east coast via mcpro

Monday, April 12, 2010


Replicated Keys Memcache

Memcache

Memcache

PHP Client

PHP Client

PHP Client

Monday, April 12, 2010


Replicated Keys Memcache

Memcache

Memcache

key

PHP Client Monday, April 12, 2010

PHP Client

PHP Client


Replicated Keys Memcache

Memcache

Memcache

key key key

PHP Client Monday, April 12, 2010

PHP Client

PHP Client


Replicated Keys Memcache

Memcache

Memcache

key key key

PHP Client Monday, April 12, 2010

PHP Client

PHP Client


Replicated Keys Memcache

Memcache

Memcache

key PHP Client Monday, April 12, 2010

PHP Client

PHP Client


Replicated Keys Memcache

Memcache

Memcache

key#0

key#1

key#3

key PHP Client Monday, April 12, 2010

PHP Client

PHP Client


Replicated Keys (notes) ▪

Viral groups and applications cause hot keys

More gets than a single memcache server can process

(Remember the rules of the game!)

That means more queries than a single DB server can process

That means that group or application is effectively down

Creating key aliases allows us to add server capacity. ▪

Hot keys are published to all web-servers

Each web-server picks an alias for gets ▪

get key:xxx => get key:xxx#N

Each web-server deletes all aliases

Monday, April 12, 2010


Memcache Rules of the Game ▪

New Rule ▪

If a key is hot, pick an alias and fetch that for reads

Delete all aliases on updates

Monday, April 12, 2010


Mirrored Pools Specialized Replica 1 Shard 1

Shard 2

Specialized Replica 2 Shard 1

Shard 2

General pool with wide fanout Shard 1

Shard 2

Shard 3

Shard n

...

Monday, April 12, 2010


Mirrored Pools (notes) ▪

As our memcache tier grows the ratio of keys/packet decreases ▪

100 keys/1 server = 1 packet

100 keys/100 server = 100 packets

More network traffic

More memcache server kernel interrupts per request

Confirmed Info - critical account meta-data ▪

Have you confirmed your account?

Are you a minor?

Pulled from large user-profile objects

Since we just need a few bytes of data for many users

Monday, April 12, 2010


Hot Misses â–Ş

[animation]

Monday, April 12, 2010


Hot Misses (notes) ▪

Remember the rules of the game ▪

update and delete

miss, query, and set

When the object is very, very popular, that query rate can kill a database server

We need flow control!

Monday, April 12, 2010


Memcache Rules of the Game ▪

For hot keys, on miss grab a mutex before issuing db query ▪

memcache-add a per-object mutex ▪

key:xxx => key:xxx#mutex

If add succeeds do the query

If add fails (because mutex already exists) back-off and try again

After set delete mutex

Monday, April 12, 2010


Hot Deletes â–Ş

[hot groups graphics]

Monday, April 12, 2010


Hot Deletes (notes) ▪

We’re not out of the woods yet

Cache mutex doesn’t work for frequently updated objects ▪

like membership lists and walls for viral groups and applications.

Each process that acquires a mutex finds that the object has been deleted again ▪

...and again

...and again

Monday, April 12, 2010


Rules of the Game: Caching Intent ▪

Each memcache server is in the perfect position to detect and mitigate contention ▪

Record misses

Record deletes

Serve stale data

Serve lease-ids

Don’t allow updates without a valid lease id

Monday, April 12, 2010


Next Steps

Monday, April 12, 2010


Shaping Memcache Traffic ▪

mcproxy as router ▪

admission control

tunneling inter-datacenter traffic

Monday, April 12, 2010


Cache Hierarchies â–Ş

Warming up Cold Clusters

â–Ş

Proxies for Cacheless Clusters

Monday, April 12, 2010


Big Low Latency Clusters ▪

Bigger Clusters are Better

Low Latency is Better

L2.5

UDP

Proxy Facebook Architecture

Monday, April 12, 2010


Worse IS better ▪

Richard Gabriel’s famous essay contrasted ▪

ITS and Unix

LISP and C

MIT and New Jersey

Monday, April 12, 2010

http://www.jwz.org/doc/worse-is-better.html


Why Memcache Works ▪

Uniform, low latency with partial results is a better user experience

memcache provides a few robust primitives

key-to-server mapping

parallel I/O

flow-control

traffic shaping

that allow ad hoc solutions to a wide range of scaling issues

Monday, April 12, 2010

We started with simple, obvious improvements. As we grew we deployed less obvious improvements... But they’ve remained pretty simple


(c) 2010 Facebook, Inc. or its licensors. Â "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Monday, April 12, 2010

flex cookbook  

something

Read more
Read more
Similar to
Popular now
Just for you