Distributed Systems

Page 68

C OV ER F E AT U RE can determine the actions of both completed transactions (which must be restored) and incomplete transactions (which must be undone). Typically, the system can achieve this by using a majority protocol (in which writes are applied to most of the copies, or quorum, and a quorum member serves the reads). In addition to the added costs incurred during normal execution, these measures can force a block during failures that involve network partitions, compromising availability, as the CAP theorem describes.3,4 Both the database and distributed systems literature offer many alternative proposals for the semantics of concurrent operations. Although the database notions of consistency apply to a distributed setting (even though they can be more expensive to enforce and might introduce availability tradeoffs), they were originally designed to

Systems must serve requests with low latency to users worldwide, throughput is high, and applications must be highly available, all at minimal ongoing operational costs.

allow interleaving of programs against a centralized database. Thus, the goal was to provide a simple programming abstraction to cope with concurrent executions, rather than to address the challenges of a distributed setting. These differences in setting have influenced how both communities have approached the problem, but the following two differences in perspective are worth emphasizing: ••

••

Unit of consistency. The database perspective, as exemplified by the notion of ACID transactions, focuses on changes to the entire database, spanning multiple objects (typically, records in a relational database). The distributed systems literature generally focuses on changes to a single object.5 Client- versus data-centric semantics. The database community’s approach to defining semantics is usually through formalizing the effect of concurrent accesses on the database; again, the definition of ACID transactions exemplifies this approach—the effect of interleaved execution on the database must be equivalent to that of some serial execution of the same transactions. But the distributed systems community often takes a client-centric approach, defining consistency levels in terms of a client that issues reads and writes sees (potentially) against a distributed data store in the presence of other concurrently executing clients.

The notions of consistency proposed in the distributed systems literature focus on a single object and are client-

44

COMPUTER

centric definitions. Strong consistency means that once a write request returns successfully to the client, all subsequent reads of the object—by any client—see the effect of the write, regardless of replication, failures, partitions, and so on. Observe that strong consistency does not ensure ACID transactions. For example, client A could read object X once, and then read it again later and see the effects of another client’s intervening write because this is not equivalent to a serial execution of the two clients’ programs. That said, implementing ACID transactions ensures strong consistency. The term weak consistency describes any alternative that does not guarantee strong consistency for changes to individual objects. A notable instance of weak consistency is eventual consistency, which is supported by Amazon’s Dynamo system,6 among others.1,5 Intuitively, if an object has multiple copies at different servers, updates are first applied to the local copy and then propagated out; the guarantee offered is that every update is eventually applied to all copies. However, there is no assurance of the order in which the system will apply the updates—in fact, it might apply the updates in different orders on different copies. Unless the nature of the updates makes the ordering immaterial—for example, commutative and associative updates—two copies of the same object could differ in ways that are hard for a programmer to identify. Researchers have proposed several versions of weak consistency,5 including •• ••

••

read-your-writes—a client always sees the effect of its own writes, monotonic read—a client that has read a particular value of an object will not see previous values on subsequent accesses, and monotonic write—all writes a client issues are applied serially in the issued order.

Each of these versions can help strengthen eventual consistency in terms of the guarantees offered to a client.

CLOUD DATA MANAGEMENT Web applications, a major motivator for the development of cloud systems, have grown rapidly in popularity and must be able to scale on demand. Systems must serve requests with low latency (tens of milliseconds) to users worldwide, throughput is high (tens of thousands of reads and writes per second), and applications must be highly available, all at minimal ongoing operational costs. Fortunately, full transactional support typically is not required, and separate systems perform complex analysis tasks—for example, map-reduce platforms such as Hadoop (http:// hadoop.apache.org). For many applications, requests are quite simple compared to traditional data management settings—the data


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.