Page 1

Contents Articles Circular buffer


Non-blocking algorithm


Producer-consumer problem


Thread pool pattern


References Article Sources and Contributors


Image Sources, Licenses and Contributors


Article Licenses License


Circular buffer


Circular buffer A circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams.

Uses An example that could possibly use an overwriting circular buffer is with multimedia. If the buffer is used as the bounded buffer in the producer-consumer problem then it is probably desired for the producer (e.g., an audio generator) to overwrite old data if the consumer (e.g., the sound card) is unable to momentarily keep up. Another example is the digital waveguide synthesis method which uses circular buffers to efficiently simulate the sound of vibrating strings or wind instruments.

A ring showing, conceptually, a circular buffer. This visually shows that the buffer has no real end and it can loop around the buffer. However, since memory is never physically created as a ring, a linear representation is generally used as is done below.

The "prized" attribute of a circular buffer is that it does not need to have its elements shuffled around when one is consumed. (If a non-circular buffer were used then it would be necessary to shift all elements when one is consumed.) In other words, the circular buffer is well suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer. Circular buffering makes a good implementation strategy for a Queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a Linked list approach may be preferred instead.

How it works A circular buffer first starts empty and of some predefined length. For example, this is a 7-element buffer:

Assume that a 1 is written into the middle of the buffer (exact starting location does not matter in a circular buffer):

Then assume that two more elements are added — 2 & 3 — which get appended after the 1:

If two elements are then removed from the buffer, the oldest values inside the buffer are removed. The two elements removed, in this case, are 1 & 2 leaving the buffer with just a 3:

If the buffer has 7 elements then it is completely full:

Circular buffer

A consequence of the circular buffer is that when it is full and a subsequent write is performed, then it starts overwriting the oldest data. In this case, two more elements — A & B — are added and they overwrite the 3 & 4:

Alternatively, the routines that manage the buffer could prevent overwriting the data and return an error or raise an exception. Whether or not data is overwritten is up to the semantics of the buffer routines or the application using the circular buffer. Finally, if two elements are now removed then what would be returned is not 3 & 4 but 5 & 6 because A & B overwrote the 3 & the 4 yielding the buffer with:

Circular buffer mechanics What is not shown example above is the mechanics of how the circular buffer is managed.

Start / End Pointers Generally, a circular buffer requires three pointers: • one to the actual buffer in memory • one to point to the start of valid data • one to point to the end of valid data Alternatively, a fixed-length buffer with two integers to keep track of indices can be used in languages that do not have pointers. Taking a couple of examples from above. (While there are numerous ways to label the pointers and exact semantics can vary, this is one way to do it.) This image shows a partially-full buffer:

This image shows a full buffer with two elements having been overwritten:

What to note about the second one is that after each element is overwritten then the start pointer is incremented as well.


Circular buffer


Difficulties Full / Empty Buffer Distinction A small disadvantage of relying on pointers or relative indices of the start and end of data is, that in the case the buffer is entirely full, both pointers point to the same element:

This is exactly the same situation as when the buffer is empty:

To solve this confusion there are a number of solutions: • • • •

Always keep one slot open. Use a fill count to distinguish the two cases. Use read and write counts to get the fill count from. Use absolute indices.

Always Keep One Slot Open This simple solution always keeps one slot unallocated. A full buffer has at most

slots. If both pointers

are pointing at the same location, the buffer is empty. If the end (write) pointer, plus one, equals the start (read) pointer, then the buffer is full. The advantages are: • Very simple and robust. • You need only the two pointers. The disadvantages are: • You can never use the entire buffer. • You might only be able to access one element at a time, since you won't easily know how many elements are next to each other in memory.. An example implementation in C: (Keep One Slot Open) #include <stdio.h> #include <string.h> #include <malloc.h> /*! * Circular Buffer Example (Keep one slot open) * Compile: gcc cbuf.c -o cbuf.exe */ /**< Buffer Size */ #define BUFFER_SIZE #define NUM_OF_ELEMS


/**< Circular Buffer Types */ typedef unsigned char INT8U; typedef INT8U KeyType;

Circular buffer typedef struct { INT8U writePointer; INT8U readPointer; INT8U size; KeyType keys[0]; } CircularBuffer;


/**< /**< /**< /**<

write pointer */ read pointer */ size of circular buffer */ Element of circular buffer */

/**< Init Circular Buffer */ CircularBuffer* CircularBufferInit(CircularBuffer** pQue, int size) { int sz = size*sizeof(KeyType)+sizeof(CircularBuffer); *pQue = (CircularBuffer*) malloc(sz); if(*pQue) { printf("Init CircularBuffer: keys[%d] (%d)\n", size, sz); (*pQue)->size=size; (*pQue)->writePointer = 0; (*pQue)->readPointer = 0; } return *pQue; } inline int CircularBufferIsFull(CircularBuffer* que) { return (((que->writePointer + 1) % que->size) == que->readPointer); } inline int CircularBufferIsEmpty(CircularBuffer* que) { return (que->readPointer == que->writePointer); } inline int CircularBufferEnque(CircularBuffer* que, KeyType k) { int isFull = CircularBufferIsFull(que); if(!isFull) { que->keys[que->writePointer] = k; que->writePointer++; que->writePointer %= que->size; } return isFull; } inline int CircularBufferDeque(CircularBuffer* que, KeyType* pK) {

Circular buffer


int isEmpty = CircularBufferIsEmpty(que); if(!isEmpty) { *pK = que->keys[que->readPointer]; que->readPointer++; que->readPointer %= que->size; } return(isEmpty); } inline int CircularBufferPrint(CircularBuffer* que) { int i=0; int isEmpty = CircularBufferIsEmpty(que); int isFull = CircularBufferIsFull(que); printf("\n==Q: w:%d r:%d f:%d e:%d\n", que->writePointer, que->readPointer, isFull, isEmpty); for(i=0; i< que->size; i++) { printf("%d ", que->keys[i]); } printf("\n"); return(isEmpty); } int main(int argc, char *argv[]) { CircularBuffer* que; KeyType a = 101; int isEmpty, i; CircularBufferInit(&que, BUFFER_SIZE); CircularBufferPrint(que); for(i=1; i<=3; i++) { a=10*i; printf("\n\n===\nTest: Insert %d-%d\n", a, a+NUM_OF_ELEMS-1); while(! CircularBufferEnque(que, a++)); //CircularBufferPrint(que); printf("\nRX%d:", i); a=0; isEmpty = CircularBufferDeque(que, &a); while (!isEmpty) { printf("%02d ", a);

Circular buffer

6 a=0; isEmpty = CircularBufferDeque(que, &a); } //CircularBufferPrint(que);

} free(que); return 0; } An example implementation in C: (Use all slots) (but is dangerous - an attempt to insert items on a full queue will yield success, but will, in fact, overwrite the queue) #include <stdio.h> #include <string.h> #include <malloc.h> /*! * Circular Buffer Example * Compile: gcc cbuf.c -o cbuf.exe */ /**< Buffer Size */ #define BUFFER_SIZE


/**< Circular Buffer Types */ typedef unsigned char INT8U; typedef INT8U KeyType; typedef struct { INT8U writePointer; /**< write pointer */ INT8U readPointer; /**< read pointer */ INT8U size; /**< size of circular buffer */ KeyType keys[0]; /**< Element of circular buffer */ } CircularBuffer; /**< Init Circular Buffer */ CircularBuffer* CircularBufferInit(CircularBuffer** pQue, int size) { int sz = size*sizeof(KeyType)+sizeof(CircularBuffer); *pQue = (CircularBuffer*) malloc(sz); if(*pQue) { printf("Init CircularBuffer: keys[%d] (%d)\n", size, sz); (*pQue)->size=size; (*pQue)->writePointer = 0; (*pQue)->readPointer = 0; }

Circular buffer

7 return *pQue;

} inline int CircularBufferIsFull(CircularBuffer* que) { return ((que->writePointer + 1) % que->size == que->readPointer); } inline int CircularBufferIsEmpty(CircularBuffer* que) { return (que->readPointer == que->writePointer); } inline int CircularBufferEnque(CircularBuffer* que, KeyType k) { int isFull = CircularBufferIsFull(que); que->keys[que->writePointer] = k; que->writePointer++; que->writePointer %= que->size; return isFull; } inline int CircularBufferDeque(CircularBuffer* que, KeyType* pK) { int isEmpty = CircularBufferIsEmpty(que); *pK = que->keys[que->readPointer]; que->readPointer++; que->readPointer %= que->size; return(isEmpty); } int main(int argc, char *argv[]) { CircularBuffer* que; KeyType a = 0; int isEmpty; CircularBufferInit(&que, BUFFER_SIZE); while(! CircularBufferEnque(que, a++)); do { isEmpty = CircularBufferDeque(que, &a); printf("%02d ", a); } while (!isEmpty); printf("\n"); free(que); return 0;

Circular buffer } Use a Fill Count The second simplest solution is to use a fill count. The fill count is implemented as an additional variable which keeps the number of readable items in the buffer. This variable has to be increased if the write (end) pointer is moved, and to be decreased if the read (start) pointer is moved. In the situation if both pointers pointing at the same location, you consider the fill count to distinguish if the buffer is empty or full. • Note: When using semaphores in a Producer-consumer model, the semaphores act as a fill count. The advantages are: • Simple. • Needs only one additional variable. The disadvantage is: • You need to keep track of a third variable. This can require complex logic, especially if you are working with different threads. Alternately, you can replace the second pointer with the fill count and generate the second pointer as required by incrementing the first pointer by the fill count, modulo buffer size. The advantages are: • Simple. • No additional variables. The disadvantage is: • Additional overhead when generating the write pointer. Read / Write Counts Another solution is to keep counts of the number of items written to and read from the circular buffer. Both counts are stored in signed integer variables with numerical limits larger than the number of items that can be stored and are allowed to wrap freely. The unsigned difference (write_count - read_count) always yields the number of items placed in the buffer and not yet retrieved. This can indicate that the buffer is empty, partially full, completely full (without waste of a storage location) or in a state of overrun. The advantage is: • The source and sink of data can implement independent policies for dealing with a full buffer and overrun while adhering to the rule that only the source of data modifies the write count and only the sink of data modifies the read count. This can result in elegant and robust circular buffer implementations even in multi-threaded environments. The disadvantage is: • You need two additional variables.


Circular buffer Absolute indices If indices are used instead of pointers, indices can store read/write counts instead of the offset from start of the buffer. This is similar to the above solution, except that there are no separate variables, and relative indices are obtained on the fly by division modulo the buffer's length. The advantage is: • No extra variables are needed. The disadvantages are: • Every access needs an additional modulo operation. • If counter wrap is possible, complex logic can be needed if the buffer's length is not a divisor of the counter's capacity. On binary computers, both of these disadvantages disappear if the buffer's length is a power of two—at the cost of a constraint on possible buffers lengths.

Multiple Read Pointers A little bit more complex are multiple read pointers on the same circular buffer. This is useful if you have n threads, which are reading from the same buffer, but one thread writing to the buffer.

Chunked Buffer Much more complex are different chunks of data in the same circular buffer. The writer is not only writing elements to the buffer, it also assigns these elements to chunks . The reader should not only be able to read from the buffer, it should also get informed about the chunk borders. Example: The writer is reading data from small files, writing them into the same circular buffer. The reader is reading the data, but needs to know when and which file is starting at a given position.

Optimization A circular-buffer implementation may be optimized by mapping the underlying buffer to two contiguous regions of virtual memory. (Naturally, the underlying buffer‘s length must then equal some multiple of the system’s page size.) Reading from and writing to the circular buffer may then be carried out with greater efficiency by means of direct memory access; those accesses which fall beyond the end of the first virtual-memory region will automatically wrap around to the beginning of the underlying buffer. When the read offset is advanced into the second virtual-memory region, both offsets—read and write—are decremented by the length of the underlying buffer.

Exemplary POSIX Implementation #include <sys/mman.h> #include <stdlib.h> #include <unistd.h> #define report_exceptional_condition() abort () struct ring_buffer { void *address; unsigned long count_bytes;


Circular buffer unsigned long write_offset_bytes; unsigned long read_offset_bytes; }; //Warning order should be at least 12 for Linux void ring_buffer_create (struct ring_buffer *buffer, unsigned long order) { char path[] = "/dev/shm/ring-buffer-XXXXXX"; int file_descriptor; void *address; int status; file_descriptor = mkstemp (path); if (file_descriptor < 0) report_exceptional_condition (); status = unlink (path); if (status) report_exceptional_condition (); buffer->count_bytes = 1UL << order; buffer->write_offset_bytes = 0; buffer->read_offset_bytes = 0; status = ftruncate (file_descriptor, buffer->count_bytes); if (status) report_exceptional_condition (); buffer->address = mmap (NULL, buffer->count_bytes << 1, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (buffer->address == MAP_FAILED) report_exceptional_condition (); address = mmap (buffer->address, buffer->count_bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, file_descriptor, 0); if (address != buffer->address) report_exceptional_condition (); address = mmap (buffer->address + buffer->count_bytes, buffer->count_bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, file_descriptor, 0); if (address != buffer->address + buffer->count_bytes)


Circular buffer report_exceptional_condition (); status = close (file_descriptor); if (status) report_exceptional_condition (); } void ring_buffer_free (struct ring_buffer *buffer) { int status; status = munmap (buffer->address, buffer->count_bytes << 1); if (status) report_exceptional_condition (); } void * ring_buffer_write_address (struct ring_buffer *buffer) { /*** void pointer arithmetic is a constraint violation. ***/ return buffer->address + buffer->write_offset_bytes; } void ring_buffer_write_advance (struct ring_buffer *buffer, unsigned long count_bytes) { buffer->write_offset_bytes += count_bytes; } void * ring_buffer_read_address (struct ring_buffer *buffer) { return buffer->address + buffer->read_offset_bytes; } void ring_buffer_read_advance (struct ring_buffer *buffer, unsigned long count_bytes) { buffer->read_offset_bytes += count_bytes; if (buffer->read_offset_bytes >= buffer->count_bytes) { buffer->read_offset_bytes -= buffer->count_bytes; buffer->write_offset_bytes -= buffer->count_bytes;


Circular buffer


} } unsigned long ring_buffer_count_bytes (struct ring_buffer *buffer) { return buffer->write_offset_bytes - buffer->read_offset_bytes; } unsigned long ring_buffer_count_free_bytes (struct ring_buffer *buffer) { return buffer->count_bytes - ring_buffer_count_bytes (buffer); } void ring_buffer_clear (struct ring_buffer *buffer) { buffer->write_offset_bytes = 0; buffer->read_offset_bytes = 0; } //--------------------------------------------------------------------------// template class Queue //--------------------------------------------------------------------------template <class T> class Queue { T *qbuf; int qsize; int head; int tail;

// buffer data // // index begin data // index stop data

inline void Free() { if (qbuf != 0) { delete []qbuf; qbuf= 0; } qsize= 1; head= tail= 0; } public: Queue() { qsize= 32; qbuf= new T[qsize];

Circular buffer

13 head= tail= 0; } Queue(const int size): qsize(1), qbuf(0), head(0), tail(0) { if ((size <= 0) || (size & (size - 1))) { throw "Value is not power of two"; } qsize= size; qbuf= new T[qsize]; head= tail= 0; } ~Queue() { Free(); } void Enqueue(const T &p) { if (IsFull()) { throw "Queue is full"; } qbuf[tail]= p; tail= (tail + 1) & (qsize - 1); } // Retrieve the item from the queue void Dequeue(T &p) { if (IsEmpty()) { throw "Queue is empty"; } p= qbuf[head]; head= (head + 1) & (qsize - 1); } // Get i-element with not delete void Peek(const int i, T &p) const { int j= 0;

Circular buffer

14 int k= head; while (k != tail) { if (j == i) break; j++; k= (k + 1) & (qsize - 1); } if (k == tail) throw "Out of range"; p= qbuf[k]; } // Size must by: 1, 2, 4, 8, 16, 32, 64, .. void Resize(const int size) { if ((size <= 0) || (size & (size - 1))) { throw "Value is not power of two"; } Free(); qsize= size; qbuf= new T[qsize]; head= tail= 0; } inline void Clear(void) { head= tail= 0; } inline int

GetCapacity(void) const { return (qsize - 1); }

// Count elements inline int GetBusy(void) const qsize : 0) + tail - head; }

{ return ((head > tail) ?

// true - if queue if empty inline bool IsEmpty(void) const { return (head == tail); } // true - if queue if full inline bool IsFull(void) const - 1)) == head ); }

{ return ( ((tail + 1) & (qsize

}; //--------------------------------------------------------------------------// Use: Queue <int> Q; Q.Enqueue(5); Q.Enqueue(100);

Circular buffer

15 Q.Enqueue(13); int len= Q.GetBusy(); int val; Q.Dequeue(val);


External links • • Boost: Templated Circular Buffer Container [1] •

References [1] http:/ / www. boost. org/ doc/ libs/ 1_39_0/ libs/ circular_buffer/ doc/ circular_buffer. html

Non-blocking algorithm In computer science, a non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress; wait-free if there is also guaranteed per-thread progress. Literature up to the turn of the 21st century used "non-blocking" synonymously with lock-free. However, since 2003,[1] the term has been weakened to only prevent progress-blocking interactions with a preemptive scheduler. In modern usage, therefore, an algorithm is non-blocking if the suspension of one or more threads will not stop the potential progress of the remaining threads. They are designed to avoid requiring a critical section. Often, these algorithms allow multiple processes to make progress on a problem without ever blocking each other. For some operations, these algorithms provide an alternative to locking mechanisms.

Motivation The traditional approach to multi-threaded programming is to use locks to synchronize access to shared resources. Synchronization primitives such as mutexes, semaphores, and critical sections are all mechanisms by which a programmer can ensure that certain sections of code do not execute concurrently if doing so would corrupt shared memory structures. If one thread attempts to acquire a lock that is already held by another thread, the thread will block until the lock is free. Blocking a thread is undesirable for many reasons. An obvious reason is that while the thread is blocked, it cannot accomplish anything. If the blocked thread is performing a high-priority or real-time task, it is highly undesirable to halt its progress. Other problems are less obvious. Certain interactions between locks can lead to error conditions such as deadlock, livelock, and priority inversion. Using locks also involves a trade-off between coarse-grained locking, which can significantly reduce opportunities for parallelism, and fine-grained locking, which requires more careful design, increases locking overhead and is more prone to bugs. Non-blocking algorithms are also safe for use in interrupt handlers: even though the preempted thread cannot be resumed, progress is still possible without it. In contrast, global data structures protected by mutual exclusion cannot safely be accessed in a handler, as the preempted thread may be the one holding the lock. Non-blocking algorithms have the potential to prevent priority inversion, as no thread is forced to wait for a suspended thread to complete. However, as livelock is still possible, threads have to wait when they encounter

Non-blocking algorithm contention; hence, priority inversion is still possible depending upon the contention management system used. Lock-free algorithms, below, avoid priority inversion.

Implementation With few exceptions, non-blocking algorithms use atomic read-modify-write primitives that the hardware must provide, the most notable of which is compare and swap (CAS). Critical sections are almost always implemented using standard interfaces over these primitives. Until recently, all non-blocking algorithms had to be written "natively" with the underlying primitives to achieve acceptable performance. However, the emerging field of software transactional memory promises standard abstractions for writing efficient non-blocking code. Much research has also been done in providing basic data structures such as stacks, queues, sets, and hash tables. These allow programs to easily exchange data between threads asynchronously. Additionally, some data structures are weak enough to be implemented without special atomic primitives. These exceptions include: â&#x20AC;˘ single-reader single-writer ring buffer FIFO â&#x20AC;˘ Read-copy-update with a single writer and any number of readers. (The readers are wait-free; the writer is usually lock-free, until it needs to reclaim memory).

Wait-freedom Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes. It was shown in the 1980s[2] that all algorithms can be implemented wait-free, and many transformations from serial code, called universal constructions, have been demonstrated. However, the resulting performance does not in general match even naĂŻve blocking designs. It has also been shown[3] that the widely available atomic conditional primitives, CAS and LL/SC, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. Wait-free algorithms are therefore rare, both in research and in practice.

Lock-freedom Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if it satisfies that when the program threads are run sufficiently long at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free. In general, a lock-free algorithm can run in four phases: completing one's own operation, assisting an obstructing operation, aborting an obstructing operation, and waiting. Completing one's own operation is complicated by the possibility of concurrent assistance and abortion, but is invariably the fastest path to completion. The decision about when to assist, abort or wait when an obstruction is met is the responsibility of a contention manager. This may be very simple (assist higher priority operations, abort lower priority ones), or may be more optimized to achieve better throughput, or lower the latency of prioritized operations. Correct concurrent assistance is typically the most complex part of a lock-free algorithm, and often very costly to execute: not only does the assisting thread slow down, but thanks to the mechanics of shared memory, the thread being assisted will be slowed, too, if it is still running.


Non-blocking algorithm

Obstruction-freedom Obstruction-freedom is possibly the weakest natural non-blocking progress guarantee. An algorithm is obstruction-free if at any point, a single thread executed in isolation (i.e., with all obstructing threads suspended) for a bounded number of steps will complete its operation. All lock-free algorithms are obstruction-free. Obstruction-freedom demands only that any partially completed operation can be aborted and the changes made rolled back. Dropping concurrent assistance can often result in much simpler algorithms that are easier to validate. Preventing the system from continually live-locking is the task of a contention manager. Obstruction-freedom is also called optimistic concurrency control. Some obstruction-free algorithms use a pair of "consistency markers" in the data structure. Processes reading the data structure first read one consistency marker, then read the relevant data into an internal buffer, then read the other marker, and then compare the markers. The data is consistent if the two markers are identical. Markers may be non-identical when the read is interrupted by another process updating the data structure. In such a case, the process discards the data in the internal buffer and tries again.

References [1] M. Herlihy, V. Luchangco and M. Moir. "Obstruction-Free Synchronization: Double-Ended Queues as an Example." (http:/ / www. cs. brown. edu/ people/ mph/ HerlihyLM03/ main. pdf) 23rd International Conference on Distributed Computing Systems, 2003, p.522. [2] Maurice P. Herlihy. "Impossibility and universality results for wait-free synchronization" (http:/ / portal. acm. org/ citation. cfm?coll=GUIDE& dl=GUIDE& id=62593) Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, 1988, pp. 276 - 290. [3] F. Fich, D. Hendler, N. Shavit. "On the inherent weakness of conditional synchronization primitives." (http:/ / www. cs. tau. ac. il/ ~afek/ Handler-conditionals. pdf) 23rd Annual ACM Symposium on Principles of Distributed Computing, 2004, pp. 80-87.

External links • Article " Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms (http://www." by Maged M. Michael and Michael L. Scott • Discussion " Communication between Threads, without blocking ( groups?group=comp.programming.threads&threadm=c2s1qn$mrj$" • Survey " Some Notes on Lock-Free and Wait-Free Algorithms ( lockfree/)" by Ross Bencina • java.util.concurrent.atomic – supports lock-free and thread-safe programming on single variables • System.Threading.Interlocked ( aspx) - Provides atomic operations for variables that are shared by multiple threads (.NET Framework) • The Jail-Ust Container Library ( • Practical lock-free data structures ( • Thesis " Efficient and Practical Non-Blocking Data Structures (" (1414 KB) by Per Håkan Sundell • WARPing - Wait-free techniques for Real-time Processing ( htm) • Non-blocking Synchronization: Algorithms and Performance Evaluation. ( ~tsigas/papers/Yi-Thesis.pdf) (1926 KB) by Yi Zhang • " Design and verification of lock-free parallel algorithms ( 2005/h.gao/)" by Hui Gao • " Asynchronous Data Sharing in Multiprocessor Real-Time Systems Using Process Consensus (http://citeseer." by Jing Chen and Alan Burns • Discussion " lock-free versus lock-based algorithms ( programming.threads&"


Non-blocking algorithm • Atomic Ptr Plus Project ( - collection of various lock-free synchronization primitives • AppCore: A Portable High-Performance Thread Synchronization Library ( ) - An Effective Marriage between Lock-Free and Lock-Based Algorithms • WaitFreeSynchronization ( and LockFreeSynchronization ( at the Portland Pattern Repository • Multiplatform library with atomic operations ( php4) • A simple C++ lock-free LIFO implementation ( • Concurrent Data Structures (libcds) ( - C++ library of various lock-free algorithms and GCs • 1024cores ( - a site devoted to lock-free, wait-free, obstruction-free and just scalable non-blocking synchronization algorithms and related topics

Producer-consumer problem In computer science, producer-consumer problem (also known as the bounded-buffer problem) is a classical example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer. The producer's job is to generate a piece of data, put it into the buffer and start again. At the same time the consumer is consuming the data (i.e., removing it from the buffer) one piece at a time. The problem is to make sure that the producer won't try to add data into the buffer if it's full and that the consumer won't try to remove data from an empty buffer. The solution for the producer is to either go to sleep or discard data if the buffer is full. The next time the consumer removes an item from the buffer, it notifies the producer who starts to fill the buffer again. In the same way, the consumer can go to sleep if it finds the buffer to be empty. The next time the producer puts data into the buffer, it wakes up the sleeping consumer. The solution can be reached by means of inter-process communication, typically using semaphores. An inadequate solution could result in a deadlock where both processes are waiting to be awakened. The problem can also be generalized to have multiple producers and consumers.

Implementations Inadequate implementation This solution has a race condition. To solve the problem, a careless programmer might come up with a solution shown below. In the solution two library routines are used, sleep and wakeup. When sleep is called, the caller is blocked until another process wakes it up by using the wakeup routine. itemCount is the number of items in the buffer. int itemCount; procedure producer() { while (true) { item = produceItem(); if (itemCount == BUFFER_SIZE) { sleep(); }


Producer-consumer problem

putItemIntoBuffer(item); itemCount = itemCount + 1; if (itemCount == 1) { wakeup(consumer); } } } procedure consumer() { while (true) { if (itemCount == 0) { sleep(); } item = removeItemFromBuffer(); itemCount = itemCount - 1; if (itemCount == BUFFER_SIZE - 1) { wakeup(producer); } consumeItem(item); } } The problem with this solution is that it contains a race condition that can lead into a deadlock. Consider the following scenario: 1. The consumer has just read the variable itemCount, noticed it's zero and is just about to move inside the if-block. 2. Just before calling sleep, the consumer is interrupted and the producer is resumed. 3. The producer creates an item, puts it into the buffer, and increases itemCount. 4. Because the buffer was empty prior to the last addition, the producer tries to wake up the consumer. 5. Unfortunately the consumer wasn't yet sleeping, and the wakeup call is lost. When the consumer resumes, it goes to sleep and will never be awakened again. This is because the consumer is only awakened by the producer when itemCount is equal to 1. 6. The producer will loop until the buffer is full, after which it will also go to sleep. Since both processes will sleep forever, we have run into a deadlock. This solution therefore is unsatisfactory. An alternative analysis is that if the programming language does not define the semantics of concurrent accesses to shared variables (in this case itemCount) without use of synchronization, then the solution is unsatisfactory for that reason, without needing to explicitly demonstrate a race condition.


Producer-consumer problem

Using semaphores Semaphores solve the problem of lost wakeup calls. In the solution below we use two semaphores, fillCount and emptyCount, to solve the problem. fillCount is the number of items to be read in the buffer, and emptyCount is the number of available spaces in the buffer where items could be written. fillCount is incremented and emptyCount decremented when a new item has been put into the buffer. If the producer tries to decrement emptyCount while its value is zero, the producer is put to sleep. The next time an item is consumed, emptyCount is incremented and the producer wakes up. The consumer works analogously. semaphore fillCount = 0; // items produced semaphore emptyCount = BUFFER_SIZE; // remaining space procedure producer() { while (true) { item = produceItem(); down(emptyCount); putItemIntoBuffer(item); up(fillCount); } } procedure consumer() { while (true) { down(fillCount); item = removeItemFromBuffer(); up(emptyCount); consumeItem(item); } } The solution above works fine when there is only one producer and consumer. Unfortunately, with multiple producers or consumers this solution contains a serious race condition that could result in two or more processes reading or writing into the same slot at the same time. To understand how this is possible, imagine how the procedure putItemIntoBuffer() can be implemented. It could contain two actions, one determining the next available slot and the other writing into it. If the procedure can be executed concurrently by multiple producers, then the following scenario is possible: 1. 2. 3. 4.

Two producers decrement emptyCount One of the producers determines the next empty slot in the buffer Second producer determines the next empty slot and gets the same result as the first producer Both producers write into the same slot

To overcome this problem, we need a way to make sure that only one producer is executing putItemIntoBuffer() at a time. In other words we need a way to execute a critical section with mutual exclusion. To accomplish this we use a binary semaphore called mutex. Since the value of a binary semaphore can be only either one or zero, only one process can be executing between down(mutex) and up(mutex). The solution for multiple producers and consumers is shown below. semaphore mutex = 1; semaphore fillCount = 0; semaphore emptyCount = BUFFER_SIZE;


Producer-consumer problem

procedure producer() { while (true) { item = produceItem(); down(emptyCount); down(mutex); putItemIntoBuffer(item); up(mutex); up(fillCount); } up(fillCount); //the consumer may not finish before the producer. } procedure consumer() { while (true) { down(fillCount); down(mutex); item = removeItemFromBuffer(); up(mutex); up(emptyCount); consumeItem(item); } } Notice that the order in which different semaphores are incremented or decremented is essential: changing the order might result in a deadlock.

Using monitors The following pseudo code shows a solution to the producer-consumer problem using monitors. Since mutual exclusion is implicit with monitors, no extra effort is necessary to protect the critical section. In other words, the solution shown below works with any number of producers and consumers without any modifications. It is also noteworthy that using monitors makes race conditions much less likely than when using semaphores. monitor ProducerConsumer { int itemCount condition full; condition empty; procedure add(item) { while (itemCount == BUFFER_SIZE) { wait(full); } putItemIntoBuffer(item); itemCount = itemCount + 1; if (itemCount == 1) { notify(empty);


Producer-consumer problem } } procedure remove() { while (itemCount == 0) { wait(empty); } item = removeItemFromBuffer(); itemCount = itemCount - 1; if (itemCount == BUFFER_SIZE - 1) { notify(full); } return item; } } procedure producer() { while (true) { item = produceItem() ProducerConsumer.add(item) } } procedure consumer() { while (true) { item = ProducerConsumer.remove() consumeItem() } } Note the use of while statements in the above code, both when testing if the buffer is full or empty. With multiple consumers, there is a race condition where one consumer gets notified that an item has been put into the buffer but another consumer is already waiting on the monitor so removes it from the buffer instead. If the while was instead an if, too many items might be put into the buffer or a remove might be attempted on an empty buffer.

Without semaphores or monitors The producer-consumer problem, particularly in the case of a single producer and single consumer, strongly relates to implementing a FIFO or a communication channel. The producer-consumer pattern can provide highly efficient data communication without relying on semaphores, mutexes, or monitors for data transfer. Use of those primitives can give performance issues as they are expensive to implement. Channels and Fifo's are popular just because they avoid the need for end-to-end atomic synchronization. A basic example coded in C is shown below. Note that: â&#x20AC;˘ Atomic read-modify-write access to shared variables is avoided: each of the two Count variables is updated by a single thread only. â&#x20AC;˘ This example does not put threads to sleep which might be OK depending on system context. The sched_yield is just to behave nice and could be removed. Thread libraries typically require semaphores or


Producer-consumer problem condition variables to control the sleep/wakeup of threads. In a multi-processor environment, thread sleep/wakeup would occur much less frequently than passing of data tokens, so avoiding atomic operations on data passing is beneficial. volatile unsigned int produceCount, consumeCount; TokenType buffer[BUFFER_SIZE]; void producer(void) { while (1) { while (produceCount - consumeCount == BUFFER_SIZE) sched_yield(); // buffer is full buffer[produceCount % BUFFER_SIZE] = produceToken(); produceCount += 1; } } void consumer(void) { while (1) { while (produceCount - consumeCount == 0) sched_yield(); // buffer is empty consumeToken( buffer[consumeCount % BUFFER_SIZE]); consumeCount += 1; } }

Example in Java import java.util.Stack; import java.util.concurrent.atomic.AtomicInteger; /** * 1 producer and 3 consumers producing/consuming 10 items * * @author pt * */ public class ProducerConsumer { Stack<Integer> items = new Stack<Integer>(); final static int NO_ITEMS = 10; public static void main(String args[]) { ProducerConsumer pc = new ProducerConsumer(); Thread t1 = new Thread( Producer()); Consumer consumer = Consumer(); Thread t2 = new Thread(consumer);


Producer-consumer problem Thread t3 = new Thread(consumer); Thread t4 = new Thread(consumer); t1.start(); try { Thread.sleep(100); } catch (InterruptedException e1) { e1.printStackTrace(); } t2.start(); t3.start(); t4.start(); try { t2.join(); t3.join(); t4.join(); } catch (InterruptedException e) { e.printStackTrace(); } } class Producer implements Runnable { public void produce(int i) { System.out.println("Producing " + i); items.push(new Integer(i)); } @Override public void run() { int i = 0; // produce 10 items while (i++ < NO_ITEMS) { synchronized (items) { produce(i); items.notifyAll(); } try { // sleep for some time, Thread.sleep(10); } catch (InterruptedException e) { } } } } class Consumer implements Runnable { //consumed counter to allow the thread to stop


Producer-consumer problem


AtomicInteger consumed = new AtomicInteger(); public void consume() { if (!items.isEmpty()) { System.out.println("Consuming " + items.pop()); consumed.incrementAndGet(); } } private boolean theEnd() { return consumed.get() >= NO_ITEMS; } @Override public void run() { while (!theEnd()) { synchronized (items) { while (items.isEmpty() && (!theEnd())) { try { items.wait(10); } catch (InterruptedException e) { Thread.interrupted(); } } consume(); } } } } }

Reference â&#x20AC;˘ Mark Grand Patterns in Java, Volume 1, A Catalog of Reusable Design Patterns Illustrated with UML [1]

References [1] http:/ / www. mindspring. com/ ~mgrand/ pattern_synopses. htm

Thread pool pattern


Thread pool pattern In computer programming, the thread pool pattern is where a number of threads are created to perform a number of tasks, which are usually organized in a queue. Typically, there are many more tasks than threads. As soon as a thread completes its task, it will request the next task from the queue until all tasks have been completed. The thread can then terminate, or sleep until there are new tasks available.

A sample thread pool (green boxes) with waiting tasks (blue) and completed tasks (yellow)

The number of threads used is a parameter that can be tuned to provide the best performance. Additionally, the number of threads can be dynamic based on the number of waiting tasks. For example, a web server can add threads if numerous web page requests come in and can remove threads when those requests taper down. The cost of having a larger thread pool is increased resource usage. The algorithm used to determine when to create or destroy threads will have an impact on the overall performance: • • • •

create too many threads, and resources are wasted and time also wasted creating any unused threads destroy too many threads and more time will be spent later creating them again creating threads too slowly might result in poor client performance (long wait times) destroying threads too slowly may starve other processes of resources

The algorithm chosen will depend on the problem and the expected usage patterns. If the number of tasks is very large, then creating a thread for each one may be impractical. Another advantage of using a thread pool over creating a new thread for each task is thread creation and destruction overhead is negated, which may result in better performance and better system stability. Creating and destroying a thread and its associated resources is an expensive process in terms of time. An excessive number of threads will also waste memory, and context-switching between the runnable threads also damages performance. For example, a socket connection to another machine—which might take thousands (or even millions) of cycles to drop and re-establish—can be avoided by associating it with a thread which lives over the course of more than one transaction. When implementing this pattern, the programmer should ensure thread-safety of the queue. Typically, a thread pool executes on a single computer. However, thread pools are conceptually related to server farms in which a master process distributes tasks to worker processes on different computers, in order to increase the overall throughput. Embarrassingly parallel problems are highly amenable to this approach.

Thread pool pattern

External links Article "Query by Slice, Parallel Execute, and Join: A Thread Pool Pattern in Java [1]" by Binildas C. A. Article "Thread pools and work queues [2]" by Brian Goetz Article "A Method of Worker Thread Pooling [3]" by Pradeep Kumar Sahu Article "Work Queue [4]" by Uri Twig Article "Windows Thread Pooling and Execution Chaining [5]" Article "Smart Thread Pool [6]" by Ami Bar Article "Programming the Thread Pool in the .NET Framework [7]" by David Carmona Article "The Thread Pool and Asynchronous Methods [8]" by Jon Skeet Article "Creating a Notifying Blocking Thread Pool in Java [9]" by Amir Kirsh Article "Practical Threaded Programming with Python: Thread Pools and Queues [10]" by Noah Gift Paper "Optimizing Thread-Pool Strategies for Real-Time CORBA [11]" by Irfan Pyarali, Marina Spivak, Douglas C. Schmidt and Ron Cytron • Conference Paper "Deferred cancellation. A behavioral pattern [12]" by Philipp Bachmann • • • • • • • • • • •

References [1] http:/ / today. java. net/ pub/ a/ today/ 2008/ 01/ 31/ query-by-slice-parallel-execute-join-thread-pool-pattern. html [2] http:/ / www. ibm. com/ developerworks/ java/ library/ j-jtp0730. html [3] http:/ / www. codeproject. com/ threads/ thread_pooling. asp [4] http:/ / codeproject. com/ threads/ work_queue. asp [5] http:/ / codeproject. com/ threads/ Joshthreadpool. asp [6] http:/ / www. codeproject. com/ KB/ threads/ smartthreadpool. aspx [7] http:/ / msdn. microsoft. com/ en-us/ library/ ms973903. aspx [8] http:/ / www. yoda. arachsys. com/ csharp/ threads/ threadpool. shtml [9] http:/ / today. java. net/ pub/ a/ today/ 2008/ 10/ 23/ creating-a-notifying-blocking-thread-pool-executor. html [10] http:/ / www. ibm. com/ developerworks/ aix/ library/ au-threadingpython/ [11] http:/ / www. cs. wustl. edu/ ~schmidt/ PDF/ OM-01. pdf [12] http:/ / doi. acm. org/ 10. 1145/ 1753196. 1753218


Article Sources and Contributors

Article Sources and Contributors Circular buffer Source:  Contributors: Amikake3, Andreas Kaufmann, Anonymi, Asimsalam, Astronouth7303, Bloodust, Calliopejen1, Cburnett, Chocolateboy, DrZoomEN, Eight40, Headbomb, Hoo man, Hosamaly, Jennavecia, Joeyadams, Julesd, KiloByte, Lucius Annaeus Seneca, Malcohol, Marokwitz, Mayukh iitbombay 2008, Mhi, Mike65535, MrOllie, Ohnoitsjamie, OlivierEM, OrlinKolev, Para15000, Parthashome, Paulitex, Pok148, Rhanekom, Rrelf, Serkan Kenar, Shabble, Shengliangsong, Shervinemami, SiegeLord, Silly rabbit, Strategist333, Sysabod, Tennenrishin, WolfWings, Ybungalobill, Zoxc, 醜い女, 91 anonymous edits Non-blocking algorithm  Source:  Contributors: Aldinuc, Andreas Kaufmann, Bdongol, Betacommand, Bovineone, Bryan Derksen, Chris Purcell, Chris the speller, David-Sarah Hopwood, DavidCary, Dvyukov, Elnyka, Ewlyahoocom, Gadfium, Helwr, IngerAlHaosului, Ivan Pozdeev, Jay, JonHarder, Joy, Keithathaide, Khizmax, M4gnum0n, Miym, Neilc, Ohiostandard, Parsecboy, Raanoo, Radagast83, Rattusdatorum, Runtime, Salix alba, TimBentley, Timlevin, Tjdw, Wapawlo, Wikip rhyre, Zigger, Zvar, 46 anonymous edits Producer-consumer problem  Source:  Contributors: Bcmpinc, C xong, Caiyu, Cburnett, ClarkSims, DarkFalls, Decrease789, Defender of torch, Dshroder, Dstary, E090, Ebraminio, Gregorian21, HJ Mitchell, HayesC, Hokie92, InvertedSaint, JamesBWatson, Jpaulm, Jtluoto, Jvanehv, Keilana, LokiClock, Loopy48, Mydarkside, Oliver H, Quackor, Raboof, Seifried, Skimat, Steinarhugi, Tbhotch, Tobias Bergemann, Xodarap00, Yacitus, 84 anonymous edits Thread pool pattern  Source:  Contributors: Andreas Kaufmann, Arkanosis, Asimahme tx1, Bezenek, Caitlinhalstead, CanisRufus, Cburnett, Charles Matthews, Check, Cybercobra, Denny-cn, Doug Bell, Ghettoblaster, Iceman42, Jdavidw13, JonHarder, LarryJeff, Leuk he, Psm, Ptrb, Red Thrush, RedWolf, RickBeton, Soumyasch, Stokestack, Swguy3, VadimIppolitov, Ysangkok, 27 anonymous edits


Image Sources, Licenses and Contributors

Image Sources, Licenses and Contributors Image:Circular buffer.svg Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - empty.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - XX1XXXX.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - XX123XX.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - XXXX3XX.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - 6789345.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - 6789AB5.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - X789ABX.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - XX123XX with pointers.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - 6789AB5 with pointers.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett Image:Circular buffer - 6789AB5 full.svg  Source:  License: GNU Free Documentation License  Contributors: en:User:Cburnett, modifications de:User:DrZoom Image:Circular buffer - 6789AB5 empty.svg  Source:  License: GNU Free Documentation License  Contributors: en:User:Cburnett, modifications de:User:DrZoom Image:Thread pool.svg  Source:  License: Creative Commons Attribution-ShareAlike 3.0 Unported  Contributors: en:User:Cburnett



License Creative Commons Attribution-Share Alike 3.0 Unported //


ring buffer  
ring buffer  

ring buffer