Skip to main content
Department of Information Technology

Split-Ordered Lists: Lock-Free Extensible Hash Tables (J. ACM 53, 3 (May 2006), 379-405)

Ori Shalev (Tel Aviv University) and Nir Shavit (Tel Aviv University and Sun Microsystems Laboratories)

Description

Abstract:
We present the first lock-free implementation of an extensible hash table running on current architectures. Our algorithm provides concurrent insert, delete, and find operations with an expected O(1) cost. It consists of very simple code, easily implementable using only load, store, and compare-and-swap operations. The new mathematical structure at the core of our algorithm is recursive split-ordering, a way of ordering elements in a linked list so that they can be repeatedly "split" using a single compare-and-swap operation. Metaphorically speaking, our algorithm differs from prior known algorithms in that extensibility is derived by "moving the buckets among the items" rather than "the items among the buckets." Though lock-free algorithms are expected to work best in multiprogrammed environments, empirical tests we conducted on a large shared memory multiprocessor show that even in non-multiprogrammed environments, the new algorithm performs as well as the most efficient known lock-based resizable hash-table algorithm, and in high load cases it significantly outperforms it.

Link to the paper

If you want another description of this algorithm see "The Art of Multiprocessor Programming" by Maurice Herlihy and Nir Shavit, pages 309 to 316.

Questions

Question 1:
What assumption(s) has/have to be made so that operations on split-ordered lists work in constant time?

Question 2:
Why is it possible to extend split-ordered lists, but not possible to shrink them back to their original size after use?

Bonus Question:
Did you read the correctness proof section?

Magnus

Q1:
Table sizes must be a power of two and the keys must have a binary representation in order for the bit-reversal to work out.


Q2:
In order to grow a split-ordered list, dummy nodes are inserted, splitting the older bucket's chains of elements. These dummy nodes are never deleted and without doing so we could not shrink the list.


Bonus:
No...

Jonatan

Q1: 1. log(n) operations performed on a machine with finite memory can be considered constant.
2. The number of retries (loops) in the underlying lock-free linked list will be bounded by some constant.


Q2:

Martin

Q1: The hash table is actually hierarchical, but assuming a 64-bit machine it is enough with a 4-level "hirarchy" to exhaust the memory, so the depth of this hierarchy is seen as bounded, and the access time as constant.


Q2: They do not provide any way to shrink them, but I can't find any proof that it would be impossible.

Pan

Q1: The hash function provides a uniform distribution.


Q2: When the table extends, the elements originally in the same bucket may be separated to different buckets. In order to shrink the hashtable, we'll need to merge some of the buckets, which is not trivial.

Kjell

Q1:

1. The hash values of the keys need to be uniformly distributed. The worst case is that all elements are placed in the same bucket and all the other buckets are empty. The complexity of the operations will then be proportional to the size of the set instead of constant.

2. The operations are never starved by operations performed by other threads (the number of retries is constant).


Q2: It might be possible to extend the described algorithm to support decreasing the size of the bucket array and removing sentinel nodes. The price that one would need to pay for such an extension is that the algorithm becomes more complicated and less performant for cases when decreasing the size of the bucket array is not needed.

Afshin

Q1. Making the standard assumption of a hash function with a uniform distribution, then under any scheduling adversary the new algorithm provides a lock-free extensible hash table with O(1) average cost per operation. The complexity improves to expected constant time if we assume a constant extendibility rate, meaning that the table is never extended (doubled in size) a non-constant number of times while a thread is delayed by the scheduler.


Q2. As this is an extensible hash table, "resizing" means extending the table. It is interesting to note, as argued elsewhere [Hsu and Yang 1986; Lea (e-mail communication 2005)], that many of the standard concurrent applications using hash tables require tables to only increase in size.
According to constant extendibility rate assumption, if shrinking is allowed there would be no constant number of steps or constant time for any single operation.

Andreas

Q1:
The hash function has a uniform distribution.


Q2:
short: It's not implemented.
long: It is not really necessary. The only thing the user of the datastructure would get is:
+ some memory (dummy nodes)
+ lower runtime (shrinking)
+ slightly larger code size

Stavros

Q1:
The hash function should distribute the keys uniformly over the buckets, and there should not be arbitrarily many splits of buckets due to insertions.


Q2:
It can be done, but not really necessary. To have any practical impact, shrinking should delete the dummy nodes. The only benefit of such an action would be the reclaimed memory.

Sofia

Q1:
The hash function should have a uniform distribution, otherwise all elements will end up in too few (or the same) bucket/s.


Q2:
Split-ordered lists are grown by splitting buckets. Elements start in the same bucket, and as the list grows, they gradually move to different buckets. Shrinking a list would require merging buckets, which is not supported by the current algorithm.

Updated  2013-06-03 17:08:02 by Sofia Cassel.