Skip to main content
Department of Information Technology

vLock: Lock Virtualization Mechanism for Exploiting Fine-grained Parallelism in Graph Traversal Algorithms


For graph traversal applications, fine synchronization is required to exploit massive fine parallelism. However, in the conventional solution using fine-grained locks, locks themselves suffer huge memory cost as well as poor locality for inherent irregular access to vertices. In this paper, we propose a novel fine lock solution--vLock. The key idea is lock virtualization that maps the huge logical lock space to a much smaller physical lock space that can reside in cache during the program life cycle. Lock virtualization effectively reduces lock incurred overheads of both memory cost and cache misses. It also achieves high usability in legacy graph programs, as from users's view vLock is the same as lock methods in Pthreads. We implement vLock as a Pthreads-like library and evaluate its performance in four classical graph algorithms (BFS,SSSP,CC,PageRank). Experiments on a SMP system with two Intel Westemere six-core processors show that, compared to conventional fine locks, vLock significantly reduces locks' cache misses and has competitive performance. Particularly, PageRank with vLock has about 20% performance improvement.

Link to the paper


Question 1:
Why is there a low amount of lock conflicts, using any kind of fine grained locking for concurrent graph traversal?
Question 2:
Why is the performance of vLock-Add so bad for PageRank?


The typical graphs that are used as input in parallel graph traversal algorithms are scale-free. This means that there are relatively few nodes with neighbors that exceed the average number, while all the others have a significantly lower number of neighbors. This property results in few conflicts as most nodes are reachable from just a few "hubs", so it is rare for different exploration threads to be in the same region.

In order to obtain a good hash value based on the memory area that a vertex is using, a special hash function is used instead of the plain modulO operation. This extra computation is what makes performance of vLock-Add poor.


As is written in the paper, graphs in real world applications are usually large ans sparse. If there are many nodes, relatively few threads and each thread locks relatively few nodes at the same time, it seems unlikely that a few thread tries to lock the same node at the same time. Of course it also depends on the application.

It is because the hash function is too expensive.


Q1: The computation time is small, resulting in short critical sections, decreasing the probability of conflicts. At the same time there is a large amount of nodes.

Q2: "The lost performance of vLock-Add is due to its high cost of hash computation [...]".


Graphs are usually large and scale-free. The critical sections are small because typically only trivial operations are carried out on vertices and nodes. All these properties decrease the probability of lock conflicts.

The cost of hash computation is much higher when computing hashes by vertex address (vLock-add) than when computing hashes by vertex id (vLock-Vtx). In PageRank, lock requests are more frequent than in other algorithms, so the more expensive hash computation outweighs the benefits of lock virtualization.


Q1: Many nodes, few threads.

Q2: The hash function is expensive.


Q1: Fine-grained locks associate with each vertex in a graph. Critical section, which can cause lock conflicts, are operations on vertices and edges. With too many vertices in sparse graphs and too few threads, conflicts can seldom happen.

Q2: The hash computation cost in vLock-Add is 70-80 times higher than that in vLock-Vtx, and the PageRank algorithm requires locks much more frequent than others.




Updated  2013-06-05 01:11:14 by Yunyun Zhu.