Distributed Shared Memory: Is False Sharing a Problem? Cristiana Amza, Alan L. Cox, Honghui Lu, Karthick Rajamani and Willy Zwaenepoel Rice University Sandhya Dwarkadas University of Rochester Since Li and Hudak's seminal work on distributed shared memory (DSM) in 1985, the ``Battle Against False Sharing'' has been a dominant, if not, the dominant theme of research in this area. Today, it is generally accepted that the ill effects of false sharing can be reduced, but not entirely eliminated, using a relaxed memory consistency model. Despite this, the conventional wisdom remains that the overhead of false sharing, as well as fine-grained true sharing, in page-based consistency protocols is the primary factor limiting the performance of DSM. Consequently, alternatives to page-based consistency, that aim to provide more efficient support for these sharing behaviors, are being investigated by several groups. In contrast, we argue that the greatest performance improvements will come from more efficient handling of coarse-grain sharing. A comparison to well-tuned message-passing code (on the same platform) is probably the fairest way to evaluate a DSM system, because it provides an upper bound on what the hardware can achieve if there isn't any memory consistency overhead. In such a comparison, we found that the largest factor accounting for the difference between the performance of the message-passing codes and the shared-memory codes was the aggregation of data transfer in the message passing codes. In other words, that the virtual memory page is often too small as the unit of data transfer mattered more than the fact it is sometimes too large as the unit of data consistency. Subsequently, we have found that on networks of moderate speed, 100Mbps or greater, and machines with small virtual memory pages, 4Kbytes, increasing the page size used by the TreadMarks DSM system, improves performance more often than not. In conclusion, we contend that o better data aggregation by the DSM system is the more critical problem on NOWs, especially since bandwidth is increasing faster than latency is falling. Using compiler support is a promising approach. o researchers should reexamine the benchmarks they use for evaluating DSM systems for NOWs. The most popular shared-memory benchmarks (with their suggested problem sizes) have execution times on the order of ten seconds or less with today's processors. The small problem sizes tend to exaggerate the importance of fine-grain and false sharing. Instead, we should focus on supporting extremely large applications, that require more memory than a single machine would typically provide.