Wednesday, August 6, 2008

NUMA and the JVM

Good blog entry from Jon Masamitsu about NUMA optimizations in Java 6 on Solaris. NUMA, essentially, and vastly simplified, is to access a region of memory that is physically closer to a processer on a multi-processor system. This way there's less latency when reading/writing to the general memory region. For Java, the optimizations are made primarily to the Eden (young generation) heap space, as well as assigning a thread to a particular CPU.

To enable the feature. it's a command-line param to the JVM at startup: -XX:+UseNUMA .

Java 6 threading article

An article on Java 6 threading optimizations recently appeared on infoq. It's a good article in two parts, but here I'm just going to capture some of the interesting notes about the different locking features now in the JVM (most of this entry is paraphrase - that is, notes to myself).

Escape analysis - determine the scope of all references in an app. If HotSpot can determine the refs are limited to local scope and none can esacpe, it can have the JIT apply runtime optimizations.

Lock elision - when refs to a lock are limited to local scope (for example creating an modifying a StringBuffer), no other thread will ever have access to object; hence it is never contended for. Then, you really don't need the lock anyway and can be elided/omitted.

Biased Locking - most locks are never accessed by more than one thread, and even when multiple threads do share data, access is rarely contended. Long story short, this makes subsequent lock acquisitions less expensive by holding onto lock until somebody else wants it. Java 6 does this by default now.

Lock Coarsening (or merging) - occurs when adjacent synchronized blocks may be merged into one (if same lock is used for all methods). For example. when calling a series StringBuffer append() operations. Locks are not coarsened inside of a loop because the lock will be held for (potentially) too long.

Thread suspending versus spinning - When a thread waits for a lock, it is usually suspended by the OS. This involved taking it off the stack, rescheduling, etc. However, most locks are held for very brief time periods (based on profiling), so if the second just waits a little bit without being suspended, it can probably acquire the lock it wants. To wait it just goes into a busy loop - known as spin locking. Was introduced in Java 1.4.2 with a default (fixed) spin of 10 iterations before suspending the thread.

Adaptive Spinning - Spin duration not fixed anymore, but policy based on previous spin attempts on same lock and state of lock owner. If spinning likely to succeed, will go for a longer iterations count (say, 100); else, will bail on spinning altogether and suspend.
Introduced in Java 6.