Java ™ HotSpot Virtual Machine Performance Enhancements
NUMA Collector Enhancements
The Parallel Scavenger garbage collector has been extended to take advantage of machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.
In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from"and "to"survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.
The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1.
The NUMA-aware allocator can be turned on with the
The
Note: There was a known bug in the Linux Kernel that may cause the JVM to crash when being run with
NUMA Performance Metrics
When evaluated against the SPEC JBB 2005 benchmark on an 8-chip Opteron machine, NUMA-aware systems showed the following performance increases: 32 bit – About 30 percent increase in performance with NUMA-aware allocator 64 bit – About 40 percent increase in performance with NUMA-aware allocator
-XX:+UseNUMA
Enables a JVM heap space allocation policy that helps overcome the time it takes to fetch data from memory by leveraging processor to memory node relationships by allocating objects in a memory node local to a processor on NUMA systems.
Introduced in Java 6 Update 2. As of this writing, it is available with the throughput collector only, -XX:+UseParallelOldGC and -XX:+UseParallelGC.On Oracle Solaris, with multiple JVM deployments that span more than one processor/memory node should also set lgrp_mem_pset_aware=1 in/etc/system.
Linux additionally requires use of the numacntl command. Use numacntl –interleave for single JVM deployments. For multiple JVM deployments where JVMs that span more than one processor/memory node, use numacntl –cpubind= –memnode=.
Windows under AMD additionally requires enabling node-interleaving in the BIOS for single JVM deployments. All Windows multiple JVM deployments, where JVMs that span more than one processor/memory node should use processor affinity, use the SET AFFINITY [mask] command. Useful in JVM deployments that span processor/memory nodes on a NUMA system.
-XX:+UseNUMA should not be used in JVM deployments where the JVM does not span processor/memory nodes.
http://www.techpaste.com/2012/02/java-command-line-options-jvm-performance-improvement/
全文を読むと...
The Parallel Scavenger garbage collector has been extended to take advantage of machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.
In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from"and "to"survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.
The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1.
The NUMA-aware allocator can be turned on with the
-XX:+UseNUMA
flag in conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC
option. The
-XX:+UseNUMA
flag was added in Java SE 6u2. Note: There was a known bug in the Linux Kernel that may cause the JVM to crash when being run with
-XX:UseNUMA
. The bug was fixed in 2012, so this should not affect the latest versions of the Linux Kernel. To see if your Kernel has this bug, you can run the native reproducer. NUMA Performance Metrics
When evaluated against the SPEC JBB 2005 benchmark on an 8-chip Opteron machine, NUMA-aware systems showed the following performance increases:
-XX:+UseNUMA
Enables a JVM heap space allocation policy that helps overcome the time it takes to fetch data from memory by leveraging processor to memory node relationships by allocating objects in a memory node local to a processor on NUMA systems.
Introduced in Java 6 Update 2. As of this writing, it is available with the throughput collector only, -XX:+UseParallelOldGC and -XX:+UseParallelGC.On Oracle Solaris, with multiple JVM deployments that span more than one processor/memory node should also set lgrp_mem_pset_aware=1 in/etc/system.
Linux additionally requires use of the numacntl command. Use numacntl –interleave for single JVM deployments. For multiple JVM deployments where JVMs that span more than one processor/memory node, use numacntl –cpubind=
Windows under AMD additionally requires enabling node-interleaving in the BIOS for single JVM deployments. All Windows multiple JVM deployments, where JVMs that span more than one processor/memory node should use processor affinity, use the SET AFFINITY [mask] command. Useful in JVM deployments that span processor/memory nodes on a NUMA system.
-XX:+UseNUMA should not be used in JVM deployments where the JVM does not span processor/memory nodes.
http://www.techpaste.com/2012/02/java-command-line-options-jvm-performance-improvement/
全文を読むと...