javaewah - A compressed alternative to the Java BitSet class - Google Project Hosting

3857 ワード

javaewah - A compressed alternative to the Java BitSet class - Google Project Hosting
The bit array data structure is implemented in Java as the BitSet class . Unfortunately, this fails to scale without compression.
JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast.
In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap as implemented in the BitSet class ). Unlike some alternatives, javaewah does not rely on a patented scheme.
It includes exhaustive unit testing. As of November 2011, Apache Maven is used to build and validate the releases. (Building the source code with javac remains trivial).
Usage:
                    // set bits at pos. 0, 2, 64, and 1<<30 to value "true" (others are false)
                   
EWAHCompressedBitmap ewahBitmap1 = EWAHCompressedBitmap.bitmapOf(0,2,64,1<<30);
                   
EWAHCompressedBitmap ewahBitmap2 = EWAHCompressedBitmap.bitmapOf(1,3,64,1<<30);
                   
EWAHCompressedBitmap xorbitmap = ewahBitmap1.xor(ewahBitmap2);
                   
int[] setbits = xorbitmap.toArray();// which bits are set?

See a more complete example below.
Alternatives:
You can compare JavaEWAH with competitive Java libraries. Results may vary depending on your application, but here are recent results we got:
Time required to compute logical or (union) between many bitmaps: (Lower values are better!)