Jdk 1.6 JUCソース解析（25）-ConcerenthashMap

20412 ワード

もっと読む
Jdk 1.6 JUCソース解析（25）-ConcerenthashMap
作者：大飛

機能概要：

ConccurrenthashMapは、スレッドの安全なHashMapである。HashTableとCollections.synchronized Mapに対して、ConcerenthashMapはより良い性能と伸縮性を持っています。これはセグメントロックの策略を使って、内部データを複数のセグメントに分けて、各セグメントごとに単独でロックをかけて、HashMap全体のロックではなく、多くの不要なロックを減らすことができます。

ソース分析:

ConcerenthashMapはConccurrentMapインターフェースを実現しました。まず、このインターフェースを簡単に理解してください。

public interface ConcurrentMap extends Map {
    /**
     *   map        key，  map key   value；
     *         key，     key value。
     *          ，     ：
     *   if (!map.containsKey(key))
     *       return map.put(key, value);
     *   else
     *       return map.get(key);
     */
    V putIfAbsent(K key, V value);
    /**
     *   map      key，  map    value      value，
     *       key value。
     *         ，     ：
     *   if (map.containsKey(key) && map.get(key).equals(value)) {
     *       map.remove(key);
     *       return true;
     *   } else return false;
     */
    boolean remove(Object key, Object value);
    /**
     *   map      key，  map    value      oldValue，
     *      key   value   newValue。
     *         ，     ：
     *   if (map.containsKey(key) && map.get(key).equals(oldValue)) {
     *       map.put(key, newValue);
     *       return true;
     *   } else return false;
     */
    boolean replace(K key, V oldValue, V newValue);
    /**
     *   map        key， 
     *      key   value      value。
     *         ，     ：
     *   if (map.containsKey(key)) {
     *       return map.put(key, value);
     *   } else return null;
     */
    V replace(K key, V value);
}

ConccurrentMapはMapインターフェースを拡張し，上の4つの原子操作法を定義した。

次に、ConcerenthashMapの内部構造を見ます。

    /**
     * segment  ,    segment    hash table。
     */
    final Segment[] segments;

segmentの実現を重点的に見ましょう。まずデータ構造を見ます。

    static final class Segment extends ReentrantLock implements Serializable {
        private static final long serialVersionUID = 2249069246763182397L;
        /**
         *   segment(   )      。
         *                 count volatile        ，     。
         */
        transient volatile int count;
        /**
         *       ，              。
         *       segment     ，        modCount  
         *       。
         */
        transient int modCount;
        /**
         *               ，    ，          。
         *       ：capacity * loadFactor
         */
        transient int threshold;
        /**
         *         。
         */
        transient volatile HashEntry[] table;
        /**
         *         。
         * @serial
         */
        final float loadFactor;

segmentの構造方法を見てみます。

        Segment(int initialCapacity, float lf) {
            loadFactor = lf;
            setTable(HashEntry.newArray(initialCapacity));
        }
        
        void setTable(HashEntry[] newTable) {
            threshold = (int)(newTable.length * loadFactor);
            table = newTable;
        }
    static final class HashEntry {
        final K key;
        final int hash;
        volatile V value;
        final HashEntry next;
        HashEntry(K key, int hash, HashEntry next, V value) {
            this.key = key;
            this.hash = hash;
            this.next = next;
            this.value = value;
        }
	    @SuppressWarnings("unchecked")
	    static final  HashEntry[] newArray(int i) {
	        return new HashEntry[i];
	    }
    }

表示されます。segmentを作成するには初期容量とローディング係数が必要です。segment内部には初期容量サイズのHash Entry配列が作成されます。

ハッシュ・テーブルのデータ構造に詳しいなら、ハッシュ・テーブルの内部には、一般的に初期容量icとローディング係数lfがあり、ハッシュ・テーブルの要素数が（ic＊lf）に達すると、ハッシュ・テーブルがrehashをトリガすることが分かる。これは何の影響がありますか？ハッシュ・テーブルがチェーン・テーブルを使用してハッシュ・衝突を解決すると仮定すると、ロードファクタが大きすぎると、ハッシュ・テーブル内の各バケット内のチェーン・テーブルの平均長さが長すぎると、クエリ・性能に影響を与える。しかし、もし負荷因子が小さすぎると、メモリ空間がもったいないです。したがって、時間と空間のバランスであり、実際の状況に応じて適切な負荷係数を選択する必要があります。

最後にConcerenthashMapの構造方法を見ます。

    /* ---------------- Constants -------------- */
    //  segment hashTable  。
    static final int DEFAULT_INITIAL_CAPACITY = 16;
    //      。
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    //  table     ，    segment    。
    static final int DEFAULT_CONCURRENCY_LEVEL = 16;
    //table     。
    static final int MAXIMUM_CAPACITY = 1 << 30;
    //      segment    。
    static final int MAX_SEGMENTS = 1 << 16; 
    /**
     *  size containsValue   ，           。
     */
    static final int RETRIES_BEFORE_LOCK = 2;
    /* ---------------- Fields -------------- */
    /**
     *   segment     。  key hash code  ( segmentShift  )    segment  。
     */
    final int segmentMask;
    /**
     * segment      。
     */
    final int segmentShift;

    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (concurrencyLevel > MAX_SEGMENTS)
            concurrencyLevel = MAX_SEGMENTS; //concurrencyLevel       
        // Find power-of-two sizes best matching arguments
        int sshift = 0;
        int ssize = 1;
        while (ssize < concurrencyLevel) { 
            ++sshift;
            ssize <<= 1; //ssize    concurrencyLevel     2  。
        }
        /*
         *      concurrencyLevel 50，
         *   ssize  64，sshift  6，segmentMask   00000000 00000000 00000000 00111111*/
        segmentShift = 32 - sshift;
        segmentMask = ssize - 1; 
        this.segments = Segment.newArray(ssize);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        int c = initialCapacity / ssize;
        if (c * ssize < initialCapacity)
            ++c;
        int cap = 1;
        while (cap < c)
            cap <<= 1; //cap               segment        2  ...   ，
        for (int i = 0; i < this.segments.length; ++i)
            this.segments[i] = new Segment(cap, loadFactor); // segment      。
    }

    public ConcurrentHashMap(int initialCapacity, float loadFactor) {
        this(initialCapacity, loadFactor, DEFAULT_CONCURRENCY_LEVEL);
    }

    public ConcurrentHashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
    }

構造方法でいくつかの重要な数が計算されていることが見られます。segmentマスク、segmentシフト値、segment配列長、segment内部ハッシュテーブル容量は、注意後の2つとも2のべきものです。何を考えていますか？a&(b-1)ですね。

ここでまとめます。ConcerentMapの内部にsegmentの配列が含まれています。また、segment自体はハッシュ表であり、ロックを持参している。内部ハッシュ・テーブルは、チェーン・テーブルを使用して、ハッシュ・衝突を解決し、各配列要素は、シングル・チェーン・テーブルである。

今は挿入と取得から操作を入力し、ソースを理解します。まず挿入操作を見てください。

    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

注意してください。putの過程で、まずkeyのhashCodeに基づいて、再度hash値を計算します。次に、このhash値に基づいてsegmentを確定し、key-valueをこのsegmentに保存します。

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because ConcurrentHashMap uses power-of-two length hash tables,
     * that otherwise encounter collisions for hashCodes that do not
     * differ in lower or upper bits.
     */
    private static int hash(int h) {
        // Spread bits to regularize both segment and index locations,
        // using variant of single-word Wang/Jenkins hash.
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }
    /**
     * Returns the segment that should be used for key with given hash
     * @param hash the hash code for the key
     * @return the segment
     */
    final Segment segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }

まず、このhashアルゴリズムを見てください。これはkey自身のhashCodeで強化されたものに相当します。もう一度hashして、hash値がよりハッシュされます。このようにする理由は、ConcerenthashMapにおけるハッシュテーブルの長さは、2つのhashCodeの高位が異なるが、低位が同じであるなど、いくつかの衝突確率が増加するためであり、ハッシュテーブルの長さがモードを取る時には、これらの高位を無視して衝突を引き起こす。ここはWang/Jenkinsのハッシュアルゴリズムを採用した一種の変種であり、より多くの関連情報はgoogleのものである。
次にsegmentを確定するステップです。上のConcerenthashMapの構造方法では、shhiftとsegment Maskが関係しています。もしsshift=6なら、segment Maskの後ろには6ビットがあります。実はここはsh値で低shhift位の残りの高位を除いてsegmentの下付きを確定します。

segmentに位置しています。segmentのどのようなput元素を見続けていますか？

        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();//   
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash(); //          ，      ，    rehash。
                HashEntry[] tab = table;
                int index = hash & (tab.length - 1); //  hash    key           。
                HashEntry first = tab[index]; //          
                HashEntry e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next; //           ，        。
                V oldValue;
                if (e != null) {
                    //         ，    
                    oldValue = e.value;
                    if (!onlyIfAbsent) //        
                        e.value = value; //        
                }
                else {
                    //         。
                    oldValue = null;
                    ++modCount; //             ，  modCount  。
                    //                。
                    tab[index] = new HashEntry(key, hash, first, value); 
                    count = c; //         volatile 。
                }
                return oldValue; //     。
            } finally {
                unlock(); //   。
            }
        }

コードも分かりやすいです。rehashの場合に注意して、rehashの方法を見てください。

        void rehash() {
            HashEntry[] oldTable = table;
            int oldCapacity = oldTable.length;
            if (oldCapacity >= MAXIMUM_CAPACITY)
                return; //        。
            /*
             *                       。
             *         2  ，                   
             *      ，             ，       。
             *       ，         ，             
             *             (      )。
             */
            HashEntry[] newTable = HashEntry.newArray(oldCapacity<<1);
            threshold = (int)(newTable.length * loadFactor);
            int sizeMask = newTable.length - 1;
            for (int i = 0; i < oldCapacity ; i++) {
                HashEntry e = oldTable[i];
                if (e != null) {
                    HashEntry next = e.next;
                    int idx = e.hash & sizeMask;
                    if (next == null)
                        newTable[idx] = e; //         ，      table。
                    else {
                        //             table      ，       。
                        HashEntry lastRun = e;
                        int lastIdx = idx;
                        for (HashEntry last = next;
                             last != null;
                             last = last.next) {
                            int k = last.hash & sizeMask;
                            if (k != lastIdx) {
                                lastIdx = k;
                                lastRun = last;
                            }
                        }
                        newTable[lastIdx] = lastRun;
                        //       copy   。
                        for (HashEntry p = e; p != lastRun; p = p.next) {
                            int k = p.hash & sizeMask;
                            HashEntry n = newTable[k];
                            newTable[k] = new HashEntry(p.key, p.hash,
                                                             n, p.value);
                        }
                    }
                }
            }
            table = newTable;
        }

putの実現は見終わって、引き続きConcerenthashMapからgetの実現を見ます。

    public V get(Object key) {
        int hash = hash(key.hashCode());
        return segmentFor(hash).get(key, hash);
    }

フローはまずsh値を計算してからsegmentに確定してから、segmentのget方法を呼び出します。

        V get(Object key, int hash) {
            if (count != 0) { //      volatile 
                HashEntry e = getFirst(hash); //           。
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {
                        V v = e.value;
                        if (v != null)
                            return v; //      key，  value。
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;
        }
        HashEntry getFirst(int hash) {
            HashEntry[] tab = table;
            return tab[hash & (tab.length - 1)];
        }
        /**
         *    value。  value nul          。
         *        HashEntry         table        
         *       ，           ，      。
         */
        V readValueUnderLock(HashEntry e) {
            lock();
            try {
                return e.value;
            } finally {
                unlock();
            }
        }

putとgetプロセスを理解しました。他の方法もよく分かりました。

        boolean containsKey(Object key, int hash) {
            if (count != 0) { // read-volatile
                HashEntry e = getFirst(hash);
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key))
                        return true;
                    e = e.next;
                }
            }
            return false;
        }

        V remove(Object key, int hash, Object value) {
            lock();
            try {
                int c = count - 1;
                HashEntry[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry first = tab[index];
                HashEntry e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;
                V oldValue = null;
                if (e != null) {
                    V v = e.value;
                    if (value == null || value.equals(v)) {
                        oldValue = v;
                        // All entries following removed node can stay
                        // in list, but all preceding ones need to be
                        // cloned.
                        ++modCount;
                        HashEntry newFirst = e.next;
                        for (HashEntry p = first; p != e; p = p.next)
                            newFirst = new HashEntry(p.key, p.hash,
                                                          newFirst, p.value);
                        tab[index] = newFirst;
                        count = c; // write-volatile
                    }
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

すべての書き込み操作は最後にcountを書きます。そしてすべての読み取り操作の前でcountを読みます。countはvolatileで修飾されていますので、これはメモリ障壁（volatile書き込みと後ろのvolatile読みをやり直すことはできません。）を加えることに相当します。読み操作は最新の書き込みの変化が見られます。
以上の理解には偏差があります。
ソースのコメントをよく見てください。
All(synchronized)write operations shuld write to the"count"field after structrally change any bin.
つまり、countはbinの構造が変化してから書きます。
ここで訂正します。bin構造を変える書き込み操作は全部countを書いて、Hash Entryの視認性を保証できます。
古い値を上書きする場合はcountは書かれません。Hash Entryのvalue自体もvolatileなので、自分の視認性を保証できます。

上にRETRIES_もあります。BEFOREELOCK値は、この値がどのような役割を果たしているかを見てください。

    public int size() {
        final Segment[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count;
                mcsum += mc[i] = segments[i].modCount;
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count;
                    if (mc[i] != segments[i].modCount) {
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum)
                break;
        }
        if (check != sum) { // Resort to locking all segments
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE)
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }

上のsizeの過程は、すべてのsegmentのcountを積算して、もし途中でsegmentの要素数が変化したら、やり直します。再試行したらRETRIES_BEFOREELOCK回（デフォルトは2）は全部だめです。segmentをロックして、countを積算してからロックを解除します。containsValueでもこのようにして遊んでいます。コードは付けません。

他のコードも分かりやすくなりました。ここまで分析します。最後に、ConccurrenthashMapもKeyとValueの集合図を提供しています。それらはConcerenthashMapとデータを共有しています。

ConcerenthashMapのコード解析完了！

参照：Jdk 1.6 JUCソース解析（7）-locks-rentrantLock

NHibernate 3.0 rounding a decimal to 5 decimal placces

AutoValue