HashMapのboolean containsKey(Object key)メソッドの時間的複雑さはなぜO(1)なのか.

6742 ワード

最近LeetCodeを塗り始めましたが、アルゴリズムの最初の問題はTwo Sumで、簡単に見えますよね?
Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice. Example: Given nums = [2, 7, 11, 15], target = 9, Because nums[0] + nums[1] = 2 + 7 = 9, return [0, 1].
そこで私の解法も簡単で暴力的でした

public class Solution {
    public int[] twoSum(int[] nums, int target) {
        int[] result = new int[2];
        for (int i = 0; i < nums.length; i++) {
            int a = nums[i];
            for (int j = i + 1; j < nums.length; j++) {
                int b = nums[j];
                if (a + b == target) {
                    result[0] = i;
                    result[1] = j;
                }
            }
        }
        return result;
    }
}

この方法は2層のサイクルをネストし,時間複雑度はそれぞれO(n)であり,従って全時間複雑度はO(n^2)である.Editorial Solutionを開くと,この時間的複雑度はO(n)であり,空間的複雑度はO(n)に増加する解法が見出された.

public int[] twoSum(int[] nums, int target) {
    Map map = new HashMap<>();
    for (int i = 0; i < nums.length; i++) {
        map.put(nums[i], i);
    }
    for (int i = 0; i < nums.length; i++) {
        int complement = target - nums[i];
        if (map.containsKey(complement) && map.get(complement) != i) {
            return new int[] { i, map.get(complement) };
        }
    }
    throw new IllegalArgumentException("No two sum solution");
}

あれ?時間の複雑さはなぜO(n)なのか.HashMapソースを見に来る時だ.まず検索でcontainsKey(Object key)メソッドを見つけます.

    /**
     * Returns true if this map contains a mapping for the
     * specified key.
     *
     * @param   key   The key whose presence in this map is to be tested
     * @return true if this map contains a mapping for the specified
     * key.
     */
    public boolean containsKey(Object key) {
        return getNode(hash(key), key) != null;
    }

containsKeyメソッドはgetNode(hash(key), key)メソッドを呼び出し、結果がnullでなければtrueを返し、そうでなければfalseを返します.getNode(hash(key), key)の方法が問題の鍵です

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node getNode(int hash, Object key) {
        Node[] tab; Node first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

このコードは一見複雑で、原理を理解する鍵は

戻り値Nodeとは何ですか

コード3行目tab = tableのtableは何ですか

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node implements Map.Entry {
        final int hash;
        final K key;
        V value;
        Node next;

        Node(int hash, K key, V value, Node next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry,?> e = (Map.Entry,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

Nodeは、KeyとValueが格納されたチェーンテーブルノードであることがわかる.

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node[] table;

tableはNode配列であり、ここでのNodeは、あるhash値チェーンテーブルのヘッダノードである(異なるkeyのhash値は重複し、後続のノードに格納される可能性がある).配列tableの長さは2の倍数であることに留意されたい.getNode(hash(key), key)メソッドに戻ります.まずコードの5行目を見てください.

first = tab[(n - 1) & hash]

そう,配列tableにおけるkey対応ヘッダノードの格納位置,すなわち,(n - 1) & hashというビット演算の結果を示す大きなべき乗を見出した.nがtableの長さ(必ず2の倍数)である場合、n-1はtableの下付きの取値範囲であり、バイナリで表すと1111...、合計log(n)個1.したがって(n - 1) & hashは実際にhashバイナリ形式の後nビット数をとり,配列tableの下付き文字にちょうど対応できる.配列が下付きでNodeにアクセスする時間複雑度はO(1)であり、Nodeがフィールドにアクセスする時間複雑度もO(1)であり、ヘッダノードの後にノードがなければ時間複雑度はO(1)である.ヘッダノードの後にノードが存在する場合,以下のコードでこれらのノードを遍歴し,時間的複雑度はO(1)より大きい.

            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }

同じビット演算値の生成をできるだけ避ける方法については,hashアルゴリズムのことであり,本稿では論じない.実際には良いhashアルゴリズムは平均時間の複雑さをO(1)にすることができる.
これにより,containsKey(Object key)法の時間的複雑さの問題はほぼ解決された.
この文章を書き終わったら、LeetCodeの一番簡単なTwo Sumもこんなにたくさんのことを学ぶことができたので、よくLeetCodeを磨きましょう.

参考資料

https://leetcode.com/articles/two-sum/

JAVA例:フィボナッチ数列

Webリアルタイム通信