なぜJDKでStringクラスのindexofはKMPやBoyer-Mooreなどの時間的複雑度の低いアルゴリズムを使用しないのか

2569 ワード

今日leetcodeで問題をタッチして、ちょうど文字列を探す問題をタッチして、以前知っていたKMPやBoyer-Mooreなどのアルゴリズムを思い出しました.これら2つとそれに類似したアルゴリズムの時間的複雑さはいずれもO(n)に近い.
后で自分でまたJDKのString类の中のindexof方法の実现を见て、とても奇怪なことを発见して、ただ暴力の解読法を使って、つまり最も原始的な実现で、时间の复雑さもO(n*m)に着きました.
Stringクラスのindexof(String s)メソッドでメソッドを呼び出します.

/**
     * Code shared by String and StringBuffer to do searches. The
     * source is the character array being searched, and the target
     * is the string being searched for.
     *
     * @param   source       the characters being searched.
     * @param   sourceOffset offset of the source string.
     * @param   sourceCount  count of the source string.
     * @param   target       the characters being searched for.
     * @param   targetOffset offset of the target string.
     * @param   targetCount  count of the target string.
     * @param   fromIndex    the index to begin searching from.
     */
    static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
        if (fromIndex >= sourceCount) {
            return (targetCount == 0 ? sourceCount : -1);
        }
        if (fromIndex < 0) {
            fromIndex = 0;
        }
        if (targetCount == 0) {
            return fromIndex;
        }

        char first = target[targetOffset];
        int max = sourceOffset + (sourceCount - targetCount);

        for (int i = sourceOffset + fromIndex; i <= max; i++) {
            /* Look for first character. */
            if (source[i] != first) {
                while (++i <= max && source[i] != first);
            }

            /* Found first character, now look at the rest of v2 */
            if (i <= max) {
                int j = i + 1;
                int end = j + targetCount - 1;
                for (int k = targetOffset + 1; j < end && source[j]
                        == target[k]; j++, k++);

                if (j == end) {
                    /* Found whole string. */
                    return i - sourceOffset;
                }
            }
        }
        return -1;
    }

グーグルはStackOverflowをひっくり返しました
元のJDKの作成者たちは、多くの場合、文字列が長くなく、元の実装を使用するとコストが低くなる可能性があると考えていた.KMPとBoyer‐Mooreアルゴリズムは,補助配列を得るために事前計算処理が必要であるため,短い文字列ルックアップでは元の実装と比較してより大きなコストがかかる可能性がある一定の時間と空間が必要である.また、一般的に大きな文字列で検索する場合、プログラマーたちは他の特定のデータ構造を使用して、検索が簡単になります.これは、特定の状況を排除する迅速なソートに似ています.異なる環境で異なるアルゴリズムを選択します.
Reference:
http://stackoverflow.com/questions/19543547/why-jdks-string-indexof-does-not-use-kmp

『アルゴリズム』の初級ソートアルゴリズムまとめ

素数判定アルゴリズム小結