Javaは辞書ツリーTrie Treeを実現します.

4325 ワード

--1.1.Data Structure

アリさんのオンライン筆記試験を準備するために、この数日はデータ構造を振り返ってみました.辞書の木を見て、四六級の高周波語は辞書の木で見つけられます.
辞書ツリーを作成する過程は以下の通りです.
1.まずツリーノードがどのようなデータ構造を使うべきかを確認します.

public class TrieTreeNode {
	/**
	 *     
	 */
	public short depth;
	/**
	 *             
	 */
	public Map children = new HashMap();
	/**
	 *         
	 */
	public boolean isTail = false;
	/**
	 *     
	 */
	public TrieTreeNode parent;
	/**
	 *    a-z      
	 */
	public char value;
	/**
	 *       ,wordCount++,           
	 */
	public int wordCount = 0;
	/**
	 *        
	 */
	public StringBuilder word = new StringBuilder();

	public TrieTreeNode() {
		// TODO Auto-generated constructor stub
	}

}

2.ノードのparent、children属性によって、各ノードを接続します.各単語の開始時に、まずrootノードを両親のノードとして、単語の中の各文字をサブノードとして構成します.

	/**
	 *              ,       
	 * @param data            ,      .
	 */
	public void createTrieTree(String data) {

		//         
		String[] strArr = data.split(regex);
		//         

		// split  str       
		for (String s : strArr) {
			//        
			if (s.length() > 1) {
				//       ,  root         
				TrieTreeNode parent = root;
				//         
				for (short i = 0; i < s.length(); i++) {
					char c = s.charAt(i);
					//       ,    .
					c = (char) (c < 'a' ? c + 32 : c);
					//     
					TrieTreeNode child = createChildTrieTree(c, i, parent);
					int key = child.value * 100 + child.depth;
					parent.children.put(key, child);
					parent = child;
				}
				//          ,            .
				parent.isTail = true;
				//           wordCount++
				int wordCount = ++parent.wordCount;
				String word = parent.word.toString();
				//            map
				wordCountMap.put(word, new WordAndCount(word, wordCount));
			}
		}
	}

	/**
	 *        
	 * @param c
	 * @param depth
	 * @param parent
	 * @return
	 */
	private TrieTreeNode createChildTrieTree(char c, short depth,
			TrieTreeNode parent) {
		//      ,       c,3               ,    ,    .
		TrieTreeNode child = getChild(c, depth, parent);
		//     ,     ,       .
		if (child == null) {
			child = new TrieTreeNode();

			child.depth = depth;
			child.parent = parent;
			child.value = c;
			//      ,  isTail   ,           .
			child.word.append(parent.word).append(c);
		}

		return child;
	}

	/**
	 *     c,        ,     .
	 * @param c
	 * @param depth
	 * @param parent
	 * @return
	 */
	private TrieTreeNode getChild(char c, short depth, TrieTreeNode parent) {
		// TODO Auto-generated method stub
		//       ,    HashMap         ,key   c depth    .
		Map children = parent.children;
		//           26,        depth   ,       c   ,             key     value
		int key = c * 100 + depth;
		return children.get(key);
	}

3.上記の手順により、辞書ツリーの構造は完了しました.次は読み値です.maker.getWord CountMap()を通じて、すべての単語とその出現回数が得られます.並べ替え表示をしたいなら、Trie TreeMakerのsort方法を呼び出して並べ替えられます.また、特別な需要があれば、get方法でrootノードを取得し、再帰的に巡回します.
最後にテストコードとデータを添付します.

@Test
	public void test() {

		String filePath = "C:\\logs\\file.log";

		File srcFile = new File(filePath);

		System.out.println("File Size:" + srcFile.length() / 1024 / 1024.0
				+ "M");

		long end2, end, start = System.nanoTime();

		TrieTreeMaker maker = new TrieTreeMaker();

		maker.setCapacity(1024 * 1024 * 20);
		//             ,      createTrieTree("xxxx")  .
		maker.obtainDataAndCreateTrieTree(srcFile);

		end = System.nanoTime();

		System.out.println("After Creation is accomplished:"
				+ BigDecimal.valueOf(end - start, 9));
		//true    ,false    
		List> list = maker.sort(
				maker.getWordCountMap(), true);

		end2 = System.nanoTime();

		System.out.println("After Sort is accomplished:"
				+ BigDecimal.valueOf(end2 - end, 9));
		//    
		for (Entry entry : list) {
			System.out.println(entry.getValue().word + ":"
					+ entry.getValue().count);
		}
	}

File Size:15.69628906225 M
After Creation is accompplished:2.812941204 s
After Sort is accompplished:0.00912021 s
org:113792
java:74116
at:71064
スプリングフレームワーク:68740
security:626200
web:53172
apache:37100
struts:33516
….結果が多すぎて、省略します.
ここをクリックしてソースをダウンロードします.

ES 6精通のIteratorインターフェースとfor…ofサイクルパーソナルノート