hadoopクラスタjobに奇妙な問題が発生

9274 ワード

今日、クラスタのjob実行に奇妙な問題が発生しました.クラスタモニタリングは以下の通りです.
各ノードの実行ステータスによって、ログ情報は次のようになります.
2013-12-20 06:38:49,580 [Main Thread] INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@81c4d9c
2013-12-20 06:38:49,697 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=1932735232, MaxSingleShuffleLimit=483183808
2013-12-20 06:38:49,714 [Thread for merging in memory files] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Thread started: Thread for merging in memory files
2013-12-20 06:38:49,716 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Need another 104 map output(s) where 0 is already in progress
2013-12-20 06:38:49,717 [Thread for polling Map Completion Events] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Thread started: Thread for polling Map Completion Events
2013-12-20 06:38:49,718 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Thread started: Thread for merging on-disk files
2013-12-20 06:38:49,718 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Thread waiting: Thread for merging on-disk files
2013-12-20 06:38:49,719 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:38:54,721 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 12 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:38:55,271 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 12 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:38:55,277 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and64 dup hosts)
2013-12-20 06:38:55,278 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 7 outputs (0 slow hosts and15 dup hosts)
2013-12-20 06:38:55,279 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and56 dup hosts)
2013-12-20 06:38:55,281 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and48 dup hosts)
2013-12-20 06:38:55,283 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and53 dup hosts)
2013-12-20 06:38:55,283 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 4 outputs (0 slow hosts and21 dup hosts)
2013-12-20 06:38:55,283 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 3 outputs (0 slow hosts and43 dup hosts)
2013-12-20 06:38:55,284 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and40 dup hosts)
2013-12-20 06:38:55,284 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 3 outputs (0 slow hosts and17 dup hosts)
2013-12-20 06:38:55,284 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and42 dup hosts)
2013-12-20 06:38:55,284 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 3 outputs (0 slow hosts and29 dup hosts)
2013-12-20 06:38:55,285 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and27 dup hosts)
2013-12-20 06:38:55,285 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and38 dup hosts)
2013-12-20 06:38:55,285 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 4 outputs (0 slow hosts and14 dup hosts)
2013-12-20 06:38:55,286 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 3 outputs (0 slow hosts and7 dup hosts)
2013-12-20 06:38:55,286 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and26 dup hosts)
2013-12-20 06:38:55,287 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 3 outputs (0 slow hosts and13 dup hosts)
2013-12-20 06:38:55,287 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and16 dup hosts)
2013-12-20 06:38:55,290 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 4 outputs (0 slow hosts and2 dup hosts)
2013-12-20 06:38:55,291 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and11 dup hosts)
2013-12-20 06:38:55,291 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and11 dup hosts)
2013-12-20 06:38:55,292 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and10 dup hosts)
2013-12-20 06:38:55,293 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and2 dup hosts)
2013-12-20 06:38:55,293 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and2 dup hosts)
2013-12-20 06:38:55,294 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and8 dup hosts)
2013-12-20 06:38:55,294 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and6 dup hosts)
2013-12-20 06:38:55,295 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and7 dup hosts)
2013-12-20 06:38:55,295 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts)
2013-12-20 06:38:55,296 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts)
2013-12-20 06:38:55,297 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts)
2013-12-20 06:38:55,298 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts)
2013-12-20 06:38:55,299 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and5 dup hosts)
2013-12-20 06:38:55,307 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and4 dup hosts)
2013-12-20 06:38:55,349 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts)
2013-12-20 06:38:55,358 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts)
2013-12-20 06:38:55,359 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts)
2013-12-20 06:38:55,497 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:39:01,330 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:39:17,041 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 2 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:39:23,312 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:39:38,313 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2013-12-20 06:39:53,316 [Main Thread] INFO org.apache.hadoop.mapred.ReduceTask: attempt_201311152318_24026_r_000000_0 Need another 8 map output(s) where 0 is already in progress

このlogは常に実行され、ブロックされ、他のタスクの実行速度が遅くなります.