ログ解析者が久しぶりにHadoopで円周率を計算しようとした結果。。。(完全分散モード編)


■ こちらの環境
OS: Ubuntu 16 or 18
Hadoop: hadoop-3.2.1.tar.gz
JDK (Java): jdk-8u202-linux-x64.tar.gz

ネームノード
192.168.76.216: h-gpu05

データノード
192.168.76.210: h-gpu03
192.168.76.210: h-gpu04

ネームノードとデータノード両方で共有するファイル(つまり、スタンドアローンからリモートに送るファイル)は3種類

  1. hadoop-env.sh
  2. core-site.xml
  3. hdfs-site.xml

そのうち、ネームノードとデータノードで全く同じものは1(hadoop-env.sh)と2(core-site.xml)。

データノードのhdfs-site.xml (|は適宜 < >に読み替えてください)


|configuration|

  |property|
    |name|dfs.replication|name|
    |value|3|value|

    |name|dfs.data.dir|name|
    |value|file:homehadoophdfsdatanode|value|
  |property|
  
|configuration|

ネームノードに置くhdfs-site.xml (|は適宜 < >に読み替えてください)


|configuration|

  |property|
    |name|dfs.replication|name|
    |value|3|value|

    |name|dfs.name.dir|name|
    |value|file:homehadoophdfsnamenode|value|

    |name|dfs.data.dir|name|
    |value|file:homehadoophdfsdatanode|value|
  |property|
  
|configuration|

下記がネームノードでは追加されている。。。


    |name|dfs.name.dir|name|
    |value|file:homehadoophdfsnamenode|value|

データノード側の設定ファイル
https://github.com/RuoAndo/qiita/tree/master/hadoop/distibuted_config/datanode

ネームノード側で設定ファイルは5種類 (*は、ネームノードに固有なファイル)

  1. hdfs-site.xml
  2. workers (*)
  3. core-site.xml
  4. mapred-site.xml (*)
  5. hadoop-env.sh
  6. yarn-site.xml (*)

ネームノード側の設定ファイル
https://github.com/RuoAndo/qiita/tree/master/hadoop/distibuted_config/namenode

ちなみに、私はこのようにファイルを送りました。


  474  scp hadoop-env.sh hadoop@h-gpu04:hadoop-3.2.1/etc/hadoop
  475  scp hadoop-env.sh hadoop@h-gpu03:hadoop-3.2.1/etc/hadoop
  476  scp hdfs-site.xml hadoop@h-gpu04:hadoop-3.2.1/etc/hadoop
  477  scp hdfs-site.xml hadoop@h-gpu03:hadoop-3.2.1/etc/hadoop
  478  scp core-site.xml hadoop@h-gpu03:hadoop-3.2.1/etc/hadoop
  479  scp core-site.xml hadoop@h-gpu04:hadoop-3.2.1/etc/hadoop

これで完了。。と行きたいところだが、私の場合、いろいろなバグに遭遇した。

■ workerファイルの設定

私の場合、h-gpu05(ネームノード)を書いていないとpi計算が途中で止まった。

~$ cat hadoop-3.2.1/etc/hadoop/workers

h-gpu05
h-gpu03
h-gpu04

■ ネームノードの再フォーマット

バグが出た場合、再フォーマット


$ hdfs namenode -format

が必要なことがあるだろう。。

その場合、テンプファイルを一通り消してから再フォーマットした方がよい。


rm -rf hadooptemp/*; rm -rf hdfs/*; rm -rf /tmp/*

■ hdfs dfsadmin -reportが便利。

再フォーマット後に、実行して確認すると吉。


~$ hdfs dfsadmin -report                                                                                                                                                                   
Configured Capacity: 349941760000 (325.91 GB)
Present Capacity: 142477877248 (132.69 GB)
DFS Remaining: 142474989568 (132.69 GB)
DFS Used: 2887680 (2.75 MB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 2
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups: 
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.76.210:9866 (h-gpu03)
Hostname: h-gpu03
Decommission Status : Normal
Configured Capacity: 116918394880 (108.89 GB)
DFS Used: 962560 (940 KB)
Non DFS Used: 62675660800 (58.37 GB)
DFS Remaining: 49171095552 (45.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 42.06%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Dec 03 18:41:00 JST 2020
Last Block Report: Thu Dec 03 18:04:42 JST 2020
Num of Blocks: 17


Name: 192.168.76.212:9866 (h-gpu04)
Hostname: h-gpu04
Decommission Status : Normal
Configured Capacity: 116511682560 (108.51 GB)
DFS Used: 962560 (940 KB)
Non DFS Used: 92455612416 (86.11 GB)
DFS Remaining: 18092593152 (16.85 GB)
DFS Used%: 0.00%
DFS Remaining%: 15.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Dec 03 18:40:58 JST 2020
Last Block Report: Thu Dec 03 18:04:31 JST 2020
Num of Blocks: 17


Name: 192.168.76.216:9866 (h-gpu05)
Hostname: h-gpu05
Decommission Status : Normal
Configured Capacity: 116511682560 (108.51 GB)
DFS Used: 962560 (940 KB)
Non DFS Used: 35336904704 (32.91 GB)
DFS Remaining: 75211300864 (70.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 64.55%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Dec 03 18:40:59 JST 2020
Last Block Report: Thu Dec 03 18:04:32 JST 2020
Num of Blocks: 17

■ hadoopの再起動は慎重に

start-all.sh / stop-all.shのあと、停止していないプロセスやnamenodeがセーフモードで起動していると、
pi計算はもとより、他のサンプルも動かない。

実行してみる。。。


$ hadoop-3.2.1/bin/hadoop jar hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 10 10000
Number of Maps  = 10
Samples per Map = 10000
2020-12-03 18:48:54,434 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #0
2020-12-03 18:48:54,689 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #1
2020-12-03 18:48:54,709 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #2
2020-12-03 18:48:54,726 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #3
2020-12-03 18:48:54,747 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #4
2020-12-03 18:48:54,764 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #5
2020-12-03 18:48:54,780 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #6
2020-12-03 18:48:54,796 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #7
2020-12-03 18:48:54,812 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #8
2020-12-03 18:48:54,828 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Wrote input for Map #9
Starting Job
2020-12-03 18:48:54,902 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-12-03 18:48:55,290 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1606986282038_0002
2020-12-03 18:48:55,319 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-03 18:48:55,395 INFO input.FileInputFormat: Total input files to process : 10
2020-12-03 18:48:55,407 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-03 18:48:55,427 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-03 18:48:55,440 INFO mapreduce.JobSubmitter: number of splits:10
2020-12-03 18:48:55,542 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-12-03 18:48:55,566 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606986282038_0002
2020-12-03 18:48:55,566 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-03 18:48:55,737 INFO conf.Configuration: resource-types.xml not found
2020-12-03 18:48:55,737 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-12-03 18:48:55,802 INFO impl.YarnClientImpl: Submitted application application_1606986282038_0002
2020-12-03 18:48:55,837 INFO mapreduce.Job: The url to track the job: http://h-gpu05:8088/proxy/application_1606986282038_0002/
2020-12-03 18:48:55,837 INFO mapreduce.Job: Running job: job_1606986282038_0002
2020-12-03 18:49:00,899 INFO mapreduce.Job: Job job_1606986282038_0002 running in uber mode : false
2020-12-03 18:49:00,900 INFO mapreduce.Job:  map 0% reduce 0%
2020-12-03 18:49:05,947 INFO mapreduce.Job:  map 60% reduce 0%
2020-12-03 18:49:09,966 INFO mapreduce.Job:  map 100% reduce 0%
2020-12-03 18:49:10,971 INFO mapreduce.Job:  map 100% reduce 100%
2020-12-03 18:49:10,977 INFO mapreduce.Job: Job job_1606986282038_0002 completed successfully
2020-12-03 18:49:11,046 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=226
                FILE: Number of bytes written=2496659
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2640
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=45
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=10
                Launched reduce tasks=1
                Data-local map tasks=10
                Total time spent by all maps in occupied slots (ms)=24531
                Total time spent by all reduces in occupied slots (ms)=2316
                Total time spent by all map tasks (ms)=24531
                Total time spent by all reduce tasks (ms)=2316
                Total vcore-milliseconds taken by all map tasks=24531
                Total vcore-milliseconds taken by all reduce tasks=2316
                Total megabyte-milliseconds taken by all map tasks=25119744
                Total megabyte-milliseconds taken by all reduce tasks=2371584
        Map-Reduce Framework
                Map input records=10
                Map output records=20
                Map output bytes=180
                Map output materialized bytes=280
                Input split bytes=1460
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=40
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=2079
                CPU time spent (ms)=14170
                Physical memory (bytes) snapshot=3824275456
                Virtual memory (bytes) snapshot=34022707200
                Total committed heap usage (bytes)=6392643584
                Peak Map Physical memory (bytes)=349265920
                Peak Map Virtual memory (bytes)=3095851008
                Peak Reduce Physical memory (bytes)=369152000
                Peak Reduce Virtual memory (bytes)=3097395200
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1180
        File Output Format Counters 
                Bytes Written=97
Job Finished in 16.209 seconds
2020-12-03 18:49:11,086 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
Estimated value of Pi is 3.14120000000000000000

(`ー´)b