Hadoopクラスタを構築してみた(CentOS7)


内容

以下構成のHadoopクラスタを構築した
・マスタノード1台、スレーブノード3台
・OS:CentOS Linux release 7.5.1804
・hadoop:3.1.1

Hadoop用ファイルシステム作成(スレーブノード)

・ディスク確認

[root@localhost ~]#  lsblk -a
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   50G  0 disk
 sda1            8:1    0  500M  0 part /boot
 sda2            8:2    0 49.5G  0 part
  centos-root 253:0    0 45.6G  0 lvm  /
  centos-swap 253:1    0  3.9G  0 lvm  [SWAP]
sdb               8:16   0   50G  0 disk
sr0              11:0    1 1024M  0 rom

・パーティション作成

[root@localhost ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x32350de9.

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-104857599, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599): 104857599
Partition 1 of type Linux and of size 50 GiB is set

Command (m for help): p

Disk /dev/sdb: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x32350de9

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   104855551    52426752   83  Linux
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

・デバイスをフォーマット

[root@localhost ~]#  mkfs.xfs /dev/sdb1
meta-data=/dev/sdb1              isize=512    agcount=4, agsize=3276672 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=13106688, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=6399, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

・マウント

mkdir /Hadoop
mount /dev/sdb1 /Hadoop
[root@localhost ~]# df -T
Filesystem              Type     1K-blocks    Used Available Use% Mounted on
/dev/mapper/centos-root xfs       47781076 1443976  46337100   4% /
devtmpfs                devtmpfs   1928504       0   1928504   0% /dev
tmpfs                   tmpfs      1940480       0   1940480   0% /dev/shm
tmpfs                   tmpfs      1940480    9032   1931448   1% /run
tmpfs                   tmpfs      1940480       0   1940480   0% /sys/fs/cgroup
/dev/sda1               xfs         508588  288936    219652  57% /boot
tmpfs                   tmpfs       388096       0    388096   0% /run/user/1000
/dev/sdb1               xfs       52401156   32944  52368212   1% /Hadoop

・OS再起動時もmountされるように修正
「/etc/fstab」ファイルに
/dev/sdb1 /Hadoop xfs defaults 0 0
を追記する

※参考
https://qiita.com/aosho235/items/ad9a4764e77ba43c9d76#%E3%83%91%E3%83%BC%E3%83%86%E3%82%A3%E3%82%B7%E3%83%A7%E3%83%B3%E3%82%92%E5%88%87%E3%82%8B
https://kazmax.zpp.jp/linux_beginner/fdisk.html

Firewall,SELinux無効化

systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config ←「disabled」に変更

OS設定

・hostname設定
nmcli general hostname test.localdomain
・hosts設定
「/etc/hosts」ファイルに以下を追記

192.168.11.237 hadoopmaster.local
192.168.11.238 hadoopslave1.local
192.168.11.239 hadoopslave2.local
192.168.11.240 hadoopslave3.local

・Hadoopクラスタ利用ユーザー作成

useradd -m hadoop
echo hadoop | passwd hadoop --stdin
chown -R hadoop:hadoop /Hadoop ←hadoopユーザーにて書き込みできるよう権限変更

・マスタノードでの鍵作成

[hadoop@hadoopmaster ~]$ whoami
hadoop
[hadoop@hadoopmaster ~]$ ssh-keygen -t rsa -P '' -f /home/hadoop/.ssh/id_rsa
[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/.ssh
total 8
-rw-------. 1 hadoop hadoop 1679 Nov 25 07:16 id_rsa
-rw-r--r--. 1 hadoop hadoop  407 Nov 25 07:16 id_rsa.pub

・スレーブノードへの公開鍵コピー
hadoopユーザーがマスタノードからスレーブノードにSSHログインできるようにするため、マスタノードの公開鍵を全スレーブノードにコピー

[hadoop@hadoopmaster ~]$ ssh-copy-id [email protected]
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'hadoopslave1.local (192.168.11.238)' can't be established.
ECDSA key fingerprint is SHA256:/GfKYMZHk4GrIkT7q6cvY/DD4fxWHrQZVEoLay3U6UY.
ECDSA key fingerprint is MD5:50:56:37:b9:3b:a0:b7:12:bf:aa:e2:e3:14:4f:b9:e2.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
[email protected]'s password: ←hadoop

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '[email protected]'"
and check to make sure that only the key(s) you wanted were added.

Javaインストール

[root@hadoopmaster ~]# yum install -y java
[root@hadoopmaster ~]# yum install -y java-1.7.0-openjdk-devel
[root@hadoopmaster ~]# java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

・環境変数設定
vi /home/hadoop/.bash_profileに以下を追記

export LANG=en_US.utf8
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/home/hadoop/hadoop-3.1.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

Hadoop設定

・インストール

[hadoop@hadoopmaster ~]$ pwd
/home/hadoop
[hadoop@hadoopmaster ~]$ wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
[hadoop@hadoopmaster ~]$ tar xzvf hadoop-3.1.1.tar.gz -C ./

・設定ファイル作成
①$HADOOP_HOME/etc/hadoop/core-site.xml
マスタノードの指定、データのI/Oのバッファサイズの指定など
「fs.default.name」の値に、「hdfs://hadoopmaster.local:9000」を指定

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
 <name>fs.default.name</name>
  <value>hdfs://hadoopmaster.local:9000</value>
</property>
</configuration>

②$HADOOP_HOME/etc/hadoop/yarn-site.xml
Hadoopクラスタ全体の資源管理を行うリソースマネージャの指定や、メモリ割り当て容量など
リソースマネージャにhadoopmaster.localを指定する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
 <name>yarn.resourcemanager.hostname</name>
  <value>hadoopmaster.local</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

③$HADOOP_HOME/etc/hadoop/hdfs-site.xml
レプリカ数の指定、HDFSのディレクトリの指定など
「dfs.replication」の値でレプリカ数を指定
「dfs.name.dir」の値でHDFSを構成するディレクトリのパス(マスタノード)
「dfs.data.dir」の値でHDFSを構成するディレクトリのパス(データノード)

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
 <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
 <name>dfs.name.dir</name>
  <value>file:///home/hadoop/hdfs/namenode</value>
</property>
</configuration>
[hadoop@hadoopslave1 ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
 <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
 <name>dfs.data.dir</name>
  <value>file:///Hadoop/hdfs/datanode</value>
</property>
</configuration>

④$HADOOP_HOME/etc/hadoop/mapred-site.xml
分散処理の仕組みであり「MapReduce」の各種パラメータの設定など
CPUやメモリなどのHW資源のスケジューリングや分散処理基板向けのアプリケーション開発のためのフレームワークであるYARNを、分散処理のフレームワークとして指定する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
</configuration>

・slavesファイルの作成
スレーブノードのhost名を記載したファイルを作成する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/slaves
hadoopslave1.local
hadoopslave2.local
hadoopslave3.local

Hadoop起動

・HDFSのフォーマット(マスタノード)

[hadoop@hadoopmaster ~]$ which hdfs
~/hadoop-3.1.1/bin/hdfs
[hadoop@hadoopmaster ~]$ hdfs namenode -format

・namenodeの起動(マスタノード)

[hadoop@hadoopmaster ~]$ hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.

hdfs --daemon startコマンドが推奨されているらしい
・resoucemanagerの起動(マスタノード)

[hadoop@hadoopmaster ~]$ yarn-daemon.sh start resourcemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.

yarn --daemon startコマンドが推奨されているらしい

・HDFSメタデータの確認(マスタノード)

[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/hdfs/namenode/current/
total 16
-rw-rw-r--. 1 hadoop hadoop 391 Nov 25 08:59 fsimage_0000000000000000000
-rw-rw-r--. 1 hadoop hadoop  62 Nov 25 08:59 fsimage_0000000000000000000.md5
-rw-rw-r--. 1 hadoop hadoop   2 Nov 25 08:59 seen_txid
-rw-rw-r--. 1 hadoop hadoop 217 Nov 25 08:59 VERSION

・データノードの起動(スレーブノード)

[hadoop@hadoopslave1 ~]$ hadoop-daemon.sh start datanode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[hadoop@hadoopslave1 ~]$ ls -l /Hadoop/hdfs/datanode/
total 4
drwxrwxr-x 3 hadoop hadoop 70 Nov 25 09:22 current
-rw-rw-r-- 1 hadoop hadoop 23 Nov 25 09:22 in_use.lock

hdfs --daemon startコマンドが推奨されているらしい
namenodeと起動コマンド一緒?

・nodemanagerの起動(スレーブノード)

[hadoop@hadoopslave1 ~]$ yarn-daemon.sh start nodemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.

yarn --daemon startコマンドが推奨されているらしい
resourcemanagerと同じコマンド?

・HDFS確認(マスタノード)

[hadoop@hadoopmaster ~]$ hdfs dfsadmin -report
Configured Capacity: 160978448384 (149.92 GB)
Present Capacity: 160876912640 (149.83 GB)
DFS Remaining: 160876900352 (149.83 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3): ←現在稼働中のノード数

Name: 192.168.11.238:9866 (hadoopslave1.local)
Hostname: hadoopslave1.local
Decommission Status : Normal
Configured Capacity: 53658783744 (49.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 33845248 (32.28 MB)
DFS Remaining: 53624934400 (49.94 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.94%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 25 09:31:36 EST 2018
Last Block Report: Sun Nov 25 09:22:42 EST 2018
Num of Blocks: 0
以下略

・スレーブノードの確認(マスタノード)

[hadoop@hadoopmaster ~]$ yarn node -list
2018-11-25 09:35:21,739 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster.local/192.168.11.237:8032
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
hadoopslave1.local:41836                RUNNING hadoopslave1.local:8042                            0
hadoopslave3.local:36843                RUNNING hadoopslave3.local:8042                            0
hadoopslave2.local:39948                RUNNING hadoopslave2.local:8042                            0

・Webコンソール
http://192.168.11.237:8088/cluster

テスト

・データ作成

[hadoop@hadoopslave1 ~]$ mkdir localdir01
[hadoop@hadoopslave1 ~]$ cat find.sh
#!/bin/sh
n=1
for i in `find /usr/share/doc -type f`;
do
 cp -a $i /home/hadoop/localdir01/`basename ${i}_${n}`; n=`expr ${n} + 1`;
done

・コピー先ディレクトリ作成

[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /
[hadoop@hadoopslave1 ~]$ hdfs dfs -mkdir -p /user/hadoop/datadir01
[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /user/hadoop
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-11-25 09:57 /user/hadoop/datadir01

・データのコピー
hdfs dfs -put /home/hadoop/localdir01/* /user/hadoop/datadir01/
・コピーしたファイルの確認
hdfs dfs -ls /user/hadoop/datadir01