いま再びのシステムまぎちゃんへの道~Ansibleで行こう~


さて、全台 Fedora に入れ替えが完了しました。(全部再インストールしたのでSLURM等も入れ直しです)
これから設定していきますが、普通に一個一個丹精込めて入れるのもよいでしょうけれども、
ここでは、

Ansible による構成管理

をしていきます。

Stepそのぜろ、構成の把握

では、構成を把握していきます。まぎちゃんはその名を付けた通り、3台のコンピュータからなるお手軽並列計算機です。
- melchior(192.168.1.101)
- balthasar(192.168.1.102)
- casper(192.168.1.103)
と、管理ノード、
- ctl(192.168.1.100)
および VPN ノード
- vpn(192.168.1.98)
で構成しています。

各種マシンは、ルータでMACアドレスを見てIPを自動的に割り当てるよう設定してあります。

今回はVPN用マシンから構成管理をします。まずは/etc/hostsの書き換えをします。

/etc/hosts
[evakichi@localhost ~]$ cat < /etc/hosts
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.1.98    vpn vpn.sc-magi.com
192.168.1.100   ctl ctl.sc-magi.com
192.168.1.110   ctl-wl ctl-wl.sc-magi.com
192.168.1.101   melchior melchior.sc-magi.com system01 system01.sc-magi.com
192.168.1.111   melchior-wl melchior-wl.sc-magi.com system01-wl system01-wl.sc-magi.com
192.168.1.102   balthasar balthasar.sc-magi.com system02 system02.sc-magi.com
192.168.1.112   balthasar-wl balthasar-wl.sc-magi.com system02-wl system02-wl.sc-magi.com
192.168.1.103   casper casper.sc-magi.com system03 system03.sc-magi.com
192.168.1.113   casper-wl casper-wl.sc-magi.com system03-wl system03-wl.sc-magi.com

hogehoge-wl は無線LAN用に割り当てたものですが、今回は最後のほうに設定したいと思います。

Stepそのいち、SSHキーペアを作る

さて、初めのお仕事として、SSHのキーペアを作ります。

bash
[evakichi@localhost ~]$ ssh-keygen -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/evakichi/.ssh/id_rsa): /home/evakichi/.ssh/id_rsa_fedora_magi
Created directory '/home/evakichi/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/evakichi/.ssh/id_rsa_fedora_magi.
Your public key has been saved in /home/evakichi/.ssh/id_rsa_fedora_magi.pub.
The key fingerprint is:
SHA256:LqBYmpK1mZ+uP6QBSyQxA33Ogms1U511R2u6AKOuW04 [email protected]
The key's randomart image is:
+---[RSA 4096]----+
|*o    . o. ..o   |
|.+. .. o  . . .  |
|o. +.  o     o   |
|o..+o . o   o    |
|.++.+.  S. .     |
|oO.=o. .  . .    |
|B =+ E. .  .   . |
|. ..*. .         |
|  .**o           |
+----[SHA256]-----+

これでキーペアができました。
つぎに、まずはホストネームの変更と、hostsの転送を行います。…とその前に、カギを各マシンに転送します。

[evakichi@localhost ~]$ ssh-copy-id -i .ssh/id_rsa_fedora_magi casper
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa_fedora_magi.pub"
The authenticity of host 'casper (192.168.1.103)' can't be established.
ECDSA key fingerprint is SHA256:13yeYjc0c69tDbRlbCGNpmqvu/7h+jIE8IAhntUKhEg.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
evakichi@casper's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'casper'"
and check to make sure that only the key(s) you wanted were added.

これをとりあえず愚直に4台行っていきます。(工程割愛)

Ansibleのインストール

では、ctl にAnsibleを入れていきます。簡単ですね、

$ sudo dnf install ansible

では次に、各クライアント(マスター1台、スレーブ3台:少々設定が異なる)に
ansibleユーザを作ります。パスワードも簡単ものに決めておきます。

[evakichi@localhost ~]$ sudo useradd ansible
[sudo] password for evakichi:
[evakichi@localhost ~]$ sudo passwd ansible
Changing password for user ansible.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[evakichi@localhost ~]$ sudo usermod -aG wheel ansible

これをノード数分だけ行います。
おっと忘れていました、自分自身にもansibleユーザを追加しておきます。

[root@vpn ~]# passwd ansible
Changing password for user ansible.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@vpn ~]# usermod -aG wheel ansible

ユーザがきちんとwheelグループに追加されていることを確認しておくのもよいでしょう。

[root@vpn ~]# id ansible
uid=1001(ansible) gid=1001(ansible) groups=1001(ansible),10(wheel)

もう一つ忘れていました。ansibleユーザにSSH公開鍵をコピーしないと…。

[evakichi@vpn ~]$ ssh-copy-id -i ./.ssh/id_rsa_fedora_magi ansible@casper
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "./.ssh/id_rsa_fedora_magi.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ansible@casper's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ansible@casper'"
and check to make sure that only the key(s) you wanted were added.

Ansibleでhostsをコピー

簡単な設定だけ書きます。とりあえず今日は使えるかどうかを試験するだけなので、
ちょうどいいから/etc/hostsをコピーしてみたいと思います。

では、インベントリを作成。

inventory.ini
[magi-system]
casper ansible_host=192.168.1.103
balthasar ansible_host=192.168.1.102
melchior ansible_host=192.168.1.101
ctl ansible_host=192.168.1.100

[linux_servers:children]
magi-system

次にプレイブックを書きます。

init.yml
---
- name: main.yml
  hosts: linux-servers
  remote_user: ansible
  roles:
    - copy-hosts
./copy-hosts/tasks/main.yml
---
- name: deploy hosts
  become: yes
  copy:
    src: hosts
    dest: /etc/hosts
    owner: root
    group: root
    mode: 0644
[ansible@vpn ansible-playbooks]$ ansible-playbook -i inventory.ini init.yml --ask-become-pass
BECOME password:
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation. This feature will be removed in version 2.10.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details


PLAY [main.yml] *****************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************************************************
ok: [melchior]
ok: [ctl]
ok: [casper]
ok: [balthasar]

TASK [copy-hosts : deploy hosts] ************************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [melchior]
changed: [balthasar]
changed: [casper]

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
balthasar                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
casper                     : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
ctl                        : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
melchior                   : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

うまくいったようです。
しかし、ちょっと上部のワーニングが気になります。

そこで、指示通り

[ansible@vpn ansible-playbooks]$ ansible-playbook -i inventory.ini init.yml --ask-become-pass
BECOME password:
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details


PLAY [main.yml] *****************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************************************************
ok: [ctl]
ok: [balthasar]
ok: [melchior]
ok: [casper]

TASK [copy-hosts : deploy hosts] ************************************************************************************************************************************************************************************************************
ok: [melchior]
ok: [balthasar]
ok: [casper]
ok: [ctl]

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
balthasar                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
casper                     : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
ctl                        : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
melchior                   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

一個WARNINGが取れました。もう一つは後で片づけます。

Ansibleで並列化環境を構築

いずれにしても、うまく動きました。この勢いで並列化環境入れていきます。
まずはMUNGE用のカギの作成から。

 dd if=/dev/random bs=1 count=1024 >./install-MPI_OPENMP/files/munge.key

鍵も作りましたので、実際にインスコ大会をしてみます。
まずは、プレイブックの作成。

./instal-MPI_OPENMP/tasks/main.yml

[ansible@vpn ansible-playbooks]$ cat install-MPI_OPENMP/tasks/main.yml
---
- name: install mpich
  become: yes
  dnf:
    name: mpich
    state: latest
    update_cache: yes
- name: install mpich-devel
  become: yes
  dnf:
    name: mpich-devel
    state: latest
    update_cache: yes
- name: install mpich-doc
  become: yes
  dnf:
    name: mpich-doc
    state: latest
    update_cache: yes
- name: install openmpi
  become: yes
  dnf:
    name: openmpi
    state: latest
    update_cache: yes
- name: install openmpi
  become: yes
  dnf:
    name: openmpi-devel
    state: latest
    update_cache: yes
- name: install libomp
  become: yes
  dnf:
    name: libomp
    state: latest
    update_cache: yes
- name: install libomp-test
  become: yes
  dnf:
    name: libomp-test
    state: latest
    update_cache: yes
- name: install libomp-devel
  become: yes
  dnf:
    name: libomp-devel
    state: latest
    update_cache: yes
- name: install slurm
  become: yes
  dnf:
    name: slurm*
    state: latest
    update_cache: yes

- name: deploy munge.key
  become: yes
  copy:
    src: munge.key
    dest: /etc/munge/munge.key
    owner: munge
    group: munge
    mode: 0600
[ansible@vpn ansible-playbooks]$ ansible-playbook -i inventory.ini init.yml --ask-become-pass
BECOME password:
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details


PLAY [main.yml] *****************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************************************************
ok: [ctl]
ok: [balthasar]
ok: [casper]
ok: [melchior]

TASK [copy-hosts : deploy hosts] ************************************************************************************************************************************************************************************************************
ok: [ctl]
ok: [melchior]
ok: [balthasar]
ok: [casper]

TASK [install-MPI_OPENMP : install mpich] ***************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [balthasar]
changed: [casper]
changed: [melchior]

TASK [install-MPI_OPENMP : install mpich-devel] *********************************************************************************************************************************************************************************************
changed: [ctl]
changed: [casper]
changed: [melchior]
changed: [balthasar]

TASK [install-MPI_OPENMP : install mpich-doc] ***********************************************************************************************************************************************************************************************
changed: [ctl]
changed: [melchior]
changed: [casper]
changed: [balthasar]

TASK [install-MPI_OPENMP : install openmpi] *************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [melchior]
changed: [casper]
changed: [balthasar]

TASK [install-MPI_OPENMP : install openmpi] *************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [melchior]
changed: [casper]
changed: [balthasar]

TASK [install-MPI_OPENMP : install libomp] **************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [balthasar]
changed: [casper]
changed: [melchior]

TASK [install-MPI_OPENMP : install libomp-test] *********************************************************************************************************************************************************************************************
changed: [ctl]
changed: [balthasar]
changed: [melchior]
changed: [casper]

TASK [install-MPI_OPENMP : install libomp-devel] ********************************************************************************************************************************************************************************************
ok: [ctl]
ok: [melchior]
ok: [casper]
ok: [balthasar]

TASK [install-MPI_OPENMP : install slurm] ***************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [melchior]
changed: [balthasar]
changed: [casper]

TASK [install-MPI_OPENMP : deploy munge.key] ************************************************************************************************************************************************************************************************
changed: [ctl]
changed: [balthasar]
changed: [melchior]
changed: [casper]

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
balthasar                  : ok=12   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
casper                     : ok=12   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
ctl                        : ok=12   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
melchior                   : ok=12   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

…で、あとはupdateを定期的に行うためのプレイブックを作成して実行しておしまい。

./update/tasks/main.yml
---
- name: update packages
  become: yes
  dnf:
    name: "*"
    state: latest
    update_cache: yes
./update.yml
---
- name: main.yml
  hosts: linux-servers
  remote_user: ansible
  roles:
    - update
[ansible@vpn ansible-playbooks]$ ansible-playbook -i inventory.ini update.yml --ask-become-pass
BECOME password:
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details


PLAY [main.yml] *****************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************************************************
ok: [balthasar]
ok: [melchior]
ok: [ctl]
ok: [casper]

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
balthasar                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
casper                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
ctl                        : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
melchior                   : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

おしまい。