MacbookにFacebook ParlAIを入れて、モデル学習を行うまで


実行環境

・ 計算機: MacBookPro
・ OS: macOS Catalina
・ Python: Puython3.6.3

実行したコード

(GitHub) facebookresearch/ParlAI

( 以下も参考にしました )

pypi.org parlai 0.10.0
ParlAI – オープンに利用可能なさまざまなダイアログデータセットでAIモデルをトレーニングおよび評価するためのフレームワーク
2017-10-19 「対話モデルの訓練/評価フレームワーク ParlAI がすごい」

git cloneでインストール

Terminal
ocean@AfoGuardMacBook-Pro Desktop % git clone https://github.com/facebookresearch/ParlAI.git ~/ParlAI
Cloning into '/Users/ocean/ParlAI'...
remote: Enumerating objects: 174, done.
remote: Counting objects: 100% (174/174), done.
remote: Compressing objects: 100% (123/123), done.
remote: Total 37068 (delta 105), reused 78 (delta 51), pack-reused 36894
Receiving objects: 100% (37068/37068), 64.73 MiB | 8.53 MiB/s, done.
Resolving deltas: 100% (26171/26171), done.
ocean@AfoGuardMacBook-Pro Desktop % 
ocean@AfoGuardMacBook-Pro Desktop % cd ~/ParlAI; python setup.py develop
running develop
running egg_info
creating parlai.egg-info
writing parlai.egg-info/PKG-INFO
writing dependency_links to parlai.egg-info/dependency_links.txt

( ・・・省略・・・ )

Installed /Users/ocean/.pyenv/versions/3.9.0/lib/python3.9/site-packages/jsonlines-1.2.0-py3.9.egg
Searching for websocket-server
Reading https://pypi.org/simple/websocket-server/
Downloading https://files.pythonhosted.org/packages/74/64/e86581ee7775a2e08aca530b41e1a1e3ee6b320233b1eff301dcb86d1636/websocket_server-0.4.tar.gz#sha256=91cd4b565d1e1b00ef107abcb2840a8090868b19543f3b38e1962d5f975d0c04
Best match: websocket-server 0.4
Processing websocket_server-0.4.tar.gz

( ・・・省略・・・ )

Using /Users/ocean/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Finished processing dependencies for parlai==0.10.0
ocean@AfoGuardMacBook-Pro ParlAI %

ディレクトリ構成

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % ls
CODE_OF_CONDUCT.md  MANIFEST.in     autoformat.sh       docs            parlai          pyproject.toml      setup.py
CONTRIBUTING.md     NEWS.md         codecov.yml     example_parlai_internal parlai.egg-info     pytest.ini      tests
LICENSE         README.md       conftest.py     mypy.ini        projects        requirements.txt    website
ocean@AfoGuardMacBook-Pro ParlAI %

pip install parlai

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % pip install parlai
Requirement already satisfied: parlai in /Users/ocean/ParlAI (0.10.0)
Requirement already satisfied: boto3 in /Users/ocean/.pyenv/versions/3.9.0/lib/python3.9/site-packages/boto3-1.16.39-py3.9.egg (from parlai) (1.16.39)

( ・・・省略・・・ )

ocean@AfoGuardMacBook-Pro ParlAI % 

ディレクトリ構成

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % ls
CODE_OF_CONDUCT.md  MANIFEST.in     autoformat.sh       docs            parlai          pyproject.toml      setup.py
CONTRIBUTING.md     NEWS.md         codecov.yml     example_parlai_internal parlai.egg-info     pytest.ini      tests
LICENSE         README.md       conftest.py     mypy.ini        projects        requirements.txt    website
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai
README.md   __main__.py chat_service    crowdsourcing   nn      tasks       zoo
__init__.py agents      core        mturk       scripts     utils
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai/scripts 
__init__.py             data_stats.py               eval_model.py               multiprocessing_train.py        token_stats.py
build_candidates.py         detect_offensive_language.py        eval_wordstat.py            party.py                train_model.py
build_dict.py               display_data.py             extract_image_feature.py        profile_interactive.py          vacuum.py
compare_opts.py             display_model.py            interactive.py              profile_train.py            verify_data.py
convert_data_to_parlai_format.py    distributed_eval.py         interactive_web.py          safe_interactive.py
convo_render.py             distributed_train.py            multiprocessing_eval.py         self_chat.py
ocean@AfoGuardMacBook-Pro ParlAI %

動かしてみる

以下は失敗

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai parlai/scripts/display_data -t squad
usage: parlai [-h] [--helpall] [--version] COMMAND ...

       _
      /")
     //)
  ==//'=== ParlAI
   /

optional arguments:
  -h, --help               show this help message and exit
  --helpall                List all commands, including advanced ones.
  --version                Prints version info and exit.

Commands:

  display_data (dd)        Display data from a task
  display_model (dm)       Display model predictions.
  eval_model (em, eval)    Evaluate a model
  train_model (tm, train)  Train a model
  interactive (i)          Interactive chat with a model on the command line
  safe_interactive         Like interactive, but adds a safety filter
  self_chat                Generate self-chats of a model

Parse Error: argument COMMAND: invalid choice: 'parlai/scripts/display_data' (choose from 'help', 'h', 'helpall', 'build_candidates', 'build_dict', 'convert_to_parlai', 'convo_render', 'data_stats', 'detect_offensive', 'display_data', 'dd', 'display_model', 'dm', 'eval_model', 'em', 'eval', 'train_model', 'tm', 'train', 'eval_wordstat', 'extract_image_feature', 'interactive', 'i', 'interactive_web', 'iweb', 'multiprocessing_eval', 'mp_eval', 'multiprocessing_train', 'mp_train', 'party', 'parrot', 'profile_interactive', 'profile_train', 'safe_interactive', 'self_chat', 'token_stats', 'vacuum', 'verify_data')
ocean@AfoGuardMacBook-Pro ParlAI %

コマンドの引数として呼び出せるのは、以下のうちいずれかである、とのこと。

  • 'help'
  • 'h'
  • 'helpall'
  • 'build_candidates'
  • 'build_dict'
  • 'convert_to_parlai'
  • 'convo_render'
  • 'data_stats'
  • 'detect_offensive'
  • 'display_data'
  • 'dd'
  • 'display_model'
  • 'dm'
  • 'eval_model'
  • 'em'
  • 'eval'
  • 'train_model'
  • 'tm'
  • 'train'
  • 'eval_wordstat'
  • 'extract_image_feature'
  • 'interactive'
  • 'i'
  • 'interactive_web'
  • 'iweb'
  • 'multiprocessing_eval'
  • 'mp_eval'
  • 'multiprocessing_train'
  • 'mp_train'
  • 'party'
  • 'parrot'
  • 'profile_interactive'
  • 'profile_train'
  • 'safe_interactive'
  • 'self_chat'
  • 'token_stats'
  • 'vacuum'
  • 'verify_data'

以下も失敗

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai parlai/scripts/display_data         
usage: parlai [-h] [--helpall] [--version] COMMAND ...

       _
      /")
     //)
  ==//'=== ParlAI
   /

optional arguments:
  -h, --help               show this help message and exit
  --helpall                List all commands, including advanced ones.
  --version                Prints version info and exit.

Commands:

  display_data (dd)        Display data from a task
  display_model (dm)       Display model predictions.
  eval_model (em, eval)    Evaluate a model
  train_model (tm, train)  Train a model
  interactive (i)          Interactive chat with a model on the command line
  safe_interactive         Like interactive, but adds a safety filter
  self_chat                Generate self-chats of a model

Parse Error: argument COMMAND: invalid choice: 'parlai/scripts/display_data' (choose from 'help', 'h', 'helpall', 'build_candidates', 'build_dict', 'convert_to_parlai', 'convo_render', 'data_stats', 'detect_offensive', 'display_data', 'dd', 'display_model', 'dm', 'eval_model', 'em', 'eval', 'train_model', 'tm', 'train', 'eval_wordstat', 'extract_image_feature', 'interactive', 'i', 'interactive_web', 'iweb', 'multiprocessing_eval', 'mp_eval', 'multiprocessing_train', 'mp_train', 'party', 'parrot', 'profile_interactive', 'profile_train', 'safe_interactive', 'self_chat', 'token_stats', 'vacuum', 'verify_data')
ocean@AfoGuardMacBook-Pro ParlAI %

以下でスクリプトを実行できた。ただし、taskを渡し忘れた。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai display_data 
21:22:17 | Opt:
21:22:17 |     allow_missing_init_opts: False
21:22:17 |     batchsize: 1
21:22:17 |     datapath: /Users/ocean/ParlAI/data
21:22:17 |     datatype: train:ordered
21:22:17 |     dict_class: None
21:22:17 |     display_add_fields: 
21:22:17 |     download_path: None
21:22:17 |     dynamic_batching: None
21:22:17 |     hide_labels: False
21:22:17 |     ignore_agent_reply: True
21:22:17 |     image_cropsize: 224
21:22:17 |     image_mode: raw
21:22:17 |     image_size: 256
21:22:17 |     init_model: None
21:22:17 |     init_opt: None
21:22:17 |     loglevel: info
21:22:17 |     max_display_len: 1000
21:22:17 |     model: None
21:22:17 |     model_file: None
21:22:17 |     multitask_weights: [1]
21:22:17 |     num_examples: 10
21:22:17 |     override: {}
21:22:17 |     parlai_home: /Users/ocean/ParlAI
21:22:17 |     starttime: Dec18_21-22
21:22:17 |     task: None
21:22:17 |     verbose: False
21:22:17 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
Traceback (most recent call last):
  File "/Users/ocean/.pyenv/versions/3.9.0/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/Users/ocean/ParlAI/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/Users/ocean/ParlAI/parlai/core/script.py", line 307, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/Users/ocean/ParlAI/parlai/core/script.py", line 90, in _run_from_parser_and_opt
    return script.run()
  File "/Users/ocean/ParlAI/parlai/scripts/display_data.py", line 108, in run
    return display_data(self.opt)
  File "/Users/ocean/ParlAI/parlai/scripts/display_data.py", line 70, in display_data
    world = create_task(opt, agent)
  File "/Users/ocean/ParlAI/parlai/core/worlds.py", line 1249, in create_task
    raise RuntimeError(
RuntimeError: No task specified. Please select a task with --task {task_name}.
ocean@AfoGuardMacBook-Pro ParlAI %

以下でスクリプトを実行できた。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai display_data -t squad
21:22:34 | Opt:
21:22:34 |     allow_missing_init_opts: False
21:22:34 |     batchsize: 1
21:22:34 |     datapath: /Users/ocean/ParlAI/data
21:22:34 |     datatype: train:ordered
21:22:34 |     dict_class: None
21:22:34 |     display_add_fields: 
21:22:34 |     download_path: None
21:22:34 |     dynamic_batching: None
21:22:34 |     hide_labels: False
21:22:34 |     ignore_agent_reply: True
21:22:34 |     image_cropsize: 224
21:22:34 |     image_mode: raw
21:22:34 |     image_size: 256
21:22:34 |     init_model: None
21:22:34 |     init_opt: None
21:22:34 |     loglevel: info
21:22:34 |     max_display_len: 1000
21:22:34 |     model: None
21:22:34 |     model_file: None
21:22:34 |     multitask_weights: [1]
21:22:34 |     num_examples: 10
21:22:34 |     override: "{'task': 'squad'}"
21:22:34 |     parlai_home: /Users/ocean/ParlAI
21:22:34 |     starttime: Dec18_21-22
21:22:34 |     task: squad
21:22:34 |     verbose: False
21:22:34 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
21:22:34 | creating task(s): squad
[building data: /Users/ocean/ParlAI/data/SQuAD]
21:22:34 | Downloading https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json to /Users/ocean/ParlAI/data/SQuAD/train-v1.1.json
Downloading train-v1.1.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.3M/30.3M [00:02<00:00, 14.4MB/s]
21:22:37 | Downloading https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json to /Users/ocean/ParlAI/data/SQuAD/dev-v1.1.json
Downloading dev-v1.1.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.85M/4.85M [00:01<00:00, 4.60MB/s]
loading: /Users/ocean/ParlAI/data/SQuAD/train-v1.1.json
- - - NEW EPISODE: squad - - -
Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?
   Saint Bernadette Soubirous
- - - NEW EPISODE: squad - - -
Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
What is in front of the Notre Dame Main Building?
   a copper statue of Christ
- - - NEW EPISODE: squad - - -
Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
The Basilica of the Sacred heart at Notre Dame is beside to which structure?
   the Main Building
- - - NEW EPISODE: squad - - -
Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
What is the Grotto at Notre Dame?
   a Marian place of prayer and reflection
- - - NEW EPISODE: squad - - -
Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
What sits on top of the Main Building at Notre Dame?
   a golden statue of the Virgin Mary
- - - NEW EPISODE: squad - - -
As at most other universities, Notre Dame's students run a number of news media outlets. The nine student-run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one-page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut.
When did the Scholastic Magazine of Notre dame begin publishing?
   September 1876
- - - NEW EPISODE: squad - - -
As at most other universities, Notre Dame's students run a number of news media outlets. The nine student-run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one-page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut.
How often is Notre Dame's the Juggler published?
   twice
- - - NEW EPISODE: squad - - -
As at most other universities, Notre Dame's students run a number of news media outlets. The nine student-run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one-page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut.
What is the daily student paper at Notre Dame called?
   The Observer
- - - NEW EPISODE: squad - - -
As at most other universities, Notre Dame's students run a number of news media outlets. The nine student-run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one-page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut.
How many student news papers are found at Notre Dame?
   three
- - - NEW EPISODE: squad - - -
As at most other universities, Notre Dame's students run a number of news media outlets. The nine student-run outlets include three newspapers, both a radio and television station, and several magazines and journals. Begun as a one-page journal in September 1876, the Scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the United States. The other magazine, The Juggler, is released twice a year and focuses on student literature and artwork. The Dome yearbook is published annually. The newspapers have varying publication interests, with The Observer published daily and mainly reporting university and other news, and staffed by students from both Notre Dame and Saint Mary's College. Unlike Scholastic and The Dome, The Observer is an independent publication and does not have a faculty advisor or any editorial oversight from the University. In 1987, when some students believed that The Observer began to show a conservative bias, a liberal newspaper, Common Sense was published. Likewise, in 2003, when other students believed that the paper showed a liberal bias, the conservative paper Irish Rover went into production. Neither paper is published as often as The Observer; however, all three are distributed to all students. Finally, in Spring 2008 an undergraduate journal for political science research, Beyond Politics, made its debut.
In what year did the student paper Common Sense begin publication at Notre Dame?
   1987
21:22:38 | loaded 87599 episodes with a total of 87599 examples
ocean@AfoGuardMacBook-Pro ParlAI % 

https://github.com/facebookresearch/ParlAI
に掲載されている以下を実行してみる。

https
Evaluate an IR baseline model on the validation set of the Personachat task:

> parlai eval_model -m ir_baseline -t personachat -dt valid

うまく実行できた。
IR baseline modelvalidation setに適用したところ、正解率その他の結果一式が、次の形で出力された。

    accuracy  bleu-4  exs    f1  hits@1  hits@10  hits@100  hits@5
       .1595   .1597 7801 .2540   .1595    .6499         1   .4299

以下がその実行結果

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai eval_model -m ir_baseline -t personachat -dt valid
21:35:05 | Opt:
21:35:05 |     aggregate_micro: False
21:35:05 |     allow_missing_init_opts: False
21:35:05 |     batchsize: 1
21:35:05 |     bpe_add_prefix_space: None
21:35:05 |     bpe_debug: False
21:35:05 |     bpe_dropout: None
21:35:05 |     bpe_merge: None
21:35:05 |     bpe_vocab: None
21:35:05 |     datapath: /Users/ocean/ParlAI/data
21:35:05 |     datatype: valid
21:35:05 |     dict_class: None
21:35:05 |     dict_endtoken: __end__
21:35:05 |     dict_file: None
21:35:05 |     dict_initpath: None
21:35:05 |     dict_language: english
21:35:05 |     dict_loaded: False
21:35:05 |     dict_lower: False
21:35:05 |     dict_max_ngram_size: -1
21:35:05 |     dict_maxtokens: -1
21:35:05 |     dict_minfreq: 0
21:35:05 |     dict_nulltoken: __null__
21:35:05 |     dict_starttoken: __start__
21:35:05 |     dict_textfields: text,labels
21:35:05 |     dict_tokenizer: re
21:35:05 |     dict_unktoken: __unk__
21:35:05 |     display_examples: False
21:35:05 |     download_path: None
21:35:05 |     dynamic_batching: None
21:35:05 |     hide_labels: False
21:35:05 |     history_size: 1
21:35:05 |     image_cropsize: 224
21:35:05 |     image_mode: raw
21:35:05 |     image_size: 256
21:35:05 |     init_model: None
21:35:05 |     init_opt: None
21:35:05 |     label_candidates_file: None
21:35:05 |     length_penalty: 0.5
21:35:05 |     log_every_n_secs: 10
21:35:05 |     log_keep_fields: all
21:35:05 |     loglevel: info
21:35:05 |     metrics: default
21:35:05 |     model: ir_baseline
21:35:05 |     model_file: None
21:35:05 |     multitask_weights: [1]
21:35:05 |     num_examples: -1
21:35:05 |     override: "{'model': 'ir_baseline', 'task': 'personachat', 'datatype': 'valid'}"
21:35:05 |     parlai_home: /Users/ocean/ParlAI
21:35:05 |     report_filename: 
21:35:05 |     save_format: conversations
21:35:05 |     save_world_logs: False
21:35:05 |     starttime: Dec18_21-35
21:35:05 |     task: personachat
21:35:05 |     tensorboard_log: False
21:35:05 |     tensorboard_logdir: None
21:35:05 |     verbose: False
21:35:05 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
21:35:05 | Evaluating task personachat using datatype valid.
21:35:05 | creating task(s): personachat
[building data: /Users/ocean/ParlAI/data/Persona-Chat]
21:35:05 | Downloading http://parl.ai/downloads/personachat/personachat.tgz to /Users/ocean/ParlAI/data/Persona-Chat/personachat.tgz
Downloading personachat.tgz: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 223M/223M [00:17<00:00, 12.7MB/s]
21:35:31 | loading fbdialog data: /Users/ocean/ParlAI/data/Persona-Chat/personachat/valid_self_original.txt
21:35:36 | Finished evaluating tasks ['personachat'] using datatype valid
    accuracy  bleu-4  exs    f1  hits@1  hits@10  hits@100  hits@5
       .1595   .1597 7801 .2540   .1595    .6499         1   .4299
ocean@AfoGuardMacBook-Pro ParlAI % 

次に実行するのは、これ。

Train a single layer transformer on PersonaChat (requires pytorch and torchtext). Detail: embedding size 300, 4 attention heads, 2 epochs using batchsize 64, word vectors are initialized with fasttext and the other elements of the batch are used as negative during training.

> parlai train_model -t personachat -m transformer/ranker -mf /tmp/model_tr6 --n-layers 1 --embedding-size 300 --ffn-size 600 --n-heads 4 --num-epochs 2 -veps 0.25 -bs 64 -lr 0.001 --dropout 0.1 --embedding-type fasttext_cc --candidates batch
Terminal
ocean@AfoGuardMacBook-Pro ParlAI % parlai train_model -t personachat -m transformer/ranker -mf /tmp/model_tr6 --n-layers 1 --embedding-size 300 --ffn-size 600 --n-heads 4 --num-epochs 2 -veps 0.25 -bs 64 -lr 0.001 --dropout 0.1 --embedding-type fasttext_cc --candidates batch
21:39:21 | building dictionary first...
21:39:21 | Opt:
21:39:21 |     activation: relu
21:39:21 |     adafactor_eps: '(1e-30, 0.001)'
21:39:21 |     adam_eps: 1e-08

( ・・・省略・・・ )

21:39:21 |     update_freq: 1
21:39:21 |     use_memories: False
21:39:21 |     use_reply: label
21:39:21 |     validation_cutoff: 1.0
21:39:21 |     validation_every_n_epochs: 0.25
21:39:21 |     validation_every_n_secs: -1
21:39:21 |     validation_max_exs: -1
21:39:21 |     validation_metric: accuracy
21:39:21 |     validation_metric_mode: None
21:39:21 |     validation_patience: 10
21:39:21 |     validation_share_agent: False
21:39:21 |     variant: aiayn
21:39:21 |     verbose: False
21:39:21 |     warmup_rate: 0.0001
21:39:21 |     warmup_updates: -1
21:39:21 |     weight_decay: None
21:39:21 |     wrap_memory_encoder: False
21:39:21 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
21:39:21 | creating task(s): personachat
21:39:21 | loading fbdialog data: /Users/ocean/ParlAI/data/Persona-Chat/personachat/train_self_original.txt
Building dictionary:   0%|                                                                                                                                                                 | 0.00/65.7k [00:00<?, ?ex/s]21:39:22 | loading fbdialog data: /Users/ocean/ParlAI/data/Persona-Chat/personachat/train_self_original.txt
Building dictionary: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65.7k/65.7k [00:03<00:00, 17.7kex/s]
21:39:25 | Saving dictionary to /tmp/model_tr6.dict
21:39:26 | dictionary built with 18745 tokens in 0.0s
21:39:26 | No model with opt yet at: /tmp/model_tr6(.opt)
21:39:26 | loading dictionary from /tmp/model_tr6.dict
21:39:26 | num words = 18745
/Users/ocean/ParlAI/data/models/fasttext_cc_vectors/crawl-300d-2M.vec.zip: 1.52GB [01:51, 13.7MB/s]                                                                                                                     
  0%|                                                                                                                                                                                       | 0/1999995 [00:00<?, ?it/s]Skipping token b'1999995' with 1-dimensional vector [b'300']; likely a header
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1999995/1999995 [02:47<00:00, 11923.81it/s]
21:44:41 | Initialized embeddings for 18284 tokens (97.5%) from fasttext_cc.
21:44:41 | Total parameters: 6,834,600 (6,527,400 trainable)
21:44:42 | Opt:
21:44:42 |     activation: relu
21:44:42 |     adafactor_eps: '(1e-30, 0.001)'

( ・・・省略・・・ )

21:44:42 |     validation_max_exs: -1
21:44:42 |     validation_metric: accuracy
21:44:42 |     validation_metric_mode: None
21:44:42 |     validation_patience: 10
21:44:42 |     validation_share_agent: False
21:44:42 |     variant: aiayn
21:44:42 |     verbose: False
21:44:42 |     warmup_rate: 0.0001
21:44:42 |     warmup_updates: -1
21:44:42 |     weight_decay: None
21:44:42 |     wrap_memory_encoder: False
21:44:42 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
21:44:42 | creating task(s): personachat
21:44:42 | loading fbdialog data: /Users/ocean/ParlAI/data/Persona-Chat/personachat/train_self_original.txt
21:44:43 | training...
21:44:43 | [ Executing train mode with batch labels as set of candidates. ]
21:44:54 | time:11s total_exs:960 epochs:0.01 time_left:1450s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss    mrr  rank  total_train_updates  tpb   tps  train_accuracy  ups
       1  8873 12513 90.12  960   4.92 .0010 782.4  1103      4.345 .07379 32.19                   15 9656 13616          .01667 1.41

21:45:04 | time:21s total_exs:1792 epochs:0.03 time_left:1543s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss    mrr  rank  total_train_updates  tpb   tps  train_accuracy   ups
       1  9062 11620 82.06  832  1.767 .0010 772.4 990.3       4.15 .08851 29.61                   28 9835 12610          .01803 1.282

21:45:15 | time:32s total_exs:2624 epochs:0.04 time_left:1570s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss   mrr  rank  total_train_updates  tpb   tps  train_accuracy   ups
       1  8964 10958 78.23  832  1.784 .0010 760.2 929.3      4.102 .1094  26.9                   41 9724 11887          .03005 1.223

21:45:25 | time:43s total_exs:3392 epochs:0.05 time_left:1604s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss   mrr  rank  total_train_updates   tpb   tps  train_accuracy   ups
       1  9238 10561 73.15  768  2.906 .0010 763.8   873      4.049 .1223  24.6                   53 10002 11434          .04297 1.143

( ・・・省略・・・ )

21:47:42 | time:179s total_exs:12544 epochs:0.19 time_left:1700s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss   mrr  rank  total_train_updates  tpb  tps  train_accuracy   ups
       1  9145  7998 55.96  576  4.974 .0010   764   668      3.614 .2696 16.51                  196 9909 8666           .1597 .8746

21:47:53 | time:190s total_exs:13184 epochs:0.20 time_left:1705s
    clip  ctpb  ctps  exps  exs  gnorm    lr  ltpb  ltps  mean_loss   mrr  rank  total_train_updates   tpb  tps  train_accuracy   ups
       1  9284  8636 59.53  640  5.013 .0010 733.2   682      3.659 .2405 16.55                  206 10017 9318           .1281 .9303

( Ctrrl-Zで強制終了 )

zsh: suspended  parlai train_model -t personachat -m transformer/ranker -mf /tmp/model_tr6  1
ocean@AfoGuardMacBook-Pro ParlAI % 

次にやること

次のチュートリアルサイトに倣って、Pythonで学習してみる。

HatenaBlog ディープラーニングブログ 2017-10-19 「対話モデルの訓練/評価フレームワーク ParlAI がすごい」

Terminal
pip install -r requirements.txt
# pytorch のインストールは http://pytorch.org/ に従う
# pyzmq, regex, spacy のインストール (requirements_ext.txt がなければ個別に)
pip install -r requirements_ext.txt  
# ParlAI のインストール
sudo python setup.py develop
# ParlAI 実行
python examples/display_data.py -t babi:task1k:1

サンプルコードの最後にあるコマンドを実行すると、display_data.pyが見つからないと怒られる。
display_data.pyのファイルパスが正しくないからだ。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % python examples/display_data.py -t babi:task1k:1
/Users/ocean/.pyenv/versions/3.9.0/bin/python: can't open file '/Users/ocean/ParlAI/examples/display_data.py': [Errno 2] No such file or directory
ocean@AfoGuardMacBook-Pro ParlAI %

display_data.pyは、ParlAI/parlai/scripts/の配下にある。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % ls
CODE_OF_CONDUCT.md  MANIFEST.in     autoformat.sh       data            mypy.ini        projects        requirements.txt    website
CONTRIBUTING.md     NEWS.md         codecov.yml     docs            parlai          pyproject.toml      setup.py
LICENSE         README.md       conftest.py     example_parlai_internal parlai.egg-info     pytest.ini      tests
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai
README.md   __main__.py agents      core        mturk       scripts     utils
__init__.py __pycache__ chat_service    crowdsourcing   nn      tasks       zoo
ocean@AfoGuardMacBook-Pro ParlAI %
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai/scripts
__init__.py             convo_render.py             distributed_train.py            multiprocessing_eval.py         self_chat.py
__pycache__             data_stats.py               eval_model.py               multiprocessing_train.py        token_stats.py
build_candidates.py         detect_offensive_language.py        eval_wordstat.py            party.py                train_model.py
build_dict.py               display_data.py             extract_image_feature.py        profile_interactive.py          vacuum.py
compare_opts.py             display_model.py            interactive.py              profile_train.py            verify_data.py
convert_data_to_parlai_format.py    distributed_eval.py         interactive_web.py          safe_interactive.py
ocean@AfoGuardMacBook-Pro ParlAI %

以下で実行成功

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % python parlai/scripts/display_data.py -t babi:task1k:1
22:20:18 | Opt:
22:20:18 |     allow_missing_init_opts: False
22:20:18 |     batchsize: 1
22:20:18 |     datapath: /Users/ocean/ParlAI/data
22:20:18 |     datatype: train:ordered
22:20:18 |     dict_class: None
22:20:18 |     display_add_fields: 
22:20:18 |     download_path: None
22:20:18 |     dynamic_batching: None
22:20:18 |     hide_labels: False
22:20:18 |     ignore_agent_reply: True
22:20:18 |     image_cropsize: 224
22:20:18 |     image_mode: raw
22:20:18 |     image_size: 256
22:20:18 |     init_model: None
22:20:18 |     init_opt: None
22:20:18 |     loglevel: info
22:20:18 |     max_display_len: 1000
22:20:18 |     model: None
22:20:18 |     model_file: None
22:20:18 |     multitask_weights: [1]
22:20:18 |     num_examples: 10
22:20:18 |     override: "{'task': 'babi:task1k:1'}"
22:20:18 |     parlai_home: /Users/ocean/ParlAI
22:20:18 |     starttime: Dec18_22-20
22:20:18 |     task: babi:task1k:1
22:20:18 |     verbose: False
22:20:18 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
22:20:18 | creating task(s): babi:task1k:1
[building data: /Users/ocean/ParlAI/data/bAbI]
22:20:18 | Downloading http://parl.ai/downloads/babi/babi.tar.gz to /Users/ocean/ParlAI/data/bAbI/babi.tar.gz
Downloading babi.tar.gz: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.2M/19.2M [00:04<00:00, 4.46MB/s]
22:20:24 | loading fbdialog data: /Users/ocean/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-nosf/qa1_train.txt
- - - NEW EPISODE: babi:task1k:1 - - -
Mary moved to the bathroom.
John went to the hallway.
Where is Mary?
   bathroom
Daniel went back to the hallway.
Sandra moved to the garden.
Where is Daniel?
   hallway
John moved to the office.
Sandra journeyed to the bathroom.
Where is Daniel?
   hallway
Mary moved to the hallway.
Daniel travelled to the office.
Where is Daniel?
   office
John went back to the garden.
John moved to the bedroom.
Where is Sandra?
   bathroom
- - - NEW EPISODE: babi:task1k:1 - - -
Sandra travelled to the office.
Sandra went to the bathroom.
Where is Sandra?
   bathroom
Mary went to the bedroom.
Daniel moved to the hallway.
Where is Sandra?
   bathroom
John went to the garden.
John travelled to the office.
Where is Sandra?
   bathroom
Daniel journeyed to the bedroom.
Daniel travelled to the hallway.
Where is John?
   office
John went to the bedroom.
John travelled to the office.
Where is Daniel?
   hallway
22:20:24 | loaded 180 episodes with a total of 900 examples
ocean@AfoGuardMacBook-Pro ParlAI % 

seq2seq を bAbI で訓練/評価

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % mkdir -p ./parlai/agents/seq2seq/model_file           
ocean@AfoGuardMacBook-Pro ParlAI % ls model_file
ls: model_file: No such file or directory
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls ./parlai/agents/seq2seq/model_file 
ocean@AfoGuardMacBook-Pro ParlAI % ls
CODE_OF_CONDUCT.md  MANIFEST.in     autoformat.sh       data            mypy.ini        projects        requirements.txt    website
CONTRIBUTING.md     NEWS.md         codecov.yml     docs            parlai          pyproject.toml      setup.py
LICENSE         README.md       conftest.py     example_parlai_internal parlai.egg-info     pytest.ini      tests
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai
README.md   __main__.py agents      core        mturk       scripts     utils
__init__.py __pycache__ chat_service    crowdsourcing   nn      tasks       zoo
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai/agents
README.md       alice           bert_ranker     fixed_response      image_seq2seq       memnn           repeat_query        seq2seq         tfidf_retriever
__init__.py     bart            drqa            hred            ir_baseline     random_candidate    retriever_reader    starspace       transformer
__pycache__     bert_classifier     examples        hugging_face        local_human     repeat_label        safe_local_human    test_agents     unigram
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai/agents/seq2seq 
README.md   __init__.py model_file  modules.py  seq2seq.py
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % ls parlai/agents/seq2seq/model_file 
ocean@AfoGuardMacBook-Pro ParlAI % 
Terminal
ocean@AfoGuardMacBook-Pro ParlAI % python examples/train_model.py -m seq2seq -t babi:Task10k:1 -mf './parlai/agents/seq2seq/model_file/babi' -e 20 -lr 0.5 -bs 32 -hs 128 -ltim 2 -vtim 20
/Users/ocean/.pyenv/versions/3.9.0/bin/python: can't open file '/Users/ocean/ParlAI/examples/train_model.py': [Errno 2] No such file or directory
ocean@AfoGuardMacBook-Pro ParlAI % 
ocean@AfoGuardMacBook-Pro ParlAI % python parlai/scripts/train_model.py -m seq2seq -t babi:Task10k:1 -mf './parlai/agents/seq2seq/model_file/babi' -e 20 -lr 0.5 -bs 32 -hs 128 -ltim 2 -vtim 20
usage: train_model.py [-h] [--helpall] [-o INIT_OPT] [--allow-missing-init-opts ALLOW_MISSING_INIT_OPTS] [-t TASK] [-dt DATATYPE] [-bs BATCHSIZE] [-dynb {None,batchsort,full}] [-v] [-dp DATAPATH] [-m MODEL]
                      [-mf MODEL_FILE] [-im INIT_MODEL] [-et EVALTASK] [-eps NUM_EPOCHS] [-ttim MAX_TRAIN_TIME] [-vtim VALIDATION_EVERY_N_SECS] [-stim SAVE_EVERY_N_SECS] [-sval SAVE_AFTER_VALID]
                      [-veps VALIDATION_EVERY_N_EPOCHS] [-vp VALIDATION_PATIENCE] [-vmt VALIDATION_METRIC] [-vmm {max,min}] [-mcs METRICS] [-micro AGGREGATE_MICRO] [-tblog TENSORBOARD_LOG]
                      [-tblogdir TENSORBOARD_LOGDIR] [--bpe-vocab BPE_VOCAB] [--bpe-merge BPE_MERGE] [--bpe-dropout BPE_DROPOUT]

Train a model

optional arguments:
  -h, --help
        show this help message and exit
  --helpall
        Show usage, including advanced arguments.

Main ParlAI Arguments:
  -o, --init-opt INIT_OPT
        Path to json file of options. Note: Further Command-line arguments override file-based options. (default: None)
  --allow-missing-init-opts ALLOW_MISSING_INIT_OPTS
        Warn instead of raising if an argument passed in with --init-opt is not in the target opt. (default: False)
  -t, --task TASK
        ParlAI task(s), e.g. "babi:Task1" or "babi,cbt" (default: None)
  -dt, --datatype DATATYPE
        choose from: train, train:ordered, valid, test. to stream data add ":stream" to any option (e.g., train:stream). by default train is random with replacement, valid is ordered, test is ordered. (default:
        train)
  -bs, --batchsize BATCHSIZE
        batch size for minibatch training schemes (default: 1)
  -dynb, --dynamic-batching {None,batchsort,full}
        Use dynamic batching (default: None)
  -v, --verbose
        Print all messages
  -dp, --datapath DATAPATH
        path to datasets, defaults to {parlai_dir}/data (default: None)

ParlAI Model Arguments:
  -m, --model MODEL
        the model class name. can match parlai/agents/<model> for agents in that directory, or can provide a fully specified module for `from X import Y` via `-m X:Y` (e.g. `-m
        parlai.agents.seq2seq.seq2seq:Seq2SeqAgent`) (default: None)
  -mf, --model-file MODEL_FILE
        model file name for loading and saving models (default: None)
  -im, --init-model INIT_MODEL
        Initialize model weights and dict from this file (default: None)

Training Loop Arguments:
  -et, --evaltask EVALTASK
        task to use for valid/test (defaults to the one used for training) (default: None)
  -eps, --num-epochs NUM_EPOCHS
  -ttim, --max-train-time MAX_TRAIN_TIME
  -vtim, --validation-every-n-secs VALIDATION_EVERY_N_SECS
        Validate every n seconds. Saves model to model_file (if set) whenever best val metric is found (default: -1)
  -stim, --save-every-n-secs SAVE_EVERY_N_SECS
        Saves the model to model_file.checkpoint after every n seconds (default -1, never). (default: -1)
  -sval, --save-after-valid SAVE_AFTER_VALID
        Saves the model to model_file.checkpoint after every validation (default False).
  -veps, --validation-every-n-epochs VALIDATION_EVERY_N_EPOCHS
        Validate every n epochs. Saves model to model_file (if set) whenever best val metric is found (default: -1)
  -vp, --validation-patience VALIDATION_PATIENCE
        number of iterations of validation where result does not improve before we stop training (default: 10)
  -vmt, --validation-metric VALIDATION_METRIC
        key into report table for selecting best validation (default: accuracy)
  -vmm, --validation-metric-mode {max,min}
        how to optimize validation metric (max or min) (default: None)
  -mcs, --metrics METRICS
        list of metrics to show/compute, e.g. all, default,or give a list split by , like ppl,f1,accuracy,hits@1,rouge,bleuthe rouge metrics will be computed as rouge-1, rouge-2 and rouge-l (default: default)
  -micro, --aggregate-micro AGGREGATE_MICRO
        Report micro-averaged metrics instead of macro averaged metrics. (default: False)

Tensorboard Arguments:
  -tblog, --tensorboard-log TENSORBOARD_LOG
        Tensorboard logging of metrics, default is False
  -tblogdir, --tensorboard-logdir TENSORBOARD_LOGDIR
        Tensorboard logging directory, defaults to model_file.tensorboard (default: None)

BPEHelper Arguments:
  --bpe-vocab BPE_VOCAB
        path to pre-trained tokenizer vocab (default: None)
  --bpe-merge BPE_MERGE
        path to pre-trained tokenizer merge (default: None)
  --bpe-dropout BPE_DROPOUT
        Use BPE dropout during training. (default: None)

Parse Error: ambiguous option: -e could match -et, -eps
ocean@AfoGuardMacBook-Pro ParlAI % 

次のコメントが出た。

Parse Error: ambiguous option: -e could match -et, -eps

-etepsは以下にことらしい。

-et, --evaltask EVALTASK
  -eps, --num-epochs NUM_EPOCHS

http://deeplearning.hatenablog.com/entry/parlai
によると、eは、-e {エポック数}を指定したのだった。

そこで、e:eps*に変更する。

またエラーが起きた。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % python parlai/scripts/train_model.py -m seq2seq -t babi:Task10k:1 -mf './parlai/agents/seq2seq/model_file/babi' -eps 20 -lr 0.5 -bs 32 -hs 128 -ltim 2 -vtim 20
usage: train_model.py [-h] [--helpall] [-o INIT_OPT] [--allow-missing-init-opts ALLOW_MISSING_INIT_OPTS] [-t TASK] [-dt DATATYPE] [-bs BATCHSIZE] [-dynb {None,batchsort,full}] [-v] [-dp DATAPATH] [-m MODEL]
                      [-mf MODEL_FILE] [-im INIT_MODEL] [-et EVALTASK] [-eps NUM_EPOCHS] [-ttim MAX_TRAIN_TIME] [-vtim VALIDATION_EVERY_N_SECS] [-stim SAVE_EVERY_N_SECS] [-sval SAVE_AFTER_VALID]
                      [-veps VALIDATION_EVERY_N_EPOCHS] [-vp VALIDATION_PATIENCE] [-vmt VALIDATION_METRIC] [-vmm {max,min}] [-mcs METRICS] [-micro AGGREGATE_MICRO] [-tblog TENSORBOARD_LOG]
                      [-tblogdir TENSORBOARD_LOGDIR] [--bpe-vocab BPE_VOCAB] [--bpe-merge BPE_MERGE] [--bpe-dropout BPE_DROPOUT]

Train a model

optional arguments:
  -h, --help
        show this help message and exit
  --helpall
        Show usage, including advanced arguments.

Main ParlAI Arguments:
  -o, --init-opt INIT_OPT
        Path to json file of options. Note: Further Command-line arguments override file-based options. (default: None)
  --allow-missing-init-opts ALLOW_MISSING_INIT_OPTS
        Warn instead of raising if an argument passed in with --init-opt is not in the target opt. (default: False)
  -t, --task TASK
        ParlAI task(s), e.g. "babi:Task1" or "babi,cbt" (default: None)
  -dt, --datatype DATATYPE
        choose from: train, train:ordered, valid, test. to stream data add ":stream" to any option (e.g., train:stream). by default train is random with replacement, valid is ordered, test is ordered. (default:
        train)
  -bs, --batchsize BATCHSIZE
        batch size for minibatch training schemes (default: 1)
  -dynb, --dynamic-batching {None,batchsort,full}
        Use dynamic batching (default: None)
  -v, --verbose
        Print all messages
  -dp, --datapath DATAPATH
        path to datasets, defaults to {parlai_dir}/data (default: None)

ParlAI Model Arguments:
  -m, --model MODEL
        the model class name. can match parlai/agents/<model> for agents in that directory, or can provide a fully specified module for `from X import Y` via `-m X:Y` (e.g. `-m
        parlai.agents.seq2seq.seq2seq:Seq2SeqAgent`) (default: None)
  -mf, --model-file MODEL_FILE
        model file name for loading and saving models (default: None)
  -im, --init-model INIT_MODEL
        Initialize model weights and dict from this file (default: None)

Training Loop Arguments:
  -et, --evaltask EVALTASK
        task to use for valid/test (defaults to the one used for training) (default: None)
  -eps, --num-epochs NUM_EPOCHS
  -ttim, --max-train-time MAX_TRAIN_TIME
  -vtim, --validation-every-n-secs VALIDATION_EVERY_N_SECS
        Validate every n seconds. Saves model to model_file (if set) whenever best val metric is found (default: -1)
  -stim, --save-every-n-secs SAVE_EVERY_N_SECS
        Saves the model to model_file.checkpoint after every n seconds (default -1, never). (default: -1)
  -sval, --save-after-valid SAVE_AFTER_VALID
        Saves the model to model_file.checkpoint after every validation (default False).
  -veps, --validation-every-n-epochs VALIDATION_EVERY_N_EPOCHS
        Validate every n epochs. Saves model to model_file (if set) whenever best val metric is found (default: -1)
  -vp, --validation-patience VALIDATION_PATIENCE
        number of iterations of validation where result does not improve before we stop training (default: 10)
  -vmt, --validation-metric VALIDATION_METRIC
        key into report table for selecting best validation (default: accuracy)
  -vmm, --validation-metric-mode {max,min}
        how to optimize validation metric (max or min) (default: None)
  -mcs, --metrics METRICS
        list of metrics to show/compute, e.g. all, default,or give a list split by , like ppl,f1,accuracy,hits@1,rouge,bleuthe rouge metrics will be computed as rouge-1, rouge-2 and rouge-l (default: default)
  -micro, --aggregate-micro AGGREGATE_MICRO
        Report micro-averaged metrics instead of macro averaged metrics. (default: False)

Tensorboard Arguments:
  -tblog, --tensorboard-log TENSORBOARD_LOG
        Tensorboard logging of metrics, default is False
  -tblogdir, --tensorboard-logdir TENSORBOARD_LOGDIR
        Tensorboard logging directory, defaults to model_file.tensorboard (default: None)

BPEHelper Arguments:
  --bpe-vocab BPE_VOCAB
        path to pre-trained tokenizer vocab (default: None)
  --bpe-merge BPE_MERGE
        path to pre-trained tokenizer merge (default: None)
  --bpe-dropout BPE_DROPOUT
        Use BPE dropout during training. (default: None)

Parse Error: argument -h/--help: ignored explicit argument 's'
ocean@AfoGuardMacBook-Pro ParlAI %

今度は、引数を与えずに実行したら、動いた。

Terminal
ocean@AfoGuardMacBook-Pro ParlAI % python parlai/scripts/train_model.py -m seq2seq -t babi:Task10k:1 -mf './parlai/agents/seq2seq/model_file/babi'                                                
22:33:17 | building dictionary first...
22:33:17 | Opt:
22:33:17 |     adafactor_eps: '(1e-30, 0.001)'
22:33:17 |     adam_eps: 1e-08
22:33:17 |     add_p1_after_newln: False

( ・・・省略・・・ )

22:33:17 |     verbose: False
22:33:17 |     warmup_rate: 0.0001
22:33:17 |     warmup_updates: -1
22:33:17 |     weight_decay: None
22:33:17 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
22:33:17 | creating task(s): babi:Task10k:1
22:33:17 | loading fbdialog data: /Users/ocean/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-10k-nosf/qa1_train.txt
Building dictionary:   0%|                                                                                                                                                                 | 0.00/9.00k [00:00<?, ?ex/s]22:33:17 | loading fbdialog data: /Users/ocean/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-10k-nosf/qa1_train.txt
Building dictionary: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00k/9.00k [00:00<00:00, 21.3kex/s]
22:33:18 | Saving dictionary to ./parlai/agents/seq2seq/model_file/babi.dict
22:33:18 | dictionary built with 26 tokens in 0.0s
22:33:18 | No model with opt yet at: ./parlai/agents/seq2seq/model_file/babi(.opt)
22:33:18 | loading dictionary from ./parlai/agents/seq2seq/model_file/babi.dict
22:33:18 | num words = 26

( ・・・省略・・・ )

22:33:18 |     validation_max_exs: -1
22:33:18 |     validation_metric: accuracy
22:33:18 |     validation_metric_mode: None
22:33:18 |     validation_patience: 10
22:33:18 |     validation_share_agent: False
22:33:18 |     verbose: False
22:33:18 |     warmup_rate: 0.0001
22:33:18 |     warmup_updates: -1
22:33:18 |     weight_decay: None
22:33:18 | Current ParlAI commit: 99160674564847c8ed68bc21437eab8c9301e95d
22:33:18 | creating task(s): babi:Task10k:1
22:33:18 | loading fbdialog data: /Users/ocean/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-10k-nosf/qa1_train.txt
22:33:18 | training...
22:33:28 | time:10s total_exs:260 epochs:0.03
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates   tpb  tps   ups
       1 60.99  1580 25.91  260  5.732 1.307   1     2 51.81 3.694      .5673                  260 62.99 1632 25.91

22:35:59 | time:161s total_exs:4166 epochs:0.46
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates  tpb  tps   ups
   .8595  61.4  1485 24.19  242  5.142 1.253   1     2 48.37 3.501      .7293                 4166 63.4 1534 24.19

22:36:09 | time:171s total_exs:4419 epochs:0.49
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates   tpb  tps   ups
   .8458 61.14  1543 25.24  253  5.152 1.265   1     2 50.48 3.543      .7490                 4419 63.14 1594 25.24

22:36:19 | time:181s total_exs:4690 epochs:0.52
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates   tpb  tps   ups
   .8376 61.08  1647 26.96  271  5.386 1.275   1     2 53.91 3.577      .7509                 4690 63.08 1701 26.96

22:36:29 | time:191s total_exs:4947 epochs:0.55
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates   tpb  tps   ups
   .7782 61.04  1567 25.67  257  4.867 1.219   1     2 51.33 3.385      .7879                 4947 63.04 1618 25.67

22:36:39 | time:201s total_exs:5214 epochs:0.58
    clip  ctpb  ctps  exps  exs  gnorm  loss  lr  ltpb  ltps   ppl  token_acc  total_train_updates   tpb  tps  ups
   .7416 60.98  1628 26.69  267    5.2 1.211   1     2 53.39 3.356      .7640                 5214 62.98 1681 26.7

 ( Ctrl-Zで強制終了 )

^Z
zsh: suspended  python parlai/scripts/train_model.py -m seq2seq -t babi:Task10k:1 -mf 
ocean@AfoGuardMacBook-Pro ParlAI % 

次にやること

ローカル環境で、サンプルコードを動かすことができた。、

次は、ParlAIで、教師役と生徒役の2体のAIが、環境の中で学習を繰り返すシステム系を理解して、Google Colaboratoryさまざまなデータセットで、モデルの学習と推論を行ってみたい。

以下のサイトでは、Facebookから新しく出たBlenderbotを、Google Colaboratory環境上で、学習を試みている。

ColabでFacebookのSOTAなチャットボットを動かして、ついでに日本語化してみた

あと、Lightなるものもあるらしい。こちらはテキストゲームとのこと。

@alphasobaさん 「ボッチだからPCと喋ってあそぶ。 ParlAI Light。」