Animal AI Olympicsの環境を触ってみる

5166 ワード

ReinforcementLearning ReinforcementLearning テキストリンク

基本情報

トップページ動画があるが、このタスクを解くわけでは無い。
タスクは、以下に説明がある。10個。

Food
Preferences
Obstacles
Avoidance
Spatial Reasoning
Generalization
Internal Models
Object Permanence
Advanced Preferences
Causal Reasoning

ルールはこちら
http://animalaiolympics.com/rules.html

順位は、EvalAIというプラットフォームでつけている。

EvalAIは、CLIがあり pip install evalai で簡単インストール

EvalAIにアカウント作成後、トークンが得られるので、CLIは、それを使ってログインする。

評価申請の方法

評価はスクリプトをPushするかDockerイメージをPushすると自動で実行される。
Animal AI Olympicsでは、Dockerイメージを使う。学習済みモデルはDockerイメージに含めておく。（余談だが、AWSを使っているらしい。DockerイメージはECRにPushされる）

評価申請の方法は以下に書かれている。
https://github.com/beyretb/AnimalAI-Olympics/blob/master/documentation/submission.md

以下に評価申請用のサンプルがあり、これを修正する。
https://github.com/beyretb/AnimalAI-Olympics/tree/master/examples/submission

Dockerfileには、必要なライブラリのインストールなどを書く。
学習済みモデルはdata以下に置く。agent.pyを修正し、step毎にActionを推論するようにする。
このdata以下とagent.pyは、Dockerイメージ無いにコピーされる。

実行の仕方

$ git clone https://github.com/beyretb/AnimalAI-Olympics.git
$ pip install animalai

にある"Environment link"からダウンロード。自分の環境のものをダウンロード。

cloneしたディレクトリの中の'env/'にダウンロードした実行ファイルを配置

visualizeArena.pyを実行。

$ cd AnimalAI-Olympics/examples
$ python visualizeArena.py

cを押すと一人称視点へ

w, a, s, dで移動できる。

configsの中にタスク1〜7のサンプルがある。これは、審査のサンプル。8〜10は秘密らしい。

$ python visualizeArena.py configs/6-Generalization.yaml

環境についてメモ

に、ObservationとActionについて記述がある
Action spaceは、[3, 3]で、それぞれ (0: nothing, 1: forward, 2: backward) と (0: nothing, 1: right, 2: left)の意味になる。つまり、2次元のAction。

ただ、Open AI Gymでは、1次元にしないといけないので、AnimalAIEnvというラッパーがある。これで、[3, 3]のAction spaceをflattenしており、action spaceは9になる。つまり

0: [0, 0] nothing, nothing
1: [0, 1] nothing, right
2: [0, 2] nothing, left
3: [1, 0] forward, nothing
4: [1, 1] forward, right
5: [1, 2] forward, left
6: [2, 0] backward, nothing
7: [2, 1] backward, right
8: [2, 2] backward, left

Author And Source

この問題について(Animal AI Olympicsの環境を触ってみる), 我々は、より多くの情報をここで見つけました https://qiita.com/ikeyasu/items/4c93dd9a579fbb7cfeec

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .