Watsonのカスタム辞書での音声認識をやってみた

14597 ワード

Watson 音声認識 AI AI テキストリンク

IBM Watsonのカスタム辞書での音声認識をやってみた

以前、Watsonの音声認識をやってみたので今度はカスタムの辞書登録をやってみる

これにより認識できなかったものを認識できる。

事前準備

WatsonのSTTサービスがあること以前作成したデータをそのまま実行する

ubuntu@test1:~$ curl -X POST -u "{username}:{password}" --header "Content-Type: audio/mp3" --data-binary "@rd319_16000.mp3" "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=ja-JP_BroadbandModel" | jq '.results[].alternatives[].transcript' -r
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  779k    0  5100  100  774k    184  28705  0:00:27  0:00:27 --:--:--     0
小川 未明 白
海 と 太陽
笑 わ
調べる
夜 も 寝る
黄金 を
いびき を かいて 寝て
昔 昔
大昔
海 が
初めて
開けて 笑った とき に
太陽 は
目 を 回して 驚いた
かわいい 花屋
人たち を
海 が
飲んで しまおう と
やさしく 光る 太陽 は
魔術
海 を 眠ら した
海 は
昼寝 る
夜 も 眠れ
多く の
いびき を かいて 寝て

元の文章が以下のためそのまま登録してみる

小川 未明 作

海と太陽
海は昼眠る、夜も眠る、
ごうごう、いびきをかいて眠る。
昔、昔、おお昔
海がはじめて、口開けて、
笑ったときに、太陽は、
目をまわして驚いた。
かわいい花や、人たちを、
海がのんでしまおうと、
やさしく光る太陽は、
魔術で、海を眠らした。
海は昼眠る、夜も眠る。
ごうごう、いびきをかいて眠る。

カスタムモデルの準備

標準の認識ではなくカスタム用に別途モデルを作成し、そこに対してデータを登録していく

ubuntu@test1:~$ curl  -X POST -u "{username}:{password}" --header "Content-Type: application/json" --data "{\"name\": \"Custom_Model1\",  \"base_mod
el_name\": \"ja-JP_BroadbandModel\",\"description\": \"Test Custom model\"}" "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations"
{"customization_id": "xxxxxx"}
ubuntu@test1:~$

カスタム辞書の学習

以下のデータを作成し学習をさせてみた。

{
   "words":[
      {
         "word":"2",
         "sounds_like":[
            "ヨルモネムル"
         ],
         "display_as":"夜も眠る、"
      },
     {
         "word":"3",
         "sounds_like":[
            "ゴウゴウイビキヲカイテネムル"
         ],
         "display_as":"ごうごう、いびきをかいて眠る。"
      },
     {
         "word":"4",
         "sounds_like":[
            "ムカシムカシオオムカシ"
         ],
         "display_as":"昔、昔、おお昔。"
      },
     {
         "word":"5",
         "sounds_like":[
            "ウミガハジメテ"
         ],
         "display_as":"海がはじめて、"
      },
     {
         "word":"6",
         "sounds_like":[
            "クチアケテ"
         ],
         "display_as":"口開けて、"
      },
     {
         "word":"7",
         "sounds_like":[
            "ワラッタトキニ"
         ],
         "display_as":"笑ったときに、"
      },
     {
         "word":"8",
         "sounds_like":[
            "タイヨウハ"
         ],
         "display_as":"太陽は、"
      },
     {
         "word":"9",
         "sounds_like":[
            "メヲマワシテオドロイタ"
         ],
         "display_as":"目をまわして驚いた。"
      },
     {
         "word":"10",
         "sounds_like":[
            "カワイイハナヤ"
         ],
         "display_as":"かわいい花や、"
      },
     {
         "word":"11",
         "sounds_like":[
            "ヒトタチヲ"
         ],
         "display_as":"人たちを、"
      },
     {
         "word":"12",
         "sounds_like":[
            "ウミガノンデシマオウト"
         ],
         "display_as":"海がのんでしまおうと、"
      },
     {
         "word":"13",
         "sounds_like":[
            "ヤサシクヒカルタイヨウハ"
         ],
         "display_as":"やさしく光る太陽は、"
      },
     {
         "word":"14",
         "sounds_like":[
            "マジュツデウミヲネムラシタ"
         ],
         "display_as":"魔術で、海を眠らした。"
      },
     {
         "word":"15",
         "sounds_like":[
            "ウミハヒルネムルヨルモネムル"
         ],
         "display_as":"海は昼眠る、夜も眠る。"
      },
     {
         "word":"16",
         "sounds_like":[
            "ゴウゴウキビキヲカイテネムル"
         ],
         "display_as":"ごうごう、いびきをかいて眠る。"
      }
   ]
}

ポイントは文章を1つのものと2つのものを入れてみました。
以下のようにトレーニングをする。スタータスをチェックし、progressが100になってstatusがavailableになっているとOK。

ubuntu@test1:~$ curl  -X GET -u "{username}:{password}" --header "Content-Type: application/json"  "https://stream.watsonplatform.net/speech-to-text
/api/v1/customizations/xxxxxx"
{
   "owner": "xxxxxx",
   "base_model_name": "ja-JP_BroadbandModel",
   "customization_id": "xxxxxx",
   "dialect": "ja-JP",
   "created": "2017-09-24T00:29:17.989Z",
   "name": "Custom_Model1",
   "description": "Test Custom model",
   "progress": 0,
   "language": "ja-JP",
   "status": "ready"
}
ubuntu@test1:~$ curl -X POST -u "{username}:{password}"  --header "Content-Type: application/json"  --data "@custom_data.txt" "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/xxxxxx/words"
{}
ubuntu@test1:~$

ubuntu@test1:~$ curl  -X POST -u "{username}:{password}" --header "Content-Type: application/json" --data "{}"  "https://stream.watsonplatform.net/s
peech-to-text/api/v1/customizations/xxxxxx/train"
{}
ubuntu@test1:~$ curl  -X GET -u "{username}:{password}" --header "Content-Type: application/json"  "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/xxxxxx"
{
   "owner": "xxxxxx",
   "base_model_name": "ja-JP_BroadbandModel",
   "customization_id": "xxxxxx",
   "dialect": "ja-JP",
   "created": "2017-09-24T00:29:17.989Z",
   "name": "Custom_Model1",
   "description": "Test Custom model",
   "progress": 0,
   "language": "ja-JP",
   "status": "training"
}
ubuntu@test1:~$
ubuntu@test1:~$ curl  -X GET -u "{username}:{password}" --header "Content-Type: application/json"  "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/xxxxxx"
{
   "owner": "xxxxxx",
   "base_model_name": "ja-JP_BroadbandModel",
   "customization_id": "xxxxxx",
   "dialect": "ja-JP",
   "created": "2017-09-24T00:29:17.989Z",
   "name": "Custom_Model1",
   "description": "Test Custom model",
   "progress": 100,
   "language": "ja-JP",
   "status": "available"
}
ubuntu@test1:~$

これを再度実行してみる。

ubuntu@test1:~$ curl -X POST -u "{username}:{password}"  --header "Content-Type: audio/mp3" --data-binary "@rd319_16000.mp3" "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=ja-JP_BroadbandModel&customization_id=xxxxxx" | jq '.results[].alternatives[].transcript' -r
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  779k    0  5101  100  774k    209  32619  0:00:24  0:00:24 --:--:--     0
小川 未明 白
海 と 太陽
海 は
調べる
夜も眠る、
高校
いびき を かいて 寝て
昔々
大昔
海 が
初めて
口開けて、 笑ったときに、
太陽は、 は
目をまわして驚いた。
かわいい花や、
人たち を
海 が
飲んで しまおう と
やさしく光る太陽は、
魔術
海 を 眠ら した
海 は
昼寝 る
夜も眠る、
多く の
いびき を かいて 寝て
ubuntu@test1:~$
ubuntu@test1:~$

初めよりは精度が上がりましたがやはり100%ではないですね。
カスタムモデルの作りが悪かったのかもしれません。。。。

Author And Source

この問題について(Watsonのカスタム辞書での音声認識をやってみた), 我々は、より多くの情報をここで見つけました https://qiita.com/sou-kun/items/75339fc9c745ed50a5ae

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .

Iteratorコレクション反復器

Numpy配列におけるNoneの役割