espnet-音声処理(TTS)の動作メモ(作業中)


ESPNET-TTS

ESPnetは、エンドツーエンドの音声処理ツールキットです、主にエンドツーエンドの音声認識とエンドツーエンドのテキスト読み上げに特化しています.

colabでのテスト

https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb#scrollTo=C1a5CgX1AHXJ
ここからデモを実行できます.

English demoとJapanese demoの2つが用意されています

English demo

Download pretrained models
You can select one from three models. Please only run the seletected model cells.

install
[Tacotron2] [Transformer] [FastSpeech]
Setup
Synthesis
を順番にすすめるだけです.
[Tacotron2] [Transformer] [FastSpeech]のみっつのモデルから選ぶことができます.
Synthesisで任意の文字列を入力すればTTSしてくれます.

試しにTacotron2で生成してみました.

This is a computer
https://soundcloud.com/jg1-wwk/e2e-tts-en-lang-test-this-is-a-computer

ここから試聴できます.

Japanese demo

Install Japanese dependencies
(a) Tacotron 2
(b) Transformer (これも選べる)
Setup
Synthesis

これも順次進めていくだけでした.

テスト音声はこちら.

https://soundcloud.com/jg1-wwk/e2e-tts-demo-jp-lang-test
"計算機最上川" KEISANKIMOGAMIGAWA

実機環境,オンプレでの動作テスト

colabはクラウド上でのコンピューティングなので制限も少々有ります.
ここではオンプレマシンでの推論を動かしてみます.

python環境はminiconda+python3.6
cuda versuon 10.2

~/.bashrcあたりに以下のパスを通しました.

export CUDAROOT=/usr/local/cuda                                                                                                                               export PATH=$CUDAROOT/bin:$PATH                                                                                                                               export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH                                                                                                       export CFLAGS="-I$CUDAROOT/include $CFLAGS"                                                                                                                   export CUDA_HOME=$CUDAROOT                                                                                                                                    export CUDA_PATH=$CUDAROOT   

git clone、推論用文字列のセット、

sudo apt-get install libsndfile1-dev
sudo apt-get install libprotobuf9v5 protobuf-compiler libprotobuf-dev
conda create -n espnet python=3.7
conda activate espnet
git clone [email protected]:espnet/espnet.git

kaldiのインストール

やり方その1

cd tools
make KALDI=/home/rocm/miniconda3/envs/esp/bin/python

やり方その2 CPU専用

cd tools
 make CUPY_VERSION='' -j 10

installチェック

make check_install

どうもErrorが取れないので環境構築がうまく行かない

ERROR: tensorboardx 1.9 has requirement protobuf>=3.8.0, but you'll have protobuf 3.0.0 which is incompatible.
Installing collected packages: PyYAML, filelock, typing, typing-extensions, chainer, configargparse, editdistance, funcsigs, more-itertools, zipp, importlib-metadata, inflect, nltk, distance, g2p-en, h5py, jaconv, kaldiio, scipy, librosa, matplotlib, pandas, attrs, pyrsistent, jsonschema, stempeg, pyaml, musdb, museval, bottleneck, nara-wpe, pysptk, sklearn, fastdtw, nnmnkwii, pillow, pystoi, pytorch-wpe, sentencepiece, tensorboardX, torch-complex, unidecode, espnet
  Found existing installation: PyYAML 3.12
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Makefile:70: recipe for target 'espnet.done' failed
make: *** [espnet.done] Error 1

puYAMLの再インストールErrorとtensorboardxのバージョンErrorが両方起きている模様.

推論テスト

(まだ未完走)

cd ..
cd ./egs/ljspeech/tts1/
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
../../../utils/synth_wav.sh --vocoder_models ljspeech.wavenet.mol.v1 example.txt

makeが一応完了したのでtest runを実施しましたが

$ ../../../utils/synth_wav.sh --models ljspeech.fastspeech.v1 example.txt
--2019-12-02 18:42:25--  https://drive.google.com/uc?export=download&id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
Resolving drive.google.com (drive.google.com)... 2404:6800:4004:800::200e, 216.58.197.142
Connecting to drive.google.com (drive.google.com)|2404:6800:4004:800::200e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz'

decode/download/ljspeech.fastspeech.v1/     [ <=>                                                                          ]   3.21K  --.-KB/s    in 0s

2019-12-02 18:42:26 (32.3 MB/s) - 'decode/download/ljspeech.fastspeech.v1/kDGmpO..tar.gz' saved [3292]


gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3292    0  3292    0     0   9119      0 --:--:-- --:--:-- --:--:--  9093
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   388    0   388    0     0   1158      0 --:--:-- --:--:-- --:--:--  1158
100 92.1M    0 92.1M    0     0  35.5M      0 --:--:--  0:00:02 --:--:-- 55.0M
conf/tuning/train_fastspeech.v1.yaml
conf/decode.yaml
data/train_no_dev/cmvn.ark
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.last1.avg.best
exp/train_no_dev_pytorch_train_fastspeech.v1/results/model.json
data/lang_1char/train_no_dev_units.txt
Sucessfully downloaded zip file from https://drive.google.com/open?id=17RUNFLP4SSTbGA01xWRJo7RkR876xM0i
stage 0: Data preparation
/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/data2json.sh --trans_type char decode/example/data decode/download/ljspeech.fastspeech.v1/data/lang_1char/train_no_dev_units.txt
Traceback (most recent call last):
  File "/home/rocm/espnet/egs/ljspeech/tts1/../../../utils/merge_scp2json.py", line 15, in <module>

    from espnet.utils.cli_utils import get_commandline_args
ModuleNotFoundError: No module named 'espnet'

現状未完走です.

todo

相当色々手を尽くしたがTTS完走に至らないのでアプローチを変えないとダメかもしれない.
オンプレ環境でのTTSができるようにする.(Dockerfileあたりを試すなどする必要がありそう)