jupyter nteractをherokuサーバーで立ち上げてみた（jupyter notebookでもやることは同じ）

5485 ワード

nteract NLP Heroku 可視化 Python Python テキストリンク

きっかけ

flask、herokuを利用し、ちまちまとテキストマイニング・可視化webアプリを作っていた。
ある日、nteractを知る。

nteract
　https://github.com/nteract/nteract
　https://blog.nteract.io/designing-the-nteract-data-explorer-f4476d53f897
　・自動インタラクティブ可視化機能がついたjupyter notebookだとおもいねぇ。

「おう、pandasのDataFrame表示するんと同じ手間でこんだけ可視化できるんけ。こりゃ使うてみるしかあるまぁのう・・・（試用中）・・・なんぼか弱点はあるのう・・・
　弱点はあらぁが、webアプリとできれば、ハードを選ばずどこからでもやりたいときに速やかに可視化できよる。制限の多いGoogle Colaboratoryより、ある意味便利かのう。」

というわけで、
heroku上でnteractを稼働させてみた。

参照

＊なぜかnteractの情報はあまりないのよね・・・

Pythonを使った機械学習の勉強にはJupyter NotebookをHerokuの無料枠で立ててスマホからでも実行できるようにして共有しよう
https://qiita.com/G-awa/items/8530a10cb847e4080df3

Deploy a Jupyter Notebook Online with Voila and Heroku
https://pythonforundergradengineers.com/deploy-jupyter-notebook-voila-heroku.html

構成

heroku：アプリケーションサーバー
nteract：heroku上で起動させる
github private：ファイルサーバー

コード

次のファイルを作成しデプロイ。指定されたアドレスを開けば動く。
注意：herokuの準備および説明は省略します。
注意：パスワードなどセキュリティ関連の処理は省略します。足すことを強くお勧めします。
注意：githubのpriveteをファイルサーバー代わりとしていますが、関連する処理は省略します。

ディレクトリ構成

xxxxxx (任意）
 ┣  Procfile
 ┣  requirements.txt
 ┣  start_jupyter
 ┗  （任意のipynbファイルなど）

requirements.txt　要事増減させてください

gunicorn==19.9.0
click==7.1.1
Flask==1.1.2
itsdangerous==1.1.0
Jinja2==2.11.2
MarkupSafe==1.1.1
numpy==1.18.3
pandas==1.0.3
plotly==4.6.0
python-dateutil==2.8.1
pytz==2019.3
retrying==1.3.3
six==1.14.0
Werkzeug==1.0.1
xlrd==1.2.0

nteract_on_jupyter
matplotlib
PyGithub

Procfile

web: chmod +x start_jupyter ; start_jupyter

start_jupyter

#!/usr/bin/env bash
jupyter nteract --no-browser --ip=* --port=$PORT

$ cd xxxxxx
$ git init
$ heroku create xxxxxx 
$ git add .
$ git commit -m "first"
$ git push heroku master

稼働例

herokuサーバーに常最新の前処理コードを記載したipynbファイルを置いておきけば、実行するだけで簡単に分析ができる。

使用上の注意

herokuの仕様上、作成したデータやファイルは一定期間で消えてしまいます。必要なファイルがあるならば、herokuサーバー外に保存しておくことを忘れないようにしましょう。
（実行するごとにファイルをメール送信するコードを書いておく、などありえるでしょう。自分はそうしています。）

参考

＊要事mecabなど入れておいてもよいでしょう。
　HerokuにMeCabを入れる際ハマっていた記録
　　https://qiita.com/kzuzuo/items/1b3e9c9af57bd4464690

＊次、nteractベースに変更しても良いかな・・・
　特許など比較的長文の文章間　類似可視化手法:　tfidf/cluster vis: tfidf-word2vec-clustering visualization
　　https://qiita.com/kzuzuo/items/8a80d8974bf3a7db7e54

＊次のようにin/outのないipynbファイルを入れておくのもよい
　e-Gov法令APIとXML　Pythonを用いた特定ワードが含まれる条文抽出
　　https://qiita.com/kzuzuo/items/d53ff2e092a69424fea0

＊nteractではなくjupyter notebookであってもやることは同じ。
＊webアプリ単体として成り立たせるならば、Dashで良い気もしている。
＊pandasベースのインタラクティブ可視化手法には、Pandas-Bokehもあるらしい。
　https://github.com/PatrikHlobil/Pandas-Bokeh/blob/master/README.md
＊pandasベースのインタラクティブ可視化手法には、Cufflinksもある。
　https://medium.com/@ozan/interactive-plots-with-plotly-and-cufflinks-on-pandas-dataframes-af6f86f62d94
→pandasのplot()からplotlyが呼べるようになったそうです。

pd.options.plotting.backend = `plotly`
df.plot.bar()

＊インタラクティブ可視化手法には、もちろんplotlyもある。

Author And Source

この問題について(jupyter nteractをherokuサーバーで立ち上げてみた（jupyter notebookでもやることは同じ）), 我々は、より多くの情報をここで見つけました https://qiita.com/kzuzuo/items/7336fa6ac991e7e5e82a

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .