Jupyter一般的な可視化フレームワークの選択

12992 ワード

Pythonをテクノロジースタックとするデータ科学者にとって、Jupyterはデータ報告ツールである.Rコミュニティにとって、有名なggplot2は一般的な可視化フレームワークであるかもしれませんが、PythonやJupyterを核心とするインタラクティブなレポートの可視化案についてはそれほど熟知していません.本稿では、一般的なソリューションをいくつか比較して、皆さんが選択しやすいようにします.
選択基準
称述式か命令式か
データ関係者が使用する図のカテゴリは、GIS可視化、ネットワーク可視化、統計図の3つに分類されます.したがって、ほとんどのシーンでは、非常に下位レベルの点、線、面に基づくコマンドに触れたくないので、良いパッケージのフレームワークを選択することが重要です.
もちろん、公認の良いパッケージは「The Grammar of Graphics(Statistics and Computing)」という本に基づいており、Rのggplot2は基本的に良い実現である.これらの描画コマンドは基本的に「ナチュラル言語」(Natural Language)のように使用できます.私たちは一応コンピュータ科学分野の「陳述式」を用いてこのような描画方式を表現する.
逆に、次の場合、このような描画コマンドは気にしない場合があります.

の図はかなり簡単で、描画速度が要求され、一般的に大きなフレームが重い(もちろん相対的に).

細部を非常に詳細に微調整したい場合、一般的に大きなフレームワークは微調整の面で比較的複雑または文コマンドに縮小する.

は統計作図可視化の革新者であり,新しい可視化の実践を試みたい.

これらの場合、単純な操作式と最下位の描画コマンドを提供するフレームワークは、上記と同様に、「コマンド式」を借りてこのようなフレームワークを説明するのが楽しいことは明らかです.
対話するかどうか
従来の交付静的アイコンとは異なり、Web端末ベースのJupterの大きな利点は、インタラクティブなアイコンを描くことができること(最近のRNotebookでも実装されている)であるため、インタラクティブを選択するかどうかは、考慮すべき点でもある.
インタラクティブグラフの利点:

は、より多くのデータ次元および情報を提供することができる.

ユーザ端末は、増幅、選択、転送などのより多くの操作を行うことができる.

はBIエンジニアに対応するJavaScriptコードを渡してエンジニアリングすることができる.

は効果的にクールで、報告受信者の特徴を考慮して選択できます.

非対話図の利点:

レポートファイルが直接静的ファイルにエクスポートされた場合、変換によって情報が失われることはありません.

画像はレポートと分離することができ、必要に応じて他の仕事の成果とすることができる.

Notebookを実行するときに、さまざまなフロントエンドフレームワークに多くの世界をロードする必要はありません.

非カーネルインタラクションJupyterのほとんどのコマンドは、以下の方法でデータを取得しますが、ほとんどの描画方法は、実際には、Notebook内のコードによって、Notebookがカーネルと対話した後に出力結果を示すだけです.しかし、ipywidgetsフレームワークは、Code CellのコードとNotebookのフロントエンドコントロール(ボタンなど)とのバインドを実現してカーネルを操作し、異なる描画結果を提供し、一部の描画フレームワークの各要素がカーネルと直接対話することもできる.

これらのフレームワークでは、より複雑なNotebookの可視化アプリケーションを構築できますが、カーネルベースのため、レポートの提出、表示時にオフラインファイルを使用すると、これらのインタラクションが無効になります.
フレーム羅列matplotlib
最もよく知られている図面フレームワークはmatplotlibで、ほとんどのpython内の静的図面フレームワークの下位コマンドを提供しています.上記の可視化フレームワークの分割に従えば、matplotlibは非インタラクティブな「コマンド式」作図フレームワークに属する.

## matplotlib    
from pylab import *

X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
C,S = np.cos(X), np.sin(X)

plot(X,C)
plot(X,S)

show()

利点は、比較的速く、最下位の操作が多いことです.欠点は言語が煩雑で、内蔵のデフォルトスタイルが美しくないことです.matplotlib jupyterでは、より良い効果を示すためにいくつかの構成が必要です.詳細はこの記事を参照してください.ggplotおよびplotnineRが移行した人にとって、ggplotとplotnineは福音であり、ggplot2のすべての文法を基本的にクローン化していると言える.横に比較するとplotnineの方が効果的です.この2つの図面パッケージの最下位は依然としてmatplotlibであるため、%matplotlib inline文を参照する際に使用することを忘れないでください.plotnineもggplot2の良好な構成文法と論理を移植したと言える.

## plotnine  
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear'))

Seaborn seabornは正確にはmatplotlibに属する拡張パッケージで、その上で多くの非常に有用なパッケージを作って、基本的に大部分の統計作図の需要を満たすことができて、matplotlib+seabornで基本的に大部分の業務シーンを満たすことができて、文法も更に「陳述式」です.
欠点はパッケージが高く、基本的にAPIが提供しない図は完全に描くことができず、各種図の組み合わせにも適していない.また、構成文の構文は「コマンド式」に戻り、比較的複雑で一致しません.

## seaborn  
import seaborn as sns; sns.set(color_codes=True)
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)

plotly plotlyはプラットフォーム間JavaScriptインタラクティブグラフィックスパッケージであり、開発者の核心はjavascriptであるため、文法全体がjson構成を書くのと似ており、文法の特質も「陳述式」と「コマンド式」の間にあり、サービスなしバージョンは無料である.
学習コストが高くなく、javascriptバージョンに文をすぐに移植できる点があります.欠点は言語が比較的煩雑である.

##plotly  
import plotly.plotly as py
import plotly.graph_objs as go

# Add data
month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
         'August', 'September', 'October', 'November', 'December']
high_2000 = [32.5, 37.6, 49.9, 53.0, 69.1, 75.4, 76.5, 76.6, 70.7, 60.6, 45.1, 29.3]
low_2000 = [13.8, 22.3, 32.5, 37.2, 49.9, 56.1, 57.7, 58.3, 51.2, 42.8, 31.6, 15.9]
high_2007 = [36.5, 26.6, 43.6, 52.3, 71.5, 81.4, 80.5, 82.2, 76.0, 67.3, 46.1, 35.0]
low_2007 = [23.6, 14.0, 27.0, 36.8, 47.6, 57.7, 58.9, 61.2, 53.3, 48.5, 31.0, 23.6]
high_2014 = [28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
low_2014 = [12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1]

# Create and style traces
trace0 = go.Scatter(
    x = month,
    y = high_2014,
    name = 'High 2014',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4)
)
trace1 = go.Scatter(
    x = month,
    y = low_2014,
    name = 'Low 2014',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,)
)
trace2 = go.Scatter(
    x = month,
    y = high_2007,
    name = 'High 2007',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dash') # dash options include 'dash', 'dot', and 'dashdot'
)
trace3 = go.Scatter(
    x = month,
    y = low_2007,
    name = 'Low 2007',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,
        dash = 'dash')
)
trace4 = go.Scatter(
    x = month,
    y = high_2000,
    name = 'High 2000',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dot')
)
trace5 = go.Scatter(
    x = month,
    y = low_2000,
    name = 'Low 2000',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,
        dash = 'dot')
)
data = [trace0, trace1, trace2, trace3, trace4, trace5]

# Edit the layout
layout = dict(title = 'Average High and Low Temperatures in New York',
              xaxis = dict(title = 'Month'),
              yaxis = dict(title = 'Temperature (degrees F)'),
              )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-line')

注:このフレームワークはjupyterで使用するにはinit_を使用する必要があります.notebook_mode()JavaScriptフレームワークをロードします.bokeh bokehは、pydataメンテナンスの比較的潜在的なオープンソースインタラクティブ可視化フレームワークである.
言うまでもなく、このフレームワークは、下位文と「陳述式」描画コマンドを同時に提供します.相対的に文法もはっきりしていますが、その構成文には依然として可視化フレームワークの問題が多く、「陳述式」コマンドと一致せず、合理的な構造がないということです.また、一般的なインタラクション効果の中には、下位コマンドで使用されるものもありますので、Dashboardを迅速に実装したり、図面を作成したりするのは不便です.

## Bokeh  
import numpy as np
import scipy.special

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file

p1 = figure(title="Normal Distribution (μ=0, σ=0.5)",tools="save",
            background_fill_color="#E8DDCB")

mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2

p1.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p1.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p1.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p1.legend.location = "center_right"
p1.legend.background_fill_color = "darkgrey"
p1.xaxis.axis_label = 'x'
p1.yaxis.axis_label = 'Pr(x)'



p2 = figure(title="Log Normal Distribution (μ=0, σ=0.5)", tools="save",
            background_fill_color="#E8DDCB")

mu, sigma = 0, 0.5

measured = np.random.lognormal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8.0, 1000)
pdf = 1/(x* sigma * np.sqrt(2*np.pi)) * np.exp(-(np.log(x)-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((np.log(x)-mu)/(np.sqrt(2)*sigma)))/2

p2.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p2.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p2.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p2.legend.location = "center_right"
p2.legend.background_fill_color = "darkgrey"
p2.xaxis.axis_label = 'x'
p2.yaxis.axis_label = 'Pr(x)'



p3 = figure(title="Gamma Distribution (k=1, θ=2)", tools="save",
            background_fill_color="#E8DDCB")

k, theta = 1.0, 2.0

measured = np.random.gamma(k, theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 20.0, 1000)
pdf = x**(k-1) * np.exp(-x/theta) / (theta**k * scipy.special.gamma(k))
cdf = scipy.special.gammainc(k, x/theta) / scipy.special.gamma(k)

p3.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p3.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p3.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p3.legend.location = "center_right"
p3.legend.background_fill_color = "darkgrey"
p3.xaxis.axis_label = 'x'
p3.yaxis.axis_label = 'Pr(x)'



p4 = figure(title="Weibull Distribution (λ=1, k=1.25)", tools="save",
            background_fill_color="#E8DDCB")

lam, k = 1, 1.25

measured = lam*(-np.log(np.random.uniform(0, 1, 1000)))**(1/k)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8, 1000)
pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k)
cdf = 1 - np.exp(-(x/lam)**k)

p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color="#036564", line_color="#033649")
p4.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p4.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p4.legend.location = "center_right"
p4.legend.background_fill_color = "darkgrey"
p4.xaxis.axis_label = 'x'
p4.yaxis.axis_label = 'Pr(x)'



output_file('histogram.html', title="histogram.py example")

show(gridplot(p1,p2,p3,p4, ncols=2, plot_width=400, plot_height=400, toolbar_location=None))

bqplot bqplotは、ipywidgetsとd3.jsの組み合わせに基づいて開発されたカーネル対話型の可視化フレームワークである.文法的にはmatplotlibとほぼ一致する文法が相対的にパッケージ化されている「陳述式文法」が採用されている.利点は、カーネルと直接対話することであり、多くのコントロールを使用してより多くの画像処理を実現することができ、欠点も直接的であり、オフラインドキュメントにはパターンやコントロールが表示されず、失効することがあります.

## bqplot  
import numpy as np
from IPython.display import display
from bqplot import (
    OrdinalScale, LinearScale, Bars, Lines, Axis, Figure
)

size = 20
np.random.seed(0)

x_data = np.arange(size)

x_ord = OrdinalScale()
y_sc = LinearScale()

bar = Bars(x=x_data, y=np.random.randn(2, size), scales={'x': x_ord, 'y':
y_sc}, type='stacked')
line = Lines(x=x_data, y=np.random.randn(size), scales={'x': x_ord, 'y': y_sc},
             stroke_width=3, colors=['red'], display_legend=True, labels=['Line chart'])

ax_x = Axis(scale=x_ord, grid_lines='solid', label='X')
ax_y = Axis(scale=y_sc, orientation='vertical', tick_format='0.2f',
            grid_lines='solid', label='Y')

Figure(marks=[bar, line], axes=[ax_x, ax_y], title='API Example',
       legend_location='bottom-right')

その他の特殊な需要の作図
統計作図に加えて、ネットワーク可視化とGIS可視化もよく使われています.ここでは簡単な羅列だけをします.
GISクラス:

gmap:インタラクティブ、google mapsインタフェース

を使用

ipyleaflet:インタラクション、leafletインタフェース

を使用
ネットワーククラス:

networkx:下層matplotlib

plotly

まとめ
最下位実装
インタラクション方式
構文
げんごこうぞう
コメント
推奨度matplotlib
-
なし
命令式
最下位言語
複雑な最下位レベルの操作が可能
★★★ gglot matplotlib
なし
陳述式
クラスggplot2推奨plotnine★★ plotnine matplotlib
なし
陳述式
クラスggplot2完全移植ggplot2★★★★★ seaborn matplotlib
なし
陳述式
高度な言語
多くの有用な統計図クラスのパッケージがあります.図面の組み立てには向いていません
★★★★★ plotly plotly.js
フロントエンドインタラクション
コマンドとプレゼンテーションの間
JavaScriptのようなもの
構文はjson構成に似ています
★★★★ bokeh
-
フロントエンドインタラクション
コマンド
下位言語と高度な言語の両方があります
コミュニティには潜在力があります
★★★ bqplot d3.js
カーネルインタラクション
コマンドmatplotlibのような下位言語があり、カプセル化された高度な言語があります.
カーネルインタラクション
★★★★

【チェーンテーブル】チェーンテーブルを反転~まだ読めません

Java Staticキーワードの詳細