TopicModelテーマモデルの可視化
http://blog.csdn.net/pipisorry
Browse LDA Topic Models This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-grapph per topic,and annotates selection of documents with therech.Instach.
Rコマンドラインに入力します.
Browse LDA Topic Models This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-grapph per topic,and annotates selection of documents with therech.Instach.
Rコマンドラインに入力します.
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
Rtools : Loading required package: devtools
WARNING: Rtools is required to build R packages, but is not currently installed.
Please
download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run > find_rtools()
[1] TRUE
...
ノート:
1. が っているRバージョン3.3.2、Rtoolsバージョン3.3はエラーを します.Rtools 3.1をインストールしたいです.また を ても っています!!!
[http://cran.r-project.org/bin/windows/Rtools/」
2.RtoolsとRバージョンの に し、エラーがない の :> if (!require(devtools)) {install.packages("devtools"); library(devtools)}
> install_github("vanatteveldt/topicbrowser")
Downloading github repo vanatteveldt/topicbrowser@master
Installing topicbrowser
"C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL \
"C:/Users/pi/AppData/Local/Temp/RtmpcvsU6M/devtools11d0fc638d5/vanatteveldt-topicbrowser-cfa62a3" \
--library="C:/Users/pi/Documents/R/win-library/3.2" --install-tests
* installing *source* package 'topicbrowser' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (topicbrowser)
Reloading installed topicbrowser
> library(topicbrowser)
>
Creating a topic browser
1.まずR のtopicmodelパッケージをインストールします.> install.packages("topicmodels")
Installing package into :/Users/pi/Documents/R/win-library/3.2?(as ib?is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/topicmodels_0.2-1.zip'
Content type 'application/zip' length 1308321 bytes (1.2 MB)
downloaded 1.2 MB
package opicmodels?successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\pi\AppData\Local\Temp\RtmpcvsU6M\downloaded_packages
[How can I install topicmodels package in R?]
2.To create a topic browser、you need to have: A model fit using
topicmodels::LDA
The set of original tokens used to create the document term marix,and the document ids these tokens are from The metadata of the documents、containing aid、headline、and date ノート:
the solution for proble of“Failed with error”:“package’topicmodels
’was built before R 3.0:please re-innstall it”.Used the following sequence of command from R consone:require(devtools)
install_url("http://cran.r-project.org/src/contrib/topicmodels_0.2-1.tar.gz")
require(topicmodels
)
ls("package:topicmodels
")
[Failed with error:‘package’sentiment’was built before R 3.0:please re-insttall it’]
[topicmodels:Topic models]
[topicmodels:An R Package for Fitting Topic Models]しかし、このようにtopicmodelをインストールするRパケットはエラーが します.ERROR:complation failed for package'topicmodels'
3.The provided data file'sotu'contains this data from the state of the union address.Make sure that the tokens ared in the way the appared in the article> data(sotu)
> tokens = tokens[order(tokens$aid, tokens$id), ]
> class(m)
[1] "LDA_Gibbs"
attr(,"package")
[1] "topicmodels"
> head(tokens)
aid lemma word sentence pos offset id pos1 freq
20 111541965 it It 1 PRP 0 1 O 1
10 111541965 be is 1 VBZ 3 2 V 1
40 111541965 we our 1 PRP$ 6 3 O 1
39 111541965 unfinished unfinished 1 JJ 10 4 A 1
32 111541965 task task 1 NN 21 5 N 1
38 111541965 to to 1 TO 26 6 ? 1
> head(meta)
id date medium headline
1 111541965 2013-02-12 Speeches Barack Obama
2 111541995 2013-02-12 Speeches Barack Obama
3 111542001 2013-02-12 Speeches Barack Obama
4 111542006 2013-02-12 Speeches Barack Obama
5 111542013 2013-02-12 Speeches Barack Obama
6 111542018 2013-02-12 Speeches Barack Obama
4.With these data、you can create a topic browser as followows:output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
## Preparing variables
## Rendering overview
## Rendering topic 1
## Rendering topic 2
## Rendering topic 3
## Rendering topic 4
## Rendering topic 5
## Rendering topic 6
## Rendering topic 7
## Rendering topic 8
## Rendering topic 9
## Rendering topic 10
## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
You can also publish the output file directly using markdown::rpubsupload
:library(markdown)
result = rpubsUpload("Example topic browser", output)
browseURL(result$continueUrl)
See the[the example](http://rpubs.com/vanatteveldt/topicbrowser)for a collection of State of the Union address.
[vanateveldt/topicbrowser]
All codes:#download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run
find_rtools()
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
#install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
#install.packages("topicmodels")
library(topicmodels)
topicmodels::LDA
data(sotu)
tokens = tokens[order(tokens$aid, tokens$id), ]
class(m)
head(tokens)
head(meta)
output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
:…
ワードcloud
テスト (simple.py)
1. するフォントをダウンロードする
2.windowsで しています.font_を します.パスwordcloud = WordCloud(font_path=r'C:\Windows\Fonts\DejaVuSansMono.ttf', ranks_only=True).generate(text)
(wordcloud.py)FONT_PATH = os.environ.get("FONT_PATH", "/usr/share/fonts/truetype/droid/DroidSansMono.ttf")
STOPWORDS = set([x.strip() for x in open(os.path.join(os.path.dirname(__file__), 'stopwords')).read().split('
')])
from:http://blog.csdn.net/pipisorry