TopicModelテーマモデルの可視化

10999 ワード

r Model topic 可視化テーマモデル LDA

http://blog.csdn.net/pipisorry
Browse LDA Topic Models This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-grapph per topic,and annotates selection of documents with therech.Instach.
Rコマンドラインに入力します.

if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/topicbrowser")
library(topicbrowser)

        Rtools : Loading required package: devtools
WARNING: Rtools is required to build R packages, but is not currently installed.
Please 
download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run > find_rtools()
[1] TRUE
...
              
ノート:
1.   が っているRバージョン3.3.2、Rtoolsバージョン3.3はエラーを  します.Rtools 3.1をインストールしたいです.また  を ても っています!!!

［http://cran.r-project.org/bin/windows/Rtools/」
2.RtoolsとRバージョンの   に  し、エラーがない の  :> if (!require(devtools)) {install.packages("devtools"); library(devtools)}
> install_github("vanatteveldt/topicbrowser")
Downloading github repo vanatteveldt/topicbrowser@master
Installing topicbrowser
"C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL  \
  "C:/Users/pi/AppData/Local/Temp/RtmpcvsU6M/devtools11d0fc638d5/vanatteveldt-topicbrowser-cfa62a3"  \
  --library="C:/Users/pi/Documents/R/win-library/3.2" --install-tests 

* installing *source* package 'topicbrowser' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (topicbrowser)
Reloading installed topicbrowser
> library(topicbrowser)
>
Creating a topic browser
1.まずR  のtopicmodelパッケージをインストールします.> install.packages("topicmodels")
Installing package into  :/Users/pi/Documents/R/win-library/3.2?(as  ib?is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/topicmodels_0.2-1.zip'
Content type 'application/zip' length 1308321 bytes (1.2 MB)
downloaded 1.2 MB

package  opicmodels?successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\pi\AppData\Local\Temp\RtmpcvsU6M\downloaded_packages
[How can I install topicmodels package in R?]
2.To create a topic browser、you need to have: A model fit using   topicmodels::LDA
 
 The set of original tokens used to create the document term marix,and the document ids these tokens are from 
 The metadata of the documents、containing aid、headline、and date 
ノート:
the solution for proble of“Failed with error”:“package’topicmodels’was built before R 3.0:please re-innstall it”.Used the following sequence of command from R consone:require(devtools)
install_url("http://cran.r-project.org/src/contrib/topicmodels_0.2-1.tar.gz")
require(topicmodels)
ls("package:topicmodels")
[Failed with error:‘package’sentiment’was built before R 3.0:please re-insttall it’]
[topicmodels:Topic models]
[topicmodels:An R Package for Fitting Topic Models]しかし、このようにtopicmodelをインストールするRパケットはエラーが  します.ERROR:complation failed for package'topicmodels'
3.The provided data file'sotu'contains this data from the state of the union address.Make sure that the tokens ared in the way the appared in the article> data(sotu)
> tokens = tokens[order(tokens$aid, tokens$id), ]

> class(m)
[1] "LDA_Gibbs"
attr(,"package")
[1] "topicmodels"

> head(tokens)
         aid      lemma       word sentence  pos offset id pos1 freq
20 111541965         it         It        1  PRP      0  1    O    1
10 111541965         be         is        1  VBZ      3  2    V    1
40 111541965         we        our        1 PRP$      6  3    O    1
39 111541965 unfinished unfinished        1   JJ     10  4    A    1
32 111541965       task       task        1   NN     21  5    N    1
38 111541965         to         to        1   TO     26  6    ?    1
> head(meta)
         id       date   medium     headline
1 111541965 2013-02-12 Speeches Barack Obama
2 111541995 2013-02-12 Speeches Barack Obama
3 111542001 2013-02-12 Speeches Barack Obama
4 111542006 2013-02-12 Speeches Barack Obama
5 111542013 2013-02-12 Speeches Barack Obama
6 111542018 2013-02-12 Speeches Barack Obama
 4.With these data、you can create a topic browser as followows:output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
## Preparing variables
## Rendering overview
## Rendering topic 1
## Rendering topic 2
## Rendering topic 3
## Rendering topic 4
## Rendering topic 5
## Rendering topic 6
## Rendering topic 7
## Rendering topic 8
## Rendering topic 9
## Rendering topic 10
## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
You can also publish the output file directly using markdown::rpubsupload:library(markdown)
result = rpubsUpload("Example topic browser", output)
browseURL(result$continueUrl)
See the[the example](http://rpubs.com/vanatteveldt/topicbrowser)for a collection of State of the Union address.
[vanateveldt/topicbrowser]
All codes:#download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run 
find_rtools()
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
#install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
#install.packages("topicmodels")
library(topicmodels)
topicmodels::LDA
data(sotu)
tokens = tokens[order(tokens$aid, tokens$id), ]
class(m)
head(tokens)
head(meta)
output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
  :…
ワードcloud
テスト  (simple.py)
1.  するフォントをダウンロードする
2.windowsで  しています.font_を  します.パスwordcloud = WordCloud(font_path=r'C:\Windows\Fonts\DejaVuSansMono.ttf', ranks_only=True).generate(text)
    (wordcloud.py)FONT_PATH = os.environ.get("FONT_PATH", "/usr/share/fonts/truetype/droid/DroidSansMono.ttf")
STOPWORDS = set([x.strip() for x in open(os.path.join(os.path.dirname(__file__), 'stopwords')).read().split('
')])
 from:http://blog.csdn.net/pipisorry

よくある並べ替え方法(javascript)