TopicModelテーマモデルの可視化


http://blog.csdn.net/pipisorry
Browse LDA Topic Models This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-grapph per topic,and annotates selection of documents with therech.Instach.
Rコマンドラインに入力します.
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
        Rtools : Loading required package: devtools
WARNING: Rtools is required to build R packages, but is not currently installed.
Please 
download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run 
> find_rtools()
[1] TRUE
...
              
ノート:
1. が っているRバージョン3.3.2、Rtoolsバージョン3.3はエラーを します.Rtools 3.1をインストールしたいです.また を ても っています!!!

[http://cran.r-project.org/bin/windows/Rtools/」
2.RtoolsとRバージョンの に し、エラーがない の :
> if (!require(devtools)) {install.packages("devtools"); library(devtools)}
> install_github("vanatteveldt/topicbrowser")
Downloading github repo vanatteveldt/topicbrowser@master
Installing topicbrowser
"C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL  \
  "C:/Users/pi/AppData/Local/Temp/RtmpcvsU6M/devtools11d0fc638d5/vanatteveldt-topicbrowser-cfa62a3"  \
  --library="C:/Users/pi/Documents/R/win-library/3.2" --install-tests 

* installing *source* package 'topicbrowser' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (topicbrowser)
Reloading installed topicbrowser
> library(topicbrowser)
>
Creating a topic browser
1.まずR のtopicmodelパッケージをインストールします.
> install.packages("topicmodels")
Installing package into  :/Users/pi/Documents/R/win-library/3.2?(as  ib?is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/topicmodels_0.2-1.zip'
Content type 'application/zip' length 1308321 bytes (1.2 MB)
downloaded 1.2 MB

package  opicmodels?successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\pi\AppData\Local\Temp\RtmpcvsU6M\downloaded_packages
[How can I install topicmodels package in R?]
2.To create a topic browser、you need to have:
  • A model fit using  
    topicmodels::LDA
  • The set of original tokens used to create the document term marix,and the document ids these tokens are from
  • The metadata of the documents、containing aid、headline、and date
  • ノート:
    the solution for proble of“Failed with error”:“package’topicmodels’was built before R 3.0:please re-innstall it”.Used the following sequence of command from R consone:
    require(devtools)
    install_url("http://cran.r-project.org/src/contrib/topicmodels_0.2-1.tar.gz")
    require(topicmodels)
    ls("package:topicmodels")
    
    [Failed with error:‘package’sentiment’was built before R 3.0:please re-insttall it’]
    [topicmodels:Topic models]
    [topicmodels:An R Package for Fitting Topic Models]しかし、このようにtopicmodelをインストールするRパケットはエラーが します.ERROR:complation failed for package'topicmodels'
    3.The provided data file'sotu'contains this data from the state of the union address.Make sure that the tokens ared in the way the appared in the article
    > data(sotu)
    > tokens = tokens[order(tokens$aid, tokens$id), ]
    
    > class(m)
    [1] "LDA_Gibbs"
    attr(,"package")
    [1] "topicmodels"
    
    
    > head(tokens)
             aid      lemma       word sentence  pos offset id pos1 freq
    20 111541965         it         It        1  PRP      0  1    O    1
    10 111541965         be         is        1  VBZ      3  2    V    1
    40 111541965         we        our        1 PRP$      6  3    O    1
    39 111541965 unfinished unfinished        1   JJ     10  4    A    1
    32 111541965       task       task        1   NN     21  5    N    1
    38 111541965         to         to        1   TO     26  6    ?    1
    > head(meta)
             id       date   medium     headline
    1 111541965 2013-02-12 Speeches Barack Obama
    2 111541995 2013-02-12 Speeches Barack Obama
    3 111542001 2013-02-12 Speeches Barack Obama
    4 111542006 2013-02-12 Speeches Barack Obama
    5 111542013 2013-02-12 Speeches Barack Obama
    6 111542018 2013-02-12 Speeches Barack Obama
    4.With these data、you can create a topic browser as followows:
    output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
    ## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
    ## Preparing variables
    ## Rendering overview
    ## Rendering topic 1
    ## Rendering topic 2
    ## Rendering topic 3
    ## Rendering topic 4
    ## Rendering topic 5
    ## Rendering topic 6
    ## Rendering topic 7
    ## Rendering topic 8
    ## Rendering topic 9
    ## Rendering topic 10
    ## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
    
    You can also publish the output file directly using markdown::rpubsupload:
    library(markdown)
    result = rpubsUpload("Example topic browser", output)
    browseURL(result$continueUrl)
    See the[the example](http://rpubs.com/vanatteveldt/topicbrowser)for a collection of State of the Union address.
    [vanateveldt/topicbrowser]
    All codes:
    #download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run 
    find_rtools()
    if (!require(devtools)) {install.packages("devtools"); library(devtools)}
    #install_github("vanatteveldt/topicbrowser")
    library(topicbrowser)
    #install.packages("topicmodels")
    library(topicmodels)
    topicmodels::LDA
    data(sotu)
    tokens = tokens[order(tokens$aid, tokens$id), ]
    class(m)
    head(tokens)
    head(meta)
    output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
    
    :…
    ワードcloud
    テスト (simple.py)
    1. するフォントをダウンロードする
    2.windowsで しています.font_を します.パス
    wordcloud = WordCloud(font_path=r'C:\Windows\Fonts\DejaVuSansMono.ttf', ranks_only=True).generate(text)
    (wordcloud.py)
    FONT_PATH = os.environ.get("FONT_PATH", "/usr/share/fonts/truetype/droid/DroidSansMono.ttf")
    STOPWORDS = set([x.strip() for x in open(os.path.join(os.path.dirname(__file__), 'stopwords')).read().split('
    ')])
    from:http://blog.csdn.net/pipisorry