Solr: Clustering documents with carrot
3728 ワード
1. Configure clutering in solrconfig.xml
2. alter clustering/carrot2/lingo-attributes.xml
3. add chinese tokenizer jar to classpath in solrconfig.xml
lucene-analyzers-smartcn-4.7.0.jar
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/tomcattd/archive/2013/08/20/3270143.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html
<searchComponent name="clustering"
enable="true"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="true"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lingo</str>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">content</str>
<!-- Field name with the logical "URL" of a each document (optional) -->
<str name="carrot.url">id</str>
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">content</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
2. alter clustering/carrot2/lingo-attributes.xml
3. add chinese tokenizer jar to classpath in solrconfig.xml
lucene-analyzers-smartcn-4.7.0.jar
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/tomcattd/archive/2013/08/20/3270143.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html