索引モジュール-類義語メタフィルタ(Index Modules-synonym Token Filter)

6976 ワード

シノニム要素フィルタ
この  語メタフィルタは、分析プロセスにおいて同義語を容易に処理することができる.同義語はプロファイルを使用して構成されます.ここでは、次の例を示します.
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/synonym.txt"
                }
            }
        }
    }
}

上記の構成synonym フィルタは、1つの経路analysis/synonym.txt (config に対する位置)を有する.このsynonym アナライザは、次にフィルタを構成します.追加の設定:ignore_case (デフォルトはfalse)、expand(デフォルトはtrue ).
この  パラメータは、同義語を解析するために使用される分詞器を制御し、whitespace 分詞器をデフォルトに設定する.
elasticsearch 0.17まで.9同義語形式のサポート:Solr,WordNet.
Solr類義語
次のファイルの例を示します.
# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod

#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz

また、同義語フィルタをプロファイルに直接定義することもできます(  ではなくsynonyms_path( ) を使用することに注意してください).
{
    "filter" : {
        "synonym" : {
            "type" : "synonym",
            "synonyms" : [
                "i-pod, i pod => ipod",
                "universe, cosmos"
            ] 
        }
    }
}

しかしながら、synonyms_pathを使用して1つのファイルに多数の同義語セットを定義することを推奨する.
WordNet類義語
同義語は、WordNetフォーマットに基づいて、  を使用すると宣言できます.
{
    "filter" : {
        "synonym" : {
            "type" : "synonym",
            "format" : "wordnet",
            "synonyms" : [
                "s(100000001,1,'abstain',v,1,0).",
                "s(100000001,2,'refrain',v,1,0).",
                "s(100000001,3,'desist',v,1,0)."
            ]
        }
    }
}
(synonyms_path) を使用して、WordNetの同義語のセットを定義することは、1つのファイルでもサポートされています.
 
Synonym Token Filter
The  synonym  token filter allows to easily handle synonyms during the analysis process. Synonyms are configured using a configuration file. Here is an example:
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/synonym.txt"
                }
            }
        }
    }
}

The above configures a  synonym  filter, with a path of  analysis/synonym.txt  (relative to the  config  location). The  synonym  analyzer is then configured with the filter. Additional settings are:  ignore_case  (defaults to  false ), and  expand  (defaults to  true ).
The  tokenizer  parameter controls the tokenizers that will be used to tokenize the synonym, and defaults to the  whitespace  tokenizer.
As of elasticsearch 0.17.9 two synonym formats are supported: Solr, WordNet.
Solr synonyms
The following is a sample format of the file:
# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod

#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz

You can also define synonyms for the filter directly in the configuration file (note use of synonyms  instead of  synonyms_path ):
{
    "filter" : {
        "synonym" : {
            "type" : "synonym",
            "synonyms" : [
                "i-pod, i pod => ipod",
                "universe, cosmos"
            ] 
        }
    }
}

However, it is recommended to define large synonyms set in a file using  synonyms_path .
WordNet synonyms
Synonyms based on WordNet format can be declared using  format :
{
    "filter" : {
        "synonym" : {
            "type" : "synonym",
            "format" : "wordnet",
            "synonyms" : [
                "s(100000001,1,'abstain',v,1,0).",
                "s(100000001,2,'refrain',v,1,0).",
                "s(100000001,3,'desist',v,1,0)."
            ]
        }
    }
}

Using  synonyms_path  to define WordNet synonyms in a file is supported as well.