Elasticsearch:負荷-スコア-query Plugin開発履歴

26937 ワード

概要
T-Shirt商品で選択したオプションの販売数が多い商品や在庫数が多い商品(少ない商品)を検索ランキング(重み)計算に入れるにはどうすればいいですか?
Luceneが提供するPayloadScoreQueryはTermを区別できる.Luceneは実際に私たちが保存したPayloadデータをtfに乗算し、アルファベットの後ろの数字を区別するために重み付け計算を行います.
残念なことに、Elasticsearchは有限負荷トークンフィルタを提供していますが、PayloadScoreQueryのように重みを計算していません.
Elasticsearch公式ドキュメント:https://www.elastic.co/guide/en/elasticsearch/reference/7.15/analysis-delimited-payload-tokenfilter.html#analysis-delimited-payload-tokenfilter
T-Shirt商品の検索要件を満たすために、ElasticsearchがPluginを通じてPayloadScoreQuery機能を開発する方法を見てみましょう.
次のAPIの実行例では、KibanaのDev Toolsを使用して、ここで完全なコードを表示できます.
n/a.環境
  • open jdk 11
  • gradle 7.1
  • elasticsearch 7.15.1
  • Analyzerの追加:
    paylod demiterというアナライザのpaylaod score queryサンプルインデックスを作成します.
    PUT paylaod_score_query
    {
      "mappings": {
        "properties": {
          "color": {
            "type": "text",
            "term_vector": "with_positions_payloads",
            "analyzer": "payload_delimiter"
          }
        }
      },
      "settings": {
        "analysis": {
          "analyzer": {
            "payload_delimiter": {
              "tokenizer": "whitespace",
              "filter": [ "delimited_payload" ]
            }
          }
        }
      }
    }
    paylaod score queryサンプルインデックスに3つのテストドキュメントをインデックスします.
    POST paylaod_score_query/_doc/1
    {
      "name" : "T-shirt S",
      "color" : "blue|1 green|2 yellow|3"
    }
    
    POST paylaod_score_query/_doc/2
    {
      "name" : "T-shirt M",
      "color" : "blue|1 green|2 red|3"
    }
    
    POST paylaod_score_query/_doc/3
    {
      "name" : "T-shirt XL",
      "color" : "blue|1 yellow|2"
    }
    ドキュメントのタグがbase 64符号化のペイロードであることを確認します.
    GET paylaod_score_query/_termvectors/1?fields=color
    {
      "_index" : "paylaod_score_query",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 2,
      "found" : true,
      "took" : 26,
      "term_vectors" : {
        "color" : {
          "field_statistics" : {
            "sum_doc_freq" : 11,
            "doc_count" : 4,
            "sum_ttf" : 11
          },
          "terms" : {
            "blue" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "payload" : "P4AAAA=="
                }
              ]
            },
            "green" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "payload" : "QAAAAA=="
                }
              ]
            },
            "yellow" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 2,
                  "payload" : "QEAAAA=="
                }
              ]
            }
          }
        }
      }
    }
    Pluginを使用していないSpan Queryの結果を確認します.
    payload definiterが適用されたcolorフィールドを含むspanqueryを実行します.
    GET paylaod_score_query/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "name": "t-shirt"
              }
            },
            {
              "span_or": {
                "clauses": [
                  {
                    "span_term": {
                      "color": "yellow"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    Elasticsearchはペイロードscore queryをサポートしていないため、colorフィールドyellow|2の値を持つドキュメントid 3の重み(score)がyellow|3の値を持つドキュメントid 1よりも高いことがわかる.
    {
      "took" : 845,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 0.6121877,
        "hits" : [
          {
            "_index" : "paylaod_score_query",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 0.6121877,
            "_source" : {
              "name" : "T-shirt XL",
              "color" : "blue|1 yellow|2"
            }
          },
          {
            "_index" : "paylaod_score_query",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.5546068,
            "_source" : {
              "name" : "T-shirt S",
              "color" : "blue|1 green|2 yellow|3"
            }
          }
        ]
      }
    }
    
    次に、実装されたElasticsearch Pluginのカテゴリと主な方法について説明し、負荷データを検索結果の重みに含め、Pluginをインストールした後に結果を検証します.
    Lucene PayloadScoreQuery:
    まず、LuceneのPayloadScoreQueryの構成方法を見てください.
      /**
       * Creates a new PayloadScoreQuery
       * @param wrappedQuery the query to wrap
       * @param function a PayloadFunction to use to modify the scores
       * @param decoder a PayloadDecoder to convert payloads into float values
       * @param includeSpanScore include both span score and payload score in the scoring algorithm
       */
      public PayloadScoreQuery(SpanQuery wrappedQuery, PayloadFunction function, PayloadDecoder decoder, boolean includeSpanScore) {
        this.wrappedQuery = Objects.requireNonNull(wrappedQuery);
        this.function = Objects.requireNonNull(function);
        this.decoder = Objects.requireNonNull(decoder);
        this.includeSpanScore = includeSpanScore;
      }
    このメソッドには4つのパラメータが必要です.
  • SpanQuery wrappedQuery. spanQueryでなければなりません.
  • PayloadFunction function. 複数のデータムが一致する場合は、重み付け、最大、最小、およびを定義します.
  • PayloadDecoder decoder. float値に変換します.intまたはfloatタイプでなければなりません.
  • boolean includeSpanScore. 保存したscoreを使用するかどうか.
  • CustomPayloadScoreQueryPlugin
    CustomPayloadScoreQueryPluginクラスにCustomPayloadScoreQueryBuilderを作成するコードを追加します.
    public class CustomPayloadScoreQueryPlugin extends Plugin implements SearchPlugin {
        @Override
        public List<QuerySpec<?>> getQueries() {
            return Collections.singletonList(
                new QuerySpec<>(CustomPayloadScoreQueryBuilder.NAME, CustomPayloadScoreQueryBuilder::new, CustomPayloadScoreQueryBuilder::fromXContent)
            );
        }
    }
    CustomPayloadScoreQueryBuilder
    fromXContentメソッドの実装
    public static QueryBuilder fromXContent(XContentParser parser) throws IOException {
        String currentFieldName = null;
        XContentParser.Token token;
        QueryBuilder iqb = null;
    
        String func = null;
        String calc = null;
        boolean includeSpanScore = false;
        while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
            if (token == XContentParser.Token.FIELD_NAME) {
                currentFieldName = parser.currentName();
            } else if (token == XContentParser.Token.START_OBJECT) {
                if (QUERY_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    iqb = parseInnerQueryBuilder(parser);
                } else {
                    throw new ParsingException(parser.getTokenLocation(),
                        "[" + NAME + "] query does not support [" + currentFieldName + "]");
                }
            } else if (token.isValue()) {
                if (FUNC_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    func = parser.text();
                } else if (CALC_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    calc = parser.text();
                } else if (INCLUDE_SPAN_SCORE_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    includeSpanScore = parser.booleanValue();
                } else {
                    throw new ParsingException(parser.getTokenLocation(),
                        "[" + NAME + "] query does not support [" + currentFieldName + "]");
                }
            }
        }
        return new PayloadScoreQueryBuilder(iqb, func, calc, includeSpanScore);
    }
    doToQueryメソッドのPayloadScoreQuery構造
    protected Query doToQuery(SearchExecutionContext context) throws IOException {
            // query  parse
            SpanQuery spanQuery = null;
            try {
                spanQuery = (SpanQuery) query.toQuery(context);
            } catch (IOException e) {
                throw new IllegalArgumentException(e);
            }
    
            if (spanQuery == null) {
                throw new IllegalArgumentException("SpanQuery is null");
            }
    
            PayloadFunction payloadFunction = CustomPayloadUtils.getPayloadFunction(this.func);
            if (payloadFunction == null) {
                throw new IllegalArgumentException("Unknown payload function: " + func);
            }
            PayloadDecoder payloadDecoder = CustomPayloadUtils.getPayloadDecoder("float");
    
            return new PayloadScoreQuery(spanQuery, payloadFunction, payloadDecoder, this.includeSpanScore);
        }
    Build source code
    $ gradle clean build
    Install plugin
    $ cd $ES_HOME
    $ ./bin/elasticsearch-plugin install file:///$PROJECT/build/distributions/payload-score-0.1.zip
    RUN Elasticsearch
    $ cd $ES_HOME
    $ ./bin/elasticsearch
    Sample APIの実行
    カスタムpluginのpayload score apiを使用してspanqueryを実行します.
    GET /paylaod_score_query/_search
    {
      "explain": false, 
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "name": "t-shirt"
              }
            },
            {
              "payload_score": {
                "func": "sum",
                "calc": "sum",
                "includeSpanScore": "false",
                "query": {
                  "span_or": {
                    "clauses": [
                      {
                        "span_term": {
                          "color": "yellow"
                        }
                      }
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
    次のAPI応答結果を見ると、通常のSpan Queryを実行した結果とは異なり、黄色|3を含むドキュメントid 1の重み(score)が適用されます.
    {
      "took" : 14,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 3.210721,
        "hits" : [
          {
            "_index" : "paylaod_score_query",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 3.210721,
            "_source" : {
              "name" : "T-shirt S",
              "color" : "blue|1 green|2 yellow|3"
            }
          },
          {
            "_index" : "paylaod_score_query",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 2.210721,
            "_source" : {
              "name" : "T-shirt XL",
              "color" : "blue|1 yellow|2"
            }
          }
        ]
      }
    }