Txtaiによる抽出QA



この記事は、txtai、AI動力セマンティック検索プラットフォームのチュートリアルシリーズの一部です.
パート1から4では、Txtaiの一般的な概観、バッキング技術と類似性検索のためにそれを使用する方法の例を示しました.この記事はそれを構築し、抽出質問応答システムの構築に拡張します.

依存関係のインストール

txtaiとすべての依存関係をインストールします.
pip install txtai

埋め込みとExtractorインスタンスを作成する


埋め込みのインスタンスはTxTaiの主なエントリポイントです.埋め込みのインスタンスは、テキストのセグメントをトークン化し、変換するために使用されるメソッドを埋め込みベクトルに定義します.
抽出器インスタンスは抽出質問応答のためのentrypointです.
埋め込みと抽出器のインスタンスの両方がトランスモデルへのパスを取る.Hugging Face model hubのどんなモデルも、下記のモデルの代わりに使われることができます.
from txtai.embeddings import Embeddings
from txtai.pipeline import Extractor

# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

# Create extractor instance
extractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")
data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays beat Red Sox final score 2-1",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Lightning goaltender pulled, lose to Flyers 4-1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

questions = ["What team won the game?", "What was score?"]

execute = lambda query: extractor([(question, query, question, False) for question in questions], data)

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in execute(query):
        print(answer)
    print()

# Ad-hoc questions
question = "What hockey team won?"

print("----", question, "----")
print(extractor([(question, question, question, False)], data))
---- Red Sox - Blue Jays ----
('What team won the game?', 'Blue Jays')
('What was score?', '2-1')

---- Phillies - Braves ----
('What team won the game?', 'Phillies')
('What was score?', '5-0')

--------- Dodgers - Giants ----
('What team won the game?', 'Giants')
('What was score?', '5-4')

--------- Flyers - Lightning ----
('What team won the game?', 'Flyers')
('What was score?', '4-1')

--------- What hockey team won? ----
[('What hockey team won?', 'Flyers')]