spaCyのpipeline周りに詳しくなる(なりたい)
注意
以下はjupyter notebookで記述した、出力された内容をmarkdownでダウンロードして貼り付けたものです。
https://github.com/booink/spacy-trial1/tree/master
こちらの公開リポジトリに動作環境を反映してあります。
30分程度しか手を動かせていないのをお試しでmarkdown出力しただけのペラペラな内容なので、読み応えはありませんので悪しからず。
上から写経していく。
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the default models consists of a tagger, a parser and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component.
要は nlp
メソッドにテキストを渡すと、トークン化したテキストをDocクラスのオブジェクトに入れて返してくれると。
そのDocオブジェクトは pipeline という仕組みで、連鎖的に処理した結果をDocオブジェクトのバケツリレーをするってことかな。
pipeline には tagger、parser、entity recognizer(ner) があるよ。
なるほど。
docオブジェクトの型を見てみよう。
import spacy
nlp = spacy.load("en")
doc = nlp("This is a text")
type(doc)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-69cc80a89d2d> in <module>
1 import spacy
2
----> 3 nlp = spacy.load("en")
4 doc = nlp("This is a text")
5 type(doc)
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
enモデルが無いよ、って怒られました。
QuickStartの通りにやってみる
!python -m spacy download en_core_web_sm
Collecting en_core_web_sm==2.2.5
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K |████████████████████████████████| 12.0 MB 476 kB/s eta 0:00:01
[?25hRequirement already satisfied: spacy>=2.2.2 in /usr/local/lib/python3.7/site-packages (from en_core_web_sm==2.2.5) (2.2.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.3)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (4.44.1)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.18.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.23.0)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.4.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (46.0.0)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2019.11.28)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.6.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.1.0)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=4e741a4ef6924b14806dc4789ff4156bf93b98c79d33f5959516f6a04c73f4bb
Stored in directory: /tmp/pip-ephem-wheel-cache-yazrb305/wheels/51/19/da/a3885266a3c241aff0ad2eb674ae058fd34a4870fef1c0a5a0
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
ダウンロードできた。
コードを実行してみる
import spacy
nlp = spacy.load("en_core_web_sm")
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-14d257ed08ca> in <module>
1 import spacy
----> 2 nlp = spacy.load("en_core_web_sm")
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
むむ。
jupyter notebook上だとアカンのか?
一度Dockerfileに書いてビルドし直してみる。
ビルドし直してみた。
再度実行してみる。
import spacy
nlp = spacy.load("en_core_web_sm")
何もエラー出ない。成功か。
docの型を見てみよう。
doc = nlp("This is a text")
type(doc)
spacy.tokens.doc.Doc
spacy.tokens.doc.Doc
なるほど。
pipeline は何が設定されているか。
for p in nlp.pipeline:
print(p)
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3c78613d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc39292ede0>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928c5360>)
ふむふむ。
tagger、parser、ner 確かに。
ちなみに、モデルのQuickStart見てたら、こんな書き方↓もできるみたい。
import en_core_web_sm # 文字列でloadするモデルを指定する方法の他に、モジュールとして読み込む方法があるようだ
nlp = en_core_web_sm.load() # 引数なしの load メソッドが nlp を返すのか
doc = nlp("This is a text")
print(doc)
for p in nlp.pipeline:
print(p)
This is a text
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3903805d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc3928bad70>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928ba9f0>)
nlp オブジェクトってなんだ
type(nlp)
spacy.lang.en.English
ふーん
Author And Source
この問題について(spaCyのpipeline周りに詳しくなる(なりたい)), 我々は、より多くの情報をここで見つけました https://qiita.com/booink/items/2f0a257ec5000dea8009著者帰属:元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。
Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .