CloudVisionAPIのPHPクライアントライブラリでOCR処理の状態を非同期で取得したい

4699 ワード

gcp CloudVisionAPI PHP PHP テキストリンク

前提

PDFファイルのドキュメントをデータ化するPHPアプリケーションを作りたい
Google Cloud Vision APIの公式PHPクライアントライブラリをcomposerでインストール
- google/cloud-vision
- google/cloud-storage(PDFファイルや変換後のjsonデータはCloud Strageに保存されます)

状況

Cloud Vision APIでOCRを使いたいシーンでは、多くの方がGithub上の公式サンプルコードを参考にすると思います。

上記コードではasyncBatchAnnotateFilesメソッドでOCRを実行しています。
解析結果はGoogle Coud Strageの指定パス以下にjsonファイルとして出力されます。

OCR処理自体は非同期で実行されますが、このサンプルではpollUntilCompleteメソッドを用いて処理終了を待っています。
素直にWebアプリケーションに組み込んだ場合は、OCRが完了するまでページの読み込み待ちという挙動になってしまいます。
解析するPDFファイルに認識対象（文字、画像、用紙サイズ、ページ数など）が多い場合、かなり時間がかかります。
呼び出し側のタイムアウトを伸ばすといった方法では対処できないシーンも発生すると思います（発生しました）

Cloud Vision APIの完了を非同期で待つ

asyncBatchAnnotateImagesメソッドのドキュメントに以下の記述がありました。

ImageAnnotatorGapicClient.php

    /*
     *     // Alternatively:
     *
     *     // start the operation, keep the operation name, and resume later
     *     $operationResponse = $imageAnnotatorClient->asyncBatchAnnotateImages($requests, $outputConfig);
     *     $operationName = $operationResponse->getName();
     *     // ... do other work
     *     $newOperationResponse = $imageAnnotatorClient->resumeOperation($operationName, 'asyncBatchAnnotateImages');
     *     while (!$newOperationResponse->isDone()) {
     *         // ... do other work
     *         $newOperationResponse->reload();
     *     }
     *     if ($newOperationResponse->operationSucceeded()) {
     *       $result = $newOperationResponse->getResult();
     *       // doSomethingWith($result)
     *     } else {
     *       $error = $newOperationResponse->getError();
     *       // handleError($error)
     *     }
     */

asyncBatchAnnotateImagesメソッドから戻るOperationResponseオブジェクトのgetNameメソッドで識別子を取得。
後からresumeOperationメソッドにその値を渡してやれば、対象のOperationResponseオブジェクトが取得できreloadで状態の更新もできるようです。

動作させてみたところ、$operationNameは

projects/my-project-name/operations/xxxxxxxxxxxxxxxx

といった書式のstringでした。
DB等に保存して後から読み込む事もできそうです。

問題点

しかし、このソースコードをそのまま流用してもresumeOperationメソッドの実行で例外が発生してしまいました。

Google\Protobuf\Internal\GPBDecodeException: Error occurred during parsing: Class google.cloud.vision.v1.OperationMetadata hasn't been added to descriptor pool in /var/www/html/my-project/vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:1327

解決

類似事例を検索してみると以下のページがヒットしました。

参考ページはSpeechClientの問題となっていますが、ImageAnnotatorに読み替えてinitOnceの呼び出しを追加します。

ImageAnnotator::initOnce();
$newOperationResponse = $imageAnnotatorClient->resumeOperation($operationName, 'asyncBatchAnnotateImages');

これで正常にOperationResponseを取得できました。
isDone()とoperationSucceeded（）で正常終了が判断できたら後続の処理へ進むという流れで良さそうです。

今回は普通のウェブページでしたので、ajaxを使って定期的にステータスを確認し完了後に次ページにへ遷移してデータを表示するという作り方になりました。

参考

VisionAPIはOCRだけではなく顔認識をはじめとした多種多様な画像解析API群です。
非常に多機能なため、初見では全体像の把握が難しい印象です。
OCR用途に絞りたい場合、下記サイトの解説がコンパクトにまとまっていて助かりました。

Vision API OCR事始め(1)：TEXT_DETECTIONとDOCUMENT_TEXT_DETECTIONの違い

Author And Source

この問題について(CloudVisionAPIのPHPクライアントライブラリでOCR処理の状態を非同期で取得したい), 我々は、より多くの情報をここで見つけました https://qiita.com/okazbb/items/1b7b706bf5ce1b4b6abc

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .