GoogleビジョンAPIを使用してOCRアプリケーションを構築する

28384 ワード

このチュートリアルでは、ノードでOCRアプリケーションを構築する予定です.グーグルビジョンAPI .
OCRアプリは、イメージ上のテキスト認識を実行します.これは、画像からテキストを取得するために使用することができます.

グーグルビジョンAPI

Google Vision APIから始めるには、以下のリンクをご覧ください
https://cloud.google.com/vision/docs/setup .
Google Vision APIをセットアップする方法についての指示に従ってください、そして、あなたのサービス・キーを含むJSONファイルであるあなたのGoogleアプリケーション資格情報を得てください、そして、一旦あなたがセットアップで終わったならば、ファイルはあなたのコンピュータにダウンロードされます.Googleアプリケーションの資格情報は非常に便利です、我々はそれをせずに動作することはできませんビルドするアプリとして.

ノードの使用。クライアントライブラリ

ノードを使用します.JSクライアントライブラリは、下記のリンクを参照してください.
https://cloud.google.com/vision/docs/quickstart-client-libraries
ページでは、お気に入りのプログラミング言語でGoogle Vision APIを使用する方法を示します.我々がページにあることを見た今、我々はまっすぐに我々のコードでそれを実行するために行くことができます.
Oracleと呼ばれるディレクトリを作成し、お気に入りのコードエディタで開きます.
ラン

npm init -y

パッケージを作成するにはJSONファイル.その後、実行

npm install --save @google-cloud/vision

Google Vision APIをインストールします.リソースフォルダを作成し、フォルダにwakeupcat.jpgからイメージをダウンロードし、インデックスを作成します.JSファイルを次のコードで入力します

process.env.GOOGLE_APPLICATION_CREDENTIALS = 'C:/Users/lenovo/Documents/readText-f042075d9787.json'

async function quickstart() {
  // Imports the Google Cloud client library
  const vision = require('@google-cloud/vision');

  // Creates a client
  const client = new vision.ImageAnnotatorClient();

  // Performs label detection on the image file
  const [result] = await client.labelDetection('./resources/wakeupcat.jpg');
  const labels = result.labelAnnotations;
  console.log('Labels:');
  labels.forEach(label => console.log(label.description));
}

quickstart()

最初の行では、GoogleHumアプリケーションの資格情報の環境変数を以前にダウンロードしたJSONファイルに設定します.非同期関数quickstartはいくつかのGoogleロジックを含み、最後の行で関数を呼び出します.
ラン

node index.js

イメージを処理するには、イメージのラベルをコンソールに印刷します.

それはよく見えます、しかし、我々はラベル発見で働くために行きたくないので、インデックスを更新します.以下のJS

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');


process.env.GOOGLE_APPLICATION_CREDENTIALS = 'C:/Users/lenovo/Documents/readText-f042075d9787.json'

async function quickstart() {
    try {
        // Creates a client
        const client = new vision.ImageAnnotatorClient();

        // Performs text detection on the local file
        const [result] = await client.textDetection('./resources/wakeupcat.jpg');
        const detections = result.textAnnotations;
        const [ text, ...others ] = detections
        console.log(`Text: ${ text.description }`);
    } catch (error) {
        console.log(error)
    }

}

quickstart()

上記のロジックは画像上のテキストを返します.

私たちは今クライアントを使用します.クライアントの代わりにTextDetectionメソッド.ラベル検出.

私たちは、検出配列を2つの部分、テキストなどに分解します.テキスト変数には、画像からの完全なテキストが含まれます.
さあ、走る

node index.js

画像のテキストを返します.

Expressのインストールと使用。js

急行をインストールする必要があります.JSは、GoogleビジョンAPIを要求するサーバーとAPIを作成します.

npm install express --save

今、インデックスを更新することができます.js

const express = require('express');
// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');
const app = express();

const port = 3000

process.env.GOOGLE_APPLICATION_CREDENTIALS = 'C:/Users/lenovo/Documents/readText-f042075d9787.json'

app.use(express.json())

async function quickstart(req, res) {
    try {
        // Creates a client
        const client = new vision.ImageAnnotatorClient();

        // Performs text detection on the local file
        const [result] = await client.textDetection('./resources/wakeupcat.jpg');
        const detections = result.textAnnotations;
        const [ text, ...others ] = detections
        console.log(`Text: ${ text.description }`);
        res.send(`Text: ${ text.description }`)
    } catch (error) {
        console.log(error)
    }

}

app.get('/detectText', async(req, res) => {
    res.send('welcome to the homepage')
})

app.post('/detectText', quickstart)

//listen on port
app.listen(port, () => {
    console.log(`app is listening on ${port}`)
})

オープン不眠症、その後http://localhost:3000/detectTextにポストリクエストを行うと、画像のテキストが応答として送信されます.

イメージアップロード

我々は1つのイメージとアプリケーションを使用することができれば、我々は我々がバックエンドで毎回処理したい画像を編集する必要がある場合は、このアプリは楽しいことはないでしょう.我々は、処理のためのルートに任意のイメージをアップロードしたい、我々はMPMと呼ばれるパッケージを使用してください.Multerは、我々にイメージをルートに送るのを許します.

npm install multer --save

multerを設定するには、MulterLogicというファイルを作成します.JSと次のコードでそれを編集

const multer = require('multer')
const path = require('path')

const storage = multer.diskStorage({
    destination: function (req, file, cb) {
      cb(null, path.join(process.cwd() + '/resources'))
    },
    filename: function (req, file, cb) {
      cb(null, file.fieldname + '-' + Date.now() + path.extname(file.originalname))
    }
})

const upload = multer( { storage: storage, fileFilter } ).single('image')

function fileFilter(req, file, cb) {
    const fileType = /jpg|jpeg|png/;

    const extname = fileType.test(path.extname(file.originalname).toLowerCase())

    const mimeType = fileType.test(file.mimetype)

    if(mimeType && extname){
        return cb(null, true)
    } else {
        cb('Error: images only')
    }
}

const checkError = (req, res, next) => {
    return new Promise((resolve, reject) => {
        upload(req, res, (err) => {
            if(err) {
                res.send(err)
            } 
            else if (req.file === undefined){
                res.send('no file selected')
            }
            resolve(req.file)
        })
    }) 
}

module.exports = { 
  checkError
}

上のロジックを理解するために1分かかりましょう.これは、すべてのマルチロジック、私たちが検出可能なロジックを検出すると、検出tttttルートです.つのプロパティを持つストレージを指定します

宛て先:これはアップロードされたファイルがどこに格納されるかを指定します、そして、

ファイル名:これを保存する前にファイルの名前を変更できます.ここでは、フィールド名(文字通りフィールド名、ここでは画像)、現在の日付、およびオリジナルファイルの拡張名を連結してファイル名を変更します.

ストレージとfilefilterを含むオブジェクトで、multerに等しい変数のアップロードを作成します.その後、ファイルタイプをチェックする機能ファイルフィルタを作成します(ここではPNG、JPG、およびJPEGファイルの種類を指定します).
次に、エラーをチェックする関数CheckErrorを作成します.ファイルがエラーがなければ、エラーが適切に処理されます.それは非常に説明でした、現在、我々は我々のコードで動きます.
CheckErrorを使用するには、インデックスに必要です.JSは以下の通りです.

const { checkError } = require('./multerLogic')

次に、クイックスタート関数を次のように編集します

async function quickstart(req, res) {
    try {

        //Creates a client
        const client = new vision.ImageAnnotatorClient();
        const imageDesc = await checkError(req, res)
        console.log(imageDesc)
        // Performs text detection on the local file
        // const [result] = await client.textDetection('');
        // const detections = result.textAnnotations;
        // const [ text, ...others ] = detections
        // console.log(`Text: ${ text.description }`);
        // res.send(`Text: ${ text.description }`)

    } catch (error) {
        console.log(error)
    }

}

checkError関数を呼び出して、解決されたREQを割り当てます.次に、imagedescをコンソールに出力します.不眠症でポストリクエストをする

コンソールに次の結果を出力します.

ファイン、今我々は画像をアップロードして実行している、そのアップロードされたイメージで動作するように我々のコードを更新するその時間を持っている.クイックスタート機能を次のコードで編集します.

//Creates a client
        const client = new vision.ImageAnnotatorClient();
        const imageDesc = await checkError(req, res)
        console.log(imageDesc)
        //Performs text detection on the local file
        const [result] = await client.textDetection(imageDesc.path);
        const detections = result.textAnnotations;
        const [ text, ...others ] = detections
        res.send(`Text: ${ text.description }`)

最後に、不眠症を使用して我々のルートにポストリクエストを作成し、我々はこれに似た結果を得る必要があります.

このチュートリアルでは、Google Vision APIを使用して構築できるものの非常に簡単な例です.
より堅牢なバージョンについては、hereをご覧ください.
さえずりに感謝し、素晴らしい一日をお過ごしください.

Reference

この問題について(GoogleビジョンAPIを使用してOCRアプリケーションを構築する), 我々は、より多くの情報をここで見つけました https://dev.to/oviecodes/building-an-ocr-app-using-google-vision-api-1mg0

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

自動化gatlingで負荷テスト-続編!

2Dの線を描画する【Unity】