マイクロソフト認知サービスを使用してPDFで画像を分析する.

33423 ワード

PDF文書の内容を分析することは非常に一般的な仕事です.しかし、以前はそれはユーザーがドキュメントからテキストを取得するために興味深いものだった場合は、現在、彼らはより深く見たい.
例を挙げましょう、観光スポットや興味のある点のイメージを含むいくつかの文書を持っています.もう一つの例-我々は服のPDFカタログを持っています、そして、我々はTシャツを持っているものを知りたいです.
現代のAIサービスは、私たちがそれらに含まれるイメージのタイプと性質を決定するのを許します、そして、彼らは我々がこの仕事を解決するのを援助することができます.
本論文では,統合ポーズの例を考察する.PDFファイル.マイクロソフト認知サービスとネット.
だから、我々はイメージとPDF文書の束を持っていると仮定し、我々は彼らに何が描かれているかを知りたい.ドキュメント内のキーワードを追加/更新する必要があります.
タスクを3つのサブタスクに分割します.

抽出画像

画像を分析し、タグを取得する

PDFでメタ情報を追加/更新

イメージ抽出
PDFドキュメントから画像を抽出するにはImagePlacementAbsorber クラス.まず、インスタンスを作成しますImagePlacementAbsorber , then
ドキュメントから画像を取得するVisit 方法およびフィルタの小さな画像を分析するための非表示画像や/または非情報を分析する.

var fileNames = new StringCollection();
// Create an instance of ImagePlacementAbsorber
var abs = new ImagePlacementAbsorber();
// Fill ImagePlacements collection with images from PDF
abs.Visit(new Aspose.Pdf.Document(pdfFileName));
// Filter small images and make an array for the future handling
var imagePlacements = abs.ImagePlacements
    .Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
    .ToArray();

後の解析のために、各ディレクトリを一時ディレクトリに保存します.

 for (var i = 0; i < imagePlacements.Count(); i++)
{
    var fileName = $@"{TempDir}\\image_{i}.jpg";
    // Create a file stream to store image in the temporary directory
    using (var stream = new FileStream(fileName, FileMode.Create))
    {
        imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
        stream.Close();
    }
    // Add filename to the collection
    fileNames.Add(fileName);
}

抽出した画像を保存するには、ファイルストリームを作成し、ImagePlacement.Save 方法

イメージ認識
上記の通り、我々はマイクロソフトを使いますComputer Vision service
前の段階では、一時ディレクトリにイメージファイルを取得し、特定の変数に保存できるファイルの一覧を取得しました.現在、我々はマイクロソフトコンピュータVisionサービスに各々のイメージをアップロードして、認識されたオブジェクトのためにタグを得ます.各タグには、名前と信頼性のプロパティが含まれます.The Confidence 対応確率のポイントName を返します.したがって、より少ないタグをフィルタリングできます.

private static IEnumerable<string> CollectImageTags(
    StringCollection imageFileNames,
    double confidence)
{
    // Create a client for Computer Vision service
    var client = Authenticate(Endpoint, SubscriptionKey);
    // Create a set of tags
    var tags = new HashSet<string>();

    foreach (var fileName in imageFileNames)
    {
        // Upload image and recognize it
        var result = AnalyzeImage(client, fileName).Result;
        // Get the tags collection
        var imageTags =
            result.Tags
                    // filter less valuable tags
                .Where(iTag => iTag.Confidence > confidence)
                    // and select only names
                .Select(iTag => iTag.Name);
        // add new tags into tag's set
        tags.UnionWith(imageTags);
    }
    return tags;
}

つのヘルパーメソッドが上記のスニペットで使用されました.最初にコンピュータビジョンAPIと2番目のアップロードのクライアントを作成し、画像を分析します.

public static ComputerVisionClient Authenticate(string endpoint, string key)
{
    var client =
        new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
            {Endpoint = endpoint};
    return client;
}

public static async Task<TagResult> AnalyzeImage(ComputerVisionClient client, string imagePath)
{
    Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
    Console.WriteLine();
    // Analyze the URL image 
    return await client.TagImageInStreamAsync(
        File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
}

The ComputerVisionClient クラスですかMicrosoft.Azure.CognitiveServices.Vision.ComputerVision 図書館.マイクロソフトコンピュータビジョンで働く方法に興味があるならばMicrosoft Cognitive Services Documentation .

メタ情報の更新
asinfoのメタ情報を扱う.PDFライブラリクラスを提供するDocumentInfo . 我々の仕事によればDocumentInfo.Keywords プロパティ.

private static void SaveMetaData(
    string pdfFileName,
    IEnumerable<string> tags)
{
    var document = new Aspose.Pdf.Document(pdfFileName);
    _ = new DocumentInfo(document)
    {
        Keywords = string.Join("; ", tags)
    };
    document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
}

そこで、コード全体を見てみましょう.

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Drawing.Imaging;
using System.Threading.Tasks;
using System.IO;
using System.Linq;

using Aspose.Pdf;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;


namespace Aspose.PDF.Demo.ImageClassification
{
    class Program
    {
        private const string SubscriptionKey = "<add key here>";
        private const string Endpoint = "https://<add endpoint here>.cognitiveservices.azure.com/";
        private const string LicenseFileName = @"<add license file here>";
        private const string PdfFileName = @"C:\tmp\<file>";

        private const string TempDir = "C:\\tmp\\extracted_images\\";
        private static readonly License License = new Aspose.Pdf.License();

        static void Main()
        {
            //you can use a trial version, but please note 
            //that you will be limited with 4 images. 
            //License.SetLicense(LicenseFileName);
            AnalyzeImageContent(PdfFileName);
        }

        private static void AnalyzeImageContent(string pdfFileName)
        {
            // Specify the directories you want to manipulate.
            var di = new DirectoryInfo(@TempDir);

            try
            {
                // Determine whether the directory exists.
                if (!di.Exists)
                    // Try to create the directory.
                    di.Create();

                var images = ExtractImages(pdfFileName);
                var tags = CollectImageTags(images, 0.9);
                SaveMetaData(pdfFileName, tags);

                // Delete the directory.
                di.Delete(true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

        private static void SaveMetaData(string pdfFileName, IEnumerable<string> tags)
        {
            var document = new Aspose.Pdf.Document(pdfFileName);
            _ = new DocumentInfo(document)
            {
                Keywords = string.Join("; ", tags)
            };
            document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
        }

        private static IEnumerable<string> CollectImageTags(StringCollection imageFileNames, double confidence)
        {
            // Create a client for Computer Vision service
            var client = Authenticate(Endpoint, SubscriptionKey);
            // Create a set of tags
            var tags = new HashSet<string>();

            foreach (var fileName in imageFileNames)
            {
                // Upload image and recognize it
                var result = AnalyzeImage(client, fileName).Result;
                // Get the tags collection
                var imageTags =
                    result.Tags
                            // filter less valuable tags
                        .Where(iTag => iTag.Confidence > confidence)
                            // and select only names
                        .Select(iTag => iTag.Name);
                // add new tags into tag's set
                tags.UnionWith(imageTags);
            }
            return tags;
        }

        private static StringCollection ExtractImages(string pdfFileName)
        {
            var fileNames = new StringCollection();
            // Create an instance of ImagePlacementAbsorber
            var abs = new ImagePlacementAbsorber();
            // Fill ImagePlacements collection with images from PDF
            abs.Visit(new Aspose.Pdf.Document(pdfFileName));
            // Filter small images and make an array for the future handling
            var imagePlacements = abs.ImagePlacements
                .Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
                .ToArray();

            for (var i = 0; i < imagePlacements.Count(); i++)
            {
                var fileName = $@"{TempDir}\\image_{i}.jpg";
                // Create a file stream to store image in the temporary directory
                using (var stream = new FileStream(fileName, FileMode.Create))
                {
                    imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
                    stream.Close();
                }
                // Add filename to the collection
                fileNames.Add(fileName);
            }
            return fileNames;
        }

        public static ComputerVisionClient Authenticate(string endpoint, string key)
        {
            var client =
                new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
                    {Endpoint = endpoint};
            return client;
        }

        public static async Task<TagResult> AnalyzeImage(
            ComputerVisionClient client,
            string imagePath)
        {
            Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
            Console.WriteLine();
            // Analyze the URL image 
            return await client.TagImageInStreamAsync(
                File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
        }
    }
}

追加リソース

Extract Images using Aspose.PDF Facades

Setting PDF File Information using Aspose.PDF Facades

What is Computer Vision?

Reference

この問題について(マイクロソフト認知サービスを使用してPDFで画像を分析する.), 我々は、より多くの情報をここで見つけました https://dev.to/andruhovski/analyzing-images-in-pdf-using-microsoft-cognitive-services-np6

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

mysqlでのandとorキーワードの使い方

二つの合計