re 2正規表現マッチングエンジンのcインタフェースバージョンcre 2の中国語マニュアル

19405 ワード

自然言語処理

前言
re 2公式住所:https://github.com/google/re2
cre 2公式アドレス:https://github.com/marcomaggi/cre2
1基本タイプ定義

     ：cre2_regexp_t

正規表現オブジェクトの不透明なタイプ.オブジェクトへのポインタを宣言します.このタイプのインスタンスは、任意の数のマッチング操作に使用でき、複数のスレッドの同時使用は安全です.

Struct Typedef：cre2_string_t

は、別の文字列の一部を参照するための単純なデータ構造である.

という分野があります

 'const char * data'
      Pointer to the first byte in the referenced substring.

 'int length'
      The number of bytes in the referenced substring.

Enumeration Typedef: cre2_error_code_t

が返すエラーコードの列挙タイプ

2関数の使用

  ：cre2_regexp_t * cre2_new（const char * PATTERN，int 
PATTERN_LEN，const cre2_options_t * OPT）

は、PATTERN_を表す新しい正規表現オブジェクトを構築して返します.LENバイト長のPATTERN;このオブジェクトはOPTのオプションを設定します.メモリ割り当てに失敗した場合:戻り値は「NULL」ポインタです.

void cre2_delete（cre2_regexp_t * REX）

は、関連するすべてのリソースを解放する正規表現オブジェクトを完了する.

const char * cre2_pattern（const cre2_regexp_t * REX）

REXが正常に構築された正規表現オブジェクトかどうか:パターン文字列へのポインタを返します.返されるポインタは、REXがアクティブな場合にのみ有効です.'cre 2_delete()'がREXに適用されると、ポインタは無効になります.

int cre2_num_capturing_groups（const cre2_regexp_t * REX）

REXが正常に構築された正規表現オブジェクトである場合:モードで取得されたグループの数(カッコ式)を示す非負の整数を返します.REXの構築中にエラーが発生した場合:'-1'を返します.

int cre2_find_named_capturing_groups（const cre2_regexp_t 
* REX，const char * NAME）

REXが正常に構築された正規表現オブジェクトの場合:NAMEという名前の名前の取得グループのインデックスを表す非負の整数を返します.REXの構築中にエラーまたは名前が無効な場合は、"-1"を返します.

3使用方法

      const char *      pattern = "from (?P.*) to (?P.*)";
      cre2_options_t *  opt     = cre2_opt_new();
      cre2_regexp_t *   rex     = cre2_new(pattern, strlen(pattern),
                                           opt);
      {
        if (cre2_error_code(rex))
          { /* handle the error */ }
        int nmatch = cre2_num_capturing_groups(rex) + 1;
        cre2_string_t strings[nmatch];
        int e, SIndex, DIndex;

        const char * text = \
           "from Montreal, Canada to Lausanne, Switzerland";
        int text_len = strlen(text);

        e = cre2_match(rex, text, text_len, 0, text_len,
                       CRE2_UNANCHORED, strings, nmatch);
        if (0 == e)
          { /* handle the error */ }

        SIndex = cre2_find_named_capturing_groups(rex, "S");
        if (0 != strncmp("Montreal, Canada",
                         strings[SIndex].data, strings[SIndex].length))
          { /* handle the error */ }

        DIndex = cre2_find_named_capturing_groups(rex, "D");
        if (0 != strncmp("Lausanne, Switzerland",
                         strings[DIndex].data, strings[DIndex].length))
          { /* handle the error */ }
      }
      cre2_delete(rex);
      cre2_opt_delete(opt);

int cre2_program_size（const cre2_regexp_t * REX）

REXが正常に構築された正規表現オブジェクトである場合:プログラムサイズを表す非負の整数を返し、正規表現「コスト」の非常に近似したメトリックである.大きな数字の小さい数字はもっと高いです.REXの構築中にエラーが発生した場合は、'-1'を返します.

int cre2_error_code（const cre2_regexp_t * REX）

REX構築時にエラーが発生した場合:関連エラーコードを示す整数を返します.エラーがない場合は、ゼロ

を返します.

const char * cre2_error_string（const cre2_regexp_t * REX）

REX構築時にエラーが発生した場合:関連エラーメッセージを示すASCIIZ文字列へのポインタを返します.戻ってきたポインタは一意で、REXは有効です.「cre 2_delete()」をREXポインタに適用すると無効になります.

Demo

 If REX is a successfully built regular expression object: return a
 pointer to an empty string.

 The following code:

      cre2_regexp_t *   rex;

      rex = cre2_new("ci(ao", 5, NULL);
      {
        printf("error: code=%d, msg=\"%s\"
",
               cre2_error_code(rex),
               cre2_error_string(rex));
      }
      cre2_delete(rex);

 prints:

      error: code=6, msg="missing ): ci(ao"

void cre2_error_arg（const cre2_regexp_t * REX，
cre2_string_t * ARG）

REX構築時にエラーが発生した場合:ARG参照の構造をパターンの違反部分を示すバイト間隔で埋め込む.

demo

 If REX is a successfully built regular expression object: ARG
 references an empty string.

 The following code:

      cre2_regexp_t *   rex;
      cre2_string_t     S;

      rex = cre2_new("ci(ao", 5, NULL);
      {
        cre2_error_arg(rex, &S);
        printf("arg: len=%d, data=\"%s\"
", S.length, S.data);
      }
      cre2_delete(rex);

 prints:

      arg: len=5 data="ci(ao"

4マッチング構成

 cre2_options_t *  opt;

 opt = cre2_opt_new();
 cre2_opt_set_log_errors(opt, 0);

Opaque Typedef：cre2_options_t

オプションオブジェクトの不透明なポインタのタイプ.このタイプのインスタンスは、任意の数の正規表現オブジェクトを構成するために使用できます.

Enumeration Typedef：cre2_encoding_t

符号化定数の列挙タイプを選択する.

が含まれています

      CRE2_UNKNOWN
      CRE2_UTF8
      CRE2_Latin1

 The value 'CRE2_UNKNOWN' should never be used: it exists only in
 case there is a mismatch between the definitions of RE2 and CRE2.

cre2_options_t * cre2_opt_new（void）

は、新しいオプションオブジェクトを割り当てて返します.メモリ割り当てに失敗した場合:戻り値は「NULL」ポインタです.

Function：void cre2_opt_delete（cre2_options_t * OPT）

は、関連するすべてのリソースを解放するオプションオブジェクトを完了する.このオブジェクトでコンパイルを構成する正規表現は破壊されません.

次のすべての関数は、正規表現オプションのgetterとsetterです.setterのFLAGパラメータはfalseでなければ無効になりませんが、trueはtrueで有効になります.別途指定しない限り、このオプションが有効になっている場合、intはtrue、無効になっている場合falseを返します.

void cre2_opt_set_encoding（cre2_options_t * OPT，
cre2_encoding_t ENC）

のデフォルトでは、正規表現モードと入力テキストはUTF-8として解釈されます.CRE2_Latin 1符号化は、Latin−1として解釈される.

int cre2_opt_posix_syntax（cre2_options_t * OPT）
void cre2_opt_set_posix_syntax（cre2_options_t * OPT，int 
FLAG）

void cre2_opt_set_posix_syntax(cre 2_options_t*OPT,int FLAG)はregexpをPOSIX egrep構文に制限します.デフォルトは無効です.

-   ：int cre2_opt_longest_match（cre2_options_t * OPT）
-   ：void cre2_opt_set_longest_match（cre2_options_t * OPT，int 
FLAG）

は、最初の一致ではなく、最長の一致を検索します.既定では無効です.

- Function：int cre2_opt_log_errors（cre2_options_t * OPT）
-   ：void cre2_opt_set_log_errors（cre2_options_t * OPT，int 
FLAG）

-構文と実行エラーを「stderr」に記録します.既定値はオンです.

Function：int cre2_opt_literal（cre2_options_t * OPT）
void cre2_opt_set_literal（cre2_options_t * OPT，int FLAG）

は、正規表現ではなく、モード文字列を文字として解釈する.既定では無効です.

demo

 Setting this option is equivalent to quoting all the special
 characters defining a regular expression pattern:

      cre2_regexp_t *   rex;
      cre2_options_t *  opt;
      const char *      pattern = "(ciao) (hello)";
      const char *      text    = pattern;
      int               len     = strlen(pattern);

      opt = cre2_opt_new();
      cre2_opt_set_literal(opt, 1);
      rex = cre2_new(pattern, len, opt);
      {
        /* successful match */
        cre2_match(rex, text, len, 0, len,
                   CRE2_UNANCHORED, NULL, 0);
      }
      cre2_delete(rex);
      cre2_opt_delete(opt);

Function：int cre2_opt_never_nl（cre2_options_t * OPT）
void cre2_opt_set_never_nl（cre2_options_t * OPT，int 
FLAG）

は、正規表現モードでも改行文字に一致しないでください.既定はオフです.このチェックボックスにチェックマークを付けると、サブモードを使用して正規表現モードの改行を除外せずに、複数行のテキストの先頭に対して部分的な一致を試行できます.

int cre2_opt_dot_nl（cre2_options_t * OPT）
void cre2_opt_set_dot_nl（cre2_options_t * OPT，int FLAG）

点は、新しい行を含むすべてのコンテンツに一致します.既定はオフです.

Function：int cre2_opt_never_capture（cre2_options_t * OPT）
void cre2_opt_set_never_capture（cre2_options_t * OPT，int 
FLAG）

すべてのカッコは非キャプチャとして解析されます.既定はオフです.

Function：int cre2_opt_case_sensitive（cre2_options_t * OPT）
void cre2_opt_set_case_sensitive（cre2_options_t * OPT，
int FLAG）

は大文字と小文字を区別する.正規表現モードでは、POSIX構文モードで構成しない限り、'(?i)を使用してこの設定を上書きできます.デフォルト値は有効です.

Function：int cre2_opt_max_mem（cre2_options_t * OPT）
void cre2_opt_set_max_mem（cre2_options_t * OPT，int M）

max memoryオプションは、正規表現とそのキャッシュDFA図のコンパイル形式を保存するためにどれだけのメモリを使用できるかを制御します.これらの関数は、このようなメモリ量を設定して取得します.詳細については、RE 2のドキュメントを参照してください.

POSIX構文が有効になっている場合、次のオプションのみがクエリされます.POSIX構文が無効になっている場合:これらの機能は常に有効で、オフにできません.

Function：int cre2_opt_perl_classes（cre2_options_t * OPT）
void cre2_opt_set_perl_classes（cre2_options_t * OPT，int 
FLAG）

は、Perlの'd','s','w','D','S','W'を許可する.既定はオフです.

int cre2_opt_word_boundary（cre2_options_t * OPT）
void cre2_opt_set_word_boundary（cre2_options_t * OPT，int 
FLAG）

Perlの'b','B'(ワード境界ではなく)を許可します.デフォルトは無効です.

int cre2_opt_one_line (cre2_options_t * OPT) 
void cre2_opt_set_one_line (cre2_options_t * OPT, int 
FLAG)

モード'^'および'$'は、テキストの先頭と末尾にのみ一致します.既定はオフです.

5正規表現の一致
基本モードは次のように一致します.

 cre2_regexp_t *   rex;
 cre2_options_t *  opt;
 const char *      pattern = "(ciao) (hello)";

 opt = cre2_opt_new();
 cre2_opt_set_posix_syntax(opt, 1);

 rex = cre2_new(pattern, strlen(pattern), opt);
 {
   const char *   text     = "ciao hello";
   int            text_len = strlen(text);
   int            nmatch   = 3;
   cre2_string_t  match[nmatch];

   cre2_match(rex, text, text_len, 0, text_len, CRE2_UNANCHORED,
              match, nmatch);

   /* prints: full match: ciao hello */
   printf("full match: ");
   fwrite(match[0].data, match[0].length, 1, stdout);
   printf("
");

   /* prints: first group: ciao */
   printf("first group: ");
   fwrite(match[1].data, match[1].length, 1, stdout);
   printf("
");

   /* prints: second group: hello */
   printf("second group: ");
   fwrite(match[2].data, match[2].length, 1, stdout);
   printf("
");
 }
 cre2_delete(rex);
 cre2_opt_delete(opt);

- Enumeration Typedef：cre2_anchor_t

は、動作のアンカーポイントの列挙タイプに一致する.

という定数が含まれています.

      CRE2_UNANCHORED
      CRE2_ANCHOR_START
      CRE2_ANCHOR_BOTH

int cre2_match（const cre2_regexp_t * REX，const char * 
TEXT，int TEXT_LEN，int START_POS，int END_POS，cre2_anchor_t 
ANCHOR，cre2_string_t * MATCH，int NMATCH）

TEXTが参照するテキストのサブ文字列に一致し、TEXT_を保持LENバイト対正規表現オブジェクトREX.テキストが一致するとtrueが返され、そうでない場合falseが返されます.

 The zero-based indices START_POS (inclusive) and END_POS
 (exclusive) select the substring of TEXT to be examined.  ANCHOR
 selects the anchor point for the matching operation.

 Data about the matching groups is stored in the array MATCH, which
 must have at least NMATCH entries; the referenced substrings are
 portions of the TEXT buffer.  If we are only interested in
 verifying if the text matches or not (ignoring the matching
 portions of text): we can use 'NULL' as MATCH argument and 0 as
 NMATCH argument.

 The first element of MATCH (index 0) references the full portion of
 the substring of TEXT matching the pattern; the second element of
 MATCH (index 1) references the portion of text matching the first
 parenthetical subexpression, the third element of MATCH (index 2)
 references the portion of text matching the second parenthetical
 subexpression; and so on.

int cre2_easy_match（const char * PATTERN，int 
PATTERN_LEN，const char * TEXT，int TEXT_LEN，cre2_string_t * 
MATCH，int NMATCH）

と'cre 2_match()'は似ていますが、パターンはPATTERN_LENバイトを含む文字列PATTERNとして指定されています.また、アンカーする必要がなく、テキストが完全に一致しています.

テキストマッチングモードの場合、戻り値は1、テキストがマッチングモードでない場合、戻り値は0、モードが非合法である場合、戻り値は2

である.

Struct Typedef：cre2_range_t

構造タイプで、インデックスの開始と終了が一致するテキストを表すサブ文字列.

というフィールドがあります.

 'long start'
      Inclusive start byte index.

 'long past'
      Exclusive end byte index.

void cre2_strings_to_ranges（const char * TEXT，
cre2_range_t * RANGES，cre2_string_t * STRINGS，int NMATCH）

はSTRINGS配列を与え、ここでNMATCH要素は、TEXTを正規表現と一致させた結果である:RANGES配列をインデックス間隔で埋め込み、TEXTバッファで同じ結果を表す.

demo

    cre2_regexp_t *	rex;
    cre2_options_t *	opt;
    const char *		pattern;
    pattern = "(ciao) (hello)";
    opt = cre2_opt_new();
    rex = cre2_new(pattern, strlen(pattern), opt);
    {
      if (cre2_error_code(rex))
        printf("rex error 
");
      int			nmatch = 3;
      cre2_string_t	strings[nmatch];
      cre2_range_t	ranges[nmatch];
      int			e;
      const char *	text = "ciao hello";
      int			text_len = strlen(text);

      e = cre2_match(rex, text, text_len, 0, text_len, CRE2_UNANCHORED, strings, nmatch);
      if (1 != e)
        printf("match error 
");
      cre2_strings_to_ranges(text, ranges, strings, nmatch);
      printf("full match: ");
      printf("%.*s
", ranges[0].past-ranges[0].start,text+ranges[0].start);
      printf("
");
      printf("first group: ");
      printf("%.*s
", ranges[1].past-ranges[1].start,text+ranges[1].start);
      printf("
");
      printf("second group: ");
      printf("%.*s
", ranges[2].past-ranges[2].start,text+ranges[2].start);
      printf("
");
    }
    cre2_delete(rex);
    cre2_opt_delete(opt);

結果:

full match: ciao hello

first group: ciao

second group: hello

≪インスタンス｜Instance｜emdw≫
次の例では、正常に一致しました.

 const char *   pattern = "ci.*ut";
 const char *   text    = "ciao salut";
 cre2_string_t  input   = {
   .data   = text,
   .length = strlen(text)
 };
 int            result;
 result = cre2_full_match(pattern, &input, NULL, 0);

 result => 1

次の例では、カッコのサブエクスプレッションを無視して一致しました.

 const char *   pattern = "(ciao) salut";
 const char *   text    = "ciao salut";
 cre2_string_t  input   = {
   .data   = text,
   .length = strlen(text)
 };
 int            result;
 result = cre2_full_match(pattern, &input, NULL, 0);

 result => 1

次の例では、カッコ式に一致するテキスト部分がレポートされた一致に成功しました.

 const char *   pattern = "(ciao) salut";
 const char *   text    = "ciao salut";
 cre2_string_t  input   = {
   .data   = text,
   .length = strlen(text)
 };
 int            nmatch  = 1;
 cre2_string_t  match[nmatch];
 int            result;
 result = cre2_full_match(pattern, &input, match, nmatch);

 result => 1
 strncmp(text, input.data, input.length)         => 0
 strncmp("ciao", match[0].data, match[0].length) => 0

1. int cre2_full_match（const char * PATTERN，const 
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）
2. int cre2_full_match_re（cre2_regexp_t * REX，const 
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）

は、ゼロ終端文字列PATTERNまたは完全バッファTEXTに対するプリコンパイル正規表現REXと一致する.

 For example: the text 'abcdef' matches the pattern 'abcdef'
 according to this function, but neither the pattern 'abc' nor the
 pattern 'def' will match.

int cre2_partial_match（const char * PATTERN，const 
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）
int cre2_partial_match_re（cre2_regexp_t * REX，const 
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）

はゼロ終端文字列PATTERNまたはバッファTEXTに対するプリコンパイル正規表現REXに一致し、TEXTのサブ文字列が一致すれば成功する.これらの関数の動作は完全に一致する関数に似ていますが、一致するテキストは先頭と末尾にアンカーする必要はありません.

 For example: the text 'abcDEFghi' matches the pattern 'DEF'
 according to this function.

int cre2_consume（const char * PATTERN，cre2_string_t * 
TEXT，cre2_string_t * MATCH，int NMATCH）
int cre2_consume_re（cre2_regexp_t * REX，cre2_string_t *
TEXT，cre2_string_t * MATCH，int NMATCH）

は、0で終わる文字列PATTERNまたはバッファTEXTに対するプリコンパイル正規表現REXに一致し、TEXTのプレフィックスが一致すると成功する.TEXT参照のデータ構造は、パターンに一致する最後のバイトの直後に参照テキストとなる.

For example: the text 'abcDEF' matches the pattern 'abc' according
 to this function; after the call TEXT will reference the text
 'DEF'.

int cre2_find_and_consume（const char * PATTERN，
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）
int cre2_find_and_consume_re（cre2_regexp_t * REX，
cre2_string_t * TEXT，cre2_string_t * MATCH，int NMATCH）

は、ゼロで終わる文字列PATTERNまたはバッファTEXTに対するプリコンパイル正規表現REXと一致し、TEXTの非一致接頭辞をスキップした後、TEXTのサブ文字列が一致すると成功する.TEXT参照のデータ構造は、パターンに一致する最後のバイトの直後に参照テキストとなる.

 For example: the text 'abcDEFghi' matches the pattern 'DEF'
 according to this function; the prefix 'abc' is skipped; after the
 call TEXT will reference the text 'ghi'.

cre2_replace_re(cre2_regexp_t * REX，
cre2_string_t * TEXT，cre2_string_t * replace)

文字列マッチングモードに適合する文字列を対応する文字列

に置き換える.

    cre2_regexp_t *	rex;
    const char *	pattern	= "hello";
    const char *	text	= "ciao hello salut";
    const char *	replace	= "ohayo";
    cre2_string_t	target	= {
      .data   = text,
      .length = strlen(text)
    };
    cre2_string_t	rewrite	= {
      .data   = replace,
      .length = strlen(replace)
    };
    int			result;
    rex = cre2_new(pattern, strlen(pattern), NULL);
    {
      result = cre2_replace_re(rex, &target, &rewrite);
      if (1 != result)
	goto error;
      if (0 != strncmp("ciao ohayo salut", target.data, target.length))
	goto error;
      if ('\0' != target.data[target.length])
	goto error;
      PRINTF("rewritten to: ");
      FWRITE(target.data, target.length, 1);
      PRINTF("
");
    }
    cre2_delete(rex);
    free((void *)target.data);

グローバル置換もサポート

    cre2_regexp_t *	rex;
    const char *	pattern	= "( | | | | | | | | | | | | | | | )";
    const char *	text	= "ciao   salut     ";
    const char *	replace	= "sty";
    cre2_string_t	target	= {
      .data   = text,
      .length = strlen(text)
    };
    cre2_string_t	rewrite	= {
      .data   = replace,
      .length = strlen(replace)
    };
    int			result;
    rex = cre2_new(pattern, strlen(pattern), NULL);
    {
      result = cre2_global_replace_re(rex, &target, &rewrite);
      printf("result is %d
", result);
      if (1 != result)
        printf("replace error 
");
      if (0 != strncmp("ciao sty salut sty", target.data, target.length))
        printf("cmp error 
");
      if ('\0' != target.data[target.length])
        printf("target error 
");
      printf("rewritten to: ");
      printf("%.*s
", target.length, target.data);
      printf("
");
    }
    cre2_delete(rex);
    free((void *)target.data);

簡単なTcpポイントツーポイントチャットプログラム

Bitnami Redmineでredmine_vividtone_my_page_blocks-master installのPluginをやってみた結果インスコできなかった＋インスコできたけど使えなかったを記録