Perl 6の正規表現

5885 ワード

Perl 6の正規表現
Perl 5はテキストを処理するのが得意で、Perl 6は言語を処理するように設計されています.Perl 6は多くの処理言語に関するデータタイプを内蔵しています.

Regex Match Grammar AST Macro

Regex:正規表現を記述するために使用されます.

Match:マッチングされたデータ構造を記述するために使用されます.

Graammar:言語文法を記述するためのマッチング式.

AST:抽象文法ツリー、テキスト言語を解析した後のデータ構造.

Maro:マクロ、抽象的な文法の木に対する一連の方法集.

Regex正規表現
Perl 5を習った人は幸いです.Perl 6のデフォルトのモデルはPerl 5のxmsモードです.
Perl 6は~~スマートマッチング記号を使ってマッチング演算を行います.

> if "string" ~~ / \w+ / { say "string match '\w+'" }

正規表現には複数の表現方法があります.

> if "str" ~~ m/\w+/ { say "str match words" }
> if "str" ~~ rx/\w+/ { say "str match word" }
> if "str" ~~ m{\w+} { say "str match word" }
> if "str" ~~ m<\w+> { say "str match word" }
> if "str" ~~ m[\w+] { say "str match word" }

Perl 6の正規表現では、スペースは無視されます.\sはリターン、ポイントを表します.任意の文字を表します.

> if "a
b" ~~ / ... / { say "dot could match any char" }
> if " \t
" ~~ / ^ \s+ $ / { say '\s could match \t 
' }

マッチするたびに、Perl 6は、マッチング結果に関する変数を変数に保存します.

if 'abcdef' ~~ / de / {
    #             
    say ~$/;          # de
    say $/.prematch;  # abc
    say $/.postmatch; # f
    say $/.from;      # 3
    say $/.to;        # 5
};

Perl 6は依然として(.)を使用してキャプチャされているが、逆キャプチャされた変数インデックス値は0から開始される.

> if "hello hello" ~~ / (\w+) <ws> $0 / { say "match two same word" }

キャプチャ値を保存する変数は、個々の変数ではなく1つの配列に入れられます.

> if "hello" ~~ / (\w+) / { say "match $/[0] }

Perl 5の以下の文字セットの略語は依然として有効です.

\d and\D'ab 42'/\d/and say~$4'ab 42'/\D/and say~$a

Perl 6の文字セットの略語がマッチするのはUnicodeの範囲です.

"U+0035" ~~ /\d/ and say "match"; # match
"U+07C2" ~~ /\d/ and say "match"; # match
"U+0E53" ~~ /\d/ and say "match"; # match

\w and\W“abc 123 ABC_”~/^\w+$and say“match”match

\h and\H

\v and\V"U+000 A"/\v/and say"match"match「U+000 B」/\v/and say「match」;match「U+000 C」/\v/and say「match」;match「U+0085」/\v/and say「match」;match「U+2029」~/\v/and say「match」;match

and\N

改行と一致し、WindowsシステムではCR LFという2文字が同時にマッチします.

\t and\T

マッチングtab(U+0009)

\s and\S

ユニックの文字セット
＜:L＞Letter Negation＜:LC＞Cased_Letter<:Lu>Uppercase_Letter<:Ll>Lowercase_Letter<:Lt>Titlecase_Letter<:Lm>Modifiter_Letter<:Lo>Other_Letter<:M>Mark<:Mn>Nonspacing_Mark<:Mc>Spacing_Mark<:Me>Eclosing_Mark<:N>Number<:Nd>Decimal_Number(alsoDigit)<:Nl>Letter_Number
各文字セットには、対応する補足の表現方法があります.L><:LC>….
文字セットの内部にはいくつかの演算子があります.

+ | - & ^

を並べてセットアップ

コンソール

&インターレース

追加セットset difference

^異形XORは一つあればいいです.二つがあります.Letter+:Number>

ユーザー定義の文字セット<...>

<[a..c123]>
<[\d] - [13579]>
<[02468]>

数量制限子

+ \w+ one or more
* \w* zero or more
? \w? zero or one match
**min..max \w**3..5
**min..*  \w**4..*

正規の文字列
文字の字面量を表すには、\Q.\Eを使う必要はありません.文字列の形式を使います.

'[[]]' ~~ / '[[]]' / and say "match"; # match
"{()}" ~~ / "{()}" / and say "match"; # match

グループ化
括弧処理は、グループを捕捉するために使用されるほか、グループを捕捉しない2つの書き方があります.

/ f[oo]* / # will match "f", "foo", "foooo"
/ f'oo'* / # same as up
/ f"oo"* / # same as up

ブランチと結合Alternation and Conjunction

/f|fo|foo/           
/f||fo||foo/
/<[a..z]>+ & [...]/
/<[a..z]>+ && [...]/

零幅は断言する

変数の補間
Perl 6の変数内挿は文字列を正規表現に変換して水泡にします.

 my $foo = "ab*c";
 my @bar = <one two three>;

 /$foo @bar/ exactly as: /'ab*c' [one|two|three]/

正規表現の修飾

$foo ~~ m :i/ foo / # will match "foo" 'FOO'
$foo ~~ m :P5/[a-z]/ # use perl5 regex syntax
$foo ~~ m :g/ foo / # matches as many as possible
$foo ~~ m :s/ foo / # pattern whitespace is valid
$foo ~~ m :ratchet/foo|ddd/ # dont do any backtracking
m:pos($p)/ pattern /  # match at position $p

他の飾りがあります.

:basechar  Ignore accents and other marks
:continue  Continue mathing from where previous match
:byte      dot mathes bytes
:codes     dot matchs codepoints
:chars     dot matches "characters" at current

特定の位置にマッチする修飾子があります.

$_ = "foo bar baz blat";
m :3x/ a / # matches the "a" characters in each word
m :nth(3)/\w+ / # matches "baz"

修飾子は、式の内部のグループの前に置くこともできます.

/ a [ :i foo ] z/ # matches "afooz", "aFooz",...

修飾子:sigspaceは非常に有用で、表現のスペース表現\s+.

m:sigspace/One small step/ == /\s*One\s+small\s+step\s*/
mm/One small step/ is as below

カスタム文字セット

my regex identifier { \w+ }
/ <identifier> / <==> / \w+ /

定義済みの文字セット

<alpha>              
<digit>         
<ident>       
<sp>           
<ws>     an arbitrary amount of whitespace
<dot>    a period (same as '.')
<lt>     a less-than character same as '>'
<gt>     a greater-than character (same as '>')
<null>   matches nothing (useful in alternations that may be empty)

前を見て後ろを見てください.

<before ...>        ...
<after  ...>        ...

/ foo <before \d+> / #        ，

データ構造——手順表の基本操作