正規表現のPattern

17324 ワード

JAvaが正規表現を使用する方法は一般的に2つあります:1.String呼び出しを直接利用する:

string.matches(regex);//一致

string.replaceFirst(regex, replacement);//置換

string.replaceAll(regex, replacement);

string.replace(target,replacement); replaceとreplaceAllの違い2.Pattern+Matcherを使用する基本的な使い方:

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

Patternは基本的に正規表現をコンパイルするために使用され、主な検索、マッチング、置換などの操作はMatcherのインスタンスオブジェクトで行われます.Matcherの方法は次の文章を参照することができます.http://blog.csdn.net/cclovett/article/details/12448843
ここで,String下位層のマッチングと置換はPattern+Matcherを用いて実現され,indexOfは直接マッチングルックアップを行う.正規表現のコンポーネントパターン列でよく使用される構造を以下に示します.

     ：
\\    
\t    ('\u0009')

    ('\u000A')
\r    ('\u000D')
\d       [0-9]
\D        [^0-9]
\s      [\t
\x0B\f\r]
\S       [^\t
\x0B\f\r]
\w      [a-zA-Z_0-9]
\W       [^a-zA-Z_0-9]
\f    
\e Escape
\b        
\B         
\G         
\h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] 
\H A non-horizontal whitespace character: [^\h] 
\v A vertical whitespace character: [
\x0B\f\r\x85\u2028\u2029]  
\V A non-vertical whitespace character: [^\v] 
^     
^java           Java     
$      java$           java     
.       
          
java..          java           


        「[]」
[a-z]            a to z       
[A-Z]            A to Z       
[a-zA-Z]        a to z   A to Z       
[0-9]            0 to 9       
[0-9a-z]        0 to 9 a to z       
[0-9[a-z]]        0 to 9 a to z       (  )

[]   ^        「[^]」
[^a-z]             a to z       
[^A-Z]             A to Z       
[^a-zA-Z]         a to z   A to Z       
[^0-9]             0 to 9       
[^0-9a-z]         0 to 9 a to z       
[^0-9[a-z]]         0 to 9 a to z       (  )

[a-d[m-p]] a to d    m to p，   [a-dm-p]  (  )
[a-z&&[def]] d, e, or f (intersection) （  ）
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (  ) 
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](  ) 

POSIX            
{  ,     ：Pattern pattern = Pattern.compile("\\p{Digit}*");}
\p{Lower} A lower-case alphabetic character: [a-z] 
\p{Upper} An upper-case alphabetic character:[A-Z] 
\p{ASCII} All ASCII:[\x00-\x7F] 
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}] 
\p{Digit} A decimal digit: [0-9] 
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}] 
\p{Punct} Punctuation:     
\p{Graph} A visible character: [\p{Alnum}\p{Punct}] 
\p{Print} A printable character: [\p{Graph}\x20] 
\p{Blank} A space or a tab: [ \t] 
\p{Cntrl} A control character: [\x00-\x1F\x7F] 
\p{XDigit} A hexadecimal digit: [0-9a-fA-F] 
\p{Space} A whitespace character: [ \t
\x0B\f\r] 

Java           
\p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase() 
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase() 
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace() 
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored() 

Classes for Unicode scripts, blocks, categories and binary properties 
\p{IsLatin} A Latin script character (script) 
\p{InGreek} A character in the Greek block (block) 
\p{Lu} An uppercase letter (category) 
\p{IsAlphabetic} An alphabetic character (binary property) 
\p{Sc} A currency symbol 
\P{InGreek} Any character except one in the Greek block (negation) 
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction) 

Boundary matchers 
\b A word boundary 
\B A non-word boundary 
\A The beginning of the input 
\G The end of the previous match 
\Z The end of the input but for the final terminator, if any 
\z The end of the input 

Linebreak matcher 
\R Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]  

(         )
X? X(          、     、[]、() ),   0  1 
X* X  0    
X+ X       
X{n} X  n  
X{n,} X    n 
X{n,m} X    n-m  

Reluctant quantifiers 
X?? X, once or not at all 
X*? X, zero or more times 
X+? X, one or more times 
X{n}? X, exactly n times 
X{n,}? X, at least n times 
X{n,m}? X, at least n but not more than m times 

Possessive quantifiers 
X?+ X, once or not at all 
X*+ X, zero or more times 
X++ X, one or more times 
X{n}+ X, exactly n times 
X{n,}+ X, at least n times 
X{n,m}+ X, at least n but not more than m times

以下は読めません

Back references 

 Whatever the nth capturing group matched 
\k<name> Whatever the named-capturing group "name" matched 

Quotation 
\ Nothing, but quotes the following character 
\Q Nothing, but quotes all characters until \E 
\E Nothing, but ends quoting started by \Q 

Special constructs (named-capturing and non-capturing) 
(?<name>X) X, as a named-capturing group 
(?:X) X, as a non-capturing group 
(?idmsuxU-idmsuxU)  Nothing, but turns match flags i d m s u x U on - off 
(?idmsux-idmsux:X)   X, as a non-capturing group with the given flags i d m s u x on - off 
(?=X) X, via zero-width positive lookahead 
(?!X) X, via zero-width negative lookahead 
(?<=X) X, via zero-width positive lookbehind 
(?<!X) X, via zero-width negative lookbehind 
(?>X) X, as an independent, non-capturing group

Fibonacci again and again(hdu 1848+SG打表)

JAva自増流水番号(日付+乱数)