Python Regular Expression


1.  References
http://docs.python.org/library/re.html#module-re
http://docs.python.org/library/re.html#raw-string-notation
http://docs.python.org/howto/regex.html#regex-howto
"mastering regular expression"
 
2. an example in bitbake
'(?P.*?)(?P_append|_prepend)(_(?P.*))?'
This regular expression matches strings which have form 'xxx_append' or 'xxx_prepend' or 'xxx_append_yyy' or 'xxx_prepend_yyy'.
Let's take a simple analysis on it.
REGEXP = '(?P.*?)(?P_append|_prepend)(_(?P.*))?' = ABC? (match AB or ABC)
A = '(?P.*?)' matches the same string set with '.*?' but the matched substring could be accessed by the identifier base.
B = '(?P_append|_prepend) matches '_append' or '_prepend'.
C = (_(?P.*)) matches string like '_xxxxxxxxx'
Following the test code for this regular expression the the corresponding output in shell.
#!/usr/bin/env python                                                                                                                                                                                           

# test_regexp.py regexp string                                                                                                                                                                                  
import sys
import re

#pattern = sys.argv[1]                                                                                                                                                                                          
#string = sys.argv[2]                                                                                                                                                                                           

def test(pattern, string):
    result = re.match(pattern, string)
    if result == None:
        print (pattern, string, None)
    else:
        print (pattern, string, result.group('keyword'), result.group('add'), result.group(0))

pattern = '(?P<base>.*?)(?P<keyword>_append|_prepend)(_(?P<add>.*))?'
test(pattern, 'hello_append')
test(pattern, 'hello')
test(pattern, 'hello_prepend')
test(pattern, 'hello_append_add_package')
test(pattern, 'hello_append_world_package')

chenqi@chenqi-OptiPlex-760:~/mypro/python$ ./test_regexp.py ('(?P.*?)(?P_append|_prepend)(_(?P.*))?', 'hello_append', '_append', None, 'hello_append') ('(?P.*?)(?P_append|_prepend)(_(?P.*))?', 'hello', None) ('(?P.*?)(?P_append|_prepend)(_(?P.*))?', 'hello_prepend', '_prepend', None, 'hello_prepend') ('(?P.*?)(?P_append|_prepend)(_(?P.*))?', 'hello_append_add_package', '_append', 'add_package', 'hello_append_add_package') ('(?P.*?)(?P_append|_prepend)(_(?P.*))?', 'hello_append_world_package', '_append', 'world_package', 'hello_append_world_package')
 
3. a complete list of metacharacters
. ^ $ * + ? { } [ ] \ | ( )

 
 
4. predefined special sequences
\d
Matches any decimal digit; this is equivalent to the class
[0-9].
\D
Matches any non-digit character; this is equivalent to the class
[^0-9].
\s
Matches any whitespace character; this is equivalent to the class
[
\t\r\f\v].
\S
Matches any non-whitespace character; this is equivalent to the class
[^
\t\r\f\v].
\w
Matches any alphanumeric character; this is equivalent to the class
[a-zA-Z0-9_].
\W
Matches any non-alphanumeric character; this is equivalent to the class[^a-zA-Z0-9_].
 
To Be Continue ...