三、匹配单词的正则表达式
我们看到,如果仅仅要求找出文本中的字符串to的话,上面的正则表达式还是可以的,但是如果要想匹配文本中的单词to的话,上面的正则表达式to就不够用了。例如,将上面的代码中的字符串s的定义改为下面的样子:
s = '''In company or association with respect to place or time;
as, to live together in one house; to live together in the
same age; they walked together to the town.'''
as, to live together in one house; to live together in the
same age; they walked together to the town.'''
我们发现,上面的文字中,不仅单词to,此外像together和town这些单词中也包含字符串to,如果这时再用上面的正则表达式to来查找“单词”to的话,就会出错了。运行修改后的代码,结果如下所示:
In company or association with respect {to} place or time;
as, {to} live {to}gether in one house; {to} live {to}gether in the
same age; they walked {to}gether {to} the {to}wn.
as, {to} live {to}gether in one house; {to} live {to}gether in the
same age; they walked {to}gether {to} the {to}wn.
如果要精确地查找to这个单词的话,我们应该使用\bto\b。这里,\b是正则表达式规定的一个特殊代码或称为元字符,它代表单词的开头或结尾,也就是单词的分界处。尽管英文的单词通常是由空格、标点符号或者换行来分隔的,但是\b并不匹配这些单词分隔字符中的任何一个,它只匹配一个位置。
再次重申,元字符\b所匹配的不是字符,而是位置:其前、后字符不全为(一个是,一个不是或不存在)\w的位置。这里的\w也是一个元字符,后面会讲到。
修改后的代码如下所示:
import re
def re_show(pat, s):
print re.compile(pat, re.M).sub("{\g<0>}", s.rstrip()),'\n'
s = '''In company or association with respect to place or time;
as, to live together in one house; to live together in the
same age; they walked together to the town.'''
re_show(r"\bto\b",s)
def re_show(pat, s):
print re.compile(pat, re.M).sub("{\g<0>}", s.rstrip()),'\n'
s = '''In company or association with respect to place or time;
as, to live together in one house; to live together in the
same age; they walked together to the town.'''
re_show(r"\bto\b",s)
我们看看它的运行结果:
In company or association with respect {to} place or time;
as, {to} live together in one house; {to} live together in the
same age; they walked together {to} the town.
as, {to} live together in one house; {to} live together in the
same age; they walked together {to} the town.
通过上面的例子,相信读者对正则表达式已经有了一个感性的认识,下面我们开始详细介绍有关元字符方面的内容。