python 正则表达式语法学习笔记
正则表达式(regularexpression)描述了一种字符串匹配的模式(pattern),可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。
Python自1.5版本起增加了re模块,它提供Perl风格的正则表达式模式。
re模块使Python语言拥有全部的正则表达式功能。
compile函数根据一个模式字符串和可选的标志参数生成一个正则表达式对象。该对象拥有一系列方法用于正则表达式匹配和替换。
本文重点给大家介绍python正则表达式语法。
Thespecialcharactersare:
"." Matchesanycharacterexceptanewline.
"^" Matchesthestartofthestring.
"$" Matchestheendofthestringorjustbeforethenewlineat
theendofthestring.
"*" Matches0ormore(greedy)repetitionsoftheprecedingRE.
Greedymeansthatitwillmatchasmanyrepetitionsaspossible.
"+" Matches1ormore(greedy)repetitionsoftheprecedingRE.
"?" Matches0or1(greedy)oftheprecedingRE.
*?,+?,??Non-greedyversionsofthepreviousthreespecialcharacters.
{m,n} MatchesfrommtonrepetitionsoftheprecedingRE.
{m,n}? Non-greedyversionoftheabove.
"\\" Eitherescapesspecialcharactersorsignalsaspecialsequence.
[] Indicatesasetofcharacters.
A"^"asthefirstcharacterindicatesacomplementingset.
"|" A|B,createsanREthatwillmatcheitherAorB.
(...) MatchestheREinsidetheparentheses.
Thecontentscanberetrievedormatchedlaterinthestring.
(?aiLmsux)SettheA,I,L,M,S,U,orXflagfortheRE(seebelow).
(?:...) Non-groupingversionofregularparentheses.
(?P...)Thesubstringmatchedbythegroupisaccessiblebyname.
(?P=name) Matchesthetextmatchedearlierbythegroupnamedname.
(?#...) Acomment;ignored.
(?=...) Matchesif...matchesnext,butdoesn'tconsumethestring.
(?!...) Matchesif...doesn'tmatchnext.
(?<=...)Matchesifprecededby...(mustbefixedlength).
(? (?(id/name)yes|no)Matchesyespatternifthegroupwithid/namematched,
the(optional)nopatternotherwise.Thespecialsequencesconsistof"\\"andacharacterfromthelist
below. Iftheordinarycharacterisnotonthelist,thenthe
resultingREwillmatchthesecondcharacter.
\number Matchesthecontentsofthegroupofthesamenumber.
\A Matchesonlyatthestartofthestring.
\Z Matchesonlyattheendofthestring.
\b Matchestheemptystring,butonlyatthestartorendofaword.
\B Matchestheemptystring,butnotatthestartorendofaword.
\d Matchesanydecimaldigit;equivalenttotheset[0-9]in
bytespatternsorstringpatternswiththeASCIIflag.
InstringpatternswithouttheASCIIflag,itwillmatchthewhole
rangeofUnicodedigits.
\D Matchesanynon-digitcharacter;equivalentto[^\d].
\s Matchesanywhitespacecharacter;equivalentto[\t\n\r\f\v]in
bytespatternsorstringpatternswiththeASCIIflag.
InstringpatternswithouttheASCIIflag,itwillmatchthewhole
rangeofUnicodewhitespacecharacters.
\S Matchesanynon-whitespacecharacter;equivalentto[^\s].
\w Matchesanyalphanumericcharacter;equivalentto[a-zA-Z0-9_]
inbytespatternsorstringpatternswiththeASCIIflag.
InstringpatternswithouttheASCIIflag,itwillmatchthe
rangeofUnicodealphanumericcharacters(lettersplusdigits
plusunderscore).
WithLOCALE,itwillmatchtheset[0-9_]pluscharactersdefined
aslettersforthecurrentlocale.
\W Matchesthecomplementof\w.
\\ Matchesaliteralbackslash.Thismoduleexportsthefollowingfunctions:
match Matcharegularexpressionpatterntothebeginningofastring.
fullmatchMatcharegularexpressionpatterntoallofastring.
search Searchastringforthepresenceofapattern.
sub Substituteoccurrencesofapatternfoundinastring.
subn Sameassub,butalsoreturnthenumberofsubstitutionsmade.
split Splitastringbytheoccurrencesofapattern.
findall Findalloccurrencesofapatterninastring.
finditer Returnaniteratoryieldingamatchobjectforeachmatch.
compile CompileapatternintoaRegexObject.
purge Cleartheregularexpressioncache.
escape Backslashallnon-alphanumericsinastring.Someofthefunctionsinthismoduletakesflagsasoptionalparameters:
A ASCII Forstringpatterns,make\w,\W,\b,\B,\d,\D
matchthecorrespondingASCIIcharactercategories
(ratherthanthewholeUnicodecategories,whichisthe
default).
Forbytespatterns,thisflagistheonlyavailable
behaviourandneedn'tbespecified.
I IGNORECASE Performcase-insensitivematching.
L LOCALE Make\w,\W,\b,\B,dependentonthecurrentlocale.
M MULTILINE "^"matchesthebeginningoflines(afteranewline)
aswellasthestring.
"$"matchestheendoflines(beforeanewline)aswell
astheendofthestring.
S DOTALL "."matchesanycharacteratall,includingthenewline.
X VERBOSE IgnorewhitespaceandcommentsfornicerlookingRE's.
U UNICODE Forcompatibilityonly.Ignoredforstringpatterns(it
isthedefault),andforbiddenforbytespatterns.
下面看下正则表达式匹配的流程:
正则表达式的大致匹配过程是:依次拿出表达式和文本中的字符比较,如果每一个字符都能匹配,则匹配成功;一旦有匹配不成功的字符则匹配失败。如果表达式中有量词或边界,这个过程会稍微有一些不同,但也是很好理解的,自己多使用几次就能明白。
总结
到此这篇关于python正则表达式语法记录的文章就介绍到这了,更多相关python正则表达式语法记录内容请搜索毛票票以前的文章或继续浏览下面的相关文章希望大家以后多多支持毛票票!
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。