Python Forum
Use or raw string on regular expressions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Use or raw string on regular expressions
#1
I am studying regular expressions in an introduction course and I don't completely understand the use of raw strings. I will ilustrate my question with the most basic example:
Quote:import re

text = "This is a text with a dot."
pattern = r"\."
match = re.findall(pattern, text)

print(match)
The use of '"r'' before the regular expresion "\." should impede the escape of the backslash before the ".", therefore searching for any character except a newline (use of ''.'' in regular expressions), but it doesn't; The expression is still interpreted as a escape, searching for the caracter ''.''.
Getting the same result that not using "r" before the regular expresion, Im confused of the use of "r".

Note: in other types of escapes as ''\n'', it does work as expected, impeding the escape and searching for the literal string "\n"
Reply
#2
It is important to know that re uses its own parser to look at string expressions. Python passes a string or a raw string to re and re takes over.

Try this:

Quote:pattern = '\n'
pattern
'\n'
pattern = r'\n'
pattern
'\\n'

Whichever pattern you feed to re, re finds \n

. is a special character within re. \. is not a special character like \n

. represents, in re, any character except newline. If you feed '.' to re, you will get back all characters, but if you escape . as \. reu will just find the dot.

To quote the re docs:

Quote:Raw String Notation

Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:

re.match(r"\W(.)\1\W", " ff ")

re.match("\\W(.)\\1\\W", " ff ")
When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\". Without raw string notation, one must use "\\\\", making the following lines of code functionally identical:

re.match(r"\\", r"\\")

re.match("\\\\", r"\\")
Reply
#3
\. Is not an escape sequence. \n is an escape sequence. A backslash by itself does not mean there is an escape sequence. The next character has to be n, b, t, x or backslash for it to be an escape sequence. I may have missed an escape character in that list which is why I use raw strings any time I write regex patterns.
Reply
#4
(May-09-2024, 12:16 PM)deanhystad Wrote: \. Is not an escape sequence. \n is an escape sequence. A backslash by itself does not mean there is an escape sequence. The next character has to be n, b, t, x or backslash for it to be an escape sequence. I may have missed an escape character in that list which is why I use raw strings any time I write regex patterns.

Thanks for the quick answer guys.

\. is not an escape sequence? If I want to search for a literal "." I have to escape it with "\.", am I wrong? The same that for "\n", if I want to search for a literal "\n" I have to escape it with ''\\n" or r'\n".
Sorry if you find this as a very basic question, I just find it very confusing.
Reply
#5
\ is used to start an ascii escape sequence and \ has special meaning in a regular expression. Raw strings only affect the ascii escape sequence. You don't need a raw string for "\." because this is not an ascii escape sequence. "\n" is an ascii escape sequence. If you want to include "\n" in a regular expression, you need to use to use a raw string or the double backslash.

Ascii escape sequences only have meaning in string literals. When your program is parsed the escape sequences are replaced with their non-visible character counterpart. In your example:
pattern = r"\."
Using a raw string has no effect. There are no ascii escape sequences in your string literal, so pattern == "\." with or without the raw prefix. The raw prefix does make a difference if the string literal contains an ascii escape sequence.
pattern = r"\n"
pattern == "\n" with the raw prefix, but "<newline>" without the raw prefix.
Zaya_pool likes this post
Reply
#6
I see! So the key is to differentiate between a regular ASCII escape and a special RegEx escape. Raw strings only affect regular ASCII escapes. Got it!

Thanks a lot.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Information Do regular expressions still need raw strings? bobmon 3 374 May-03-2024, 09:05 AM
Last Post: rishika24
  Recursive regular expressions in Python risu252 2 1,374 Jul-25-2023, 12:59 PM
Last Post: risu252
Sad Regular Expressions - so close yet so far bigpapa 5 1,065 May-03-2023, 08:18 AM
Last Post: bowlofred
  Having trouble with regular expressions mikla 3 2,684 Mar-16-2021, 03:44 PM
Last Post: bowlofred
  Regular expression: cannot find 1st number in a string Pavel_47 2 2,466 Jan-15-2021, 04:39 PM
Last Post: bowlofred
  Regular expression: return string, not list Pavel_47 3 2,570 Jan-14-2021, 11:49 AM
Last Post: Pavel_47
  Regular Expressions pprod 4 3,163 Nov-13-2020, 07:45 AM
Last Post: pprod
  simple f-string expressions to access a dictionary Skaperen 0 1,557 Jul-15-2020, 05:04 AM
Last Post: Skaperen
  Format phonenumbers - regular expressions Viking 2 1,972 May-11-2020, 07:27 PM
Last Post: Viking
  regular expressions in openpyxl. format picnic 0 2,527 Mar-28-2020, 09:47 PM
Last Post: picnic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020