Python Forum
Do regular expressions still need raw strings?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Do regular expressions still need raw strings?
#1
Information 
Hello everyone (1st post!)

When I started using regular expressions back in Python 2, it seemed that best practice was to use raw strings in order to avoid having escape characters all over the place.

But recently, I've done a few patterns that I thought would need "escaped escapes" and the like, but they seemed to work fine as normal text strings.

Has the regular expression syntax been streamlined, or did I just not make a regex that needed a raw string?
(sorry, no, I don't remember the specific pattern at the moment.)
Reply
#2
(May-02-2024, 03:43 PM)bobmon Wrote: Has the regular expression syntax been streamlined, or did I just not make a regex that needed a raw string?
Not much has changed. You made a regex that did not need a raw string.

Now let's see this from a higher point of vue: Regexes in Python need to be written in the language of Python regexes, which is described in the re module. The problem is that this language uses the backslash character as a special character, and it turns out that the backslash character is ALSO used as a special character in Python's LITERAL strings.

The consequence is that when you write a regex in a literal string, a backslash character from the regex language needs to be escaped when there is an ambiguity. For example the regex \b which maches the beginning of a word must be written in a literal string as "\\b" or r"\b" because \b in an ordinary literal string means an ASCII-backspace character, which is quite different.

So strictly speaking, regexes DONT need raw strings but raw strings in this context are a handy tool to avoid incorrect interpretation of literal strings. For your safety, use raw strings to write literal regexes.

Can you match an ascii-backspace in a regex written as a raw literal string? Use \x08
>>> import re
>>> r = re.compile(r'spa\x08m')
>>> r.search('hello spa\bm')
<re.Match object; span=(6, 11), match='spa\x08m'>
>>> 
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Hmm. Okay, thanks!
Reply
#4
Yes, you can match an ASCII backspace character (\x08) in a regular expression written as a raw literal string. Here's how you can do it:

import re

# Define the regular expression pattern using a raw string
pattern = r'spa\x08m'

# Compile the regular expression
regex = re.compile(pattern)

# Search for the pattern in the input string
match = regex.search('hello spa\bm')

# Print the match
print(match)

This code will output:

<re.Match object; span=(6, 11), match='spa\x08m'>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Use or raw string on regular expressions Zaya_pool 5 330 May-09-2024, 06:10 PM
Last Post: Zaya_pool
  Trying to understand strings and lists of strings Konstantin23 2 836 Aug-06-2023, 11:42 AM
Last Post: deanhystad
  Recursive regular expressions in Python risu252 2 1,362 Jul-25-2023, 12:59 PM
Last Post: risu252
  Taking Mathematical Expressions from Strings quest 2 753 Jul-02-2023, 01:38 PM
Last Post: Pedroski55
Sad Regular Expressions - so close yet so far bigpapa 5 1,060 May-03-2023, 08:18 AM
Last Post: bowlofred
  Splitting strings in list of strings jesse68 3 1,833 Mar-02-2022, 05:15 PM
Last Post: DeaD_EyE
  Having trouble with regular expressions mikla 3 2,679 Mar-16-2021, 03:44 PM
Last Post: bowlofred
  Regular Expressions pprod 4 3,158 Nov-13-2020, 07:45 AM
Last Post: pprod
  Format phonenumbers - regular expressions Viking 2 1,965 May-11-2020, 07:27 PM
Last Post: Viking
  regular expressions in openpyxl. format picnic 0 2,526 Mar-28-2020, 09:47 PM
Last Post: picnic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020