Python Forum
Regular Expression (re module)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regular Expression (re module)
#1
Regular expressions are an excellent tool that all programmers should learn. They provide a compact syntax for expressing complicated text searches. This is a short tutorial on how to use regular expression in Python.

For a starting example, let's look at searching descriptions of injuries for fall related injuries. With normal Python, you might do it like this:

if 'fall' in narrative:
    do_something()
To do it with regular expressions, you would use the re module:

import re

fall_re = re.compile('fall')
if fall_re.search(narrative):
    do_something()
So the first thing you do is you compile the regular expression. Then you can use that regular expression to search the narrative for a match. The search method returns a match object, which we will get into later. For now, you just need to know that if the match fails to find anything, it evaluates as false.

In this simple case, that's some extra typing for nothing extra. But note that the narrative might be written in the past tense, so you have to search for 'fell' as well. With normal Python, that would be:

if 'fall' in narrative or 'fell' in narrative:
    do_something()
With regular expressions, it would be:

import re

fall_re = re.compile('f(a|e)ll')
if fall_re.search(narrative):
    do_something()
Here we used a pipe character (|) in the regular expression to indicate 'or'. So the regular expression searches for an 'f', followed by an 'a' or an 'e', followed by 'll'. Note the parentheses. Without them, the or condition goes to the ends of the regular expression. So 'fa|ell' would search for 'fa' or 'ell'. That would still match what we are looking for, but it would also match a lot of stuff we aren't looking for. The parentheses also make a group, which is something we will make more use of later.

There's another way we could catch fall or fell: 'f.ll'. A period in a regular expression matches any one character (except a new line). So 'f.ll' would match fall or fell, but it would also match 'fill', 'full', and part of 'of llamas'. So not the best choice here.

Of course, the narrative might just mention that the person tripped. With the 'in' operator, you have add another 'or' clause:

if 'fall' in narrative or 'fell' in narrative or 'trip' in narrative:
    do_something()
With regular expressions, we can just do another |, again being careful to use parentheses to indicate exactly what we want to be 'or'ed.

import re

fall_re = re.compile('(f(a|e)ll)|(trip)')
if fall_re.search(narrative):
    do_something()
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Messages In This Thread
Regular Expression (re module) - by ichabod801 - Jan-06-2017, 04:53 PM
RE: Regular Expression (re module) - by ichabod801 - Jan-06-2017, 04:54 PM
RE: Regular Expression (re module) - by ichabod801 - Jan-06-2017, 04:59 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020