Python Forum
Regex to find triple characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex to find triple characters
#1
I am looking for a regex pattern to find the instance of 3 matching characters in series in a string. The characters can be letters or numbers, but can only be 3 characters in length and all be the same character. For example, the string 'ab999thc7' would result in '999' being found, but 'abddddthc7' would not be valid due to the repeated character, 'd', being a length of 4. Any assistance would be greatly appreciated.
Reply
#2
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']
Reply
#3
This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
« We can solve any problem by introducing an extra level of indirection »
Reply
#4
This seems to work, although it contains 3 times opening bracket, (, but only 2 times closing bracket, ), which is weird!

string = 'AAABBCCCDDEEEFFGGGHHIIIJJKKK'
result = [match[0] for match in re.findall(r'((\w)\2{2,})', string)]
Reply
#5
(May-14-2024, 05:59 AM)deanhystad Wrote:
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.
Reply
#6
(May-14-2024, 08:14 AM)Gribouillis Wrote: This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
Thank you, this solution does work but there is a compilation error stating that groups are not supported in lookbehinds. Any way to clear that up. Appreciate it.
Reply
#7
Thank you, but this solution allows consecutive characters of more than 3 to slip through. I need the expression to specifically look for exactly 3 consecutive characters.
Reply
#8
(May-14-2024, 12:29 PM)bfallert Wrote: there is a compilation error stating that groups are not supported in lookbehinds.
Which version of Python are you using? It works fine here in Python 3.10. The latest Python is 3.12 as of may 2024. Groups are allowed in lookbehind assertions since Python 3.5 (2015).
« We can solve any problem by introducing an extra level of indirection »
Reply
#9
Quote:I am looking for a regex pattern to find the instance of 3 matching characters in series in a string.

This finds all instances of "3 matching characters in series", as stated above.

Is that NOT what you want?

string = 'AAAABBBBCCCC11212222333444455555666666'
result = [match[0] for match in re.findall(r'((\w)\2{2,2})', string)]
Output:
result ['AAA', 'BBB', 'CCC', '222', '333', '444', '555', '666', '666']
Reply
#10
(May-14-2024, 12:27 PM)bfallert Wrote:
(May-14-2024, 05:59 AM)deanhystad Wrote:
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.

Change the repeat count from {1,} to {2}.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find numbers using Regex giddyhead 18 3,404 Jul-28-2022, 12:29 AM
Last Post: giddyhead
  Find if chain of characters or number Frankduc 4 1,873 Feb-11-2022, 01:55 PM
Last Post: Frankduc
  Regex not finding all unicode characters tantony 3 2,347 Jul-13-2021, 09:11 PM
Last Post: tantony
  Find and replace in files with regex and Python Melcu54 0 1,900 Jun-03-2021, 09:33 AM
Last Post: Melcu54
  EOF while scanning triple-quoted string literal louis216 1 4,031 Jun-30-2020, 04:11 AM
Last Post: bowlofred
  How to find the first and last of one of several characters in a list of strings? tadsss 2 2,265 Jun-02-2020, 05:23 PM
Last Post: bowlofred
  Remove escape characters / Unicode characters from string DreamingInsanity 5 14,167 May-15-2020, 01:37 PM
Last Post: snippsat
  Find and replace to capitalize with Regex hermobot 2 2,584 Mar-21-2020, 12:30 PM
Last Post: hermobot
  Help converting MATLAB triple-for loop to Python davlovsky 1 2,044 Oct-29-2019, 10:26 PM
Last Post: scidam
  Do I always have to use triple quotes or \n for multi-line statements? DragonG 3 2,692 Oct-24-2018, 11:21 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020