Regex: Remove all match plus one char before all

***Ofnuts*** · (This post was last modified: Feb-21-2017, 09:22 PM by Ofnuts.)

Warning, aspirin required.

This is quite tricky because you can have anything before |BS|, including another |BS|. And covering your rear with something such as [\|]|BS| isn't general enough because it prevents backspacing over a |. and in a regexp you can't express something like "not this string"...

So, you have to attack at the other end: use a regexp that will match any character followed by whole sequence of consecutive |BS|. Due to the greedy way things are matched, this will always include the whole sequence of consecutive |BS|, so you initial character cannot be itself part of a |BS|.

Then look at the fine print in the specs of re.sub(), it looks for non-overlapping occurences of the pattern, so the search for the next match starts after the end of the current match... which is after the end of the sequence of |BS|, so in a sequence of |BS|you will only process one per call to sub().

So in practice, we look for a character followed by a |BS| followed by zero or more other |BS| (captured in a group) and replace that by just that captured group:

import re

pattern=re.compile(r'.\|BS\|((\|BS\|)*)')

def noBS(s):
    print '------------'
    previous=''
    while s!=previous:
        previous=s
        s=re.sub(pattern,r'\1',s)
        print s # this shows that the two sequences of |BS| are processed in parallel 
    return s

print noBS("it |BS||BS||BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is o n  e|BS||BS||BS||BS||BS||BS|an example")
# The first 'BS|' gets backspaced over due to missing leading '|'... 
print noBS("it BS||BS||BS||BS||BS||BS||BS|this is o n  e|BS||BS||BS||BS||BS||BS|an example")

Output for he last one:

Output:it BS|BS||BS||BS||BS||BS|this is o n  |BS||BS||BS||BS||BS|an example
it B|BS||BS||BS||BS|this is o n |BS||BS||BS||BS|an example
it |BS||BS||BS|this is o n|BS||BS||BS|an example
it|BS||BS|this is o |BS||BS|an example
i|BS|this is o|BS|an example
this is an example

Unfortunately, I don't think you can avoid n explicit iteration.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Facing issue in python regex newline match	Shr	6	1,531	Oct-25-2023, 09:42 AM Last Post: Shr
	How to split a String from Text Input into 40 char chunks?	lastyle	7	1,296	Aug-01-2023, 09:36 AM Last Post: Pedroski55
	Failing regex, space before and after the "match"	tester_V	6	1,283	Mar-06-2023, 03:03 PM Last Post: deanhystad
	Regex pattern match	WJSwan	2	1,365	Feb-07-2023, 04:52 AM Last Post: WJSwan
	Match substring using regex	Pavel_47	6	1,535	Jul-18-2022, 07:46 AM Last Post: Pavel_47
	Match key-value json,Regex	saam	5	5,555	Dec-07-2021, 03:06 PM Last Post: saam
	How to replace on char with another in a string?	korenron	3	2,429	Dec-03-2020, 07:37 AM Last Post: korenron
	How to remove char from string??	ridgerunnersjw	2	2,624	Sep-30-2020, 03:49 PM Last Post: ridgerunnersjw
	regex.findall that won't match anything	xiaobai97	1	2,091	Sep-24-2020, 02:02 PM Last Post: DeaD_EyE
	Creating new list based on exact regex match in original list	interjectdirector	1	2,366	Mar-08-2020, 09:30 PM Last Post: deanhystad

Regex: Remove all match plus one char before all

User Panel Messages

Announcements