Python Forum
Regex: Remove all match plus one char before all
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex: Remove all match plus one char before all
#11
Warning, aspirin required.

This is quite tricky because you can have anything before |BS|, including another |BS|. And covering your rear with something such as [\|]|BS| isn't general enough because it prevents backspacing over a |. and in a regexp you can't express something like "not this string"...

So, you have to attack at the other end: use a regexp that will match any character followed by whole sequence of consecutive |BS|. Due to the greedy way things are matched, this will always include the whole sequence of consecutive |BS|, so you initial character cannot be itself part of a |BS|.

Then look at the fine print in the specs of re.sub(), it looks for non-overlapping occurences of the pattern, so the search for the next match starts after the end of the current match... which is after the end of the sequence of |BS|, so in a sequence of |BS|you will only process one per call to sub().

So in practice, we look for a character followed by a |BS| followed by zero or more other |BS| (captured in a group) and replace that by just that captured group:

import re

pattern=re.compile(r'.\|BS\|((\|BS\|)*)')

def noBS(s):
    print '------------'
    previous=''
    while s!=previous:
        previous=s
        s=re.sub(pattern,r'\1',s)
        print s # this shows that the two sequences of |BS| are processed in parallel 
    return s

print noBS("it |BS||BS||BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is one|BS||BS||BS|an example")
print noBS("it |BS||BS||BS| |BS|this is o n  e|BS||BS||BS||BS||BS||BS|an example")
# The first 'BS|' gets backspaced over due to missing leading '|'... 
print noBS("it BS||BS||BS||BS||BS||BS||BS|this is o n  e|BS||BS||BS||BS||BS||BS|an example")
Output for he last one:
Output:
it BS|BS||BS||BS||BS||BS|this is o n  |BS||BS||BS||BS||BS|an example it B|BS||BS||BS||BS|this is o n |BS||BS||BS||BS|an example it |BS||BS||BS|this is o n|BS||BS||BS|an example it|BS||BS|this is o |BS||BS|an example i|BS|this is o|BS|an example this is an example
Unfortunately, I don't think you can avoid n explicit iteration.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply


Messages In This Thread
RE: Regex: Remove all match plus one char before all - by Ofnuts - Feb-21-2017, 09:22 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Facing issue in python regex newline match Shr 6 1,531 Oct-25-2023, 09:42 AM
Last Post: Shr
Sad How to split a String from Text Input into 40 char chunks? lastyle 7 1,296 Aug-01-2023, 09:36 AM
Last Post: Pedroski55
  Failing regex, space before and after the "match" tester_V 6 1,283 Mar-06-2023, 03:03 PM
Last Post: deanhystad
  Regex pattern match WJSwan 2 1,365 Feb-07-2023, 04:52 AM
Last Post: WJSwan
  Match substring using regex Pavel_47 6 1,535 Jul-18-2022, 07:46 AM
Last Post: Pavel_47
  Match key-value json,Regex saam 5 5,555 Dec-07-2021, 03:06 PM
Last Post: saam
  How to replace on char with another in a string? korenron 3 2,429 Dec-03-2020, 07:37 AM
Last Post: korenron
  How to remove char from string?? ridgerunnersjw 2 2,624 Sep-30-2020, 03:49 PM
Last Post: ridgerunnersjw
  regex.findall that won't match anything xiaobai97 1 2,091 Sep-24-2020, 02:02 PM
Last Post: DeaD_EyE
  Creating new list based on exact regex match in original list interjectdirector 1 2,366 Mar-08-2020, 09:30 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020