python - Using regex, extract quoted strings that may contain nested quotes -
i have following string:
'well, i've tried "how doth little busy bee," came different!' alice replied in melancholy voice. continued, 'i'll try again.'
now, wish extract following quotes:
1. well, i've tried "how doth little busy bee," came different! 2. how doth little busy bee, 3. i'll try again.
i tried following code i'm not getting want. [^\1]*
not working expected. or problem elsewhere?
import re s = "'well, i've tried \"how doth little busy bee,\" came different!' alice replied in melancholy voice. continued, 'i'll try again.'" i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)): print("\ngroup {:d}: ".format(i+1)) g in m.groups(): print(' '+g)
if really need return results single regular expression applied once, necessary use lookahead ((?=findme)
) finding position goes start after each match - see this answer more detailed explanation.
to prevent false matches, clauses needed regarding quotes add complexity, e.g. apostrophe in i've
shouldn't count opening or closing quote. there's no single clear-cut way of doing rules i've gone are:
- an opening quote must not preceeded word character (e.g. letter). example,
a"
not count opening quote,"
count. - a closing quote must not followed word character (e.g. letter). example,
'b
not count closing quote'.
count.
applying above rules leads following regular expression:
(?=(?:(?<!\w)'(\w.*?)'(?!\w)|"(\w.*?)"(?!\w)))
a quick sanity check test on possible candidate regular expression reverse quotes. has been done in demo here: https://regex101.com/r/vx4cl9/1
Comments
Post a Comment