python - Using regex, extract quoted strings that may contain nested quotes -
i have following string:
'well, i've tried "how doth little busy bee," came different!' alice replied in melancholy voice. continued, 'i'll try again.' now, wish extract following quotes:
1. well, i've tried "how doth little busy bee," came different! 2. how doth little busy bee, 3. i'll try again. i tried following code i'm not getting want. [^\1]* not working expected. or problem elsewhere?
import re s = "'well, i've tried \"how doth little busy bee,\" came different!' alice replied in melancholy voice. continued, 'i'll try again.'" i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)): print("\ngroup {:d}: ".format(i+1)) g in m.groups(): print(' '+g)
if really need return results single regular expression applied once, necessary use lookahead ((?=findme)) finding position goes start after each match - see this answer more detailed explanation.
to prevent false matches, clauses needed regarding quotes add complexity, e.g. apostrophe in i've shouldn't count opening or closing quote. there's no single clear-cut way of doing rules i've gone are:
- an opening quote must not preceeded word character (e.g. letter). example,
a"not count opening quote,"count. - a closing quote must not followed word character (e.g. letter). example,
'bnot count closing quote'.count.
applying above rules leads following regular expression:
(?=(?:(?<!\w)'(\w.*?)'(?!\w)|"(\w.*?)"(?!\w))) 
a quick sanity check test on possible candidate regular expression reverse quotes. has been done in demo here: https://regex101.com/r/vx4cl9/1
Comments
Post a Comment