python - Using regex, extract quoted strings that may contain nested quotes -


i have following string:

'well, i've tried "how doth little busy bee," came different!' alice replied in melancholy voice. continued, 'i'll try again.' 

now, wish extract following quotes:

1. well, i've tried "how doth little busy bee," came different! 2. how doth little busy bee, 3. i'll try again. 

i tried following code i'm not getting want. [^\1]* not working expected. or problem elsewhere?

import re  s = "'well, i've tried \"how doth little busy bee,\" came different!' alice replied in melancholy voice. continued, 'i'll try again.'"  i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)):     print("\ngroup {:d}: ".format(i+1))     g in m.groups():         print('  '+g) 

if really need return results single regular expression applied once, necessary use lookahead ((?=findme)) finding position goes start after each match - see this answer more detailed explanation.

to prevent false matches, clauses needed regarding quotes add complexity, e.g. apostrophe in i've shouldn't count opening or closing quote. there's no single clear-cut way of doing rules i've gone are:

  1. an opening quote must not preceeded word character (e.g. letter). example, a" not count opening quote ," count.
  2. a closing quote must not followed word character (e.g. letter). example, 'b not count closing quote '. count.

applying above rules leads following regular expression:

(?=(?:(?<!\w)'(\w.*?)'(?!\w)|"(\w.*?)"(?!\w))) 

regular expression visualization

debuggex demo

a quick sanity check test on possible candidate regular expression reverse quotes. has been done in demo here: https://regex101.com/r/vx4cl9/1


Comments

Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -