Python: I want to check for the count of the words in the string -


i managed case i'm struggling when have consider 'color' equal 'colour' such words , return count accordingly. this, wrote dictionary of common words spelling changes in american , gb english this, pretty sure isn't right approach.

 ukus=dict()      ukus={'colour':'color','cheque':'check', 'programme':'program','grey':'gray', 'jewellery':'jewelery','aluminium':'aluminum', 'theater':'theatre','license':'licence','armour':'armor', 'artefact':'artifact','centre':'center', 'cypher':'cipher','disc':'disk','fibre':'fiber', 'fulfill':'fulfil','metre':'meter', 'savoury':'savory','tonne':'ton','tyre':'tire', 'color':'colour','check':'cheque', 'program':'programme','gray':'grey', 'jewelery':'jewellery','aluminum':'aluminium', 'theatre':'theater','licence':'license','armor':'armour', 'artifact':'artefact','center':'centre', 'cipher':'cypher','disk':'disc','fiber':'fibre', 'fulfil':'fulfill','meter':'metre','savory':'savoury', 'ton':'tonnne','tire':'tyre'} 

this dictionary wrote check values. can see degrading performance. pyenchant isn't available 64bit python. please me out. thank in advance.

okay, think know enough comments provide solution. function below allows choose either uk or replacement (it uses default, can of course flip that) , allows either perform minor hygiene on string.

import re  ukus={'colour':'color','cheque':'check', 'programme':'program','grey':'gray', 'jewellery':'jewelery','aluminium':'aluminum', 'theater':'theatre','license':'licence','armour':'armor', 'artefact':'artifact','centre':'center', 'cypher':'cipher','disc':'disk','fibre':'fiber', 'fulfill':'fulfil','metre':'meter', 'savoury':'savory','tonne':'ton','tyre':'tire'} usuk={'color':'colour','check':'cheque', 'program':'programme','gray':'grey', 'jewelery':'jewellery','aluminum':'aluminium', 'theatre':'theater','licence':'license','armor':'armour', 'artifact':'artefact','center':'centre', 'cipher':'cypher','disk':'disc','fiber':'fibre', 'fulfil':'fulfill','meter':'metre','savory':'savoury', 'ton':'tonnne','tire':'tyre'}  def str_wd_count(my_string, uk=false, hygiene=true):     = not(uk)     # if uk flag true, default uk version, else default version     print "using "+uk*"uk"+us*"us"+" dictionary default words"      # optional hygiene of non-alphanumeric characters pure word counting     if hygiene:         my_string = re.sub('[^ \d\w]',' ',my_string)         my_string = re.sub(' {1,}',' ',my_string)      # create list of unqique words in text     ttl_wds = [ukus.get(w,w) if else usuk.get(w,w) w in my_string.upper().split(' ')]     wd_counts = {}     wd in ttl_wds:         wd_counts[wd] = wd_counts.get(wd,0)+1      return wd_counts 

as sample of use, consider string

str1 = 'the colour of dog not same color of tire, or tyre, can never tell 1 fulfill'  # resulting sorted dict.items() default settings '[(the,5),(tire,2),(color,2),(of,2),(is,2),(fulfil,1),(never,1),(dog,1),(same,1),(it,1),(will,1),(i,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(one,1),(or,1)]'  # resulting sorted dict.items() hygiene=false '[(the,5),(color,2),(of,2),(is,2),(fulfil,1),(never,1),(dog,1),(same,1),(tire,,1),(will,1),(i,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(one,1),(or,1),(it,1),(tyre,,1)]'  # resulting sorted dict.items() uk swap, hygiene=true '[(the,5),(of,2),(is,2),(tyre,2),(colour,2),(which,1),(i,1),(never,1),(dog,1),(same,1),(or,1),(will,1),(as,1),(can,1),(tell,1),(not,1),(fulfill,1),(one,1),(it,1)]'  # resulting sorted dict.items() uk swap, hygiene=false '[(the,5),(of,2),(is,2),(colour,2),(one,1),(i,1),(never,1),(dog,1),(same,1),(tire,,1),(will,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(fulfill,1),(tyre,,1),(it,1),(or,1)]' 

you can use resulting dictionary of word counts in way you'd like, , if need original string modifications added easy enough modify function return that.


Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -