Python: I want to check for the count of the words in the string -
i managed case i'm struggling when have consider 'color' equal 'colour' such words , return count accordingly. this, wrote dictionary of common words spelling changes in american , gb english this, pretty sure isn't right approach.
ukus=dict() ukus={'colour':'color','cheque':'check', 'programme':'program','grey':'gray', 'jewellery':'jewelery','aluminium':'aluminum', 'theater':'theatre','license':'licence','armour':'armor', 'artefact':'artifact','centre':'center', 'cypher':'cipher','disc':'disk','fibre':'fiber', 'fulfill':'fulfil','metre':'meter', 'savoury':'savory','tonne':'ton','tyre':'tire', 'color':'colour','check':'cheque', 'program':'programme','gray':'grey', 'jewelery':'jewellery','aluminum':'aluminium', 'theatre':'theater','licence':'license','armor':'armour', 'artifact':'artefact','center':'centre', 'cipher':'cypher','disk':'disc','fiber':'fibre', 'fulfil':'fulfill','meter':'metre','savory':'savoury', 'ton':'tonnne','tire':'tyre'}
this dictionary wrote check values. can see degrading performance. pyenchant isn't available 64bit python. please me out. thank in advance.
okay, think know enough comments provide solution. function below allows choose either uk or replacement (it uses default, can of course flip that) , allows either perform minor hygiene on string.
import re ukus={'colour':'color','cheque':'check', 'programme':'program','grey':'gray', 'jewellery':'jewelery','aluminium':'aluminum', 'theater':'theatre','license':'licence','armour':'armor', 'artefact':'artifact','centre':'center', 'cypher':'cipher','disc':'disk','fibre':'fiber', 'fulfill':'fulfil','metre':'meter', 'savoury':'savory','tonne':'ton','tyre':'tire'} usuk={'color':'colour','check':'cheque', 'program':'programme','gray':'grey', 'jewelery':'jewellery','aluminum':'aluminium', 'theatre':'theater','licence':'license','armor':'armour', 'artifact':'artefact','center':'centre', 'cipher':'cypher','disk':'disc','fiber':'fibre', 'fulfil':'fulfill','meter':'metre','savory':'savoury', 'ton':'tonnne','tire':'tyre'} def str_wd_count(my_string, uk=false, hygiene=true): = not(uk) # if uk flag true, default uk version, else default version print "using "+uk*"uk"+us*"us"+" dictionary default words" # optional hygiene of non-alphanumeric characters pure word counting if hygiene: my_string = re.sub('[^ \d\w]',' ',my_string) my_string = re.sub(' {1,}',' ',my_string) # create list of unqique words in text ttl_wds = [ukus.get(w,w) if else usuk.get(w,w) w in my_string.upper().split(' ')] wd_counts = {} wd in ttl_wds: wd_counts[wd] = wd_counts.get(wd,0)+1 return wd_counts
as sample of use, consider string
str1 = 'the colour of dog not same color of tire, or tyre, can never tell 1 fulfill' # resulting sorted dict.items() default settings '[(the,5),(tire,2),(color,2),(of,2),(is,2),(fulfil,1),(never,1),(dog,1),(same,1),(it,1),(will,1),(i,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(one,1),(or,1)]' # resulting sorted dict.items() hygiene=false '[(the,5),(color,2),(of,2),(is,2),(fulfil,1),(never,1),(dog,1),(same,1),(tire,,1),(will,1),(i,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(one,1),(or,1),(it,1),(tyre,,1)]' # resulting sorted dict.items() uk swap, hygiene=true '[(the,5),(of,2),(is,2),(tyre,2),(colour,2),(which,1),(i,1),(never,1),(dog,1),(same,1),(or,1),(will,1),(as,1),(can,1),(tell,1),(not,1),(fulfill,1),(one,1),(it,1)]' # resulting sorted dict.items() uk swap, hygiene=false '[(the,5),(of,2),(is,2),(colour,2),(one,1),(i,1),(never,1),(dog,1),(same,1),(tire,,1),(will,1),(as,1),(can,1),(which,1),(tell,1),(not,1),(fulfill,1),(tyre,,1),(it,1),(or,1)]'
you can use resulting dictionary of word counts in way you'd like, , if need original string modifications added easy enough modify function return that.
Comments
Post a Comment