python - Cross-Lingual Word Sense Disambiguation -

i beginner in computer programming , completing essay on parallel corpora in word sense disambiguation. basically, intend show substituting sense word translation simplifies process of identifying meaning of ambiguous words. have word-aligned parallel corpus (europarl english-spanish) giza++, don't know output files. intention build classifier calculate probability of translation word given contextual features of tokens surround ambiguous word in source text. so, question is: how extract instances of ambiguous word parallel corpus aligned translation?

i have tried various scripts on python, these run on assumption 1) english , spanish texts in separate corpora , 2) english , spanish sentences share same indexes, not work. e.g.

def ambigu_word2(document, document2):     words = ['letter']     sentences in document:         tokens = word_tokenize(sentences)         item in tokens:             x = w_lemma.lemmatize(item)             w in words:                 if w == x in sentences:                     print (sentences, document2[document.index(sentences)]) print (ambigu_word2(raw1, raw2))

i grateful if provide guidance on matter.

Search This Blog

Living

python - Cross-Lingual Word Sense Disambiguation -

Comments

Post a Comment

Popular posts from this blog

elasticsearch python client - work with many nodes - how to work with sniffer -

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -