python - Cross-Lingual Word Sense Disambiguation -
i beginner in computer programming , completing essay on parallel corpora in word sense disambiguation. basically, intend show substituting sense word translation simplifies process of identifying meaning of ambiguous words. have word-aligned parallel corpus (europarl english-spanish) giza++, don't know output files. intention build classifier calculate probability of translation word given contextual features of tokens surround ambiguous word in source text. so, question is: how extract instances of ambiguous word parallel corpus aligned translation?
i have tried various scripts on python, these run on assumption 1) english , spanish texts in separate corpora , 2) english , spanish sentences share same indexes, not work. e.g.
def ambigu_word2(document, document2): words = ['letter'] sentences in document: tokens = word_tokenize(sentences) item in tokens: x = w_lemma.lemmatize(item) w in words: if w == x in sentences: print (sentences, document2[document.index(sentences)]) print (ambigu_word2(raw1, raw2))
i grateful if provide guidance on matter.
Comments
Post a Comment