Persian Word Sense Disambiguation using LDA topic model

babak masoudi1 aboozar zandvakili2

1) Department of information technology, Payamenoor university(PNU),P.O.Box, 19395-3697 Tehran,I.R of Iran Email:
2) Department of Computer Engineering, College of Engineering, jiroft Branch, Islamic Azad University, jiroft Iran Email:

Publication : International Conference on Science and Engineering(icesconf.com)
Abstract :
The Word sense disambiguation is a prominent issue in natural language processing. In this paper, a model is proposed for Persian word sense disambiguation using extraction of new features. To generate this model two groups of features are utilized including words and signs accompanying ambiguous word as well as features derived using topic modeling schemes. A topic model is a probabilistic model for extracting abstract of topics which are included in documents of a corpuse. In the paper at hand unsupervised Latent Dirichlet Allocation method is exploited. Experimental results for four ambiguous popular Persian words extracted from research center of intelligent signal processing corpus, show a precision of 97%. It demonstrates the effect of this method on finding proper sense of words.
Keywords : Latent Dirichlet Allocation multi-sense word sense disambiguation topic modeling