Mixed Script Information Retrieval


Zine Research Lab, NIT Jaipur


Language Identification(LI), Named Entity Recognition(NER) and subclassification in a limited corpus having english + 8 indic languages.


Hierarchical classification model, combining distinct supervised classifiers for LI and NER with semi-supervised search engine ranking based correction and Wikipedia-based keyword scoring for named-entity subclassification.


  1. Weighed F-Score of 0.8082.
  2. 2nd amongst 10 participating teams.
  3. Django application implementing the above model.
HLine Web App


Saatvik Shah,Vaibhav Jain,Sarthak Jain,Anshul Mittal,Jatin Verma,Shubham Tripathi and Dr. Rajesh Kumar. “Hierarchical classification for multilingual language identification and named entity recognition.” Proceedings of the Forum for Information Retrieval Evaluation, 2015.(In Press)URL


Web Application

Running live at this link. Currently running on a free heroku plan - hence a bit slow :/.