Mixed Script Information Retrieval

Where

Zine Research Lab, NIT Jaipur

Objective

Language Identification(LI), Named Entity Recognition(NER) and subclassification in a limited corpus having english + 8 indic languages.

Methodology

Hierarchical classification model, combining distinct supervised classifiers for LI and NER with semi-supervised search engine ranking based correction and Wikipedia-based keyword scoring for named-entity subclassification.

Results

  1. Weighed F-Score of 0.8082.
  2. 2nd amongst 10 participating teams.
  3. Django application implementing the above model.
HLine Web App

Publication

Saatvik Shah,Vaibhav Jain,Sarthak Jain,Anshul Mittal,Jatin Verma,Shubham Tripathi and Dr. Rajesh Kumar. “Hierarchical classification for multilingual language identification and named entity recognition.” Proceedings of the Forum for Information Retrieval Evaluation, 2015.(In Press)URL

Code@Github

Web Application

Running live at this link. Currently running on a free heroku plan - hence a bit slow :/.