Data Analytics Tool for Educational Data


Fundamental Research Group, IIT Bombay


750 GB - that’s the whoopingly huge amount of data generated by an average EdX course per week.

Our objective was to create a research workbench for automating the extraction-transform-load pipeline, data analysis and visualization of student behavior from user logs generated on IITBombayX: IITBombay’s version of EdX.


It took us(a team of 5) to churn out a working prototype on Django. Firstly, it iterated through, to clean and organize loads of unstructured data via a combination of shell and python scripting. The internal data was then pushed onto different Hive tables. From here, Spark was used to make customized queries on the data, which were further processed to obtain useful inferences. These inferences were then visualised using D3.js.

The Backend


  1. Django web application running on a multinode Hadoop cluster
  2. Our research was carried forward by the FRG team and graduate students to create a personalized tutoring system and improve IITBombayX.
Visualizing student question-answering time


  1. Report URL
  2. Presentation. URL
  3. Certificate URL


P.S: To get selected for this internship I had to create teaching software in different categories(Physics, Chemistry, Maths, English) for the IITBombay EkShiksha initiative. The code lives here.