Fundamental Research Group, IIT Bombay
750 GB - that’s the whoopingly huge amount of data generated by an average EdX course per week.
Our objective was to create a research workbench for automating the extraction-transform-load pipeline, data analysis and visualization of student behavior from user logs generated on IITBombayX: IITBombay’s version of EdX.
It took us(a team of 5) to churn out a working prototype on Django. Firstly, it iterated through, to clean and organize loads of unstructured data via a combination of shell and python scripting. The internal data was then pushed onto different Hive tables. From here, Spark was used to make customized queries on the data, which were further processed to obtain useful inferences. These inferences were then visualised using D3.js.
- Django web application running on a multinode Hadoop cluster
- Our research was carried forward by the FRG team and graduate students to create a personalized tutoring system and improve IITBombayX.