Data Ecosystem Infrastructure LA4LD

I’m currently working on my graduate internship. My project is part of Marcel’s PhD research project about learning analytics, learning dashboards, and learning design.


One of the problems faced by faculties, course designers, and teachers is the lack of insight into the student experience. Faculties are rated based on two factors: the time it takes for students to get their degree and the student experience. The first factor is obvious and easy to measure, but student experience is harder.

Currently the faculty of ICT within Zuyd University of Applied Science has two ways of measuring student experience. The first is the Nationale Studenten Enquête (NSE), which is a national questionnaire filled in by students from all Universities (of Applied Science). The second is a questionnaire at the end of every course, these are faculty specific.

The results of both of these questionnaires come after the course has ended. The feedback toward the students on what is done with the results is also limited, which probably (based on anecdotal evidence) contributes to lower participation numbers. All in all, not enough data is available to improve student experience, and students are not seeing enough actionable feedback to be more engaged with the courses and the faculty.

The Project

To solve this problem Marcel has suggested creating a data ecosytem in which students, teachers and course designers participate to collect more (useful) data. Several projects have bin done and are currently going on to develop systems to collect data (for example the IoT projects). Another project is looking at ways to present the data gathered in a collection of dashboards.

My project fits neatly between all projects mentioned before. I will be developing an open-source infrastructure that can catch, clean, structure and store all data gathered while also delivering the underlying services needed to present the data to the users through dashboards.

Because this system sits at the core of the data ecosystem and must be able to support many different kinds of systems, both current and in the future, it is vital to make the entire infrastructure modular. During my internship process, a couple of modules will be developed.

The first module will be an end-point for collecting information on student attendance. This system could be an RFID reader on which students swipe their student-card. Another module connects to the digital learning environment, in this case Moodle and collects data on how students use the provided course material. A third module imports student results from a file. And finally a last module will collect and store data from questionnaires.

As said before it has to be possible to develop more modules later down the line, adding for example environmental variables from the classroom or students study room at home. Another example would be to track the view of students in the classroom, where they are looking on the slides, what draws their attention.

Any system that collects this amount of data, especially potentially sensitive private data, has to consider the privacy of it’s participants and thus the security of the system. A way as to be sought to ensure that no one but the student themselves are able to see their own personalized data. Teachers and course designers will only see anonymised group data. The general security of the system also has to be considered.


During the development of the system I will be using a couple of different methodologies.

Design based Research Process

This project will be using the design based research methodology1 The first three phases (problem definition & motivation, objectives of solution, and design & development) will be completed during the project, the fourth one (demonstration) will be started.

Systematic Mapping Review

At the start of the project, a systematic mapping review2 will be done to see in which fields data ecosystems have been suggested and maybe even deployed. It’s also interesting to know if any effect studies have been published in cases where data ecosystems have been deployed.

Scrum and GitHub

For managing the project I will be using a slightly modified version of Scrum. Slightly modified because I’m the only person in the development team. For tracking all Scrum related information I will be using the issues, pull requests, projects and wiki pages on the GitHub page for the project. I wanted to figure out how to automate the entire Scrum workflow on GitHub. I have made some decent progress on it, good enough for this project, but I still have to move “to-do” items manually to “in progress” and after that to “in review”. If you read my post on converting exam questions to flashcard, you know I’m lazy (hopefully in a good way) and I will be looking to automate as much as the workflow as possible, so maybe I find a solution to these two manual actions.

Test Driven Development

For the development of the system I’ll be using Test Driven Development (TDD). The basic idea of TDD is to make testing an integral part of the development cycle. By developing automated functional and/or integration tests first, then developing smaller unit tests, at first these tests should fail (it would be weird if they didn’t). Only after having done all that, you write just enough code to get the tests to pass (or at least progress to the next step). When you have some passing tests you can refactor (improve) the code while using the previously passing tests to make sure the program does not regress. This process is often called Red, Green, Refactor.

TDD - Red, Green, Refactor


My internship lasts half a year (20 school weeks). The first 3 weeks are spend on clearly defining the project, choosing the methodologies, and planning the phases of the project. Week 4 and 5 are used for requirements analysis and the systematic review. From week 6 until week 16 the system will be designed & developed in a couple of Scrum sprints. The last 4 weeks are used to prepare for the presentation at the end of the internship and to finish up the project in general.


The design & development phase consists of a number of sprints:

  1. Setup (software architecture & base functionality like logins, database connections, etc.) - 2 weeks

  2. Importing student results from file – 1 week

  3. Student attendance – 1 week

  4. Moodle/xAPI connection – 3 weeks

  5. MSLQ or other questionnaire connection – 1 week

  6. Admin panel – 2 weeks

  7. Wrapping up (extended testing, deployment considerations, etc.) - 2 weeks

What now?

In about 5 or 6 weeks I’ll be posting a status update on where I’m at with the project. Another 5 or 6 weeks after that I will present my results. Finally when I’m (almost) done with my internship I’ll write a post about my experiences.

The repository for this project can be found on GitHub