ProfAnalytics

Friday 28 September 2012

Profanalytics Results

The analysis of the results has finished and we are wrapping up the project. Over the summer holiday, the data have been analyzed and we have had an expert meeting in which we discussed the results.

As indicated in previous blogs, we intended to study the data using Process Mining. We were able to transform the Piwik data into the XES format in several ways and to read the data into PROM6.1. The next job was to figure out which of the many analysis tools in PROM were the most suited for our data. After reading the book "Process Mining" by Wil van der Aalst (ISBN 978-3-642-19344-6), we decided for Fuzzy Mining (Günther and van der Aalst, 2007). This technique makes it possible to abstract processes from spaghetti-like traces, like the ones we produced in ProF. We generated datasets for three types of students: depending on their results at the progress test. However, the fuzzy miner was not able to generate a meaningful process from this. Wat we saw were pictures like:

These pictures were clear representations of the navigational structure of the ProF application - the arrows indicating roughly the preferred navigation routes. In short, this did help us not so much in understanding the data.

We then decided to switch back to R and analyze the data classically by counting, neglecting the order of the pages viewed.This, fortunately, produced more useful results. To summarize the results, the most important finding was that students with poor results (blue below) used ProF less often than others (red and green), but when they used ProF, they tended to stay longer in the application. They also looked a bit longer into score details.

Other findings were that too many students only watched a single page in ProF, that students did not use the cumulative view of their results too often, and that they did not study the detailed scoretypes enough.

During an expert meeting we discussed the findings and came up with a number of advices for improvement. The first one was that we need to change the opening page of ProF, so that students are stimulated to look further. Second, we need to develop instruction materials to teach students how to make better use of the possibilities of ProF.

The story does not end here. We intend to do a follow-up research in which we ask students their opinion of using the tool and its effectiveness. In the meantime we are continuing harvesting data using Piwik so that longitudinal studies will become able.

Wednesday 11 July 2012

Data colllection is ended, first analyses ongoing

The data collection using Piwik has been successful. We used a perl-script to download a separate Piwik logfile per week (for security). This produced on average 4 Mb data per week (9 weeks in May-June). Here is a sniplet of perl-code to download the piwik data:

$auth = "XXXSECRETXXX";
$period = "range";
$date = $date1.",".$date2;
$url = "https://oururl/piwik/index.php?module=API&method=Live.getLastVisitsDetails&format=XML&idSite=".$siteid. "&period=".$period."&date=".$date."&expanded=1&filter_limit=1000&token_auth=".$auth;
my $ua = new LWP::UserAgent;
$ua->timeout(120);
my $request = new HTTP::Request('GET', $url);
my $response = $ua->request($request);
my $content = $response->content();
print MYFILE $content;

Before we go into the more fancy analysis methods, we first study our data in a more traditional way. Again using Perl, we collect the Piwik data, link it to the outcome of the progress test that was administered in May and produced two comma-separated reports: one with data per student and one with data per session.

Per student we collected the number of sessions, the result of the progress test and the university, year and program for all students that participated in the progress test. This allows us to count the number of students that used ProF.

Per session we collected the number of pages, the duration of the visit and the number of pages with specific settings (such as “cumulative score”, “longitudinal view” etc.). We also stored per session the test result and the student’s university, year and program. This allows us to characterize the sessions and study distributions over the sessions.

These two comma-separated files can be analyzed with many standard tools. We started using RapidMiner for analyzing the student-report data. Although it is quite possible to do all the filtering and aggregation we need, it is rather clumsy to change the filtering and collect the results in a report every time you want a different selection. Therefore we decided to switch to Project R. This is a script-based environment and changing a script to get a different selection is much easier than the interactive flowchart of RapidMiner. It is true that Rapidminer offers a nice interactive module to inspect data visually, but the quality and flexibility of graphs in R is much better, and honestly, producing different graphs is just as easy.

The analysis shows that the usage of the ProF system is a little higher than we expected. It is clearly visible that when students are stimulated to use ProF, for instance, because they have to include an analysis in their portfolio, the usage of ProF is much higher.

We also see differences in the use between the students with insufficient grade and students with sufficient or good grades. Not only in the number of times ProF is used (good students use ProF the most), but also in the number of details they study.

The perl-scripts that collects our data also produces XES-files for process mining in Prom. Now we have analyzed the data, we can decide how to filter and preprocess the data in a meaningful way for Prom.

Tuesday 8 May 2012

Piwik is running!

Changes made to the ProF application

The ProF application has been adapted so that it uses Piwik to track the behaviour of users. The adaptations we made were the following:

In the configuration of a ProF instance we store the URL for the Piwik tracking and the site id. This makes it possible to set up a separate Piwik per ProF instance. Only if this configuration is made, ProF uses Piwik to track.

We took care of putting all get-parameters needed for navigation in the same order on all pages. Piwik uses the complete URL to recognize pages.

We defined a set of user variables that are sent to Piwik at each tracking event: the user id, the user authorisation (student, staff), the institution, and some more.

We forced Piwik to use new cookies every day per user in order to be able to separate sessions of different users on the same machine.

Security and privacy enhancement

All communication with our installation of Piwik is through a secure HTTPS connection. Moreover, the piwik console and API are only reachable from a restricted set of IP addressess. The tracking API must be accessible to all users of ProF. The user id we send to Piwik is not recognisable for anybody that would still be able to eavesdrop: it is an internal database id. Only with the help of the original database, the id can be translated to a student number. We implemented a possibility to set an opt-out flag for each separate user.

Logging

We use the API of Piwik to download a log-file in XML format. Piwik offers a nice dashboard and many different reports, but for our analysis purposes we are going to use the setting "&method=Live.getLastVisitsDetails&format=XML". This format can be translated one-to-one into the XES format needed for process mining in PROM.

Thursday 5 April 2012

Some first ideas about Piwik

Piwik is an open-source webanalytics tool that we are currently investigating as a way of collecting usage information of the ProF system. Peter Bex has set up a test environment with ProF and Piwik and this morning we discussed the first findings.

Stabilizing the page URLs in ProF

In order for Piwik to be able to recognize pages, the way in which ProF sets up the navigational information within the url had to be stabilized such that the same page with the same settings always has the same url.

Activating Piwik and using custom variables: privacy aspects

We chose to invoke Piwik from within ProF by using the javascript-API of Piwik. It is possible to add custom variables in this procedure: either for the visitor (e.g. user id, university id, authorisation) or for the page. In our project we do need to know the visitor's identity because we want to link usage to learning outcomes, however we should store in Piwik the least possible information about the visitor to avoid privacy issues. This means that we do not store the public student-id in Piwik, but an internal userid which can only be understood in connection with the result tables in ProF.
We decided also to mask the IP address of the visitor (only letting the two first numbers visible). We do not have to know the location, so we should not store it.

Preparation to process mining

It seems that one of the XML-reports that Piwik generates (detailed user report) fits nicely to the .XES format that the process mining tool Prom uses. In XES the information is organised in so-called traces that contain a chronological list of events. In the Piwik report the information is stored in so-called visits that contain a chronological list of actions. This means that the Piwik report can be translated linearly into a .XES file. Maybe we even can add a module to Piwik that directly generates the .XES files, but in this project we first have to find out what information from Piwik really is necessary to do our process mining.

One issue: when is a visit started?

We discovered that when a user logs out from ProF and then logs in again using a different accout, Piwik still observes this as a single visit and the second account is taken as the visitor. This is not a problem for our regular students, because they use Surffederatie to log in and this forces them the close the browser when they are finished. For other users we want ProF to force Piwik to start a new visit whenever a user logs in.

All by all, it looks as if Piwik is the correct tool to collect and store usage data from ProF in this project

Thursday 22 March 2012

Brainstorm meeting, 22 March 2012

At 22 March a brainstorm meeting was organised with all team members. The project team is quite multidisciplinary, which was appreciated by all participants:

Jeroen Donkers, Jean van Berlo: Knowledge engineer, data scientist
Arno Muijtjens: statistician, psychometrics
Danielle Verstegen and Bill Wrigley: educational scientist
Guido Tans: study advisor
Robert Peperkamp, Peter Bex, Eric Sol: developers

(Erik Sol and Peter Bex are hired from CaseBuilders. They developed the user interface of the ProF system and will, during this project, make some changes to the system to collect usage data.)

Expected usage patterns

During the meeting we discussed what usage patterns could exist for students and what factors can influence those. We distinguished the following usage patterns:

Quick orientation (look at the main page and browse through a few details)
Study the profiles (look at the momentary scores for all categories or disciplines at different points of time).
Search for issues (systematically browse through all categories and disciplines)
Look at test-making strategy (look at different score-types: correct-score, questionmark score)
Look for knowledge development (use the cumulative scores, compare different background populations).

Factors that could influence the use of these patterns are:

What is at stake for the student. (For some students progress test is a bottle neck at the end of the bachelor and threatens to prevents them to enter the master. Those students will use ProF much more seriously than others.) This factor is important, and can be computed from previous progress test results. The rules, however, differ per university.
The study year of the student
The level of the student (we take the score of the student at the progress test)
The medical school

A complication factor is that sometimes a student logs on to the system together with a study advisor or mentor. In this case, usage will be quite different from normal usage. It will be difficult to filter out these occasions.

Next to looking at usage patterns during a session, it is interesting to know how often and when students use ProF.

We discusses also that we could also look at the usage patterns of staff members.

Finally, we agreed that the usage of the accompanying website at prof.ivtg.nl. I worthwhile to investigate, using standard web analysis. The problem then is that this website is public so we cannot link it to individual students.

Technical issues

With the technical subgroup (Robert, Peter, Erik and Jeroen) we discussed how we could log the usage in such way that these patterns can be studied. For this, we have to know which student during which session views at what point of time what pages with what options selected.

Possible options are to use the web access log, to create a special table in the ProF database, to use CAM schema. We decided, however, to look into the open source web analytics tool Piwik. It appears feasible to link the ProF application to Piwik. The advantages are that the logdata is separate from the ProF application, that Piwik allows for user-defined data (such as the user-id we need for linking to progress test data), that Piwik offers a dashboard and API that we could use, in future, to visualize patterns to students and staff. We decided to perform a desk research and to discuss the findings in the first week of April.

Future directions

During the meeting we also looked forward to future directions. An important concern is that usage data alone is not enough to know why a student is using ProF in a certain way. Interviews and observations (with think-aloud protocols) with students and study advisers would be needed to find out more about this. Moreover, to measure the effect of using ProF in a certain way needs longitudinal study. The patterns that we hope to find in this project might help us to set up detailed research projects.

The results of this project could also lead to changes in the ProF system itself. If it appears that some pages are never used, we might decide to remove those. It could also mean that the instruction of using ProF has to be improved.

We also speculated at how results might be presented to users. For now it is very unclear how students could profit themselves, but Guido indicated that he would very much like to see the ProF usage pattern of students that visit him about progress test problems.

Welcome

Welcome to our project blog. ProfAnalytics is a project of Maastricht Unversity funded by the Surf innovation program Learning Analytics. The project runs from March till October 2012.

In this project we relate usage data of the already existing feedback system for progress testing, ProF, with the results that medical students obtain in these progress tests. We want to distill whether a certain type of usage of the system is related to better or worse results in the progress test. This knowledge can lead to advice to students, study advisers and academic organisations.

To be able to achieve this, we have to change the ProF system in such way that it can collect the required data. Next, data will be collected during a period of two months. During the analysis phase, techniques from data mining and statistics are applied to search for possible links between usage patterns and results. Finally, the patterns and models found will be translated, whenever possible, into practical advises. Follow-up projects will be needed to further investigate how these patterns can be visualized interactively to students and teachers.

The project team consists of: Jeroen Donkers, Arno Muijtjens, Jean van Berlo, Daniëlle Verstegen, Robert Peperkamp, Guido Tans, Bill Wrigley, Eric Sol, and Peter Bex.

A screenshot of the Prof system.