Thursday, 5 April 2012

Some first ideas about Piwik

Piwik is an open-source webanalytics tool that we are currently investigating as a way of collecting usage information of the ProF system.  Peter Bex has set up a test environment with ProF and Piwik and this morning we discussed the first findings.

Stabilizing the page URLs in ProF

In order for Piwik to be able to recognize pages, the way in which ProF sets up the navigational information within the url had to be stabilized such that the same page with the same settings always has the same url.

Activating Piwik and using custom variables: privacy aspects

We chose to invoke Piwik from within ProF by using the javascript-API of Piwik. It is possible to add custom variables in this procedure: either for the visitor (e.g. user id, university id, authorisation) or for the page. In our project we do need to know the visitor's identity because we want to link usage to learning outcomes, however we should store in Piwik the least possible information about the visitor to avoid privacy issues.  This means that we do not store the public student-id in Piwik, but an internal userid which can only be understood in connection with the result tables in ProF.
We decided also to mask the IP address of the visitor (only letting the two first numbers visible). We do not have to know the location, so we should not store it.

Preparation to process mining

It seems that one of the XML-reports that Piwik generates (detailed user report) fits nicely to the .XES format that the process mining tool Prom uses. In XES the information is organised in so-called traces that contain a chronological list of events. In the Piwik report the information is stored in so-called visits that contain  a chronological list of actions. This means that the Piwik report can be translated linearly into a .XES file. Maybe we even can add a module to Piwik that directly generates the .XES files, but in this project we first have to find out what information from Piwik really is necessary to do our process mining.

One issue: when is a visit started?

We discovered that when a user logs out from ProF and then logs in again using a different accout, Piwik still observes this as a single visit and the second account is  taken as the visitor. This is not a problem for our regular students, because they use Surffederatie to log in and this forces them the close the browser when they are finished. For other users we want ProF to force Piwik to start a new visit whenever a user logs in.

All by all, it looks as if Piwik is the correct tool to collect and store usage data from ProF in this project