We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.

Bug 5236 - Update OJS "Visits and Usage" statistics tracking and reporting
Update OJS "Visits and Usage" statistics tracking and reporting
Product: OJS
Classification: Unclassified
Component: General
To be determined
PC Mac OS X 10.3
: P5 enhancement
Assigned To: PKP Support
Depends on:
  Show dependency treegraph
Reported: 2010-03-18 14:33 PDT by James MacGregor
Modified: 2013-05-29 16:19 PDT (History)
5 users (show)

See Also:
Version Reported In:
Also Affects:


Note You need to log in before you can comment on or make changes to this bug.
Description James MacGregor 2010-03-18 14:33:32 PDT
The following bug report is the product of my and Andrea's statistics work for the Synergies Statistics Working Group [1]; conversations with Colin Prince, Juan Pablo, Alec, Sioux at INASP, and many others. I will be sending the URL for this report out to interested groups for further comments. Thanks to everyone who has commented to date, and who has remained patient while this report has been developed. 

OJS needs a more comprehensive statistics collection and analysis + report system for site visits and article usage. I see the issue involving a set of three sub-issues: information storage (what is stored and where); information analysis/metrics (how the information is used); and information representation (how the information is displayed. 

1. Information storage. 

OJS should store the following information at a minimum: 

- Visitor IP address/geographic location;
- text of search queries (both internal using OJS' search function, and external from eg. Google);
- number of abstract/metadata (record) views;
- number of full record (galley) views;

Some other possibly valuable information, off the top of my head: 
- is the visitor logged in;
- is the visitor a subscriber (both institutional and individual);
- session length;

And there's probably more. Comments here are especially welcome. 

This information should be fully sortable, ie. it should be possible to say that article X is popular in region Y, or that search keys a, b, and c most often led to full-text views. 

There has been a lot of discussion about how OJS should store or otherwise track this information. The main suggestions so far include 

a) querying server logs (if they exist) to retrieve relevant information:
- this may not be a good option for users who don't have easy server log access (eg. GoDaddy customers);
- this may be resource intensive if we need to do large queries over large datespans, unless we store compiled results occasionally, which isn't all that different than b) or c). 

b) creating an OJS log of its own to write to:
- we've already done this with COUNTER log, and ran into size problems. We'd have to provide at the very least documentation/support for log file rotation/management. 

c) storing the information in the database:
- I don't know how resource intensive this would be compared to the log file. Colin suggested we might be able to keep monthly "buckets" for information and rotate that way, since this report only needs to be date-limited by month (see below).

Further notes on storage: 

- this aspect of the system must be extensible so that other statistical information can be tracked if and when necessary.
- provisions should be taken for extraneous (bot, spam, etc.) traffic. 
- incidentally, COUNTER stats could probably eventually be culled from this pool, as could article view counts as seen in various parts of the site. 

2. Information Analysis/Metrics

Ideally, the Journal Manager should be able to sort and represent information by type against type; that is sort by date/time; or by article; or by IP address; etc. 

This information should be available in monthly and yearly time blocks: my and Andrea's survey findings indicate that Journal Managers are primarily interested in monthly stats, and almost entirely uninterested in daily. Those interested in a daily level of reporting can always use AWStats, Google Analytics, etc. 

3. Information Representation

This information should be available in CSV format at the very least. In order of preference, other formats should include: 

- XML (integrated with SUSHI)

Again, reports should be available on a monthly and/or yearly basis. 

This is a big topic, and I myself will probably have more to add. Further comments welcome!

[1] Executive Summary of WG: https://secur.erudit.org/confluence/download/attachments/425999/statisticsExecutiveSummary-1.0.2.doc?version=1&modificationDate=1264207866000
Comment 1 ushasharma84 2010-11-19 02:20:48 PST
We also want to track the kind of users who are downloading our journals. we want to capture their biography and country.so it would be great if taht kind of reporting is possible.
Comment 2 Yan Han 2012-05-10 11:06:46 PDT
The University of Arizona also would like to have this features. 

Over the past few years, We have subscribed institutions/libraries/users requesting their usage statistics. The feature is important for subscribed institutions/libraries to evaluate use and ROI.
Comment 3 Alec Smecher 2013-05-29 16:19:10 PDT
Superseded by stats work for OJS 3.0.