Bug 6835 - published_articles 'views' count inconsistent
published_articles 'views' count inconsistent
Status: RESOLVED WONTFIX
Product: OJS
Classification: Unclassified
Component: General
2.4.x
All All
: P3 normal
Assigned To: PKP Support
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-22 12:03 PDT by James MacGregor
Modified: 2014-02-12 08:56 PST (History)
1 user (show)

See Also:
Version Reported In:
Also Affects:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James MacGregor 2011-08-22 12:03:53 PDT
It appears that the views count in the published_articles table is updated inconsistently. In some cases in my local install, I'm seeing a single galley view add two increments to an article view count, and in other cases I'm not seeing any increment at all. 

Also, there doesn't seem to be any relation between what you see in published_articles.views and article_galleys.views, even taking into consideration that published_articles.views *should* include abstract counts, while article_galleys.views *doesn't* include abstract counts and separates views distinctly by galley type. 

I've been doing a bit of testing on nepjol for this. Here are the results of two queries: 

SELECT article_id, views FROM article_galleys ORDER BY views DESC LIMIT 10;

1362	78926
290	18900
2088	17312
247	12889
34	12222
484	11277
491	9960
2693	9684
37	9676
43	9562

SELECT article_id, views FROM published_articles ORDER BY views DESC LIMIT 10;

4100	19830
4099	9718
1029	6657
1031	5164
159	3972
1363	3426
596	3388
1034	2163
1618	2156
34	2059

Note a few things: 1) Nepjol only publishes PDFs, so you don't have to worry about adding different galley view counts together to approximate a total close to the published article view count; 2) even though IIRC the published article view count should include abstract views, in this case the view counts are actually LOWER across the board for published_galleys; 3) the article_ids don't match up, making me think that one or the other view count algorithm is markedly inconsistent, not just incorrect -- if it were merely incorrect (eg. inflated or deflated one way or another), you'd probably see a similar article_id order. 

From what little testing I've done, my suspicion is that the article_galleys.views counts are correct, but that something's borked with the published_articles.views count algorithm. I can provide a DB dump if need be; alternatively, nepjol is probably a good place to take a look.
Comment 1 Alec Smecher 2011-08-22 12:54:05 PDT
James, I think NepJOL is going to be too confusing a data source for chasing down this bug. The install was updated from an older version, and IIRC there was a problem in OJS < 2.3.3-2 with counts being recorded twice; that bug will still be reflected in view counts. I also wouldn't expect any correlation between high numbers of abstract views and high numbers of PDF views; a high search engine ranking of one over the other would explain widely diverging numbers, and therefore the different order of article IDs you're getting from your queries (since you're sorting by view counts).

If possible, I'd suggest doing some testing with a less convoluted source of data to make sure you're not too bogged down in legacy stuff.
Comment 2 James MacGregor 2011-08-23 22:09:25 PDT
(In reply to comment #1)
> James, I think NepJOL is going to be too confusing a data source for chasing
> down this bug. The install was updated from an older version, and IIRC there
> was a problem in OJS < 2.3.3-2 with counts being recorded twice; that bug will
> still be reflected in view counts. I also wouldn't expect any correlation
> between high numbers of abstract views and high numbers of PDF views; a high
> search engine ranking of one over the other would explain widely diverging
> numbers, and therefore the different order of article IDs you're getting from
> your queries (since you're sorting by view counts).
> 
> If possible, I'd suggest doing some testing with a less convoluted source of
> data to make sure you're not too bogged down in legacy stuff.

Points taken, Alec -- I just tried from a totally clean install, using an imported issue (from our demo journal) as data. It appears that the public_galleys.views field is only incremented when an abstract is viewed -- before I continue testing any further, could you maybe give me a pointer as to where this behaviour is controlled in the code, and possibly whether the public_galleys.views count is meant to only count abstract views (although with a pointer to the code I should be able to figure that out)?
Comment 3 Alec Smecher 2011-08-23 22:10:31 PDT
...public_galleys?
Comment 4 James MacGregor 2011-08-23 22:25:57 PDT
(In reply to comment #3)
> ...public_galleys?

What the? Yeah, I meant published_articles.
Comment 5 Alec Smecher 2011-08-23 22:27:35 PDT
That's right, published_articles.views is only supposed to count abstract views.
Comment 6 Alec Smecher 2012-03-12 12:55:40 PDT
Needs discussion.
Comment 7 Alec Smecher 2014-02-12 08:56:34 PST
Obsolete -- replaced by the stats overhaul.