Bug 7166 - Investigate web cache file name generation algorithm
Investigate web cache file name generation algorithm
Status: RESOLVED FIXED
Product: OJS
Classification: Unclassified
Component: General
2.4.2
All All
: P3 normal
Assigned To: Alec Smecher
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-21 17:10 PST by Matthew Crider
Modified: 2013-01-02 13:45 PST (History)
2 users (show)

See Also:
Version Reported In:
Also Affects:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Crider 2012-02-21 17:10:58 PST
It seems like the method used to generate web cache files is not working properly -- one journal using the feature has generated over 250k cache files.
Comment 1 Matthew Crider 2012-02-23 11:27:22 PST
From what I can tell, the caching is working fine and can't be optimized any further (but I'm hardly an expert with this code).  The only problem I see is that expired cache files just hang around until they are regenerated, which may never happen.  This might be a good candidate for a scheduled task (periodic cleanup of cache files older than web_cache_hours).  Though I think only megajournals (like the one I was working with) will really see major disk usage from web caching.
Comment 2 Alec Smecher 2012-02-23 11:46:05 PST
Even for a megajournal, it does seem like there were an unreasonable number of files there.

In any case, I think a periodic cleanup as you describe would be helpful.
Comment 3 Alec Smecher 2013-01-02 12:01:30 PST
(Started logging the bd install in /tmp/wc.log; see lib/pkp/classes/core/PKPPageRouter.inc.php for an error_log statement. Will check back after some time has elapsed to log weird boundary cases. Suspect there are 404s getting logged or something similar that causes the potential URL space to be limitless.)
Comment 4 Alec Smecher 2013-01-02 13:30:02 PST
Added note for web_cache CRON setup
https://github.com/pkp/ojs/commit/76de03f284fb1fb3dd69282e1ecaf840e0ad629f
Comment 5 Alec Smecher 2013-01-02 13:30:02 PST
Added note for web_cache CRON setup
https://github.com/pkp/ojs/commit/8858f0686a6bef69d938ce3a7b3fcf2ef34794ed
Comment 6 Alec Smecher 2013-01-02 13:40:30 PST
This looks to be working OK to me too. I'm installing in the crontabs for journals that use webcache a line that reads:

@daily find ~/ojs/cache -maxdepth 1 -name wc-\*.html -mtime +1 -exec rm "{}" ";"

This will remove files older than 24 hours on a daily basis and solve the current cache maintenance headaches.

I've installed this in all accounts using web_cache on lib-journals[x].

I also added a note in config.TEMPLATE.inc.php to help others with this config.
Comment 10 Alec Smecher 2013-01-02 13:45:02 PST
Added note for web_cache CRON setup
https://github.com/pkp/omp/commit/ae699c0b9df3f476b89a2a847e89a7b51d0bf8d8