Bug 7531 - Add search path disallow to robots.txt
Add search path disallow to robots.txt
Status: NEW
Product: OJS
Classification: Unclassified
Component: General
2.4.x
All All
: P3 normal
Assigned To: PKP Support
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-31 13:14 PDT by James MacGregor
Modified: 2012-08-20 11:46 PDT (History)
2 users (show)

See Also:
Version Reported In:
Also Affects:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James MacGregor 2012-05-31 13:14:15 PDT
Disallow Google et. al from crawling search pages; should result in minimizing unnecessary load.
Comment 1 James MacGregor 2012-08-17 13:37:21 PDT
Opening up to the group as I'm not entirely sure what the best approach is here. 

From Jason (email shared with pkp-support on July 30 2012): 

Typically, with Disallow,  URLs that begin with slashes anchor the pattern to the beginning of the URL, so /search would mean that the path would need to start with that.   Wild cards are not part of the official robots.txt specification but some crawlers (including Google, MSN, and Yahoo) do support them.   Most sites use URL-rewriting to get the other edge cases.
Comment 2 James MacGregor 2012-08-17 13:38:06 PDT
Also reassigning to 2.4.0, though not against seeing it deferred (esp. if it needs testing).
Comment 3 Alec Smecher 2012-08-20 11:46:45 PDT
Unfortunately we can't use robots.txt to disallow search pages because the URLs are dynamic, i.e. we can't know in advance what they'll be in order to ship them. I suggest looking into a header-based alternative, if it's possible, or making sure search functions require a POST request rather than URL parameters. In any case, deferring until there's more time or this becomes a higher priority.