PKP Bugzilla – Bug 7531
Add search path disallow to robots.txt
Last modified: 2012-08-20 11:46:45 PDT
Disallow Google et. al from crawling search pages; should result in minimizing unnecessary load.
Opening up to the group as I'm not entirely sure what the best approach is here.
From Jason (email shared with pkp-support on July 30 2012):
Typically, with Disallow, URLs that begin with slashes anchor the pattern to the beginning of the URL, so /search would mean that the path would need to start with that. Wild cards are not part of the official robots.txt specification but some crawlers (including Google, MSN, and Yahoo) do support them. Most sites use URL-rewriting to get the other edge cases.
Also reassigning to 2.4.0, though not against seeing it deferred (esp. if it needs testing).
Unfortunately we can't use robots.txt to disallow search pages because the URLs are dynamic, i.e. we can't know in advance what they'll be in order to ship them. I suggest looking into a header-based alternative, if it's possible, or making sure search functions require a POST request rather than URL parameters. In any case, deferring until there's more time or this becomes a higher priority.