PKP Bugzilla – Bug 3838
XML Sitemap generator
Last modified: 2009-05-22 09:45:01 PDT
We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.
I would like to see a plugin to generate XML sitemaps that comply with the Google Sitemap protocol (which, I think, is the same that's used by Yahoo! and some other search engines). This should be a nice SEO enhancement.
I know that in WordPress there's an excellent plugin to do this: http://wordpress.org/extend/plugins/google-sitemap-generator/
Maybe it could be an starting point in terms of code and design objectives
Created attachment 1143 [details]
Patch against OJS pre-2.3 CVS
-This might be amenable to a scheduled task, so the sitemap can be updated regularly
-Only fills in the <loc> tag, not the others (which are not really necessary, or are subjective)
-Scans all of the pages on the About This Journal->Sitemap page, with higher granularity on the About page, and also includes all published issues and articles, along with their abstract/galley view pages.
Needs to be backported to OCS and Harvester(?) once this is okayed.
Matt, I don't think this will get used if it requires manual intervention. I'd suggest shipping OJS with a sitemap index that refers to a dynamically-generated sitemap (i.e. that's generated on-the-fly by OJS). No need for it to be a plug-in.
Ok.. But what if people want to opt-out of having a sitemap--They would just delete the index file from OJS' root directory?
The site map under the "about" link cannot be disabled without modifying OJS, and I can't think of a good reason to add an option to disable it, so I'd suggest the same for this one.
My only concern is that the automatically created sitemap will be crawled, and the user might not want the crawler to use the sitemap (that is, to instead crawl the site the old-fashioned way), or would want to customize their sitemap (e.g. by adding the nodes that OJS won't produce), and the automatically generated sitemap would override their own.
What if this is set as an option in the Journal setup wizard?
Maybe in the Management section, just as a single checkbox: "Generate XML Sitemap?"
Agreed. Or, just in config.inc.php, with a default of "Yes".
Created attachment 1171 [details]
Patch against OJS pre-2.3 CVS
Created attachment 1172 [details]
Patch against OCS pre-2.3 CVS
Created attachment 1173 [details]
Looks good, Matt. Have you tried submitting one of these maps to Google? I'm curious to see how it ranks and filters the large numbers of links that may result from a live journal.
I have, but it doesn't provide any in depth statistics. It essentially lists all of the sitemaps in the sitemap index, says how many URLs are in each sitemap, and tells if you it is valid or not, and that's about it--They aren't in Google's index yet, so I'm not sure how it affects search results.
If possible, could you get one of these submitted to Google and tested? Once that's done, if it looks OK, go ahead and commit.
Alex, It might take some time for the site to get indexed--I submitted it yesterday and it still hasn't been indexed, and I've read that it could take up to 6 months! I can give it some more time if you like though.
OK, let's just leave the entry open; if we get to release time and it still hasn't been indexed, we won't worry about it.
One of my installations has been indexed:
The descriptions aren't particularly useful--maybe we can develop some generic meta description tags for each of these pages?
Matt, do you have an example of the sitemap hierarchy getting indexed?
http://memescheme.net/pkp/ojs2-devel/index.php/index/sitemap is the sitemap index; in this case it only refers to one sitemap at http://memescheme.net/pkp/ojs2-devel/index.php/tj/sitemap
What I mean is -- does the hierarchy show up in search results? Google "apache", for example, and you'll see a bunch of deep links in the search results page.
(In reply to comment #19)
@Alec: As far as I know, sitemaps are not necessarily related to get that kind of listings in Google... I think that's so mysterious as pagerank (we know that there are a few known factors that influence it, but the formula it's unknown)
Here's an example: http://www.google.ca/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=Kkk&q=test+journal+memescheme.net%2Fpkp%2Fojs2-devel&btnG=Search&meta=
But I think Felipe is right, hierarchies are developed more on page rank, and (probably) can't be forced by creating a sitemap.
(In reply to comment #16)
I think that one option it's to create some generic meta descriptions...
another option would be to develop some sort of function that would take the
first... say, 250 words of the #content and display as them meta description
(maybe it would be a good idea to apply some kind of filters first, to strip
html tags and quotes)... and in the case of articles, they should be the first
words of the abstract (this is already used in "DC.Description", but it should
be duplicated for search engines); and of course, keywords as keywords
Committed -- Documentation added into docs/README-SITEMAP. Closing.