We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.

Bug 3838 - XML Sitemap generator
XML Sitemap generator
Product: OJS
Classification: Unclassified
Component: Plug-ins
To be determined
All All
: P1 enhancement
Assigned To: PKP Support
Depends on:
  Show dependency treegraph
Reported: 2008-10-27 14:56 PDT by Felipe Lav
Modified: 2009-05-22 09:45 PDT (History)
1 user (show)

See Also:
Version Reported In:
Also Affects:

Patch against OJS pre-2.3 CVS (13.87 KB, patch)
2008-11-20 11:35 PST, Matthew Crider
Details | Diff
Patch against OJS pre-2.3 CVS (7.22 KB, patch)
2008-12-01 10:08 PST, Matthew Crider
Details | Diff
Patch against OCS pre-2.3 CVS (9.11 KB, patch)
2008-12-01 10:19 PST, Matthew Crider
Details | Diff
Documentation (2.66 KB, text/plain)
2008-12-01 11:36 PST, Matthew Crider

Note You need to log in before you can comment on or make changes to this bug.
Description Felipe Lav 2008-10-27 14:56:11 PDT
I would like to see a plugin to generate XML sitemaps that comply with the Google Sitemap protocol (which, I think, is the same that's used by Yahoo! and some other search engines). This should be a nice SEO enhancement.

I know that in WordPress there's an excellent plugin to do this: http://wordpress.org/extend/plugins/google-sitemap-generator/

Maybe it could be an starting point in terms of code and design objectives
Comment 1 Matthew Crider 2008-11-20 11:35:24 PST
Created attachment 1143 [details]
Patch against OJS pre-2.3 CVS

Some comments:
-This might be amenable to a scheduled task, so the sitemap can be updated regularly
-Only fills in the <loc> tag, not the others (which are not really necessary, or are subjective)
-Scans all of the pages on the About This Journal->Sitemap page, with higher granularity on the About page, and also includes all published issues and articles, along with their abstract/galley view pages.

Needs to be backported to OCS and Harvester(?) once this is okayed.
Comment 2 Alec Smecher 2008-11-21 10:36:40 PST
Matt, I don't think this will get used if it requires manual intervention. I'd suggest shipping OJS with a sitemap index that refers to a dynamically-generated sitemap (i.e. that's generated on-the-fly by OJS). No need for it to be a plug-in.
Comment 3 Matthew Crider 2008-11-21 12:16:04 PST
Ok..  But what if people want to opt-out of having a sitemap--They would just delete the index file from OJS' root directory?
Comment 4 Alec Smecher 2008-11-21 12:26:32 PST
The site map under the "about" link cannot be disabled without modifying OJS, and I can't think of a good reason to add an option to disable it, so I'd suggest the same for this one.
Comment 5 Matthew Crider 2008-11-21 13:03:27 PST
My only concern is that the automatically created sitemap will be crawled, and the user might not want the crawler to use the sitemap (that is, to instead crawl the site the old-fashioned way), or would want to customize their sitemap (e.g. by adding the nodes that OJS won't produce), and the automatically generated sitemap would override their own.
Comment 6 Felipe Lav 2008-11-21 15:26:49 PST
What if this is set as an option in the Journal setup wizard?
Maybe in the Management section, just as a single checkbox: "Generate XML Sitemap?"
Comment 7 Matthew Crider 2008-11-21 15:30:43 PST
Agreed.  Or, just in config.inc.php, with a default of "Yes".
Comment 8 Matthew Crider 2008-12-01 10:08:50 PST
Created attachment 1171 [details]
Patch against OJS pre-2.3 CVS
Comment 9 Matthew Crider 2008-12-01 10:19:06 PST
Created attachment 1172 [details]
Patch against OCS pre-2.3 CVS
Comment 10 Matthew Crider 2008-12-01 11:36:02 PST
Created attachment 1173 [details]
Comment 11 Alec Smecher 2008-12-01 12:13:19 PST
Looks good, Matt. Have you tried submitting one of these maps to Google? I'm curious to see how it ranks and filters the large numbers of links that may result from a live journal.
Comment 12 Matthew Crider 2008-12-01 12:26:47 PST
I have, but it doesn't provide any in depth statistics.  It essentially lists all of the sitemaps in the sitemap index, says how many URLs are in each sitemap, and tells if you it is valid or not, and that's about it--They aren't in Google's index yet, so I'm not sure how it affects search results.
Comment 13 Alec Smecher 2008-12-08 08:46:27 PST
If possible, could you get one of these submitted to Google and tested? Once that's done, if it looks OK, go ahead and commit.
Comment 14 Matthew Crider 2008-12-09 17:54:19 PST
Alex, It might take some time for the site to get indexed--I submitted it yesterday and it still hasn't been indexed, and I've read that it could take up to 6 months!  I can give it some more time if you like though.
Comment 15 Alec Smecher 2008-12-09 18:01:51 PST
OK, let's just leave the entry open; if we get to release time and it still hasn't been indexed, we won't worry about it.
Comment 16 Matthew Crider 2008-12-19 10:34:37 PST
One of my installations has been indexed:

The descriptions aren't particularly useful--maybe we can develop some generic meta description tags for each of these pages?
Comment 17 Alec Smecher 2008-12-19 10:37:44 PST
Matt, do you have an example of the sitemap hierarchy getting indexed?
Comment 18 Matthew Crider 2008-12-19 10:40:33 PST
http://memescheme.net/pkp/ojs2-devel/index.php/index/sitemap is the sitemap index; in this case it only refers to one sitemap at http://memescheme.net/pkp/ojs2-devel/index.php/tj/sitemap
Comment 19 Alec Smecher 2008-12-19 10:46:48 PST
What I mean is -- does the hierarchy show up in search results? Google "apache", for example, and you'll see a bunch of deep links in the search results page.
Comment 20 Felipe Lav 2008-12-19 10:55:52 PST
(In reply to comment #19)

@Alec: As far as I know, sitemaps are not necessarily related to get that kind of listings in Google... I think that's so mysterious as pagerank (we know that there are a few known factors that influence it, but the formula it's unknown)
Comment 21 Matthew Crider 2008-12-19 10:58:17 PST
Here's an example: http://www.google.ca/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=Kkk&q=test+journal+memescheme.net%2Fpkp%2Fojs2-devel&btnG=Search&meta=

But I think Felipe is right, hierarchies are developed more on page rank, and (probably) can't be forced by creating a sitemap.
Comment 22 Felipe Lav 2008-12-19 11:03:55 PST
(In reply to comment #16)

I think that one option it's to create some generic meta descriptions...
another option would be to develop some sort of function that would take the
first... say, 250 words of the #content and display as them meta description
(maybe it would be a good idea to apply some kind of filters first, to strip
html tags and quotes)... and in the case of articles, they should be the first
words of the abstract (this is already used in "DC.Description", but it should
be duplicated for search engines); and of course, keywords as keywords
Comment 23 Matthew Crider 2009-05-22 09:45:01 PDT
Committed -- Documentation added into docs/README-SITEMAP.  Closing.