Fix for charset bugs found at some archives

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Fix for charset bugs found at some archives

Postby ozp » Wed Jun 13, 2007 2:55 pm

I found many archives (most using OJS) that have bugs related to charset and html tags.

We made a filter for those errors:

At plugins/preprocessors/regex/

add this to preprocessEntry() function

Code: Select all

   $fieldsToChange = array('title','description','creator','rights', 'type', 'source', 'subject');

      $value = strip_tags($value);


      if(in_array($field->getName(), $fieldsToChange)) {

         foreach ($_SERVER['argv'] as $arg) switch ($arg) {

         case 'encode':

               $value = utf8_decode($value);




         $value = html_entity_decode($value, null, 'UTF-8');





when you call the harvest.php, you have to include "enconde" after the command

Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Postby asmecher » Wed Jun 13, 2007 3:57 pm

Hi ozp,

Thanks -- FYI, we're working on a general solution for problems with illegal characters for the Harvester, OJS 2.x, and OCS 2.x. It's still under development, but should be included the next release of each.

Alec Smecher
Public Knowledge Project Team
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada

Return to “Open Harvester Systems Support and Development”

Who is online

Users browsing this forum: No registered users and 1 guest