OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



getting an Invalid character error...

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

getting an Invalid character error...

Postby mikel » Thu Mar 04, 2010 3:02 pm

hi pkp team.

i'm getting an "Invalid character" error harvesting some brazilian an catalan repositories, e.g. http://ojs.c3sl.ufpr.br/ojs2/index.php/ ... fix=oai_dc

i thought it might be database character set and collation so i changed them to utf8 and utf8_unicode but still the same.
In my config.ini.php all my charsets are set to utf8.

from the xml response i found out it has a lot of html tags and entities and thought this could be the reason, so i created some records with html tags in a local dspace an harvested them, but all worked well.

i also validated the response using xmllint and got no validation errors.

i'm using

harvester 2.3.0
php 5.2.6
mysql 5.0.75

any suggestion :?:
mikel
 
Posts: 1
Joined: Thu Mar 04, 2010 2:03 pm

Re: getting an Invalid character error...

Postby asmecher » Thu Mar 11, 2010 8:21 pm

Hi mikel,

That error message is coming from PHP's XML parser, so there must be an invalid character or character set configuration somewhere... If you like, post the OAI URL and I'll see if I can find it.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Mon May 03, 2010 7:52 am

I have the same problem when harvester
http://eprints.zu.edu.ua/cgi/oai2
* Invalid character

Addition information, if add line in
XMLParser.inc.php
$this->addError(xml_get_current_line_number($parser));
add return
line 290

Current Version Harvester 2.3.0

In previous version this problem not found
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Mon May 03, 2010 9:05 am

Hi alexukua,

Can you try harvesting with the command line using the "verbose" option? This will dump the harvesting URLs as they are fetched and will help narrow down where the Harvester is encountering an invalid character.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Mon May 03, 2010 9:32 am

asmecher wrote:Hi alexukua,

Can you try harvesting with the command line using the "verbose" option? This will dump the harvesting URLs as they are fetched and will help narrow down where the Harvester is encountering an invalid character.

Regards,
Alec Smecher
Public Knowledge Project Team


php /usr/share/harvester/new/tools/harvest.php all from=last verbose
Selected archive: Zhytomyr State University Library
Fetching records...
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... fix=oai_dc
Finished:
0 records indexed
9 seconds elapsed
0.00 records per second
45 records kept from past harvests
45 records total.
Errors/Warnings:
Invalid character

possible problem in the item http://eprints.zu.edu.ua/187/ ? (see dump file)
Attachments
temp.zip
dump data for function xml_parse
(30.14 KiB) Downloaded 164 times
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Mon May 03, 2010 11:34 am

Hi alexukua,

Indeed, that's invalid XML data. Can you try dumping instead what's being read from the remote server, as close to the source as possible?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Mon May 03, 2010 12:46 pm

asmecher wrote:Hi alexukua,

Indeed, that's invalid XML data. Can you try dumping instead what's being read from the remote server, as close to the source as possible?

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec
Error appears each time in another place, it is unclear to what it involves
for example,
xml_get_error_code Invalid character
xml_get_current_line_number 182
xml_get_current_column_number 172
(see temp3.zip dump file)


apparently changed at the position errors depending on the settings of mbstring (php.ini), probably a mistake to be found in
xml_get_error_code Invalid character
xml_get_current_line_number 290
xml_get_current_column_number 110
(see temp4.zip)
Attachments
temp4.zip
(29.58 KiB) Downloaded 155 times
temp3.zip
(33.66 KiB) Downloaded 162 times
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Mon May 03, 2010 3:59 pm

Hi alexukua,

Do the line numbers only move when you change the mbstring settings, or do they change even without any configuration changes? Does the behavior change when you turn on the "allow_url_fopen" option in config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Mon May 03, 2010 10:10 pm

asmecher wrote:Hi alexukua,

Do the line numbers only move when you change the mbstring settings, or do they change even without any configuration changes? Does the behavior change when you turn on the "allow_url_fopen" option in config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec

allow_url_fopen = Off
Invalid character
line 290
column 110

allow_url_fopen = On
Invalid character
line 337
column 100
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Tue May 04, 2010 8:05 am

Hi alexukua,

Could you try again with the charset_normalization option disabled in config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Tue May 04, 2010 8:52 am

asmecher wrote:Hi alexukua,

Could you try again with the charset_normalization option disabled in config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec

If turn charset_normalization to off i have next problem
harvester items is empty
Code: Select all
Central and Eastern European Marine Repository (CEEMAR)
View Archive Info
 
Metadata
 
Field    Value
 
 
Names


OAI Base URL http://www.ceemar.org/dspace-oai/request

if harvester eprints data, all ok.

(see http://oai.org.ua/new/index.php/browse)
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Tue May 04, 2010 9:39 am

Hi alexukua,

OK, this is progress. I suspect the UTF8 normalization code was incorrectly concatenating data when a split in the read buffer fell in the middle of a multibyte UTF8 character. What metadata format are you using to perform the harvest?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Re: getting an Invalid character error...

Postby alexukua » Tue May 04, 2010 9:43 am

asmecher wrote:Hi alexukua,

OK, this is progress. I suspect the UTF8 normalization code was incorrectly concatenating data when a split in the read buffer fell in the middle of a multibyte UTF8 character. What metadata format are you using to perform the harvest?

Regards,
Alec Smecher
Public Knowledge Project Team

We use DC metadata format for eprints archives
and mods for Dspace
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby alexukua » Wed May 05, 2010 2:15 pm

You can help me? Error probably lies in the plugin mods.
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: getting an Invalid character error...

Postby asmecher » Wed May 05, 2010 2:28 pm

Hi alexukua,

Try applying the patch at http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=4123.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9099
Joined: Wed Aug 10, 2005 12:56 pm

Next

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest

cron