OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



error in finish submission

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

error in finish submission

Postby vazquezm » Tue Mar 14, 2006 1:39 pm

Hi,

the authors get this error when press "finish submission":

DB Error: ERROR: invalid UTF-8 byte sequence detected near byte 0xe3

I see that the paper is received ok.

What can I do to fix this.
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Tue Mar 14, 2006 2:29 pm

Hi Vazquezm,

This is probably caused during the searching and indexing phase of the submission; when indexing a PDF, for example, OJS uses an external tool as configured in config.inc.php (search for "pdf" to find the appropriate line). The error message you're receiving probably indicates that the external tool is providing information that's not valid UTF-8. Try extracting the text from the submission by hand and testing to see if the resulting output is UTF-8; you may need to configure and/or update your extraction tool.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Postby vazquezm » Wed Mar 15, 2006 4:09 am

hi,

The PDF is uncomment so this should not be the problem.
or yes?.
I have checked config.inc.php and I have this:

[search]

; Minimum indexed word length
min_word_length = 3

; The maximum number of search results fetched per keyword. These results
; are fetched and merged to provide results for searches with several keywords.
results_per_keyword = 500

; The number of hours for which keyword search results are cached.
result_cache_hours = 1

; Paths to helper programs for indexing non-text files.
; Programs are assumed to output the converted text to stdout, and "%s" is
; replaced by the file argument.
; Note that using full paths to the binaries is recommended.
; Uncomment applicable lines to enable (at most one per file type).
; Additional "index[MIME_TYPE]" lines can be added for any mime type to be
; indexed.

; PDF
; index[application/pdf] = "/usr/bin/pstotext %s"
; index[application/pdf] = "/usr/bin/pdftotext %s -"

; PostScript
; index[application/postscript] = "/usr/bin/pstotext %s"
; index[application/postscript] = "/usr/bin/ps2ascii %s"

; Microsoft Word
; index[application/msword] = "/usr/bin/antiword %s"
; index[application/msword] = "/usr/bin/catdoc %s"
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Wed Mar 15, 2006 10:39 am

Hi vazquezm,

That's correct, as long as the PDF extraction tool's configuration is commented, it will not be used.

The next most likely cause is the same problem in your HTML files; are you sure they're saved as UTF8 text? You can use tools like iconv to convert them.

If this isn't the problem, you'll need to investigate further -- I'd suggest turning on logging for your database and determining the query that's causing the problem. The invalid text should be clearly visible in the query.

PostgreSQL seems to be less stringent when the database character set option in config.inc.php is disabled; if you're unable to correct the problem in another way, try disabling the option and trying again.

If the problem is indeed the full-text indexing, you can try regenerating the index using the external tool tools/rebuildSearchIndex.php; this will make debugging easier.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Change in coding charset

Postby vazquezm » Thu Mar 16, 2006 1:42 am

Hi,

One question related with this,
If I have started the OJS using the charset UFT-8, now can I change to LATIN1?, where I need to do the change?.

I think that with this change I can solve two problems, the strange symbol that appears intead of the accents in the emails, and the error en the finish submission.

thank you for you time.
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Thu Mar 16, 2006 10:52 am

Hi Vazquezm,

To change OJS to Latin1, simply change the character sets in config.inc.php -- for example, change client_charset to "latin1". You'll need to transcode the database contents; I'm not sure if PostgreSQL has any transcoding tools, but you can accomplish this externally by dumping the database to a text file, running iconv to convert the contents, and loading the converted database dump back into PostgreSQL.

However, I think it's preferrable to use UTF-8; have you been able to locate the source of the incorrect codes? It may be as simple as running your HTML through iconv.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Postby vazquezm » Thu Mar 16, 2006 12:28 pm

Dear Alec,

I am sorry but my knowledge of programming is scarce.

If you think that it´s preferrable to use UTF-8, I am wanna use it.

But give detailed instructions to locate the source of the incorrect codes.
Now can I running my HTML through iconv?. First, where, what is iconv?.
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Thu Mar 16, 2006 1:48 pm

Hi vazquezm,

First, make sure that your HTML is actually the problem -- you'll have to examine your database log to make sure. What is logged and where it is logged depend on your database configuration, which is out of OJS's control. Find out what character sequence is causing the problem, and then check to see if that sequence appears in your article. However, I think non-UTF8 in your HTML is the most likely cause.

If so, the next step is to convert the HTML from its current character set (probably Latin1) to UTF-8. This is what iconv does.

If you're running a Linux or Unix system, you should have the iconv tool already installed. If you're using Windows, you can get the iconv tool from http://gnuwin32.sourceforge.net/packages/libiconv.htm; alternately, your HTML editor might support UTF-8 natively. To convert from Latin1 to UTF-8,
Code: Select all
iconv -t UTF-8 -f LATIN1 -o output-file.html input-file.html
Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Postby vazquezm » Thu Mar 16, 2006 2:56 pm

Hi,
Thank you for the detailed explanation. I have understood the process.
I am using Windows, and I have download the tool iconv.

Now, maybe is a stupid question, but what is my .html files?.
I have extracted OJS into the public directory of my server. I see only .php files and a index.html in the directory /public/ . Is this the html file?.

Manuel
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Thu Mar 16, 2006 3:26 pm

Hi vazquezm,

You can find your HTML files in your files_dir, as configured in config.inc.php. For example, for journal ID 12 and article ID 3074, your HTML file is in the (files_dir)/journals/12/articles/3074/public directory.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Postby vazquezm » Thu Mar 16, 2006 5:11 pm

Hi,

I get the files_dir.
But, in the files_dir all files are .doc or .pdf. No html files.
I am not using html files for articles, all the articles are doc files (ms-word) sent from authors and I create pdf files to send to reviewers. The final journal will be pdf files alone.

So the problem is indexing doc and pdf files?
I should uncomment some of this line in config.inc.php?

; PDF
; index[application/pdf] = "/usr/bin/pstotext %s"
; index[application/pdf] = "/usr/bin/pdftotext %s -"

; Microsoft Word
; index[application/msword] = "/usr/bin/antiword %s"
; index[application/msword] = "/usr/bin/catdoc %s"

Maybe the problem is that the system give a null output ( no utf-8 ) to indexing and postgress with the error.

But I am using windows. Where I get this antiword, catdoc, pdftotext and where I install them?. I cannot using the original word.exe or the acrobat.exe?

thx
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby vazquezm » Thu Mar 23, 2006 2:06 am

Hi,

I think that I find the origin of the problem.
I remember you the problem (using OJS in Windows 2000 OS):

the authors get this error when press "finish submission":
DB Error: ERROR: invalid UTF-8 byte sequence detected near byte 0xe3

I get the same error when I try to save metadata.

I think that this is because the windows NT or 2000 use UCS-2 instead of UTF-8 internally.

I find this in the link
http://support.microsoft.com/kb/232580/en-us

"UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC driver and OLEDB provider all internally represent Unicode data as UCS-2".
"Any UTF-8 data sent from the client to the server via GET or POST is also converted to UCS-2 automatically"

As I am not expert in computer science, please read the link and suggested me a solution of the problem.

thank you
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby asmecher » Thu Mar 23, 2006 11:40 am

Hi Vazquezm,

Hmm, what a mess -- that might be the problem you're encountering. It's not clear to me from the description whether this affects PHP on IIS. I found some discussion of the issue (http://bugs.php.net/bug.php?id=18169), and it's unclear whether it's being addressed as a Windows bug, a PHP work-around, or application-level work-arounds. Try your earlier solution of using Latin1 instead of UTF-8, or consider using MySQL instead of PostgreSQL as it's less fastidious about its encodings.

Please report your results back here in case other users encounter the same problem.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8585
Joined: Wed Aug 10, 2005 12:56 pm

Postby vazquezm » Wed Mar 29, 2006 12:51 pm

I want to test OJS in Windows with MySql and Latin1 set.
But when install I get the error:

Warning: xml_parser_create() [function.xml-parser-create]: unsupported source encoding "latin1" in C:\www\journal\classes\xml\XMLParser.inc.php on line 138

Warning: xml_parser_set_option(): supplied argument is not a valid XML Parser resource in C:\www\journal\classes\xml\XMLParser.inc.php on line 139

Warning: xml_parser_set_option(): supplied argument is not a valid XML Parser resource in C:\www\journal\classes\xml\XMLParser.inc.php on line 140

Warning: xml_parse_into_struct(): supplied argument is not a valid XML Parser resource in C:\www\journal\classes\xml\XMLParser.inc.php on line 107

Warning: xml_parser_free(): supplied argument is not a valid XML Parser resource in C:\www\journal\classes\xml\XMLParser.inc.php on line 149

Warning: Cannot modify header information - headers already sent by (output started at C:\www\journal\classes\xml\XMLParser.inc.php:138) in C:\www\journal\classes\template\TemplateManager.inc.php on line 179


What is the problem?
vazquezm
 
Posts: 35
Joined: Mon Mar 13, 2006 7:35 am

Postby mjordan » Wed Mar 29, 2006 1:49 pm

Hi vazquezm,

OJS has been tested on a variety of Windows platforms. Can you let us know which version of Windows (XP, Server 2003, etc.) and also what version of PHP you are using? I can't offer any specific advice but knowing this informatoin will help Alec troubleshoot when he returns on Monday.

Mark
mjordan
 
Posts: 21
Joined: Wed Mar 17, 2004 10:59 pm
Location: Vancouver, BC, Canada

Next

Return to OJS Technical Support

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Yahoo [Bot] and 6 guests