I have made some headway on the problem with the special characters in the papers report.
On this page:
http://kwizcom.blogspot.com/2007/05/utf-8-with-signature.htmlI found this:
My client had to display some non-English characters that were all UTF8 encoded. When saving the result HMTL in notepad using UTF8 excel was able to display the special chars ok, but when using the code sample above - we got some wrong data.
After googling around for a while i managed to understand that excel must use UTF8 with signature text and i had to add a signature to it.
So - how do I signature my file as UTF8???
Some more googling allowed me to learn that all I needed to do is add these bytes to the start of the file:
0xEF, 0xBB, 0xBF
Which led me to include this:
- Code: Select all
fwrite($fp, chr(0xEF).chr(0xBB).chr(0xBF));
in plugins/reports/papers/PaperReportPlugin.inc.php, right after:
- Code: Select all
$fp = fopen('php://output', 'wt');
It still wasn't all the way there. My output file still contained HTML entities, despite the fact that
html_entity_decode was used for a number of fields.
Back to Google... and on the PHP documentation page for the
html_entity_decode fn:
http://php.net/manual/en/function.html-entity-decode.php I find this:
I wrote in a previous comment that html_entity_decode() only handled about 100 characters. That's not quite true; it only handles entities that exist in the output character set (the third argument). If you want to get ALL HTML entities, make sure you use ENT_QUOTES and set the third argument to 'UTF-8'.
So I try adding these two arguments for the abstract field and that seems to do the trick. ie. this:
- Code: Select all
} elseif ($index == 'abstract') {
$columns[$index] = html_entity_decode(strip_tags($row[$index]);
becomes this:
- Code: Select all
} elseif ($index == 'abstract') {
$columns[$index] = html_entity_decode(strip_tags($row[$index]), ENT_QUOTES, 'UTF-8');
I think that perhaps we should do the same for at minimum the title field as well, so maybe it should be this:
- Code: Select all
} elseif (($index == 'abstract') || ($index == 'title')) {
$columns[$index] = html_entity_decode(strip_tags($row[$index]), ENT_QUOTES, 'UTF-8');
The file opens fine in Excel for Windows. Still have messed up characters in Excel on my Mac. Even without the UTF-8 signature fix, the file opens fine in Open Office.