mj wrote:Hi delong,
One of the features we're working on for OJS 2.2 is integration with a web service that will allow document conversion from MS-Word (and others) to XML, which in turn can be rendered into (consistent, UTF-8) XHTML galleys.
I have actually done a fair amount of work in this area; mostly using OpenOffice.org to do the conversion, which I can't recommend enough above MS-Office. Often character conversion issues can come from custom (Microsoft) fonts -- the Symbol font comes to mind. Depending on if you're previewing the galley file in IE, it may appear correctly, however once it's published through the web server, character encodings go funny.
A few recommendations on converting MS-Word to HTML:
- use OpenOffice instead of MS-Word; it creates *much* better HTML (actual valid HTML 4.0 rather than proprietary Word-HTML)
- try running your HTML files through HTMLTidy and/or converting everything to HTML entities rather than UTF-8; some browsers (notably IE) don't handle UTF-8 HTML very well.
There seem to be an increasing number of people who are going in this direction for their HTML galleys, so we are trying to get the XML facilities out there as quickly (and reliably) as possible.
Hope this helps,
Users browsing this forum: Bing [Bot], Yahoo [Bot] and 1 guest