OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



initiating on harvester2: first questions

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

initiating on harvester2: first questions

Postby ozp » Tue May 01, 2007 4:24 pm

Base:
http://biblioteca.redepsi.com.br/

Locale:
Does anyone have pt_BR locale?
I made a clone of en_US locale named pt_BR, but it does not show at the locale options. How to enable this?
In case no one has this locale Ill translate it in steps but I want to have it configured in my system as the translation is done


char problems:
http://biblioteca.redepsi.com.br/index. ... se/index/2
how to fix this?
eg:
Trabalho, gestão e subjetividade

<p align="justify">As atuais pr&aacute;ticas sociais de trabalho e de ...

Is this a problem in the source archive or at my harvester?


search:
Even with 1 source updated I cannot find any results on search

I did not read all the docs, but I wonder If I have to do anything to enable search

Reading Tools:
I did not understood what this is about.
Is it about Acrobat, word, firefox or it is about knowledge areas?

Because I tried to edit and archive and there were options like: bussiness health...

My harvester will be most about pychology and social sciences



Regards
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Wed May 02, 2007 2:58 pm

Hi ozp,

The problem is at the data source end. An OAI data feed shouldn't include HTML (e.g. the <p> tags) or HTML entities (e.g. the &quot;). Try fixing these at the source end; also note that OJS 2.x gives cleaner OAI feeds than OJS 1.x did.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 7692
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Wed May 02, 2007 6:19 pm

for data source end do you mean the source (magazine) that Ive inserted in my harvester2?

only he can fix this?


Can you help with my other problems?

Regard
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby ozp » Thu May 03, 2007 8:01 am

At this harvester http://www.ibict.br/oasis.br/

I found many other sources with char set problems

could this be related to ISO or UTF stuff?

This is a problem that happends with many apps in portuguese language
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Thu May 03, 2007 8:58 am

Hi ozp,

This is often because of character encoding problems. OJS 1.x used ISO8859-1 by default for character encodings; character set conversion code exists in OJS 1.x's OAI feed software to provide standard UTF8-encoded output, but there are often problems with character set conversions.

You can check whether a problem exists at the data source end by using tools like "wget" and "xmllint"; for example, you can often fetch a bunch of records from an OAI data source using:
Code: Select all
wget -O tmp.xml "http://url-to-oai-data-source?metadataPrefix=oai_dc&verb=ListRecords
Then, check the XML using:
Code: Select all
xmllint --noout tmp.xml
Often the XML will not validate properly due to invalid character sets -- this confirms that the problem is the data source.

As for the reading tools -- this is a sidebar allowing users to interact with other resources such as search engines and dictionaries. When enabled, a sidebar will appear when viewing a record. There is a different set of reading tools available for each discipline included (e.g. Social Sciences) or you can customize your own.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 7692
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Thu May 03, 2007 10:20 am

ok!

And do you think there is a way to fix this? regardless the source?

I dont think those sources will fix this, because if they would it should be fixed already


And what about the pt_BR locale? do you know if someone has it?
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Thu May 03, 2007 12:34 pm

Hi ozp,

If you want to strip HTML tags and otherwise preprocess data coming into the harvester, modify the preprocessEntry function in plugins/preprocessors/regex/RegexPreprocessorPlugin.inc.php to include e.g.:
Code: Select all
$value = strip_tags($value);
Note that this will cause the Harvester to remove anything it interprets as an HTML tag, so data that legitimately contains "<" or ">" symbols might be adversely affected. You'll have to experiment with your particular sets of data.

I've contacted a user who may have a pt_BR translation and hope to hear back from them soon.

Regards,
Alec Smecher
Open Journal Systems Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 7692
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Mon May 28, 2007 9:06 am

Hello asmecher

we are trying to solve this problem

I'm not a developer so a friend of mine is doing the job for me

Many Brazilians OJS sites have this problem (char set bugs)
I dont think they will do anything about this, and even if they would, it would take too much time to wait.

We are trying to create a filter to fix the records before they are inserted at the harverster database

This particular journal still use OJS 1
http://146.164.3.26/seer/lab19/ojs/oai/ ... istRecords

Its a pretty good journal (Quality "A" ) but I think they dont have staff resources to maintain their system uptodate and bug free

If we suceed on making a filter for those chars we plan to post it here

This braziliand goverment harvester also has a lot of resources with charset problems
http://oasisbr.ibict.br/
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest

cron