Suggestions for PKP Harvester

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 945
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Suggestions for PKP Harvester

Postby ramon » Wed Dec 16, 2009 6:42 am

Hello all,

I'm sure I've mentioned this to you during the PKP Conferences I attended, but it's always best to register the requests.
Our experience at IBICT with OAI-PMH harvesting theses and dissertations has taught us a lot:
  1. Enable a "review" process within PKP Harvester to allow the Harvester administrator and data providers to check the quality of the data indexed, before adding the repository. Most data providers do not know/care/ about data quality. Many do not provide correct or full metadata, in the correct character set. Depending on the database used, the XML is not accepted (mainly Oracle). This review process would enable messages between the people responsible for the original data to fix certain issues, such as invalid characters, invalid data, strange data mappings (especially when using custom software, instead of PKP's suite). We have developed for our BDTD control features that include detecting such characters and other types of errors for control. The list of options is not known to me, as I don't work directly with this project, but eventually could provide a full list.
    • The "review" process could/should include a standard review form, enabling compliance to quality guidelines by field (such as our Capes/Qualis, or ISI).
    • Include registration/metadata from the repository (description, policies, owners, contact, access rights, etc...), editable by owner/harvester admin, etc, with email verification.
    • Enabling harvesting full text as well, for searching, if data provider allows - should be defined by harvester policies (as a digital repository, such as image minimum resolution, file type and size), especially for digital preservation.
  2. Enable registering more "users" for each repository, with different roles (i.e: system admin, repository manager or owner), as well as the PKP Harvester admin (and other roles). These could enable future reports for institutional use of some kind.
  3. I would recommend the use of 2 separate databases, 1 for collecting and one for searching, reducing the load on the server when users search. This could be very handy, as we can set 2 remote servers for the databases and sync the data by repository, for example.
  4. Search results and the repository itself should enable content categories, by discipline, institution, repository type (journals, conferences, websites, etc), as well as quality classification (by administrators and users) and ratings (by users).
  5. Plan for RSS Feeds, like the SmartFeeds used in PHPBB, as well as a "save search" features. Repositories and Harvesters should be reliable information sources, as well as allow researches to users come back and use the data for reference, creating new knowledge more easily. You could think of adding a "reference" manager feature (such as Zotero - plugin for Firefox)

A I come up with new ideas, I'll add them to the list, or let me know if I'm completely off here...

Return to “Open Harvester Systems Support and Development”

Who is online

Users browsing this forum: No registered users and 1 guest