OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Duplicate raw_field_id for the same record in entries table

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Duplicate raw_field_id for the same record in entries table

Postby ovz » Sun Jun 08, 2008 4:14 pm

Hi All,

I'm trying to build a custom template to get search results in XML. I'm using the following code for harvester/plugins/schemas/dc/summary.tpl

<record>

{foreach from=$entries item=entry key=name name=entries}

<{$name}>{foreach from=$entry item=value}{$value.value|escape|default:""}{/foreach}</{$name}>

{/foreach}
</record>

For now I would like just print the entire record in XML format. I've just discovered the following set of records entries table

entry_id record_id raw_field_id value
4723822 20594 1 A DATA PARTITIONING APPROACH
TO FREQUENT PATTERN M...
4723823 20594 16 Nguyen, Son Nhu
4723824 20594 16 Not available
4723828 20594 19 2007

Note that there're two records for with the same record_id _AND_ raw_field_id, one with value 'Nguyen, Son Nhu' and one with value 'Not available'. As a result I get a node like this

<creator>Nguyen, Son NhuNot available</creator>

Of course I would like a cleaner node. Does it make sense to create a more sophisticated template script to handle such situations or I should just remove the offending record from entries table manually and move on?
ovz
 
Posts: 6
Joined: Thu Jun 05, 2008 10:10 am

Re: Duplicate raw_field_id for the same record in entries table

Postby asmecher » Mon Jun 09, 2008 9:54 am

Hi ovz,

Have you looked at the source XML that resulted in this entry? I suspect there were two separate entries for the <creator> field -- if so, you should be able to simply reverse the order of the two Smarty foreach calls, which will lead to...
Code: Select all
<creator>Nguyen, Son Nhu</creator>
<creator<Not available</creator>
Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: Duplicate raw_field_id for the same record in entries table

Postby ovz » Wed Jun 11, 2008 7:41 am

Yes, the code that produces two adjacent <creator> nodes looks like this

Code: Select all
{foreach from=$entry item=value}<{$name}>{$value.value|escape|default:""}</{$name}>{/foreach}


But I also would like to know the what second creator field in database is for? As I've mentioned in my original post sometimes there's just one creator field and sometimes there're two with second one set to "Not Available".
ovz
 
Posts: 6
Joined: Thu Jun 05, 2008 10:10 am

Re: Duplicate raw_field_id for the same record in entries table

Postby asmecher » Wed Jun 11, 2008 8:49 am

Hi ovz,

You'll have to investigate your data source on that one -- I don't think it's a PKP product (i.e. OJS or OCS). One option to filter this out in the Harvester would be to write a preprocessor plugin to strip out the "Not Available" during the harvest.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: Duplicate raw_field_id for the same record in entries table

Postby ovz » Wed Jun 11, 2008 5:15 pm

Ок, now I know what "second creator" is. In some records author's e-mail is stored there. Its trivial to distinguish e-mail address from "Not Available", but I still would like to know are the "Not Available" records any different from "one creator" records?
ovz
 
Posts: 6
Joined: Thu Jun 05, 2008 10:10 am

Re: Duplicate raw_field_id for the same record in entries table

Postby asmecher » Wed Jun 11, 2008 8:54 pm

Hi ovz,

There are still some details missing here -- what application are you harvesting data from? It's worth checking the XML that application is serving to see if it's differentiating between the two records somehow. If it's simply serving up two <creator> nodes, with the first containing the name and the second containing the email address, that's out of OAI spec and should be addressed at the source.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: Duplicate raw_field_id for the same record in entries table

Postby ovz » Thu Jun 12, 2008 8:52 am

asmecher wrote:Hi ovz,

There are still some details missing here -- what application are you harvesting data from? It's worth checking the XML that application is serving to see if it's differentiating between the two records somehow. If it's simply serving up two <creator> nodes, with the first containing the name and the second containing the email address, that's out of OAI spec and should be addressed at the source.



I suppose by "application" you mean which archive I'm trying to index. The archive is NTLTD, http://alcme.oclc.org/ndltd/, and particularly the offending record can be found at

http://alcme.oclc.org/srw/search/NDL/Se ... d/XNDL.xsl

And indeed, the XML does contain dupilcate <creator> field

<dc:creator>Nguyen, Son Nhu</dc:creator>
<dc:creator>Not available</dc:creator>

As there could indeed be sevaral authors for one paper, like in

http://pkp.sfu.ca/harvester2/demo/index ... iew/461345

it seems ok to use several <dc:creator> to reflect this. But do I understand you right, that having "not Available" or e-mail address in <dc:creator> is against OAI spec?
ovz
 
Posts: 6
Joined: Thu Jun 05, 2008 10:10 am

Re: Duplicate raw_field_id for the same record in entries table

Postby asmecher » Thu Jun 12, 2008 12:01 pm

Hi ovz,

Sorry, that's my typo -- I meant the DC spec, not the OAI spec. I would expect multiple <creator> nodes to indicate that there were multiple creators, not multiple pieces of information referring to the same creator. See, for example, http://www.ariadne.ac.uk/issue8/canberra-metadata/ for a bit of discussion on shoehorning data into the creator field.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: Duplicate raw_field_id for the same record in entries table

Postby ovz » Thu Jun 12, 2008 3:13 pm

Too bad this actual archive (and I suspect the other archives too) do not make use of Type or Scheme sub-qualifiers. But "Not available" creator fields are clearly a glitch in archive's software.

Thanks for clarifying this for me :!:
ovz
 
Posts: 6
Joined: Thu Jun 05, 2008 10:10 am


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest