OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Exporting contents field of Records table into an outfile

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Exporting contents field of Records table into an outfile

Postby singhkarki » Tue Jun 10, 2014 12:29 am

hello ,
When i export the contents field of records table into an outfile , I find that the file generated, has a larger size than what it actually is . Is this because of repeated attempts of indexing done on the pkp database.
And, when i open that outfile there are many unidentified characters.
Is there a way i can clean the pkp database?

Thanks and best regards ,

Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Tue Jun 10, 2014 7:42 am

Hi Vijay,

I'm not sure, but I think you're referring to UTF-8 characters. If you're using UTF-8, certain characters will take more than 1 byte to store. Make sure when you extract and work with the data that you're using UTF-8 capable software.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Thu Jun 12, 2014 11:26 pm

hello Sir,
I have been working on z39.50 functionality for PKP harvester , I seek some help here.
The pkp harvester harvests data from an OAI repository and the data in the repository is stored in the following format:


<OAI-PMH xsi:schemaLocation="openarchives.org/OAI/2.0/ openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2014-06-06T10:19:32Z</responseDate><request verb="ListRecords" metadataPrefix="oai_dc">http://192.168.8.140/cgi-bin/koha/oai.pl</request><ListRecords><record><header><identifier>CFTRI:2</identifier><datestamp>2013-12-17T15:25:11Z</datestamp></header><metadata><oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>योग चिकित्सा Yog chikitsa</dc:title><dc:creator>
मिश्र (रा) Misra (R)
</dc:creator><dc:type/><dc:publisher>नई दिल्ली यूनिवर्सिटी पुब्लिकेन्स</dc:publisher><dc:date>2008</dc:date><dc:language>eng</dc:language><dc:identifier>http://192.168.8.140:80/cgi-bin/koha/opac-detail.pl?biblionumber=2</dc:identifier><dc:identifier>URN:ISBN:ISBN 978-81-7555-210-4</dc:identifier></oai_dc:dc></metadata></record><record><header><identifier>CFTRI:3</identifier><datestamp>2013-12-17T15:25:11Z</datestamp></header><metadata><oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>विज्ञान के अनन्य पथिक (विदेशी वैज्ञानिक) Vigyan ke ananya pathik (Videshi vaigyanik)</dc:title><dc:creator>
महंती (सु) Mahanti (S)
</dc:creator><dc:type/><dc:publisher>दिल्ली मेधा बुक्स</dc:publisher><dc:date>2009</dc:date><dc:language>eng</dc:language><dc:identifier>http://192.168.8.140:80/cgi-bin/koha/opac-detail.pl?biblionumber=3</dc:identifier><dc:identifier>URN:ISBN:ISBN 978-81-8166-276-6</dc:identifier></oai_dc:dc></metadata></record><record><header><identifier>CFTRI:4</identifier><datestamp>2013-12-17T15:25:11Z</datestamp></header><metadata><oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>विज्ञान के अनन्य पथिक (भारतीय वैज्ञानिक) Vigyan ke ananya pathik (Bharathiya vaigyanik)</dc:title><dc:creator>
महंती (सु) Mahanti (S)
</dc:creator><dc:type/><dc:publisher>दिल्ली मेधा बुक्स</dc:publisher><dc:date>2009</dc:date><dc:language>eng</dc:language><dc:identifier>http://192.168.8.140:80/cgi-bin/koha/opac-detail.pl?biblionumber=4</dc:identifier><dc:identifier>URN:ISBN:ISBN 978-81-8166-276-8</dc:identifier></oai_dc:dc></metadata></record>

And, We see here that every record is stored according to the oai schema , and it has <record>, <header> , <identifier> , < datestamp > < metadata> tags associated with it.

Whereas In the contents field of the records table the same records are stored as follows :

<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc=".openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title>Ciba Foundation Colloquia on Endocrinology,Vol.10</dc:title><dc:creator>
Wolstenholme,G.E.W.
</dc:creator><dc:type>text</dc:type><dc:publisher>J &amp; A Churchill Ltd. London</dc:publisher><dc:date>1957</dc:date><dc:language>und</dc:language><dc:identifier>http://192.168.8.143:83/cgi-bin/koha/opac-detail.pl?biblionumber=1994</dc:identifier></oai_dc:dc> |^C </dc:creator><dc:type>text</dc:type><dc:publisher>J &amp; A Churchill Ltd. London</dc:publisher><dc:date>1957</dc:date><dc:language>und</dc:language><dc:identifier>http://192.168.8.143:83/cgi-bin/koha/opac-detail.pl?biblionumber=1994</dc:identifier></oai_dc:dc>

Here , the following tags associated with every oai record
<record>, <header> , <identifier> , < datestamp > < metadata> </record>
are not present,

Now when i try and transform the oai: dc xml file created from the contents table , back to marcxml and consequently into marc using the schema : DC2MARC21slim.xsl , what it does is , it is not able to validate the oai:dc file against the schema , since the tags are not there.

Please guide me here.

My Aim is to make the pkp database available through Z39.50 protocol.

Thanks and best regards ,

PS : the pkp BB says the following: Your message contains too many URLs. The maximum number of URLs allowed is 4.
So i have truncated few http and www from original message.
Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Fri Jun 13, 2014 2:40 pm

Hi Vijay,

The OAI-centric data for each record is stored in the "records" table, outside the Dublin Core XML. See the identifier and datestamp columns, for example.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Mon Jun 16, 2014 4:53 am

hello Sir,
Thanks for your reply,
I would need a little guideline here, let me say that my record in the repository before harvesting has the format...

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2014-06-06T10:19:32Z</responseDate><request verb="ListRecords" metadataPrefix="oai_dc">http://192.168.8.140/cgi-bin/koha/oai.pl</request><ListRecords><record><header><identifier>CFTRI:2</identifier><datestamp>2013-12-17T15:25:11Z</datestamp></header><metadata><oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title>योग चिकित्सा Yog chikitsa</dc:title><dc:creator>
मिश्र (रा) Misra (R)
</dc:creator><dc:type/><dc:publisher>नई दिल्ली यूनिवर्सिटी पुब्लिकेन्स</dc:publisher><dc:date>2008</dc:date><dc:language>eng</dc:language><dc:identifier>http://192.168.8.140:80/cgi-bin/koha/opac-detail.pl?biblionumber=2</dc:identifier><dc:identifier>URN:ISBN:ISBN 978-81-7555-210-4</dc:identifier></oai_dc:dc></metadata></record>


And, the record after harvesting in the contents table of ohs database is ...

<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title>योग चिकित्सा Yog chikitsa</dc:title><dc:creator>
मिश्र (रा) Misra (R)
</dc:creator><dc:type/><dc:publisher>नई दिल्ली यूनिवर्सिटी पुब्लिकेन्स</dc:publisher><dc:date>2008</dc:date><dc:language>eng</dc:language><dc:identifier>http://192.168.8.140:80/cgi-bin/koha/opac-detail.pl?biblionumber=2</dc:identifier><dc:identifier>URN:ISBN:ISBN 978-81-7555-210-4</dc:identifier></oai_dc:dc>


Sir, do you have any sql query written which when i run on my ohs database would give me the record back in the same format as it was earlier stored in the repository.

PS: And, surprisingly the count of the datestamp is zero, as follows:
mysql> select count(datestamp) from records;
+------------------+
| count(datestamp) |
+------------------+
| 0 |
+------------------+

Whereas , the count for identifier is finite , as follows:

mysql> select count(identifier) from records;
+-------------------+
| count(identifier) |
+-------------------+
| 369497 |
+-------------------+


Please guide .
I would be really grateful.

Thanks and best regards,

Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Mon Jun 16, 2014 8:31 am

Hi Vijay,

I don't have any SQL on hand that would do that, but I suggest taking a look at the OAI service provided by OHS -- this is separate from the one OHS harvests. The URL should be http://your-domain.com/index.php/oai.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Wed Jun 18, 2014 2:59 am

hello Sir,
Thanks for your reply, it works quite nice however on the html page after showing 100 records it gives me :
resumption token for another set of 100 oai records as follows:

OAI Record: oai::record/281016
OAI Record Header
OAI Identifier oai::record/281016 oai_dc formats
Datestamp 2014-06-18T09:59:46Z
setSpec 35 Identifiers Records

Dublin Core Metadata (oai_dc)
Title Chemistry of Penicillin
Author or Creator Clarke,H.T.
Resource Type text
Publisher Oxford University Press Princeton
Date 1949
Language und
Resource Identifier http://192.168.8.143:83/cgi-bin/koha/op ... number=100

There are more results.
resumptionToken: 8c673840f808a0ba3c18ad55c83e5a26 Resume


For these 100 records i can apply xslt and get the records in the requisite format.
But my record count is huge and will further go up by some millions, so what i would require is that instead of the resumption token for every 100 records, i get all the records in one page without resumption token. How may i achieve that ? Please suggest.

PS: after my xsl transformations on the records, i get some 'BAD offsets in data' errors, do the offsets change when records are harvested from repository?

thanks and best regards.
Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Wed Jun 18, 2014 5:43 am

Hi Vijay,

Exporting all the records at once will certainly lead to a server timeout or an out-of-memory error; that's what resumption tokens were introduced into the OAI-PMH standard to avoid. However, you can increase the limit by specifying an oai_max_records directive in your config.inc.php's [oai] section.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Wed Jun 25, 2014 5:15 am

Hello Sir, thanks a lot for invaluable reply , that really helped.
Sir, my search queries are taking longer than earlier in pkp , i opine this is due to the repeated indexing done.
How may i make my search queries faster.

I have created twelve archives in my instance of pkp.
I have a record titled 'Comdex Computer Course Kit (Hindi)' available in one of the archives . When i search this record using its title name, from the search tab available in the breadcrumb , i get the record. But, what happens , when I try and search the same title 'Comdex Computer Course Kit (Hindi)' using the main search box , i don't get the record in the search records.In which file i need to make changes so that i may get the record retrieved , when searching it through the main search box.

thanks and best regards,

Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Mon Jul 07, 2014 2:18 pm

Hi Vijay,

I'd suggest using the "Debug" option in config.inc.php to turn on the dump of database queries used to generate a page. That way you can compare the queries used for the sidebar search to those used by the main search form.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Tue Jul 08, 2014 1:48 am

hello Sir,
thanks for your reply,
If i wish to see the archives sorted alphabetically where do i need to make change ?

thanks and best ..

Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Tue Jul 08, 2014 9:00 am

Hi Vijay,

You can specify a sort order in the call to ArchiveDAO::getArchives as the optional 3rd parameter. You can specify "title", "url", or "manager". For the public listing in the Browse area, see pages/browse/BrowseHandler.inc.php in the "index" function.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Wed Jul 16, 2014 1:59 am

hello Sir, thanks for replying,
Sir, the repository(koha) from which my instance of PKP OHS harvests records sometimes ,later deletes some of its record , Is there some provision available in PKP OHS, to find those records that have been deleted in the source repository and can we later delete the same deleted records from the harvester too.

thanks and best ,


Vijay.
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Re: Exporting contents field of Records table into an outfil

Postby asmecher » Wed Jul 16, 2014 11:58 am

Hi Vijay,

Yes, OHS supports record deletion. See plugins/harvesters/oai/OAIHarvester.inc.php in the handleRecordNode function.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Exporting contents field of Records table into an outfil

Postby singhkarki » Wed Jul 16, 2014 9:28 pm

hello sir, thanks for your reply,
Could you please explain , how does PKP ohs keep track of records from the data provider repository, that have been deleted. Since, records archived in the ohs database are not live records, how would the ohs ensure that it updates the deleted records?
I have set up a cron job on my pkp system and it periodically updates the records in different archives. Will PKP system itself delete the records that have been deleted on the source repository?

Sir, i added a new record in my data provider(repository, Koha ) and updated my metadata index in ohs , ohs quickly added the new record in its database . Later , i deleted the same record in my data provider (Koha repository) and once again i updated the metadata index in ohs to see if the deleted record(from repository ) got deleted from my koha too.
The record was not deleted from ohs. So where in handleRecordNode() function from ohs/plugins/harvesters/oai/OAIHarvester.inc.php , do i need to make change to get the desired functionality.

thanks and best regards,

Vijay
singhkarki
 
Posts: 38
Joined: Fri Nov 29, 2013 3:28 pm

Next

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest