OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



slow query

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

slow query

Postby xmu_chen » Mon Mar 16, 2009 7:18 pm

hi,
we have harvested about 2,000,000 records from several archives with PKP harvester 2.0.1, and the system is running well. recently, the system is running slowly when searching with words. At the begining, it spent less 1 seconds searching a keyword, and now it spends more than 100 seconds.
we have make a slow query report with mysqldumpslow, the most visited SQL statement is following:
Code: Select all
Count: 1093  Time=541.93s (592329s)  Lock=0.19s (205s)  Rows=114.8 (125450), root[root]@localhost
  SELECT
  o.record_id AS record_id,
  MAX(o.raw_field_id) AS raw_field_id,
  COUNT(*) AS count
  FROM
  records r LEFT JOIN archives a ON (r.archive_id = a.archive_id), search_objects o NATURAL JOIN search_object_keywords o0 NATURAL JOIN search_keyword_list k0
  WHERE k0.keyword_text LIKE 'S' AND o.raw_field_id = N AND (r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S' OR r.archive_id = 'S')
  AND r.record_id = o.record_id AND a.enabled = N
  GROUP BY o.record_id
  ORDER BY count DESC
  LIMIT N


and the second most visted SQL statement is following:
Code: Select all
Count: 592  Time=186.29s (110284s)  Lock=3.84s (2274s)  Rows=903586.0 (534922898), root[root]@localhost
  SELECT DISTINCT r.*, o.object_time FROM records r LEFT JOIN search_objects o ON (o.record_id = r.record_id AND (o.raw_field_id = 'S')) WHERE r.archive_id = 'S'


do you have any advise?
xmu_chen
 
Posts: 22
Joined: Tue Oct 30, 2007 6:07 am

Re: slow query

Postby asmecher » Mon Mar 16, 2009 9:28 pm

Hi xmu_chen,

Double-check to make sure that all the appropriate indexes have been created and run an optimize on the database; you should also be able to drop most foreign keys down to a smaller data type (i.e. int4 instead of int8). If you're still suffering from slow queries, I'd suggest moving to a Lucene-backed index rather than a MySQL-backed one. A plugin for each ships with Harvester 2.3.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: slow query

Postby xmu_chen » Wed Mar 18, 2009 12:50 am

hi Smecher,

Could you say more about how to moving to a Lucene-backed index from MySQL-backed one, or do you have any document which i can add the function based on harvester2.0.1 according to?

Thank you !

xmu_chen
xmu_chen
 
Posts: 22
Joined: Tue Oct 30, 2007 6:07 am

Re: slow query

Postby asmecher » Wed Mar 18, 2009 2:21 am

Hi xmu_chen,

It's not possible to easily back-port Lucene indexing to anything older than Harvester 2.3; this was a major rewrite. I'd suggest upgrading if possible. If that's not possible, you should be able to improve performance by optimizing the database as I suggested above -- but for large collections, a MySQL-based index may simply be too slow.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

Re: slow query

Postby xmu_chen » Wed Mar 18, 2009 6:15 pm

hi Smecher,

Thank you for your advise. I will try install harvester 2.3.0 and test it.

xmu_chen
xmu_chen
 
Posts: 22
Joined: Tue Oct 30, 2007 6:07 am


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 3 guests