You are viewing the PKP Support Forum | PKP Home Wiki

Searching across the harvested archives

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Searching across the harvested archives

Postby prcgian » Fri Oct 13, 2006 6:35 am

Hi, How is performed the searching across the harvested archives?
what is tha algorithm? what is the rank function?

If I use MySql the Harvester2 uses its searching functions?? (i.e. fulltext search)
Posts: 3
Joined: Fri Oct 13, 2006 5:10 am

Postby asmecher » Fri Oct 13, 2006 6:56 am

Hi prcgian,

Searching and indexing are implemented in classes/search/*.php using an inverted index. The keywords and indexing information are stored in the MySQL tables called search_keyword_list, search_objects, and search_object_keywords.

The algorithm works very roughly as follows: the search string is split into keywords (with a quoted phrase being treated as a single "keyword"). The numbers of results for each keyword (with a maximum number as defined in the configuration file) are added and the final ranking is calculated based on those totals.

The search algorithm itself can be found in classes/search/Search.inc.php in the retrieveResults function.

Alec Smecher
Public Knowledge Project Team
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm


Postby prcgian » Fri Oct 13, 2006 10:50 am

Thank you, your information are very useful!
Posts: 3
Joined: Fri Oct 13, 2006 5:10 am

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest