Searching across the harvested archives

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 3
Joined: Fri Oct 13, 2006 5:10 am

Searching across the harvested archives

Postby prcgian » Fri Oct 13, 2006 6:35 am

Hi, How is performed the searching across the harvested archives?
what is tha algorithm? what is the rank function?

If I use MySql the Harvester2 uses its searching functions?? (i.e. fulltext search)

Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Postby asmecher » Fri Oct 13, 2006 6:56 am

Hi prcgian,

Searching and indexing are implemented in classes/search/*.php using an inverted index. The keywords and indexing information are stored in the MySQL tables called search_keyword_list, search_objects, and search_object_keywords.

The algorithm works very roughly as follows: the search string is split into keywords (with a quoted phrase being treated as a single "keyword"). The numbers of results for each keyword (with a maximum number as defined in the configuration file) are added and the final ranking is calculated based on those totals.

The search algorithm itself can be found in classes/search/ in the retrieveResults function.

Alec Smecher
Public Knowledge Project Team

Posts: 3
Joined: Fri Oct 13, 2006 5:10 am


Postby prcgian » Fri Oct 13, 2006 10:50 am

Thank you, your information are very useful!

Return to “Open Harvester Systems Support and Development”

Who is online

Users browsing this forum: No registered users and 0 guests