Thanks for your detailed reply! I used to use PDF creation tools a while ago, and I do recall that their quality and their capacity can vary widely. One tool that my old institution used in their PDF creation workflow was ABBYY Reader -- http://www.abbyy.com/
. It's definitely not free, but I do believe they had success running old PDFs through it to generate full text for indexing. If this sounds like it might be something that could help you, I can ask them to provide more detail.
Regarding your other points: that's a good observation that this is an issue with metadata not being searchable. I had included that other thread not to point out the possibility of non-valid spaces being the culprit; rather, to point out some useful checkpoints for eftekharb to check against:
- Code: Select all
1. check that your browser is submitting forms in UTF-8 (although this usually isn't a problem)
2. check that your config.inc.php settings are using utf8 as the connection_charset
3. your database collaiton is set to UTF-8 (utf8_general_ci or utf8_unicode_ci) collation
4. all of your tables are set to the same collation as the database
5. all of your columns in the tables are likewise set to the same collation
6. check the article_search_keyword_list table and perhaps search the keyword_text column to see if the string you're looking for is there
My apologies for not making it clear why I included that link. Since non-full-text search is working for you (correct?) but not for eftekharb, my inclination is that eftekharb should check through that list first, and then get back to this thread with results.