We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.

Bug 5256 - Reduce include path to a single entry and change all import statements
Reduce include path to a single entry and change all import statements
Status: RESOLVED FIXED
Product: PKP General
Classification: Unclassified
Component: General
unspecified
PC Windows XP
: P3 normal
Assigned To: jerico
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-24 17:50 PDT by jerico
Modified: 2010-05-03 04:55 PDT (History)
2 users (show)

See Also:
Version Reported In:
Also Affects:


Attachments
A list of import statements that need to be corrected (semi-)manually (27.73 KB, text/plain)
2010-03-25 07:50 PDT, jerico
Details
Contains an analysis of all include and require statements in the application (13.79 KB, text/plain)
2010-03-25 07:58 PDT, jerico
Details
the final script that was applied to the apps (4.27 KB, text/plain)
2010-05-03 04:51 PDT, jerico
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jerico 2010-03-24 17:50:41 PDT
Profiling shows that the long include path may be the principal factor contributing to the increased execution times from OJS 2.2.x to OJS 2.3.

I'll try to reduce the include path to a single entry and change all import, include* and require* statements accordingly to absolute paths. Then I'll check application performance again. Only when the expected improvements occur (>25% execution time reduction) and I've tested the change for a while locally will this refactoring be made permanent.
Comment 1 jerico 2010-03-25 07:50:10 PDT
Created attachment 3061 [details]
A list of import statements that need to be corrected (semi-)manually
Comment 2 jerico 2010-03-25 07:58:57 PDT
Created attachment 3062 [details]
Contains an analysis of all include and require statements in the application
Comment 3 jerico 2010-03-25 08:00:32 PDT
I'm documenting my approach here for review.

Command that extracts import names from file name:
 find * -type f -name '*.inc.php' | sed ';s%^lib/pkp/lib/smarty/%%;s%^lib/pkp/lib/pqp/classes/%%;s%^lib/pkp/lib/phputf8/%%;s%^lib/pkp/lib/adodb/%%;s%^lib/pkp/pages/%%;s%^lib/pkp/includes/%%;s%^lib/pkp/%%;s%lib/pkp/classes/%%;s%^classes/%%;s%^pages/%%;s%/%.%g;s%.inc.php$%%' >../import-names.txt

Make sure that we do not have duplicates:
 sort -u ../import-names.txt >../import-names-unique.txt
 sort ../import-names.txt >../import-names-sorted.txt
 diff ../import-names-unique.txt ../import-names-sorted.txt
 rm ../import-names*
see bug #5275: found duplicate class AuthSource

Command that maps file names to import names:
 find * -type f -name '*.inc.php' | sed -n 'h;s%^lib/pkp/lib/smarty/%%;s%^lib/pkp/lib/pqp/classes/%%;s%^lib/pkp/lib/phputf8/%%;s%^lib/pkp/lib/adodb/%%;s%^lib/pkp/pages/%%;s%^lib/pkp/includes/%%;s%^lib/pkp/%%;s%^lib/pkp/classes/%%;s%^classes/%%;s%^pages/%%;x;H;x;s/\n/ : /;s%/%.%g;s%.inc.php%%gp' >../import-names.txt
 
Remove duplicates discovered earlier
 grep -v ' classes.security.AuthSource$' ../import-names.txt >../import-names-unique.txt
 mv ../import-names-unique.txt ../import-names.txt
 
Explore statements containing the old class names:
 grep -o '^[^ ]\+' ../import-names.txt | sed -r $'s/\./\\\\./g;s/(^|$)/["\']/g' >../import-names-old.txt
 find * -type f -name '*.php' -exec grep -Hnif ../import-names-old.txt '{}' \; >../import-names-occurrences.txt
 rm ../import-names-old.txt
 
Find all import/require/include statements for comparison:
 find * -type f -name '*.php' -exec egrep -Hni '(^|[[:space:]])import[[:space:]]*\(' '{}' \; >../all-imports.txt
 find * -type f -name '*.php' -exec egrep -Hni $'(^|[[:space:]])(include(_once)?|require(_once)?)[^;]+;' '{}' \; >../all-include-and-require.txt
 
Remove cache and lib entries from lists:
 sed -i '/^cache/d;\%^lib/pkp/lib%d' ../import-names-occurrences.txt ../all-imports.txt ../all-include-and-require.txt

Which imports did we miss?
 diff ../import-names-occurrences.txt ../all-imports.txt >../unmatched-imports.txt

Reasons for unmatched imports:
* DAO configuration in Application needs to be updated
* Router names need to be updated in PKPApplication
* One unknown class (XCacheCache, see bug #5260)
* Dynamic imports, i.e. import($...);
* A few classes are imported with an absolute file name already, e.g. import('pages....'); -> check manually whether path is complete
* Some erroneous imports, see bug #5261
* a few false positives
* see attachment unmatched-imports.txt for the complete results

Refactoring approach:
* automatically correct all matched imports using sed
* (semi-)manually address unmatched import statements (see attachment 'unmatched-imports.txt' and results above)
* correct include and require statements (semi-)manually (see attachment 'all-include-and-require.txt')
Comment 4 jerico 2010-03-25 08:02:56 PDT
We might have to leave libs like adodb and smarty in the include path if they use non-local includes. We'll have to check that for all libraries separately.
Comment 5 Alec Smecher 2010-03-26 09:45:16 PDT
Florian, looks good. A few comments:
- Bug #5275 does not (currently) exist -- suspect it's the wrong bug number
- We'll probably have some areas where there isn't a 1:1 correspondence between the filename and the class name, and a few where the filename doesn't match up against the classname. Have a look at classes/rt/RTStruct in the PKP library.

Are you planning to replace the import() function entirely with require_once? I'd suggest doing so, since we're fully qualifying names.

There are also a number of places where we could reduce the amount of code brought in for every request by changing the import strategy from one where imports occur at the top of the file, to one where imports happen where the other class is needed. Example: specific form validators. But that can wait for another bugzilla entry.
Comment 6 jerico 2010-03-26 11:10:52 PDT
Yes sorry, it was bug #5257 which you already fixed.
Comment 7 jerico 2010-03-26 14:02:02 PDT
Thanks a lot for your feedback!

> We'll probably have some areas where there isn't a 1:1 correspondence between the filename and the class name

All functions I need to refactor (import, include*, require*) make no assumptions about classes. So this shouldn't be a problem.

> Are you planning to replace the import() function entirely with require_once?

I didn't plan that. I found it actually quite nice to have a place to intercept file inclusion. I'm using it for unit tests to insert mock implementations of classes that rely on static method calls. The overhead of the import() function is so small according to the profiles I have seen that I would just leave it.

> Example: specific form validators. But that can wait for another bugzilla entry.

Yes, I've already remarked that specific case when I did the validator re-factoring. What's especially bad in that case is that a parent class imports subclasses. I just didn't include it in my re-factoring as there are so very many places where validators need to be included. An alternative approach would be a classloader. But I don't think that the performance or maintenance benefit of this change is worth our attention currently.

If you agree with my analysis then I'd start implementing this re-factoring and do some testing with it. I think together with the HookRegistry cache we're currently discussing this could already make up the 25% overall performance improvement that I think is possible without too much effort. From my profiles I cannot see any other such low-hanging fruit. But I might be mistaken.
Comment 8 Alec Smecher 2010-03-26 14:19:06 PDT
Florian, that all sounds reasonable to me. Hold off on the import refactoring (at least as far as they affect the applications) until we get OJS 2.3.2 and OCS 2.3.1 out, but as soon as that's done, this will be a great improvement.
Comment 9 jerico 2010-03-26 14:39:07 PDT
ok, I'll just do the refactoring locally so that I can see whether it holds what it promises. Then we'll have all the commands ready to do this quickly in all applications when we're free to commit larger changes.
Comment 10 jerico 2010-03-26 18:28:25 PDT
More sed magic, this time to actually edit all files that contain import statements...

Manually inspect the entries in import-names-occurrences.txt that are not import statements and delete false positives, e.g. comments, translation ids, etc. (don't delete Router and DAO entries):
 vim ../import-names-occurrences.txt
 /\v^(.*import)@!.*$

Create sed statements to substitute old import for new import:
 sed -r $'h;s/.* : //;s/^/\'/;s/$/\'%/;x;s/ : .*//;s/\\./\\\\./g;s/^/s%[\'"]/;s/$/[\'"]%/;G;s/\\n//' ../import-names.txt >../translate-old-new.sed

Create sed statements for editing the import statements:
 sed -r 's/([^:]+):([0-9]+):(.+)$/sed -i $:\2 s%^.*$%\3%: \1/' ../import-names-occurrences.txt | sed -rf ../translate-old-new.sed | sed $'s/\'/\\\\\'/g;s/:/\'/g' >../import-refactoring.sh

Make sure that we've got all occurrences captured:
 wc -l ../import-names-occurrences.txt
 grep '^sed' ../import-refactoring.sh | wc -l
 
If there is a difference inspect with:
 grep -v '^sed' ../import-refactoring.sh
and return to edit import-names-occurrences.txt

Inspect generated script:
 less ../import-refactoring.sh
 
Execute generated script:
 chmod u+x ../import-refactoring.sh
 bash -x ../import-refactoring.sh
 
Check the result:
 git diff
Comment 11 jerico 2010-03-26 19:27:01 PDT
Check partially qualified import statements in unmatched-imports.txt...

Manually copy partially qualified import statements to a new file: partially-qualified-imports.txt

Delete all lines that already contain fully qualified import statements:
 sed -r $'s/.* : //;s/\./\\\\\./g;s%(^|$)%/%g;s/$/d/' ../import-names.txt >../mark-fully-qualified.sed
 sed -i -f ../mark-fully-qualified.sed ../partially-qualified-imports.txt

Replace remaining import statements in file: 
 sed -r $'s/^..//;s/([^:]+):([0-9]+):([^\']+)\'([^\']+)\'(.+)/sed -i $:\\2 s%^.*$%\\3\'lib.pkp.\\4\'\\5%: \\1/;s/\'/\\\\\'/g;s/:/\'/g' ../partially-qualified-imports.txt >../import-refactoring.sh
 bash -x ../import-refactoring.sh
Comment 12 jerico 2010-03-26 21:24:22 PDT
I've executed this change locally and on my Windows machine I get about 30%-40% faster execution time overall! :-)

BUT: This is not representative because I've got exceptionally high costs for file system accesses:
1) I'm on NTFS
2) I'm on an encrypted drive
3) I've got weak hardware

Anyway, promising results so far. OJS 2.3.1 is now only little slower only than OJS 2.2.4 on my machine.

I've uploaded this change to my personal git repository (ojs and pkp-lib, branch "performance"). I'd be very happy if some of you could benchmark this change on other architectures (above all: Linux).
Comment 13 jerico 2010-04-12 07:36:26 PDT
We also have to change local includes (e.g. plug-in wrappers) from non-qualified paths to qualified paths, e.g.

require_once('./CoinsPlugin.inc.php');
Comment 14 jerico 2010-05-03 04:51:24 PDT
Created attachment 3088 [details]
the final script that was applied to the apps