PKP Bugzilla – Bug 5256
Reduce include path to a single entry and change all import statements
Last modified: 2010-05-03 04:55:55 PDT
Profiling shows that the long include path may be the principal factor contributing to the increased execution times from OJS 2.2.x to OJS 2.3. I'll try to reduce the include path to a single entry and change all import, include* and require* statements accordingly to absolute paths. Then I'll check application performance again. Only when the expected improvements occur (>25% execution time reduction) and I've tested the change for a while locally will this refactoring be made permanent.
Created attachment 3061 [details] A list of import statements that need to be corrected (semi-)manually
Created attachment 3062 [details] Contains an analysis of all include and require statements in the application
I'm documenting my approach here for review. Command that extracts import names from file name: find * -type f -name '*.inc.php' | sed ';s%^lib/pkp/lib/smarty/%%;s%^lib/pkp/lib/pqp/classes/%%;s%^lib/pkp/lib/phputf8/%%;s%^lib/pkp/lib/adodb/%%;s%^lib/pkp/pages/%%;s%^lib/pkp/includes/%%;s%^lib/pkp/%%;s%lib/pkp/classes/%%;s%^classes/%%;s%^pages/%%;s%/%.%g;s%.inc.php$%%' >../import-names.txt Make sure that we do not have duplicates: sort -u ../import-names.txt >../import-names-unique.txt sort ../import-names.txt >../import-names-sorted.txt diff ../import-names-unique.txt ../import-names-sorted.txt rm ../import-names* see bug #5275: found duplicate class AuthSource Command that maps file names to import names: find * -type f -name '*.inc.php' | sed -n 'h;s%^lib/pkp/lib/smarty/%%;s%^lib/pkp/lib/pqp/classes/%%;s%^lib/pkp/lib/phputf8/%%;s%^lib/pkp/lib/adodb/%%;s%^lib/pkp/pages/%%;s%^lib/pkp/includes/%%;s%^lib/pkp/%%;s%^lib/pkp/classes/%%;s%^classes/%%;s%^pages/%%;x;H;x;s/\n/ : /;s%/%.%g;s%.inc.php%%gp' >../import-names.txt Remove duplicates discovered earlier grep -v ' classes.security.AuthSource$' ../import-names.txt >../import-names-unique.txt mv ../import-names-unique.txt ../import-names.txt Explore statements containing the old class names: grep -o '^[^ ]\+' ../import-names.txt | sed -r $'s/\./\\\\./g;s/(^|$)/["\']/g' >../import-names-old.txt find * -type f -name '*.php' -exec grep -Hnif ../import-names-old.txt '{}' \; >../import-names-occurrences.txt rm ../import-names-old.txt Find all import/require/include statements for comparison: find * -type f -name '*.php' -exec egrep -Hni '(^|[[:space:]])import[[:space:]]*\(' '{}' \; >../all-imports.txt find * -type f -name '*.php' -exec egrep -Hni $'(^|[[:space:]])(include(_once)?|require(_once)?)[^;]+;' '{}' \; >../all-include-and-require.txt Remove cache and lib entries from lists: sed -i '/^cache/d;\%^lib/pkp/lib%d' ../import-names-occurrences.txt ../all-imports.txt ../all-include-and-require.txt Which imports did we miss? diff ../import-names-occurrences.txt ../all-imports.txt >../unmatched-imports.txt Reasons for unmatched imports: * DAO configuration in Application needs to be updated * Router names need to be updated in PKPApplication * One unknown class (XCacheCache, see bug #5260) * Dynamic imports, i.e. import($...); * A few classes are imported with an absolute file name already, e.g. import('pages....'); -> check manually whether path is complete * Some erroneous imports, see bug #5261 * a few false positives * see attachment unmatched-imports.txt for the complete results Refactoring approach: * automatically correct all matched imports using sed * (semi-)manually address unmatched import statements (see attachment 'unmatched-imports.txt' and results above) * correct include and require statements (semi-)manually (see attachment 'all-include-and-require.txt')
We might have to leave libs like adodb and smarty in the include path if they use non-local includes. We'll have to check that for all libraries separately.
Florian, looks good. A few comments: - Bug #5275 does not (currently) exist -- suspect it's the wrong bug number - We'll probably have some areas where there isn't a 1:1 correspondence between the filename and the class name, and a few where the filename doesn't match up against the classname. Have a look at classes/rt/RTStruct in the PKP library. Are you planning to replace the import() function entirely with require_once? I'd suggest doing so, since we're fully qualifying names. There are also a number of places where we could reduce the amount of code brought in for every request by changing the import strategy from one where imports occur at the top of the file, to one where imports happen where the other class is needed. Example: specific form validators. But that can wait for another bugzilla entry.
Yes sorry, it was bug #5257 which you already fixed.
Thanks a lot for your feedback! > We'll probably have some areas where there isn't a 1:1 correspondence between the filename and the class name All functions I need to refactor (import, include*, require*) make no assumptions about classes. So this shouldn't be a problem. > Are you planning to replace the import() function entirely with require_once? I didn't plan that. I found it actually quite nice to have a place to intercept file inclusion. I'm using it for unit tests to insert mock implementations of classes that rely on static method calls. The overhead of the import() function is so small according to the profiles I have seen that I would just leave it. > Example: specific form validators. But that can wait for another bugzilla entry. Yes, I've already remarked that specific case when I did the validator re-factoring. What's especially bad in that case is that a parent class imports subclasses. I just didn't include it in my re-factoring as there are so very many places where validators need to be included. An alternative approach would be a classloader. But I don't think that the performance or maintenance benefit of this change is worth our attention currently. If you agree with my analysis then I'd start implementing this re-factoring and do some testing with it. I think together with the HookRegistry cache we're currently discussing this could already make up the 25% overall performance improvement that I think is possible without too much effort. From my profiles I cannot see any other such low-hanging fruit. But I might be mistaken.
Florian, that all sounds reasonable to me. Hold off on the import refactoring (at least as far as they affect the applications) until we get OJS 2.3.2 and OCS 2.3.1 out, but as soon as that's done, this will be a great improvement.
ok, I'll just do the refactoring locally so that I can see whether it holds what it promises. Then we'll have all the commands ready to do this quickly in all applications when we're free to commit larger changes.
More sed magic, this time to actually edit all files that contain import statements... Manually inspect the entries in import-names-occurrences.txt that are not import statements and delete false positives, e.g. comments, translation ids, etc. (don't delete Router and DAO entries): vim ../import-names-occurrences.txt /\v^(.*import)@!.*$ Create sed statements to substitute old import for new import: sed -r $'h;s/.* : //;s/^/\'/;s/$/\'%/;x;s/ : .*//;s/\\./\\\\./g;s/^/s%[\'"]/;s/$/[\'"]%/;G;s/\\n//' ../import-names.txt >../translate-old-new.sed Create sed statements for editing the import statements: sed -r 's/([^:]+):([0-9]+):(.+)$/sed -i $:\2 s%^.*$%\3%: \1/' ../import-names-occurrences.txt | sed -rf ../translate-old-new.sed | sed $'s/\'/\\\\\'/g;s/:/\'/g' >../import-refactoring.sh Make sure that we've got all occurrences captured: wc -l ../import-names-occurrences.txt grep '^sed' ../import-refactoring.sh | wc -l If there is a difference inspect with: grep -v '^sed' ../import-refactoring.sh and return to edit import-names-occurrences.txt Inspect generated script: less ../import-refactoring.sh Execute generated script: chmod u+x ../import-refactoring.sh bash -x ../import-refactoring.sh Check the result: git diff
Check partially qualified import statements in unmatched-imports.txt... Manually copy partially qualified import statements to a new file: partially-qualified-imports.txt Delete all lines that already contain fully qualified import statements: sed -r $'s/.* : //;s/\./\\\\\./g;s%(^|$)%/%g;s/$/d/' ../import-names.txt >../mark-fully-qualified.sed sed -i -f ../mark-fully-qualified.sed ../partially-qualified-imports.txt Replace remaining import statements in file: sed -r $'s/^..//;s/([^:]+):([0-9]+):([^\']+)\'([^\']+)\'(.+)/sed -i $:\\2 s%^.*$%\\3\'lib.pkp.\\4\'\\5%: \\1/;s/\'/\\\\\'/g;s/:/\'/g' ../partially-qualified-imports.txt >../import-refactoring.sh bash -x ../import-refactoring.sh
I've executed this change locally and on my Windows machine I get about 30%-40% faster execution time overall! :-) BUT: This is not representative because I've got exceptionally high costs for file system accesses: 1) I'm on NTFS 2) I'm on an encrypted drive 3) I've got weak hardware Anyway, promising results so far. OJS 2.3.1 is now only little slower only than OJS 2.2.4 on my machine. I've uploaded this change to my personal git repository (ojs and pkp-lib, branch "performance"). I'd be very happy if some of you could benchmark this change on other architectures (above all: Linux).
We also have to change local includes (e.g. plug-in wrappers) from non-qualified paths to qualified paths, e.g. require_once('./CoinsPlugin.inc.php');
Created attachment 3088 [details] the final script that was applied to the apps
Committed, see: http://github.com/pkp/pkp-lib/commit/3ed6257dc05c0176fde26b7914ed1965fe2bc79a http://github.com/pkp/pkp-lib/commit/7d80daab3146db4a3b7eccee2042c2702ad590fa http://github.com/pkp/harvester/commit/6dbf577e829ece8eb57e815d923a6acf119fefad http://github.com/pkp/ocs/commit/d96690dda93807a047ff6585c056c8a834360285 http://github.com/pkp/ojs/commit/f09032187e2866ec5173123827c5c523b22edd81 http://github.com/pkp/omp/commit/c44935d9d6a9724ddb1da7edd319b1bebdd10e5b