OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Proxy problems with Harvester?

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Proxy problems with Harvester?

Postby fredriley » Tue Sep 25, 2007 8:03 am

Sorry, it looks like I'm monopolising this forum. I have RTFM, honestly. I've been able to install Harvester on my account on our Unix system (http://www.nottingham.ac.uk/~ntzfr/test ... ter-2.0.1/), but now I find that I can't connect to any OAI archives. I've just now tried the following OAI services which I know for sure work, because I've been able to harvest from them in the Harvester installation on my Macbook at home:

http://www.nla.gov.au/apps/oaicat/servlet/OAIHandler
http://www.rlo-cetl.ac.uk:8080/test/IntraLibrary-OAI

In both cases, I go into Add Archive, add the URL to the OAI Base URL field, press Fetch Archive Metadata, twiddle my thumbs for a couple of minutes, then get no result - no metadata, no error message, nowt, just the same Add Archive form on screen. If I fill in the data and press Save, more thumb-twiddling, then I get the error message "DB Error: Lost connection to MySQL server during query". The database is accessible and editable via mySQL GUI tools. All permissions are set as per the README, with all of /cache and /public set to 777 - all I've not yet done is to create a directory to store uploaded files. This problem occurs on two different PCs running Firefox, which can get any other site fine - I even tried IE with, unsurprisingly, the same result.

This problem might be down to our university using a proxy script (http://wwwcache.nottingham.ac.uk/proxy.pac) for web traffic as I've had similar thumb-twiddling problems with other applications (BOINC, for instance). Might this be the case, and if so can I tweak the Harvester config to use the proxy? I've searched for "proxy" in this forum and TFM without result. If it's not a proxy, can anyone suggest a possible cause? I'm close to committing some serious mouse and keyboard abuse here... ;-\ - it was so easy to set up the software on my home Mac with OS/X, but I've now spent hours on this installation and feel like Xeno's tortoise, getting nowhere very very slowly :(

Cheers

Fred

Is there a FAQ for Harvester anywhere? The
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby asmecher » Tue Sep 25, 2007 8:56 am

Hi Fred,

I suspect you're dealing with two problems; an HTTP proxy shouldn't be causing the MySQL connection to drop, so that may be an unrelated problem. However, you can configure an HTTP proxy in config.inc.php if you're using at least version 2.0.1 of the Harvester. See the http_host, http_port, http_username and http_password settings in config.inc.php.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Proxy problems with Harvester?

Postby fredriley » Wed Sep 26, 2007 10:26 am

Thanks for the tip, Alec. I've edited the config.inc.php file so that the proxy bit reads:

[proxy]

; Note that allow_url_fopen must be set to Off before these proxy settings
; will take effect.

; The HTTP proxy configuration to use
http_host = 128.243.220.20
http_port = 3128
; proxy_username = username
; proxy_password = password

The university proxy doesn't need authentication, and I've used the above settings to get BOINC working ok so I know they work. These settings are visible in the university's proxy config file (http://wwwcache.nottingham.ac.uk/proxy.pac).

I've also set:

allow_url_fopen = Off

as the comments in the proxy section advise. The thumb-twiddling has now gone, but instead we've other strange behaviour. If I put a OAI URL (eg http://www.rlo-cetl.ac.uk:8080/test/IntraLibrary-OAI) into the OAI Base URL field and hit Fetch Archive Metadata, the page immediately reloads but with no archive metadata. If I fill in the mandatory fields manually and hit Save, I get:

Errors occurred processing this form:
The specified OAI URL is not valid. Please check the URL and try again.


I've also tried, out of desperation,

http_host = http://128.243.220.20
http_host = 'http://128.243.220.20'

No effect. I've logged out, closed and reopened browsers to kill any sessions, but no effect either. config.inc.php is set to 755. I'm fresh out of ideas, and am beginning to wonder whether the Harvester can only be installed on a server which doesn't have its traffic routed through a proxy. Anyone got any other ideas? Can the Harvester be used with proxies? Have I spent some 6 hours on this in vain? No rush now, mind - I'm off on holiday for a week, a long, blessed way away from computers and the Internet.

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby asmecher » Thu Sep 27, 2007 3:09 pm

Hi Fred,

Sorry to hear you're having trouble. I've added one code tune-up that might help the harvester work with your proxy -- see http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=3012 for a patch. Otherwise we'll have to do some further debugging.

The first setup format you were using is correct -- you shouldn't need to include any quoting or a http://... prefix.

Let me know if it works.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Proxy problems with Harvester?

Postby fredriley » Tue Oct 09, 2007 6:29 am

asmecher wrote:Hi Fred,

Sorry to hear you're having trouble. I've added one code tune-up that might help the harvester work with your proxy -- see http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=3012 for a patch. Otherwise we'll have to do some further debugging.


Thanks again, Alec, and sorry for the late reply but I've been on holiday these last 10 days or so. I'm not familiar with Bugzilla, so could you tell me what I need to do to implement your patch? Do I need to download and reinstall the Harvester? Do I need to copy your code somewhere? I've never been part of a formal software development team so all this CVS, Bugzilla and the rest of it is new to me.

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby asmecher » Tue Oct 09, 2007 11:32 am

Hi Fred,

What you need to do is download the patch linked from the Bugzilla entry above and apply it on your server using the "patch" tool on the command line. If you're running a UNIX-like (e.g. Linux) server, the patch tool should be installed; if you're running a Windows server, you can download the patch tool by searching Google for something like "GNU patch tool windows". The patch tool will take the patch file you've just saved, and modify your installation according to the changes described in the patch file.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 10, 2007 3:46 am

asmecher wrote:Hi Fred,

What you need to do is download the patch linked from the Bugzilla entry above and apply it on your server using the "patch" tool on the command line. If you're running a UNIX-like (e.g. Linux) server, the patch tool should be installed; if you're running a Windows server, you can download the patch tool by searching Google for something like "GNU patch tool windows". The patch tool will take the patch file you've just saved, and modify your installation according to the changes described in the patch file.

Regards,
Alec Smecher
Public Knowledge Project Team


Ok, Alec, looks like this has set me off on another wee learning curve, never having used 'patch' before. I read what I could of the man pages on it before my eyes started swimming and even looked at the Wikipedia page (http://en.wikipedia.org/wiki/Patch_(Unix)), then looked at the code in your patch file, and it looks like I have to place the patch file in the root of the Harvester installation on my Unix account, which I've done, and run patch <patchfile> on the command line, which I've also done, but it just hung on me until I Ctrl-C'd it. I saved the patchfile to disk as a couple of different names - I don't know if it should have a particular extension or obey a particular filenaming convention.

So, looking at the patch file itself, it's plainly a patch for /classes/file/FileWrapper.inc.php with a bunch of @. ---, + and - characters which presumably relate to patch. Should I just edit the file() function, scratch out these patch characters, and copy it into FileWrapper.inc.php? I dare say it would be better, quicker and less prone to error to use patch, but I'd welcome some guidance on that - I can see the point of using it, and see what it does, but it's not clear to me how it's used, not least because I'm not familiar with diff.

Questions, questions. Nothing's ever simple, eh? ;-|

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby ramon » Wed Oct 10, 2007 6:16 am

Fred,

Depending on how many files the patch affects, the process is quite simple.

What the patch tool does is replace any listed files, removing the lines with "-'" with lines with a "+" sign in them...

You can edit the files manually if you wish...

The patch tool has some commands to apply the patch from the root, or something like that.

Search this forum for patch and you will find quite a few tips from Alec, with commands to execute...
ramon
 
Posts: 931
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 10, 2007 8:48 am

ramon wrote:Fred,

Depending on how many files the patch affects, the process is quite simple.

What the patch tool does is replace any listed files, removing the lines with "-'" with lines with a "+" sign in them...

You can edit the files manually if you wish...

The patch tool has some commands to apply the patch from the root, or something like that.

Search this forum for patch and you will find quite a few tips from Alec, with commands to execute...


No, sorry, that's just too much extra work - I've already spent many hours trying to get Harvester to work on our Unix system with considerable fiddling around with permissions, includes, install parameters, and the rest of it, then today I spent a good couple of hours trying to figure out how patch works and what it was trying to do. This is only a small part of my regular job and I can't afford to devote any more time to it when a myriad other things are clamouring for my attention. I'll try manually editing the patch file and copying the function into the FileWrapper.inc.php include, and if that fails then I'll cut my losses and assume that Harvester is too tricky to run on a proxied system. I'll try instead to run it off my personal hosting account which AFAIK isn't proxied. Hell, all I'm trying to do is get a test installation online for colleagues to try out, and see if they like the look of it ;-\

Such a shame. Harvester works like a dream on my Macbook at home when I've got root privileges to everything and my ADSL connection is unproxied. As ever, though, once you try to run these things on a server with all the proxies, firewalls, blocks, hidden policies, etc, etc, you find yourself going down blind and time-consuming alleys... :(

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby asmecher » Wed Oct 10, 2007 8:58 am

Hi Fred,

I can help you through the patch process, as it should just be a matter of using the right command line. In the posting above, where the patch tool appeared to hang, it sounds like you're missing the pipe "<" in the command. This is a typical way of applying a patch:
Code: Select all
patch -p0 --dry-run < /path/to/patch.file
The --dry-run option instructs the patch tool *not* to actually apply the patch, but to run through the process anyway without actually changing anything. This is useful for checking to make sure that it will apply successfully without messing up the code. If it looks like it'll apply successfully, run it again without the --dry-run option.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 10, 2007 9:52 am

Before Alec's last post, I manually edited the classes/file/FileWrapper.inc.php file adding and removing the few lines as specified in the diff file, such that the HTTPFileWrapper.open() method now reads as at the end of this message, with lines added/edited in bold. Same problem as before, folks - sorry. I'll try running the patch command as Alec suggests in case I've edited something wrong.

I would upload the config.inc.php and FileWrapper.inc.php files, but this BBS won't allow .php files (or .xx, or .txt, and I gave up after that) extensions to be uploaded. I've copied the config.inc.php proxy section below.

it's looking as if I'll just not be able to get Harvester working on our system - that's $%@#!! proxies for you ;-\. I'm going to spend two hours more max on this, then I'll be into double figures and will give in. If it doesn't work out then it's not PKP's fault - it's impossible to code for every system out there. I suspect the trick is to use a standalone *nix box with unproxied connections and full root access to everything - any time you have to go through sysadmins it's trouble and hassle.

Cheers

Fred

-----------------

config.inc.php proxy settings;

;;;;;;;;;;;;;;;;;;
; Proxy Settings ;
;;;;;;;;;;;;;;;;;;

[proxy]

; Note that allow_url_fopen must be set to Off before these proxy settings
; will take effect.

; The HTTP proxy configuration to use
http_host = 128.243.220.20
http_port = 3128
; proxy_username = username
; proxy_password = password

~~~~~~~~~~~~~~~~~~~~~~~~~

HTTPFileWrapper.open() method:

function open($mode = 'r') {
$realHost = $host = isset($this->info['host']) ? $this->info['host'] : $this->defaultHost;

$port = isset($this->info['port']) ? (int)$this->info['port'] : $this->defaultPort;
$path = isset($this->info['path']) ? $this->info['path'] : $this->defaultPath;
if (isset($this->info['query'])) $path .= '?' . $this->info['query'];

if (!empty($this->proxyHost)) {
$realHost = $host;
$host = $this->proxyHost;
$port = $this->proxyPort;
if (!empty($this->proxyUsername)) {
$this->headers['Proxy-Authorization'] = 'Basic ' . base64_encode($this->proxyUsername . ':' . $this->proxyPassword);
}
}

if (!($this->fp = fsockopen($host, $port, $errno, $errstr)))
return false;

$additionalHeadersString = '';
if (is_array($this->headers)) foreach ($this->headers as $name => $value) {
$additionalHeadersString .= "$name: $value\r\n";
}

$request = 'GET ' . (empty($this->proxyHost)?$path:$this->url) . " HTTP/1.0\r\n" .
"Host: $realHost\r\n" .
$additionalHeadersString .
"Connection: Close\r\n\r\n";
fwrite($this->fp, $request);

// CODE AFTER THIS UNALTERED SO NOT QUOTED - FR
}
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 10, 2007 10:04 am

For info, I managed to run the patch with patch -p0 < FileWrapper.inc.php.patch - the dry-run parameter wasn't recognised, so perhaps this is an earlier version of patch. It would have been nice had the man page mentioned that the pipe character be used... ;-|

Anyway, file patched, same errors, I'm going home in a bate, though at least I've learnt another tiny abstruse part of the behemoth that is Unix...

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby asmecher » Wed Oct 10, 2007 11:08 am

Hi Fred,

It must be some peculiarity of the proxy software you're using; I've tested using Tinyproxy and it seems to work OK. What is the proxy app you're using?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 10, 2007 2:51 pm

asmecher wrote:Hi Fred,

It must be some peculiarity of the proxy software you're using; I've tested using Tinyproxy and it seems to work OK. What is the proxy app you're using?

Regards,
Alec Smecher
Public Knowledge Project Team


Nothing at all, Alec - all HTTP traffic going out of our university, AFAIK, goes via the proxy config script at http://wwwcache.nottingham.ac.uk/proxy.pac - I've just put Harvester on my Unix account and accessing it from Firefox and IE, without any intervening software. I've no idea what's on the university web server and can't access the Apache config - I'm just an ordinary staff user. I've just had a bright, though depressing, idea - maybe our sysadmins have blocked traffic on port 3128? Sysadmins are notoriously paranoid and will block any port they can - hell, they'd block all ports and cut off systems from the world if they were allowed to in the search for total security. I'll email them now and ask if there's a block on this port, and if so would they remove it, and get back here with their reply. The more I think of it, the more this seems the likeliest cause of the problem, but if it is I'll be fuming... :(

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: Proxy problems with Harvester?

Postby fredriley » Wed Oct 17, 2007 9:35 am

Sadly, I was right - our sodding systems lot have blocked port 3128, so that's tens of hours of mine and your time wasted, though at least it explains the problem. If possible, it might be worth trying to cater for that in the error message that Harvester gives out - is there a way in PHP to test if a port is open? I've had a quick scan of the PHP networking functions (http://uk3.php.net/manual/en/ref.network.php) and nothing jumps out at me, though maybe you could check for the FALSE result from fsockopen(). An error message, rather than just a blank output, might have made diagnosis of the problem easier. Given how paranoid sysadmins are, and thus how prevalent institutional/corporate firewalls are, this issue might occur elsewhere.

Anyway, I'll try to install the harvester on another non-firewalled server and see how that goes. I'll also try to get our sysadmins to unblock 3128, but that would probably mean undergoing the Spanish Inquisition.... ;-\

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Next

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest

cron