OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



[PATCH] Make possible to harvest through a http proxy

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

[PATCH] Make possible to harvest through a http proxy

Postby kerphi » Wed Jun 28, 2006 1:10 am

Hello,

Here is a small patch based on 2.0.0 version which make possible to harvest through a HTTP proxy.
It adds 4 parameters in the config file :
http_host
http_port
http_username
http_password

Here is the patch.
Code: Select all
diff -Naur harvester-2.0.0.orig/classes/file/FileWrapper.inc.php harvester-2.0.0/classes/file/FileWrapper.inc.php
--- harvester-2.0.0.orig/classes/file/FileWrapper.inc.php   2006-05-05 22:35:16.000000000 +0200
+++ harvester-2.0.0/classes/file/FileWrapper.inc.php   2006-06-27 08:52:11.000000000 +0200
@@ -168,7 +168,18 @@
       $port = isset($this->info['port']) ? (int)$this->info['port'] : $this->defaultPort;
       $path = isset($this->info['path']) ? $this->info['path'] : $this->defaultPath;
       if (isset($this->info['query'])) $path .= '?' . $this->info['query'];
+      $url = "http://".$host.":".$port.$path;
       
+      $proxy_host     = Config::getVar('proxy', 'http_host');
+      $proxy_port     = Config::getVar('proxy', 'http_port');
+      $proxy_username = Config::getVar('proxy', 'http_username');
+      $proxy_password = Config::getVar('proxy', 'http_password');
+      if ($proxy_host != "")
+      {
+         $host = $proxy_host;
+         $port = $proxy_port;
+         $this->headers["Proxy-Authorization"] = "Basic ".base64_encode ("$proxy_username:$proxy_password");
+      }
       if (!($this->fp = fsockopen($host, $port, $errno, $errstr)))
          return false;
 
@@ -177,7 +188,7 @@
          $additionalHeadersString .= "$name: $value\r\n";
       }
 
-      $request = "GET $path HTTP/1.0\r\n" .
+      $request = "GET $url HTTP/1.0\r\n" .
          "Host: $host\r\n" .
          $additionalHeadersString .
          "Connection: Close\r\n\r\n";
diff -Naur harvester-2.0.0.orig/config.TEMPLATE.inc.php harvester-2.0.0/config.TEMPLATE.inc.php
--- harvester-2.0.0.orig/config.TEMPLATE.inc.php   2006-05-08 21:31:44.000000000 +0200
+++ harvester-2.0.0/config.TEMPLATE.inc.php   2006-06-27 23:26:34.000000000 +0200
@@ -216,3 +216,14 @@
 ; for any production system.
 show_stacktrace = Off
 
+;;;;;;;;;;;;;;;;;;
+; Proxy settings ;
+;;;;;;;;;;;;;;;;;;
+
+[proxy]
+
+;http_host = localhost
+;http_port = 3128
+;http_username = username
+;http_password = password
+
diff -Naur harvester-2.0.0.orig/pages/rtadmin/RTAdminHandler.inc.php harvester-2.0.0/pages/rtadmin/RTAdminHandler.inc.php
--- harvester-2.0.0.orig/pages/rtadmin/RTAdminHandler.inc.php   2006-05-06 00:53:22.000000000 +0200
+++ harvester-2.0.0/pages/rtadmin/RTAdminHandler.inc.php   2006-06-27 23:24:46.000000000 +0200
@@ -282,13 +282,28 @@
       return false;
    }
 
-   $fp = @ fsockopen($data['host'], isset($data['port']) && !empty($data['port']) ? $data['port'] : 80, $errno, $errstr, 10);
-   if (!$fp) {
-      return false;
+   $fp = NULL;
+   $proxy_host     = Config::getVar('proxy', 'http_host');
+   $proxy_port     = Config::getVar('proxy', 'http_port');
+   $proxy_username = Config::getVar('proxy', 'http_username');
+   $proxy_password = Config::getVar('proxy', 'http_password');
+   if ($proxy_host != "")
+   {
+      $fp = @ fsockopen($proxy_host, $proxy_port != "" ? $proxy_port : 3128, $errno, $errstr, 10);
+      if (!$fp) {
+         return false;
+      }
+      $req = sprintf("%s %s HTTP/1.0\r\nHost: %s\r\nProxy-Authorization: Basic %s\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516\r\n\r\n", ($useGet ? 'GET' : 'HEAD'), $url, $proxy_host, base64_encode("$proxy_username:$proxy_password"));
+   }
+   else
+   {
+      $fp = @ fsockopen($data['host'], isset($data['port']) && !empty($data['port']) ? $data['port'] : 80, $errno, $errstr, 10);
+      if (!$fp) {
+         return false;
+      }
+      $req = sprintf("%s %s HTTP/1.0\r\nHost: %s\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516\r\n\r\n", ($useGet ? 'GET' : 'HEAD'), (isset($data['path']) && $data['path'] !== '' ? $data['path'] : '/') .  (isset($data['query']) && $data['query'] !== '' ? '?' .  $data['query'] : ''), $data['host']);
    }
 
-   $req = sprintf("%s %s HTTP/1.0\r\nHost: %s\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516\r\n\r\n", ($useGet ? 'GET' : 'HEAD'), (isset($data['path']) && $data['path'] !== '' ? $data['path'] : '/') .  (isset($data['query']) && $data['query'] !== '' ? '?' .  $data['query'] : ''), $data['host']);
-
    fputs($fp, $req);
 
    for($res = '', $time = time(); !feof($fp) && $time >= time() - 15; ) {


Hope that helps.

best regards,

Stéphane Gully.
kerphi
 
Posts: 1
Joined: Wed Jun 28, 2006 1:04 am
Location: France

Postby asmecher » Thu Jun 29, 2006 12:32 pm

Hi Stéphane,

Thanks -- we'll definitely consider this for inclusion in the next release.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm

data-grabbing & mining - need script-help

Postby ethno_researcher » Sun Jul 23, 2006 2:37 am

hello Alec, good day - hello Stephane,

asmecher wrote:Hi Stéphane,

Thanks -- we'll definitely consider this for inclusion in the next release.

Regards,
Alec Smecher
Open Journal Systems Team



hello all,

this is probably one of the best places to ask such questions. so i do it now.

first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions.

to give an example - let us take this forum here. How can i grab all the data out of this forum - and get it local and then after wards put it in a local database - of a phpBB-forum - is this possible"?!"?
to give an example - let us take this forum here - am i able to grabb and harvest data out of this forum here. How can i do that.

What i have in mind - Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what?


I need to to take out forum messages and other data (foum topics, users) into database. Purpose: create forum copy for text analysis. Does anyone have approximate solution?

It is needed to get data through HTTP for further analysis - in need to get the data through HTTP and put it into CSV - in order to get a dump that can fill a local database of a phpBB-board.

I need the data in a allmost full and complete formate. So i need all the data like

username .-
forum
thread
topic
text of the posting and so on and so on.

how to do that?

i need some kind of a grabbing tool - can i do it with that kind of tool. How do i sove the storing-issue into the local mysql-database.

Well you see that is a tricky work - and i am pretty sure taht i am getting help here. So for any and all help i am very very thankful

many many thanks in advance


- a Ethno-reseracher
ethno_researcher
 
Posts: 2
Joined: Sun Jul 23, 2006 2:34 am

Postby asmecher » Sun Jul 23, 2006 6:49 pm

Hi ethno_researcher,

This is off-topic -- you'd be best off trying the phpBb support forum, where you might be able to get a good answer. I've responded to you in a private message with a suggestion, but unfortunately we won't be able to help you any further here; we just use phpBb for our support forum.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8470
Joined: Wed Aug 10, 2005 12:56 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 3 guests