This is how the xml files look like when gunzip og untar (this is explained in
http://www.oaister.org/sru.html):
<?xml version="1.0" encoding="UTF-8"?>
<BIBDB><GROUP NAME="zas">
<A ID="oai:ZASPIL:zp28001" DT="2005-09-08T13:31:30Z"><B><K>Acoustic Cues for the Korean Stop Contrast - Dialectal Variation</K><L>Choi, Hansook</L></B><E><YR>2002</YR><X>ZAS-Berlin</X></E><G><AA>http://www.zas.gwz-berlin.de/papers/zaspil/articles/28-1-choi.pdf</AA></G><J><URL>http://www.zas.gwz-berlin.de/papers/zaspil/articles/28-1-choi.pdf</URL></J><FMT>application/pdf</FMT><LANG>English</LANG><TYPE>arcticle</TYPE><INST>Zentrum fur Allgemeine Sprachwissenschaft, Typologie und Universalienforschung (ZAS) Archive</INST></A>
<A ID="oai:ZASPIL:zp29002" DT="2005-09-08T13:31:30Z"><B>..........</A>
........
</GROUP></BIBDB>
Harvesting Oister.org is only available via ftp. The metadata is packed in .tar.gz. I was thinking using DOM/PHP5 to inject the data in the xml files into Harvester2.
Currently Harvester2 uses a socket open and read to access remote HTTP. Would it simple to add an extension which allows the harvester to access ftp socket 21?
Cheers
Obi