by asmecher » Thu Feb 16, 2012 10:30 pm
Hi tlchristian,
Unfortunately OHS's database isn't very suitable for this kind of analysis directly. You can get the raw XML for each record from the database by querying the contents column of the records table, but you'll still need to parse the XML for the particular fields you're looking for. (If you need some data crosswalked into Dublin Core, you're better getting XML from the OAI interface as described above.)
I tend to use command-line tools like grep, sort, uniq, and wc -- available on most *NIX and MacOSX systems, but also available under e.g. Cygwin for Windows -- to do basic analysis from there, as they can operate directly on the XML files. However, they aren't particularly intuitive.
Regards,
Alec Smecher
Public Knowledge Project Team