Archive for August, 2012

Serial publications with editors at Notre Dame

Posted on August 22, 2012 in Uncategorized

This is a list of serial publications (journals, yearbooks, magazines, newsletters, etc.) whose editorial board includes at least one person from the University of Notre Dame. This is not a complete list, and if you know of other titles, then please drop me a line:

  1. Actroterion
  2. American Journal Of Jurisprudence
  3. American Midland Naturalist
  4. American Political Thought
  5. Analyst
  6. Analytical Chemistry
  7. Applied Preventative Psychology
  8. Attention, Perception, & Psychophysics
  9. Behavior Genetics
  10. Beyond Politics
  11. Biomicrofluidics
  12. Bulletin De Philosphie Mediévalé
  13. Cognitive Psychology
  14. Conflict Resolution
  15. Current Drug Targets
  16. Faith And Philosophy
  17. International Yearbook Of German Idealism
  18. Journal of Autism and Developmental Disorders
  19. Journal Of Biblical Literature
  20. Journal of Chemical & Engineering Data
  21. Journal Of College and University Law
  22. Journal of Experimental Psychology: Learning, Memory & Cognition
  23. Journal of Hindu-Christian Studies
  24. Journal Of Legislation
  25. Journal Of Modern Russian History and Historiography
  26. Journal of Moral Education
  27. Journal of Multivariate Analysis
  28. Journal of Organic Chemistry
  29. Journal Of Philosophical Research
  30. Journal of Physical Chemistry A
  31. Journal of Physical Chemistry Letters
  32. Journal Of Religion and Literature
  33. Journal Of Undergraduate Research
  34. Kellogg Institute Working Papers
  35. Mobilization
  36. Mobilizing Ideas
  37. Multivariate Behavioral Research
  38. Nineteenth Century Context
  39. Notre Dame Journal of Formal Logic
  40. Notre Dame Journal of International & Comparative Law
  41. Notre Dame Journal Of Law, Ethics, And Public Policy
  42. Notre Dame Law Review
  43. Notre Dame Philosophical Reviews
  44. Notre Dame Review
  45. Psychological Methods
  46. Quarterly Journal Of Experimental Psychology
  47. Re: Visions
  48. Reilly Center Reports
  49. Rethinking Marxism
  50. Review Of Politics
  51. Scholastic
  52. Scientia
  53. Sociological Voices
  54. Studies in History and Philosophy of Science
  55. Sullivan Prize series
  56. The Bend
  57. The Hub
  58. The Juggler
  59. Through Gendered Lenses
  60. William And Katherine Devers Series in Dante Studies

This is a list titles that may or may not have had an editor from Notre Dame at one time, but to the best of my ability I could not find one.

  1. Catholic Education
  2. Comparative Politics Newsletter
  3. International Journal Of Ecology
  4. International Journal Of Industrial Organization
  5. Medieval Philosophy And Theology
  6. Memory And Cognition

Again, is not necessarily a complete list, and if know of other titles, then please drop me a line.

Last updated: October 1, 2012

Exploiting the content of the HathiTrust, epilogue

Posted on August 14, 2012 in Uncategorized

This blog posting simply points to a browsable and downloadable set of MARC records describing a set of books in both in the HathiTrust as well as the Hesburgh Libraries at the University of Notre Dame.

In a previous blog posting I described how I downloaded about 25,000 MARC records that:

  1. were denoted as in the public domain
  2. described books publish prior to 1924
  3. were denoted as a part of the Hesburgh Libraries at the University of Notre Dame
  4. were denoted as a part of the HathiTrust
  5. had a one-to-one correspondance between OCLC number and digitized item

This list of MARC records is not nor was not intended to be a comprehensive list of overlapping materials between the Hesburgh Libraries collection and the HathiTrust. Instead, this list is intended to be a set of unambiguous sample data allowing us to import and assimilate HathiTrust records into our library catalog and/or “discovery system” on an experimental basis.

The browsable interface is rudimentary. Simply point your browser to the interface and a list of ten randomly selected titles from the MARC record set will be displayed. Each title will be associated with the date of publication and three links. The first link points to the HathiTrust catalog record where you will be able to read/view the item’s bibliographic data. The second link points to the digitized version of the item complete with its searching/browsing interface. Third and final link queries OCLC for libraries owning the print version of the item. This last link is here to prove that the item is owned by the Hesburgh Libraries.

Screen shot of browsable interface

For a good time, you can also download the MARC records as a batch.

Finally, why did I create this interface? Because people will want to get a feel for the items in question before the items’ descriptions and/or URLs become integrated into our local system(s). Creating a browsable interface seemed to be one of the easier ways I could accomplish that goal.

Fun with MARC records, the HathiTrust, and application programmer interfaces.

Exploiting the content of the HathiTrust, continued

Posted on August 11, 2012 in Uncategorized

This blog posting describes how I created a set of MARC records representing public domain content that is in both the University of Notre Dame’s collection as well as in the HathiTrust.

Background

In a previous posting I described how I learned about the amount of overlap between my library’s collection and the ‘Trust. There is about a 33% overlap. In other words, about one out of every three books owned by the Hesburgh Libraries has also been digitized and in the ‘Trust. I wondered how our collections and services could be improved if hypertext links between our catalog and the ‘Trust could be created.

In order to create links between our catalog and the ‘Trust, I need to identify overlapping titles and remote ‘Trust URLs. Because they originally wrote the report which started the whole thing, OCLC had to have the necessary information. Consequently I got in touch with the author of the original OCLC report (Constance Malpas) who in turn sent me a list of Notre Dame holdings complete with the most rudimentary of bibliographic data. We then had a conference call between ourselves and two others — Roy Tennant from OCLC and Lisa Stienbarger from the Notre Dame. As a group we discussed the challenges of creating an authoritative overlap list. While we all agreed the creation of links would be beneficial to my local readers, we also agreed to limit what gets linked, specifically public domain items associated with single digitized items. Links to copyrighted materials were deemed more useless than useful. One can’t download the content, and searching the content is limited. Similarly, any OCLC number — the key I planned to use to identify overlapping materials — can be associated with more than one digitized item. “To which digitized item should I link?” Trying to programmatically disambiguate between one digitized item and another was seen as too difficult to handle at the present time.

The hacking

I then read the HathiTrust Bib API, and I learned it was simple. Construct a URL denoting the type of control number one wants to use to search as well as denote full or brief output. (Full output is just like brief output except full output includes a stream of MARCXML.) Send the URL off to the ‘Trust and get back a JSON stream of text. The programmer is then expected to read, parse, and analyze the result.

Energized with a self-imposed goal, I ran off to my text editor to hack a program. Given the list of OCLC numbers provided by OCLC, I wrote a Perl program that queries the ‘Trust for a single record. I then made sure the resulting record was: 1) denoted as in the public domain, 2) published prior to 1924, and 3) was associated with a single digitized item. When records matched this criteria I wrote the OCLC number, the title, and the ‘Trust URL pointing to the digitized item to a tab-delimited file. After looping through all the records I identified about 25,000 fitting my criteria. I then wrote another program which looped through the 25,000 items and created a local MARC file describing each item complete with remote HathiTrust URL. (Both of my scripts — filter-pd.pl and get-marcxml.pl — can be used by just about any library. All you need is a list of OCLC numbers.) It is now possible for us here at Notre Dame to pour these MARC records into our catalog or “discovery system”. Doing so is not always straight-forward, and if the so desire, I’ll let that work to others.

What I learned

This process has been interesting. I learned that a lot of our library’s content exists in digital form, and copyright is getting in the way of making it as useful as it could be. I learned the feasibility of improving our library collections and services by linking between our catalog and remote repositories. The feasibility is high, but the process of implementation is not straight-forward. I learned how to programmatically query the HathiTrust. It is simple and easy-to-use. And I learned how the process of mass digitization has been boon as well as a bit of a bust — the result is sometimes ambiguous.

It is now our job as librarians to figure out how to exploit this environment and fulfill our mission at the same time. Hopefully, this posting will help somebody else take the next step.

Exploiting the content of the HathiTrust

Posted on August 10, 2012 in Uncategorized

I have been exploring possibilities of exploiting to a greater degree the content in the HathiTrust. This blog posting outlines some of my initial ideas.

The OCLC Research Library Partnership program recently sent us here at the University of Notre Dame a report describing and illustrating the number and types of materials held by both the University of Notre Dame and the HathiTrust — an overlap report.

As illustrated by the pie chart from the report, approximately 1/3 of our collection is in the HathiTrust. It might be interesting to link our local library catalog records to the records in the ‘Trust. I suppose the people who wrote the original report would be able supply us with a list our overlapping titles. Links could be added to our local records facilitating enhanced services to our readers. “Service excellence.”

pie chart

Percentage of University of Notre Dame and HathiTrust overlap

According to the second chart, of our approximately 1,000,000 overlapping titles, about 121,000 (5%) are in the public domain. The majority of the public domain documents are government documents. On the other hand about 55,000 of our overlapping titles are both in the public domain and a part of our collection’s strengths (literature, philosophy, and history). It might be interesting to mirror any or all of these public domain documents locally. This would enable us to enhance our local collections and possibly provide services (text mining, printing, etc.) against them. “Lots of copies keep stuff safe.”

subjects

Subject coverage of the overlapping materials

According to the HathiTrust website, about 250,000 items in the ‘Trust are freely available via the public domain. For example, somebody has created a collection of public domain titles called English Short Title Catalog, which is apparently the basis of EBBO and in the public domain. [2] Maybe we could query the ‘Trust for public domain items of interest, and mirror them locally too? Maybe we could “simply” add those public domain records to our catalog? The same process could be applied collections from the Internet Archive.

The primary purpose of the HathiTrust is to archive digitized items for its membership. A secondary purpose it to provide some public access to the materials. After a bit of creative thinking on our parts, I believe it is possible to extend the definition of public access and provide enhanced services against some of the content in the archive as well as fulfill our mission as a research library.

I think will spend some time trying to get a better idea of exactly what public domain titles are in our collection as well as in the HathiTrust. Wish me luck.