Catholic pamphlets workflow

Posted on April 12, 2013 in Uncategorized by Eric Lease Morgan

Gratuitous eye candy by Matisse

Gratuitous eye candy by Matisse

This is an outline of how we here at Notre Dame have been making digitized versions of our Catholic pamphlets available on the Web — a workflow:

  1. Save PDF files to a common file system – This can be as simple as a shared hard disk or removable media.
  2. Ingest PDF files into Fedora to generate URLs – The PDF files are saved in Fedora for the long haul.
  3. Create persistent URLs and return a list of system numbers and… URLs – Each PDF file is given a PURL for the long haul. Output a delimited file containing system numbers in one column and PURLs in another. Create persistent URLs and return a list of system numbers and… URLs – Each PDF file is given a PURL for the long haul. Output a delimited file containing system numbers in one column and PURLs in another. (Steps #2 and #3 are implemented with a number of Ruby scripts: batch_ingester.rb, book.rb, mint_purl.rb, purl_config.rb, purl.rb, repo_object.rb.)
  4. Update Filemaker database with URLs for quality assurance purposes – Use the PURLs from the previous step and update the local database so we can check the digitization process.
  5. Start quality assurance process and cook until done – Look at each PDF file making sure it has been digitized correctly and thoroughly. Return poorly digitized items back to the digitization process.
  6. Use system numbers to extract MARC records from Aleph – The file names of each original PDF document should be an Aleph system number. Use the list of numbers to get the associated bibliographic data from the integrated library system.
  7. Edit MARC records to include copyright information and URLs to PDF file – Update the bibliographic records using scripts called list-copyright.pl and update-marc.pl. The first script outputs a list of copyright information that is used as input for the second script which includes the copyright information as well as simply pointers to the PDF documents.
  8. Duplicate MARC records and edit them to create electronic resource records – Much of this work is done using MARCEdit
  9. Put newly edited records into Aleph test – Ingest the newly created records into a staging area.
  10. Check records for correctness – Given enough eyes, all bugs are shallow.
  11. Put newly edited records into Aleph production – Make the newly created records available to the public.
  12. Extract newly created MARC records with new system numbers – These numbers are needed for the concordance program — a way to link back from the concordance to the full bibliographic record.
  13. Update concordance database and texts – Use something like pdftotext to extract the OCR from the scanned PDF documents. Save the text files in a place where the concordance program can find them. Update the concordance’s database linking keys to bibliographic information as well as locations of the text files. All of this is done with a script called extract.pl.
  14. Create Aleph Sequential File to add concordance links – This script (marc2aleph.pl) will output something that can be used to update the bibliographic records with concordance URLs — an Aleph Sequential File.
  15. Run Sequential File to update MARC records with concordance link – This updates the bibliographic information accordingly.

Done, but I’m sure your milage will vary.

Digital Scholarship Grilled Cheese Lunch

Posted on April 5, 2013 in Uncategorized by Eric Lease Morgan

Grilled Cheese Lunch Attendees

Grilled Cheese Lunch Attendees

In the Fall the Libraries will be opening a thing tentatively called The Hesburgh Center for Digital Scholarship. The purpose of the Center will be to facilitate learning, teaching, and research across campus through the use of digital technology.

For the past few months I have been visiting other centers across campus in order to learn what they do, and how we can work collaboratively with them. These centers included the Center for Social Research, the Center for Creative Computing, the Center for Research Computing, the Kaneb Center, Academic Technologies, as well as a number of computer lab/classroom. Since we all have more things in common than differences, I recently tried to build a bit of community through a grilled cheese lunch. The event was an unqualified success, and pictured are some of the attendees.

Fun with conversation and food.

Editors across campus: A reverse travelogue

Posted on March 8, 2013 in Uncategorized by Eric Lease Morgan

Some attending editors

Some attending editors

On Friday, February 8 an ad hoc library group called The Willing sponsored a lunch for editors of serial titles from across campus, and this is all but the tiniest of “reverse travelogues” documenting this experience surrounding the scholarly communications process.

Professionally, I began to experience changes in the scholarly communications process almost twenty years ago when I learned how the cost of academic journals was increasing by as much as 5%-7% per year every year. With the advent of globally networked computers, the scholarly communications process is now effecting academics more directly.

In an effort to raise the awareness of the issues and provide a forum for discussing them, The Willing first compiled a list of academic journals whose editors were employed by the University. There are/were about sixty journals. Being good librarians, we subdivided these journals into smaller piles based on various characteristics. We then invited subsets of the journal editors to a lunch to discuss common problems and solutions.

The lunch was attended by sixteen people, and they were from all over the campus wearing the widest variety of hats. Humanists, scientists, and social scientists. Undergraduate students, junior faculty, staff, senior faculty. Each of us, including myself, had a lot to say about our individual experiences. We barely got around the room with our introductions in the allotted hour. Despite this fact, a number of common themes — listed below in more or less priority order — became readily apparent:

  • facilitating the peer-review process
  • going digital
  • understanding open access publishing models
  • garnering University support
  • balancing copyrights (often called “ownership” by attendees)
  • being financially sustainable
  • combatting plagiarism
  • facilitating community building around and commenting on journal content
  • soliciting submissions

With such a wide variety of topics it was difficult to have a focused discussion on any one of them in the given time and allow everybody to express their most important concerns. Consequently it was decided by the group to select individual themes and sponsor additional get togethers whose purpose will be to discuss the selected theme and only the selected theme. We will see what we can do.

Appreciation goes to The Willing (Kenneth Kinslow, Parker Ladwig, Collette Mak, Cheryl Smith, Lisa Welty, Marsha Stevenson, and myself) as well as all the attending editors. “Thanks! It could not have happened without you.

Editors Across The Campus

Posted on January 18, 2013 in Uncategorized by Eric Lease Morgan

gratuitous “eye candy” by Matisse


In an effort to make life easier for people who edit serial literature here at Notre Dame, we are organizing an informal lunch called Editors Across The Campus. We hope you can join us:

  • Who: Anybody and everybody who edits a journal here at Notre Dame
  • What: An informal lunch and opportunity for discussion
  • When: 11:45 to no later than 1 o’clock, Friday, February 8
  • Where: Room 248 of the Hesburgh Libraries
  • Why: Because we all have something to learn from each other

Here at the University quite a number of journals, magazines, and various other types of serial literature are edited by local faculty, students, and staff; based on our investigations there are more than one hundred editors who have their hands in more than sixty serial titles.

Bringing editors together from across campus will build community. It will foster the creation of a support network. It will also make it easier for people interested in scholarly communication to hear, learn, and prioritize issues and challenges facing editors. Once these issues are identified and possibly prioritized, then plans can be made to address the issues effectively. Thus, the purpose of the lunch/discussion combination is to begin to share “war stories” in the hopes of at least finding some common ground. Issues and challenges might include but are certainly not limited to:

  • balancing the costs of publication
  • dealing with copyright issues
  • decisioning between electronic and paper-based distribution
  • determining the feasibility of open access publishing
  • finding and identifying qualified authors
  • finding and identifying qualified publishers
  • finding and identifying qualified reviewers
  • implementing a searchable/browsable archive of previous content
  • increasing impact factors
  • increasing readership
  • learning how to use computer technology to manage workflows
  • moving from one publisher to another

We sincerely believe we all have more things in common than differences. If you are an editor or someone who is keenly interested in the scholarly communications process, then drop us a line (Eric Lease Morgan <emorgan@nd.edu>, 631-8604), come to the lunch, and participate in the discussion. We hope to see you there.

A couple of Open Access Week events

Posted on November 17, 2012 in Uncategorized by Eric Lease Morgan

A couple of Open Access Week events were sponsored here at Notre Dame on October 31, and this posting summarizes my experiences.

willing
Many of The Willing plus Nick Shockey and José E. Limón

Morning session

In the morning there was a presentation to library faculty by Nick Shockey (SPARC), specifically on the process of increasing open access publishing, and he outlined five different tactics:

  1. Simple advocacy – Describing what open access publishing is and its philosophical advantages. Unfortunately this approach does not always resonate with the practicalities of everyday promotion and tenure processes.
  2. Education – This goes hand-in-hand with advocacy but may also include how open access has more things in common with traditional publishing than differences. For example, Shockey pointed out the increasing number of mandates from funders to have the results of research funded by them become available via open access. Another success factor in education involves getting a deep level of understanding in faculty. Once this is done resistance is much lower.
  3. Engage scholarly societies – For example, ask the society to open up their back log of published materials as open access materials.
  4. Educate friends and colleagues – We have to understand that not everybody sees the whole problem. There is the perspective of the author, the publisher, and librarian. Each are needed in the scholarly communications process, yet not everybody understands the issues of the other completely. Build relationships between all three of these communities. He also advocated educating students because they can be a catalyst to change.
  5. Make your work open access – This means know your rights, keep your rights, and use your rights. The process is increasingly negotiable.

Finally, Shockey insisted on engaging authors on very real world problems instead of the philosophical issues such as expanding the sphere of knowledge. “Look for and point out tangible benefits of open access including higher citation counts, wider distribution, and the possibility of massive textual analysis.”

Afternoon session

The afternoon session was co-presented by Nick Shockey and José E. Limón. The topic was authors’ rights.

Shockey began by outlining the origination of scholarly journals and how they were originally non-profit enterprises. But as time went on and the publishing increasingly became profit-based, a question needed to be asked, “How well does this new model really serve the people for whom it is needed?” When the prices of some chemistry journals approach $4,200/year, there has got to be a better way.

Knowing author’s rights can help. For example, knowing, understanding, and acting upon the self-archiving rights associated with many journals now-a-days, it is possible to make available versions of published materials in a much wider fashion than ever before, but it does require some extra work — systematic extra work that could be done by libraries.

Shockey also advocated contractual amendments like the one called the Scholar’s Copyright Addendum Engine [1]. Complete the form with our name, title, and journal. Click the button. Print the form. Sign and send away to the publisher while retaining many of one’s rights automatically.

Finally, Shockey advocated university-wide institutional policies for retaining authors’ rights. “These policies create a broader and wider audiences which are not limited and offer greater visibility.”

José E. Limón (American Studies at the University of Notre Dame) began by confessing the idea of authors’ right has been rather foreign to him, and at the same time the ante is going up in terms of tenure and promotion. No longer is is about publishing a single book. Consequently he believes his knowledge regarding authors’ rights needs to be increased.

Limón went on to regale a personal story about authors’ rights. It began when he discovered an unpublished manuscript at Texas A&M University. It was a novel coauthored by Jovita González and Margaret Eimer which he edited and eventually published under the title of Caballero. Written in the 1930s, this historical novel is set during the Mexican American War and is sometimes called Texas’s Gone with the Wind. After the book was published Limón was approached by Steven Spielberg’s company about movie rights, but after a bit of investigation he discovered he had no rights to the book, but rather the rights remained with Texas A&M. To many in the audience, the story was a bit alarming.

In the end, he had one thing to say, “Academics just do not know.”

Kudos

Kudos to Nick Shockey and José E. Limón for sharing some of their experiences. “Thank you!” Thanks also go to the ad hoc group in the Hesburgh Libraries who call themselves “The Willing” (Kenneth Kinslow, Parker Ladwig, Collette Mak, Cheryl Smith, Marsha Stevenson, Lisa Welty, and Eric Lease Morgan). Without their help none of this would have happend.

New Media From the Middle Ages To The Digital Age

Posted on November 7, 2012 in Uncategorized by Eric Lease Morgan

new and old teaching toolsI attended an interesting lecture yesterday from a series called New Media From the Middle Ages to the Digital Age, and here are a few of my take-aways.

Peter Holland (Film, Television, and Theatre) began by giving an overview of his academic career. He noted how his technology of the time was a portable typewriter. He then went on to compare and contrast scholarship then and now. From what I could tell, he did not think there was a significant difference, with the exception of one thing — the role and definition of community. In the past community meant going to conferences and writing letters every once in a while. Now-a-days, conferences are still important, letters have been replaced by email, but things like mailing lists play a much larger role in community. This sort of technology has made it possible to communicate with a much wider audience much faster than in previously times. The SHAKSPER mailing was his best example.

The next presentation was by Elliott Visconsi (English). While the foundation of his presentation surrounded his The Tempest for iPad project, he was really focused on how technology can be used to enhance learning, teaching, and research. He believed portable Web apps represent a convergence of new and old technologies. I believe he called them “magic books”. One of his best examples is how the application can support dynamic and multiple commentaries on particular passages as well as dynamic and different ways speeches can be vocalized. This, combined with social media, give Web applications some distinct advantages over traditional pedagogical approaches.

From my point of view, both approaches have their distinct advantages and disadvantages. Traditional teaching and learning tolls are less fragile — less mutable. But at the same time they rely very much on the work of a single individual. On the the other hand, the use of new technology is expensive to create and keep up-to-date while offering a richer learning experience that is easier to use in groups. “Two heads are better than one.”

So many editors!

Posted on September 22, 2012 in Uncategorized by Eric Lease Morgan

There are so many editors of serial content here at the University of Notre Dame!

In a previous posting I listed the titles of serials content with editors here at Notre Dame. I identified about fifty-nine titles. I then read more about each serial title and created a sub-list of editors which resulted in about 113 names. The original idea was to gather as many of the editors together as possible and facilitate a discussion on scholarly communication, but alas, the number of 113 people is far too many for a chat.

Being a good librarian, I commenced to classify my list of serials hoping to create smaller, more cohesive groups of people. I used facets such as student-run, peer-reviewed, open access, journal (as opposed to blog), and subjects. This being done I was able to create subsets of the titles with much more manageable numbers of editors. For example:

  • 15 science publications (19 editors)
  • 10 student-run publications (24 editors)
  • 12 open access publications (26 editors)
  • 17 humanities publications (41 editors)
  • 31 peer-reviewed publications (43 editors)
  • 26 social science publications (50 editors)
  • 28 published here at Notre Dame (56 editors)

One of our goals here in the Libraries to play a role in the local scholarly communication process. Exactly what that role entails is yet to be determined. Bringing together editors from across campus could build community. It could also make it easier for us to hear, learn, and prioritize issues facing editors. Once we know what those issues are, we might be able to figure out a role for ourselves. Maybe there isn’t a role. On the other hand, maybe there is something significant we can do.

The next step is to figure out whether or not to bring subsets of these editors together, and if so, then how. We’ll see what happens.

Yet more about HathiTrust items

Posted on September 14, 2012 in Uncategorized by Eric Lease Morgan

This directory includes the files necessary to determine what downloadable public domain items in the HathiTrust are also in the Notre Dame collection.

In previous postings I described some investigations regarding HathiTrust and Notre Dame collections. [1, 2, 3] Just yesterday I got back from a HathiTrust meeting and learned that even the Google digitized items in the public domain are not really downloadable without signing some sort of contract.

Consequently, I downloaded a very large list of 100% downloadable public domain items from the HathiTrust (pd.xml). I then extracted the identifiers from the list using a stylesheet (pd.xsl). The result is pd.txt. Starting with my local MARC records created from the blog postings (nd.marc), I wrote a Perl script (nd.pl) to extract all the identifiers (nd.txt). Lastly, I computed the intersection of the two lists using a second Perl script (compare.pl) resulting in a third text file (both.txt). The result is a list of public domain items in the HathiTrust as well as in the collection here at Notre Dame as well as require no disambiguation because the item has not been digitized more than once. (“Confused yet?”)

It is now possible to download the entire digitized book through the HathiTrust Data API via a Web form. [4] Or you can use something like the following URL:

http://babel.hathitrust.org/cgi/htd/aggregate/<ID>

where <ID> is a HathiTrust identifier. For example:

http://babel.hathitrust.org/cgi/htd/aggregate/mdp.39015003700393

Of the about 20,000 items previously “freely” available, it seems that there are now just more than 2,000. In other words, about 18,000 of the items I previously thought were freely available for our catalog are not really “free” but instead permissions still need to be garnered in order to get these free items.

I swear we are presently creating a Digital Dark Age!

Links

  1. http://blogs.nd.edu/emorgan/2012/08/hathitrust/
  2. http://blogs.nd.edu/emorgan/2012/08/hathitrust-continued/
  3. http://blogs.nd.edu/emorgan/2012/08/hathi-epilogue/
  4. https://babel.hathitrust.org/shcgi/htdc

Serial publications with editors at Notre Dame

Posted on August 22, 2012 in Uncategorized by Eric Lease Morgan

This is a list of serial publications (journals, yearbooks, magazines, newsletters, etc.) whose editorial board includes at least one person from the University of Notre Dame. This is not a complete list, and if you know of other titles, then please drop me a line:

  1. Actroterion
  2. American Journal Of Jurisprudence
  3. American Midland Naturalist
  4. American Political Thought
  5. Analyst
  6. Analytical Chemistry
  7. Applied Preventative Psychology
  8. Attention, Perception, & Psychophysics
  9. Behavior Genetics
  10. Beyond Politics
  11. Biomicrofluidics
  12. Bulletin De Philosphie Mediévalé
  13. Cognitive Psychology
  14. Conflict Resolution
  15. Current Drug Targets
  16. Faith And Philosophy
  17. International Yearbook Of German Idealism
  18. Journal of Autism and Developmental Disorders
  19. Journal Of Biblical Literature
  20. Journal of Chemical & Engineering Data
  21. Journal Of College and University Law
  22. Journal of Experimental Psychology: Learning, Memory & Cognition
  23. Journal of Hindu-Christian Studies
  24. Journal Of Legislation
  25. Journal Of Modern Russian History and Historiography
  26. Journal of Moral Education
  27. Journal of Multivariate Analysis
  28. Journal of Organic Chemistry
  29. Journal Of Philosophical Research
  30. Journal of Physical Chemistry A
  31. Journal of Physical Chemistry Letters
  32. Journal Of Religion and Literature
  33. Journal Of Undergraduate Research
  34. Kellogg Institute Working Papers
  35. Mobilization
  36. Mobilizing Ideas
  37. Multivariate Behavioral Research
  38. Nineteenth Century Context
  39. Notre Dame Journal of Formal Logic
  40. Notre Dame Journal of International & Comparative Law
  41. Notre Dame Journal Of Law, Ethics, And Public Policy
  42. Notre Dame Law Review
  43. Notre Dame Philosophical Reviews
  44. Notre Dame Review
  45. Psychological Methods
  46. Quarterly Journal Of Experimental Psychology
  47. Re: Visions
  48. Reilly Center Reports
  49. Rethinking Marxism
  50. Review Of Politics
  51. Scholastic
  52. Scientia
  53. Sociological Voices
  54. Studies in History and Philosophy of Science
  55. Sullivan Prize series
  56. The Bend
  57. The Hub
  58. The Juggler
  59. Through Gendered Lenses
  60. William And Katherine Devers Series in Dante Studies

This is a list titles that may or may not have had an editor from Notre Dame at one time, but to the best of my ability I could not find one.

  1. Catholic Education
  2. Comparative Politics Newsletter
  3. International Journal Of Ecology
  4. International Journal Of Industrial Organization
  5. Medieval Philosophy And Theology
  6. Memory And Cognition

Again, is not necessarily a complete list, and if know of other titles, then please drop me a line.

Last updated: October 1, 2012

Exploiting the content of the HathiTrust, epilogue

Posted on August 14, 2012 in Uncategorized by Eric Lease Morgan

This blog posting simply points to a browsable and downloadable set of MARC records describing a set of books in both in the HathiTrust as well as the Hesburgh Libraries at the University of Notre Dame.

In a previous blog posting I described how I downloaded about 25,000 MARC records that:

  1. were denoted as in the public domain
  2. described books publish prior to 1924
  3. were denoted as a part of the Hesburgh Libraries at the University of Notre Dame
  4. were denoted as a part of the HathiTrust
  5. had a one-to-one correspondance between OCLC number and digitized item

This list of MARC records is not nor was not intended to be a comprehensive list of overlapping materials between the Hesburgh Libraries collection and the HathiTrust. Instead, this list is intended to be a set of unambiguous sample data allowing us to import and assimilate HathiTrust records into our library catalog and/or “discovery system” on an experimental basis.

The browsable interface is rudimentary. Simply point your browser to the interface and a list of ten randomly selected titles from the MARC record set will be displayed. Each title will be associated with the date of publication and three links. The first link points to the HathiTrust catalog record where you will be able to read/view the item’s bibliographic data. The second link points to the digitized version of the item complete with its searching/browsing interface. Third and final link queries OCLC for libraries owning the print version of the item. This last link is here to prove that the item is owned by the Hesburgh Libraries.

Screen shot of browsable interface

For a good time, you can also download the MARC records as a batch.

Finally, why did I create this interface? Because people will want to get a feel for the items in question before the items’ descriptions and/or URLs become integrated into our local system(s). Creating a browsable interface seemed to be one of the easier ways I could accomplish that goal.

Fun with MARC records, the HathiTrust, and application programmer interfaces.