ODNP: What Are We Working On?
Posted on
Color photographs of stacks of newspapers.
Wikimedia Commons, File:Newspaper_20250330_120826.jpg. Ka23 13, CC BY 4.0.

As the program manager for the Oregon Digital Newspaper Program (ODNP), I’m often asked what newspaper titles we’re currently working on. We are busy!

Sandy, Newberg, Eugene, Coos Bay, Fossil, Monroe, Junction City, Eagle Point, Lakeview, Portland — we’ll be digitizing newspapers from these and other places in 2026.

We currently have 13 digitization projects in progress, with another 11 confirmed projects in the queue for FY26. Also underway is a large, federally funded project to digitize 100,000 pages of about 20 titles which will go into both the Library of Congress’s Chronicling America and our own Historic Oregon Newspapers.

Visit our website for a complete list of projects in the ODNP digitization queue.

Querying Oregon Digital RDF*, Part 2
Posted on

*And some other RDF, too

Last year around this time I published a blog post about querying RDF digital-collection metadata description sets from Oregon Digital, a digital collections repository run by Oregon State University and the University of Oregon. It was fun to write and, as a relatively new Oregon Digital team member, it helped me gain a basic understanding of Oregon Digital metadata expressed as Resource Description Framework (RDF) triples [RDF] (we don’t use RDF in any production workflows right now, and one could work full-time loading and remediating metadata without ever looking at it).

The queries I used for that post don’t do anything that couldn’t be done some other way, and often much more quickly and easily, like by using our Solr search index. So, I wanted to return to the topic and show some queries which do a bit more to demonstrate why we go to the trouble of recording URIs in metadata description sets and configuring predicate URIs for metadata fields in the first place—you know, linked-data stuff, like aggregating metadata about the same resources from different data stores! Looking back at last year’s post I realize I said, as if to set the stage for further exploration:

I think [Samvera] Hyrax’s implementation of elements of RDF and the ability to record URIs in metadata descriptions have benefits for both users and administrators!

Tools

I would like to use only implementations of RDF standards, like a SPARQL query processor [SPARQL], to do accomplish these tasks. To my mind one benefit of implementing RDF technologies for creating, storing, and/or serving metadata is relying more on the related standards and spending less time on tool- or product-specific details. (Of course, I’ve spoken to library software developers who point out drawbacks of using RDF for some of these purposes, too.) Another benefit, as I mentioned above, is the ability to aggregate resource descriptions from different sources.

But between the absence of a SPARQL endpoint for some data (including Oregon Digital, where RDF is currently only available for download by administrators) and my novice-level skill with SPARQL and programming, it was helpful for me to use Python tools [PYTHON] to process SPARQL queries on data stored in my computer, send queries to an endpoint where available, and pass results from one set of queries to the next. I ran my code in a Jupyter Notebook (.ipynb) file [JUPYTER].

Data

The goal for this demonstration was to gather data of interest which could be used to create something like a browse-by-topic interface for the collection. Note that I didn’t build anything with the data I collected (I resisted temptation to go down a rabbit hole and begin learning Python web templating). I’m currently working with subject specialists here to expand our Gertrude Bass Warner Collection of Japanese Votive Slips (nōsatsu) 1850s to 1930s, so I selected Library of Congress Subject Headings present in this collection as a test set for RDF data aggregation.

Queries and some details

This first query of collection metadata yielded the LCSH URIs recorded as subjects, and for each, the number of times it occurs and the persistent identifier and model information for each resource where it appears in metadata, which can be used to construct display-page URLs [ODRDF]. Running this query in Python code, I narrowed down the results (arbitrarily, to make managing my data a little easier) to only those headings recorded between 10 and 100 times in the collection.

g = rdflib.Graph().parse("gb-warner-nosatsu_0.nt")
# your RDF file in any serialization that can be parsed by rdflib goes here
# see details at https://rdflib.readthedocs.io/en/stable/plugins/#plugin-parsers
data = {}
with open("odrdf.rq", "r") as query1:
    result = g.query(query1.read())
    for row in result:
        if int(row.lcshCount) > 9 and int(row.lcshCount) <= 100:
            data.update({row.lcsh: {
                "count": row.lcshCount,
                "odworks": str(row.odWorks).split("|")
            }})

Having a list of Library of Congress Subject Headings (LCSH), I next retrieved and queried data from the Library of Congress Linked Data Service. This query retrieves human-readable labels for each heading, and where they are available, Wikidata items and subject headings from the National Diet Library (NDL) of Japan which have been mapped as equivalent.

The syntax here and for the query that follows is slightly different from that which will be executed. It’s not actually SPARQL—it would raise an error if passed to a SPARQL endpoint as-is, and I probably shouldn’t really use the .rq file extension here—because it includes some Python string accommodations that allow for passing LCSH and Wikidata URIs into the query strings.

for iri in data:
    response = requests.get(iri, headers=headers)
    g = rdflib.Graph().parse(data=response.text, format="xml")
    with open("lcsh.rq", "r") as rqfile:
        result = g.query(rqfile.read().format(iri, iri, iri)) # passing in URI here

I think the retrieved NDL subject headings have potential to connect to even more data of interest for users of this collection, but they are included as a bit of a placeholder for now, as I haven’t had the opportunity to dive into the documentation on Web NDL Authorities and do more with them yet.

Next, I passed the list of Wikidata URIs to the Wikidata SPARQL endpoint for one final set of queries to gather more information. This yielded, where available, English- and Japanese- language labels for the mapped Wikidata items and URLs for articles from English- and Japanese-language Wikipedia covering these topics.

Results

The diagram below outlines the data I aggregated and the RDF property relationships I used to do so, and the JSON code snippet shows aggregated data for one example LCSH heading. This example heading ("Bonsai"@en) had relationships to everything I looked for, but results were varied in terms of which LCSH terms had been mapped to NDL subject headings and to Wikidata, and of those Wikidata entities, which had both English- and Japanese-language labels and corresponding English- and/or Japanese-language Wikipedia articles.

A diagram showing the RDF properties which link other RDF entities to subject headings present in Oregon Digital metadata.

This code is available in this GitHub Gist.

Notes

[RDF] The Resource Description Framework is a data model that “can be used to publish and interlink data on the Web,” according to the W3C RDF 1.1 Primer.

[SPARQL] SPARQL is a query language (and more!) for use with RDF data. As the SPARQL 1.1 Overview puts it, “SPARQL 1.1 is a set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store.”

[PYTHON] Alongside some modules from the Python standard library and the Requests HTTP library, I rely heavily here—and in general, for working with RDF—on the rdflib package. I also often use the rdflib-endpoint module to work on queries over local data in a GUI interface. Thank you to the developers of these and other open-source software tools!

[JUPYTER] See this very brief README for some information about running Jupyter notebooks and installing the rdflib and requests modules.

[ODRDF] The workflow here is written specifically for data coming out of our digital collections platform—for example, in odrdf.rq, I’m retrieving LCSH URIs recorded as values for triples with predicate dct:subject, but they might not be there in other datasets. Also, I’m getting the information I need to construct URLs (?odWorks) by picking out part of subject URI strings and combining these with other information. I don’t expect this pattern would work for other data. For a general-purpose query to pull and count LCSH URIs from any metadata description set where they are recorded as values, see simplelcsh.rq.

AMIA report back
Posted on

I was awarded a Professional and Organizational Development Fund (PODF) grant from the Library Grants and Awards Committee to attend the Association of Moving Image Archivists annual conference, which took place December 2–5 in Baltimore. I am grateful for the opportunity to spend a few inspiring days surrounded by smart people doing very cool work.

Before the main conference sessions began, I attended an all-day workshop: Film Restoration Essentials for Small Archives and Non-Profits taught by Fabio Paul Bedoya Huerta. This was a hands-on workshop covering post-preservation workflows on Davinci Resolve for film restoration to correct container defects such as stabilization, deflickering, clean up and recovery, grain management and color correction.

I spent a good amount of time with my BTAA AV digitization and preservation interest group co-chair from Ohio State. We attended the College & University Archives Interest Group as well as the Magnetic Media Crisis Committee (MMCC) and had a meeting with Iron Mountain about AV preservation storage. I networked with our regional colleagues from the Moving Image Preservation of Puget Sound (MIPoPS) and I was able to meet with our vendors from Aviary, George Blood and Colorlab with specific questions that are easier to ask and answer in a face-to-face setting.

presenter slideshow image of steps in predigitizing workflow for video tapes

I attended some thought provoking sessions including:
Mapping the Magnetic Media Landscape: Report from the National Survey
Looking Forward at the Virtual Film Bench Grant Project
Legal Brief: AI, Fair Use, and the Copyright Office
From Degralescence to Collective Action: Community-Driven Responses to the Magnetic Media Crisis
Advancing Digital Preservation Education in the U.S.: Results from DPOE-N
Preserving Local Broadcast History: Digitizing and Managing the WMAR-TV News Collection (1948-1993)
and Be Kind, Rewind: VHS Culture, Community, and the Archive

I was fortunate to hear the inspiring keynotes from Robert Newlen, Acting Librarian of Congress, and Carla Hayden, the recent outgoing Librarian of Congress. Dr. Haden spoke at Archival Screening night and received a long standing ovation before she even spoke which reminded me of the tradition at the Cannes Film Festival. I even attended a “funeral” for U-matic 3/4″ video which is now considered a dead format.

tape baking recipes over time

One particular topic I was hoping to research at AMIA was pre-digitization workflows to stabilize magnetic media and improve outcomes. Happily, some of the best information on this came from our regional neighbors at MIPoPS, and I look forward to learning more from them in the future. My main takeaway was the need to add baking and cleaning tapes before digitizing them to our workflow. While this will increase the time it takes to produce a file, it will help recover content that may already be damaged due to deteriorating media carriers. Thanks to colleagues for offering equipment recommendations and baking recipes—those casual, in-person exchanges are what make attending a conference like this so valuable.

DNA data capsule

A final note from the frontiers of science fiction, DNA encoded data storage. It’s real, it’s here, and it doesn’t make a lot of sense. Apparently this capsule can hold 50TB of data. The makers insist playback devices for this medium, DNA sequencers, will be available “forever”.

“A Rather Ambitious Microfilming Project”: A History of the Oregon (Digital) Newspaper Program 
Posted on

How did the UO Libraries come to have such a large and comprehensive collection of Oregon newspapers on microfilm? What did it take to preserve thousands of newspapers scattered around the state? Why does it still matter? And how does a 1952 Oldsmobile fit in?  

Learn more about the history of the Libraries’ newspaper microfilming program and how it provided the basis for the current Oregon Digital Newspaper Program. ODNP Program Manager Elizabeth Peterson tells the story on the ODNP blog. 

Newspaper article from the Oregon Daily Emerald titled "National Project Archives Oregon's Newspapers."
Oregon Daily Emerald, Feb. 19, 1998, p. 9
Querying Oregon Digital RDF, Part 1
Posted on

Oregon Digital is a digital asset management system which is shared jointly developed by staff at the Oregon State University Libraries and Press and the University of Oregon Libraries. It is built using the Samvera Hyrax digital repository framework, which was chosen (among other reasons) because of its support for Resource Description Framework (RDF) data. Uniform Resource Identifiers (URIs) can be recorded as values in Oregon Digital metadata descriptions, and collection metadata can be exported by administrators in the RDF N-Triples format. In this blog post I will share a look at the RDF metadata description sets which can be exported from Oregon Digital and share two SPARQL queries which can be run on this data.

A visualization of an RDF triple having a subject, directional predicate, and object
A visualization of an RDF triple having a subject, directional predicate, and object

You might ask why I’d use RDF and SPARQL when Solr queries can be run against all our metadata without any need to generate exports or manage individual RDF files for collections. I see the value of RDF and SPARQL in the ability to make use of other data sources. RDF – a foundational model for sharing linked open data – makes use of unambiguous identifiers for resources of interest, and these resources can be shared and reused across the web. So, for example, SPARQL queries can be run against Oregon Digital RDF and information in other linked open data repositories using federated queries, a powerful extension of the SPARQL query language.

Very brief technical information

The queries shown below use the SPARQL 1.1 query language. All of the queries and other code snippets shown here, as well as a Jupyter notebook which can be used to run the queries, are available in this GitHub Gist. To run queries on RDF data, a SPARQL endpoint or query interface is needed. The GitHub Gist where these queries are shared includes Python code for running these queries against the data, because this is the method I use. The materials don’t provide a detailed tutorial on setting up the software needed to use SPARQL for querying RDF data, but they do include some information and links to additional resources.

Looking closely at the data

Even though I can include URIs in Oregon Digital metadata and export metadata as RDF, I don’t describe it as linked open data for a few reasons:

  • Subject URIs in are not persistent or dereferenceable—I expect subject URIs to look different in RDF exports from other Hyrax instances where they are available, and Oregon Digital subject URIs (and other aspects of the RDF) will change in the future when Oregon Digital data storage changes
  • It isn’t possible to language-tag text values in our Samvera Hyrax instance, and there are no language tags in exported RDF
  • Oregon Digital RDF isn’t currently available to all users

Another interesting point for me—someone still relatively new to the Samvera Hyrax user community—is that RDF exported from Oregon Digital seems very “noisy” from the perspective of an end-user interested in descriptive metadata. Many or most of the triples in each description set aren’t descriptive—technical-administrative metadata takes up a lot of space. For example, each resource is classed in six distinct ways (that is, has six distinct values for the rdf:type predicate), as can be seen in the snippet of Oregon Digital RDF available in the online materials—see od_rdf_excerpt.ttl. To be clear, despite this, I think Hyrax’s implementation of elements of RDF and the ability to record URIs in metadata descriptions have benefits for both users and administrators!

Getting a useful subset of the metadata…

When I’m investigating RDF from a source I’ve never used before, I often download some data in Turtle serialization just to look at it in a text editor. Sometimes I need to convert whatever serialization is available to Turtle—which is much easier for humans to read—myself, but the Python rdflib library and many other tools make this easy to do.

I knew from looking at the data that a portion of subject URIs—the nine-character persistent identifier (PID)—is present in the URL for viewing the object and metadata in a web browser. Viewing metadata in the browser helps me understand how metadata descriptions look and function for a user, so this seems like a valuable piece of information to create from the RDF. The following query can be run against RDF for a collection to yield a title, PID, and detailed web view URL for each object in it.

title_pid_showpage.rq

Details

For this to function with RDF coming from a different Samvera Hyrax instance it would be necessary to change at least two components:

  • The regular expression used as the second argument for the REPLACE function has been written to match the structure of subject URIs coming from our instance, and would need to be changed
  • The use of an object-model name (?model in this query) in web view URLs is expected to be common across Hyrax instances, but the location of this data in exported RDF may vary; here it appears as the value of the property with URI info:fedora/fedora-system:def/model#hasModel

…for resources of interest

Now that I have the query syntax needed to distill a PID and create a web view URL for RDF resources, I can add some search terms to retrieve this information for the objects I’m interested in. In this query, I use the SPARQL UNION keyword to retrieve objects for which metadata contains either a particular string, or one of two subject URIs.

match_on_string_or_uri.rq

Details

This query doesn’t come close to taking full advantage of SPARQL’s regex function, but this will match on “quilt” or “quilting” and the “i” flag allows matching regardless of capitalization.

What Are Digital Library Services?
Posted on

Welcome to the University of Oregon Digital Library Services blog

This inaugural post describes some of what we do—a lot of different things! We hope you’ll return to learn more in future posts as we dive into different facets of digital libraries in greater detail.

The University of Oregon Libraries Digital Library Services (DLS) department was formed in 2023 as part of a strategic design process at the University of Oregon (UO) Libraries and brings together staff with expertise in multiple areas of digital stewardship to create, provide access to, and preserve digital objects to support learning at UO, in Oregon, and around the world. Our areas of specialization include digitization, digital object curation and management, project management, metadata, accessibility, discoverability, semantic web technologies, information architecture, user experience, digital preservation, copyright and intellectual property, and more. You can see the products of our work in various platforms on the web, whether you are a student or researcher at UO or halfway around the world.

A view of the Knight Library from the Memorial Quad, showing green trees and people sitting and walking in the quad.
UO Stock Photos, University of Oregon. “Facing Knight Library” Oregon Digital. \\ https://oregondigital.org/concern/images/df70hp06d \\ https://creativecommons.org/licenses/by-nc-nd/4.0/

Oregon Digital

Oregon Digital is a collaboration between the University of Oregon Libraries and Oregon State University Libraries and Press. Comprising more than 500,000 digitized and born-digital objects, this platform is a resource for scholarship and learning in diverse fields. One of the biggest drivers of growth for the UO collections in Oregon Digital is user requests for digitization of materials in the UO Libraries Special Collections and University Archives.

Historic Oregon Newspapers

Historic Oregon Newspapers is a free online database where you can search and access the complete content of over 2.4 million pages from 375 Oregon newspapers published 1846 through the present. These newspapers include titles published in cities and small towns from the Oregon coast to the eastern border, at high schools and colleges, and by and for African Americans, Native Americans, migrant workers, labor unions, and many other Oregon communities, and we are continually adding new titles.

The historic newspapers are digitized through the Oregon Digital Newspaper Program, which also provides access to materials through a born-digital preservation program. More than 30 publishers participate in this program at present, and it has made nearly 200,000 pages dating from 2015 available online.

The Oregon Digital Newspaper Program is the newspaper digitization and digital preservation program for the state of Oregon. Based here at the University of Oregon Libraries in Eugene, the program has been digitizing Oregon newspapers since 2009 after receiving funding from the National Endowment for the Humanities and the Library of Congress, and an in-house, self-sustaining digitization program was created in 2015 with funding from the State Library of Oregon. We work with partners across the state including public libraries, historical societies, museums, local groups, and private donors to digitize content with support from private, local, and state-level funding sources.

Over the last five years, our digital newspapers program has garnered over 10 million page views from users around the state, country, and world, completed 65 digitization projects, raised over $115,000 in funding with partners, and added over 80 titles to the Historic Oregon Newspapers platform.

Scholars’ Bank

Scholars’ Bank is an open-access digital archive for the scholarly and creative output of the University of Oregon community. Its mission is to preserve and disseminate the intellectual and creative output of UO’s faculty, staff, and students. UO faculty, staff, and graduate students can deposit published and unpublished research and other scholarly output in Scholars’ Bank, and departments can distribute working papers, technical reports, and other research material. If you’d like to share your work, please contact us.

Digital Exhibits

DLS also supports the creation of digital exhibits showcasing unique collections at the University of Oregon, with past support for exhibits funded by the Andrew W. Mellon Foundation and showcasing items from the UO Libraries and the Jordan Schnitzer Museum of Art. Current DLS exhibits focus on fields including sports, religious history, and labor history. We currently use the open-source Spotlight software platform for digital exhibits.

Digital Production

Our digitization services unit creates high-quality digital versions for a wide variety of library resources. These digital assets are published online in platforms like those described here and preserved securely for the long term. In fiscal year 2024, this unit generated more than 50,000 digital images from 56 archival collections and 72 bound volumes, growing collections in Oregon Digital as well as Scholars’ Bank, for which they digitized 49 theses and dissertations which had only been available in print. Many of the digital resources were also directly provided to more than 150 UO Libraries patrons, fulfilling requests for remote access to archival materials. Digitized content is created in compliance with the Federal Agencies Digital Guidelines Initiative four-star performance level and the Library of Congress Recommended Formats Statement.


Thank you to the following people who contributed to this post:

  • Catherine Flynn-Purvis, Scholar’s Bank Manager
  • Elizabeth Peterson, Digital Collections Librarian and Program Manager, Oregon Digital Newspaper Program
  • Emily Young, Digitization Specialist
  • Julia Simic, Director, Digital Library Services