Gretchen Gueguen on the DPLA Archival Description White Paper

As a Pennsylvania native (Kittanning, PA), educated by our flagship institution (Penn State), and current resident (Erie, PA), I can’t help but be a little partial to the PA Digital Hub. So I am delighted to have the opportunity to discuss collections, description, and access at DPLA on this blog, and to particularly talk about Aggregating and Representing Collections in the Digital Public Library of America.

Photo on 1-20-17 at 11.51 AM #2

I am the Data Services Coordinator at DPLA, which means that I am in charge of managing our data aggregation services. This involves working with our Content and Service Hubs in their early stages to ensure that their data will be interoperable with our data set and then managing the quality review and remediation process for the first data ingest. After that, I work closely with the technology team to schedule re-ingest and maintenance of the data. I have several ongoing initiatives to analyze and improve our overall data quality that I’ll be working on (with the help of our Hubs) over the coming months. There’s always more improvement to be made, which is a great challenge!

Putting together the Archival Description Working Group

Throughout 2015 several themes kept popping up in questions and conversations at DPLA: How do we communicate the context of materials that are closely related to each other? How do we take advantage of information that is created about collections while still keeping DPLA a library of digital materials only? How can materials that come from different descriptive traditions (libraries, archives, museums) be reconciled?

DPLA had always had metadata fields for collection name and description in our item-level records, but at that point it had never been tightly controlled. We focused far more on the basic item descriptors (title, subjects, description) and let institutions do what they wished with collection. Since “collection” is actually a term with a lot of different meanings, we ended up with a hodge-podge of data that could at times seem misleading or redundant. Some Hubs organized all the content from each partner into an institution-based collection, others had very broad subject- or format-based collections, while still others had very specific provenance-based collections. In short, we felt that the collection data was not ready for prime-time, so while it was retained and indexed in the record for searching, it was not featured in the website version of the metadata record.

In late 2015, DPLA also decided that in order to continue to collaborate with the community effectively the time had come to move on from the very broad open committees that had been created during the planning stages of the DPLA(?) to a series of more focused working groups that could help solve specific issues. One of the first of these working groups we decided to put together was one related to  the issues of collections data, specifically that created in archival description — hence, the Archival Description Working Group. While archives were the initial focus, the work of the group ended up being a more all-encompassing analysis of collections in DPLA regardless of the type of originating institution.

The recommendations the working group came up with were published as a whitepaper in 2016. They addressed five areas:

  • Recommendations for representing objects (item vs. aggregate)
  • Recommendations for relationship of object to collection
  • Recommendations for creating and sharing collection data
  • Recommendations for user interface
  • Recommendations for process

The Methodology of the Working Group

An important first step the group tackled was to define what we meant by the word “collection” in our context. From the whitepaper:

The term is used loosely by the working group to mean any intentionally created grouping of materials. This could include, but is not limited to: materials that are related by provenance, and described and managed using archival control; materials intentionally assembled based on a theme, era, etc.; and groupings of materials gathered together to showcase a particular topic (e.g., digital exhibits or DPLA primary source sets). Not included in this definition are assemblages of digital objects that are not the result of some sort of intentional selection. For example, all of the objects that are exposed to DPLA by a particular institution would, generally speaking, not be a collection in this sense. All of the digital objects that belong to a specific type or form/genre – maps, for instance – would also not be a collection in the context of this white paper.

In order to develop our recommendations we took a three-phased approached. First, we did research. We read as many reports and articles we could find on combining materials described at item- and collection- or aggregate-level and we reviewed several digital library sites that did something innovative in this area. After the research phase, we synthesized our findings and created a list of user-based scenarios that we thought DPLA should support:

  1. It should be apparent to users when they find an item/s that these materials are part of a collection if appropriate.
  2. Users should know as soon as they search that items are part of collections and should be able to act on that knowledge.
  3. Users should be able to refine and limit their searches by membership in collections.
  4. Users should understand when objects are described using a traditional component-level archival-style descriptions, i.e., one object that represents many items.
  5. Users should be presented with appropriate metadata for objects, and this level of metadata and context may not be the same for all objects and collections. This could result in many items with the same description.
  6. Users may be presented with information that helps them makes sense of where the item belongs within a collection if the collection structure or arrangement is meaningful.
  7. Collection/context information applies to different types of collections including exhibitions and primary source sets.
  8. Users should be able to go to DPLA and find a collection that interests them without doing an item search.
  9. Users should be able to find similar materials related to a retrieved item by their membership in the same collection.

We then used the scenarios to guide us through the process of making recommendations for changes to DPLA metadata, workflow, and interface.

Recommendations for item and aggregate objects

Rather than write at length here about each of the areas of recommendations, I’d like to just address one of the areas of biggest discussion: Recommendations for representing objects (item vs. aggregate).

The question that drove this discussion was how data created about materials in the aggregate can be used in DPLA. A prime example of this kind of data is a folder-level description that an archive might create. In the past decade in particular this kind of practice has increased in the archival community, largely inspired by the landmark publication of More Product, Less Process. DPLA has increasingly gotten records from contributing institutions that reflect things like folders of materials rather than the individual items within them. In the archival community the finding aid, which contextualizes and describes an entire collection is the norm. However, DPLA doesn’t just serve archives. We have a huge collection of books, films, reports, journals, etc., all of which are individual items. Our searching, indexing, and presentation designs are all based on the idea that each record corresponds to a single individual object. Since, DPLA can’t just adopt an archival, collection-based description approach, the working group focused part of their efforts of thinking through how aggregate-level descriptions could be combined with the existing item-level paradigm.

The two solutions usually adopted when faced with the question of how to translate metadata for a folder to DPLA were basically either to create one description that described a bunch of items in the aggregate, or to create a bunch of really minimal records for each item reusing similar data in each one. Either of these approaches might be best in particular situations. For example, in specific cases of unique visual materials an institution may want to opt for lots of individual items with minimal metadata. Even though the metadata is similar for each, this would allow the visual material to be discretely findable.

On the other hand, the search experience of seeing record after record for textual materials with indistinct images that are virtually identical would not suit the majority of those types of materials. In this case, the experience of finding a basic, high-level description of a folder and then following the link back to the originating institution to examine the materials in depth seemed to be the best fit.

The working group actually doesn’t want to recommend one style of description over another. Instead we want to work with the kinds of descriptive practices that professionals are already using. We want DPLA to fit into the accepted professional practice, not create yet another new approach that may or may not be adopted. We think that having an infrastructure that can be flexible enough to encompass aggregated objects and item level objects, while also communicating relationships between materials will serve the user scenarios we came up with best. Furthermore, we wanted to rely on the judgement of the librarians, archivists, and other professional on the best way to describe and provide access to their own materials rather than dictate something to them.

In both of these types of description though, the working group members agreed that the addition of collection titles, descriptions, and links back to collection-level descriptions or home pages at the original institution would help greatly in communicating what these objects are to their audience. The other sections of the whitepaper go into detail on how that collection-level information can be gathered, stored, and displayed effectively in DPLA. Combined with a flexible approach to item description described above, the working group felt that these changes would best achieve the goals of the user scenarios.

Recommendations have context too

It’s important to remember that this and the other recommendations in the whitepaper are for DPLA, in other words, they pertain to the handling of collection and aggregate-object metadata in a heterogeneous, large-scale aggregated environment. They should not be read as recommendations for every cultural heritage institution everywhere. Those submitting data to DPLA would need to publish it in a way that we could use, but within their own context, their own repository or website, it may be best served by being put up another way. I would encourage anyone involved in a DPLA contributing institution or interested in metadata aggregation overall to read the whitepaper and think about how these recommendations might or might not fit in their own institution.

It’s also important to remember that these are recommendations only. DPLA is in the process of implementing a number of them, but some have turned out to be infeasible or are affected by other DPLA initiatives. In particular, recommendations around representing objects and process are being implemented, but those around creating and sharing data and user interface have been refined. Another working group working on overall revisions to DPLA’s metadata application profile is suggesting further refinements of collection data, and the interface is being worked out through an overall DPLA website redesign. In the end, I feel like the spirit of the recommendations will definitely be adopted, but with a few tweaks.

MARAC Spring 2017 Conference

This guest post is by Linda Ballinger, Metadata Strategist @ the Pennsylvania State University, Chair of PA Digital Metadata Team Rights Subgroup

Last month I had the pleasure of presenting at the Mid-Atlantic Regional Archives Conference (MARAC) spring 2017 conference in Newark, NJ, along with PA Digital colleagues, Doreva Belfiore (HSLC) and Kelsey Duinkerken (Thomas Jefferson University). I was glad to have this opportunity to talk about the work of PA Digital and DPLA. But I especially enjoyed getting to know some of the galleries, libraries, archives, and museums (GLAM) communities in the area, and exploring the archives side of GLAM. As a cataloging librarian, I’ve often worked with archivists on projects, but I’ve never attended an archives-centered conference before.

I was excited to see many sessions focused on cultural awareness, diversity, and inclusion. I started with the “Deconstructing Whiteness in Archives” session led by Sam Winn (Virginia Tech), where we held small group discussions based on Roadside Theater’s Story Circle method. These discussions helped set the stage for “Radical Honesty in Descriptive Practice,” a session composed of three presentations on a topic of great interest to me – bringing greater diversity and inclusiveness to descriptive metadata. Sam Winn (Virginia Tech) challenged us to stop assuming we can be completely objective and to consider ways in which archives (and, I would add, the rest of GLAM) contribute to the erasure of underrepresented communities. She pointed to the Knowledge River Institute at the University of Arizona as an example of what can be done to humanize descriptive practice by elevating community expertise and participation. Christiana Dobrzynski (Bryn Mawr College) also talked about partnering with the communities being described, but cautioned against doing so in ways that perpetuate colonialism and tokenism. She also emphasized the importance of documenting descriptive practices for greater transparency. Michael Andrec (Ukrainian Historical and Educational Center of New Jersey) pointed out that many researchers, especially those new to archival research, don’t read the notes in finding aids, so they miss out on a lot of the context archivists provide. He proposed putting more descriptive notes in the container lists, so researchers don’t miss out on valuable information.

I also attended “ArchivesSpace and Metadata: Using Creative Tools and Workflows for Archival Management Systems,” which began with a session by Jessica Wagner Webster (Baruch College, City University of New York) on converting EAD XML metadata into spreadsheets for ingestion into Omeka and conversion to Dublin Core. I look forward to exploring Webster’s technique further to see if it can help prepare some Penn State collections for PA Digital and DPLA. I will also be looking more closely at the presentations by Lora J. Davis (Johns Hopkins University) on using the ArchivesSpace API, and Bria Parker’s (University of Maryland) on normalizing archival metadata with OpenRefine.

Our own session, “Adaptable DPLA: Repurposing Data with PA Digital and the Digital Public Library of America,” was one of the last sessions of the conference, but was well attended, with at least 45 attendees. Doreva Belfiore provided an introduction, including the history of the PA Digital Service Hub and the process of adding collections to DPLA via the PA Digital Aggregator. She outlined the many ways DPLA enhances the discovery and use of member collections, such as clickable map and timeline interfaces, virtual exhibits, and primary source sets for K-12 teachers. She showed how the metadata normalization process that PA Digital provides for member institutions enables such discovery tools, and how DPLA’s efforts to standardize rights information makes it easier for researchers to know how they can use the resources they discover. She also talked about how preparing collections from Temple University for PA Digital and DPLA made those collections easier to share with other discovery portals, such as Umbra Search African American History. Next, Kelsey Duinkerken talked about her experiences at Thomas Jefferson University as a PA Digital contributor. She described the support they received from PA Digital and its Metadata Team to prepare their collections for sharing with DPLA. Finally, I described Penn State’s experiences with using standardized rights statements from and our interest in the recommendations of the DPLA Archival Description Working Group in their whitepaper, “Aggregating and Representing Collections in the Digital Public Library of America”. The whitepaper addresses the need for DPLA to allow some collection-level metadata, and offers ways to give researchers enough collection-level description to help them understand the context of digital objects in DPLA. After the session, the three of us answered questions from attendees contemplating participating in a DPLA Service Hub and questions about aggregating metadata in other contexts.

Any conference is enhanced by the kind of informal networking and idea sharing that takes place between sessions and during breaks. I learned a lot by having the chance to get to know archivists and other cultural heritage organizational professionals outside my usual conference routine, and I hope to attend other MARAC conferences in the future.

DPLAFest 2017, Brandy Karl

This guest blog post is written by Brandy Karl, Copyright Officer and Affiliate Law Library Faculty @ the Pennsylvania State University, and PA Digital Metadata Team Rights Subgroup Member

I attended DPLAfest in April on behalf of the PA Digital DPLA Hub & PSU Libraries and spoke on a panel sharing the experiences of metadata teams: Managing Relationships, Managing Metadata: Digital Library Collaborations Between Institutions and Across Sectors.  

  • I worked with the PA Digital Metadata Rights Subgroup team to present Anastasia Chiu’s analysis of rights statements in metadata associated with PA Digital objects. A few other hubs had the same idea – we all believe that this data is incredibly important to demonstrate our progress, the work that needs to be done to implement normalized rights statements, and to provide a deeper understanding of the overall DPLA metadata analysis, which is tilted heavily towards a few institutions with many DPLA contributions.
  • I also presented insights from our work on the Metadata Rights Subgroup – how we share cross-institutional workload and collaborate effectively with different systems and technologies.
  • Finally, I called upon the attendees to brainstorm technical ideas to combat static rights statements. That is to say that a rights statement is only good so long as the copyright term status hasn’t changed or the copyright law hasn’t changed. DPLA leadership was excited and I continue to receive questions and interest in resolving this big issue.


  • I was really struck by the multiple structural forms of the Hubs – I hadn’t realized that some hubs had their own staff.
  • DPLA is interested in forming a national working group to create Rights Statement & Metadata training, but doesn’t seem to be moving fast. It is my opinion that it should be a separately funded position (to create training); currently, it’s still falling on hubs to build their own, separate wheels (rather than sharing creation of the wheel together). But we are moving forward with that at PA Digital, and I think the Metadata Team’s work is showing true leadership in this area.
  • It’s clear that the DPLA is valuable – there were many sessions on the projects that started with access to the materials that DPLA has enabled, with an extremely strong emphasis on social engagement.
  • Everyone was very excited about the idea of creating a risk management toolkit. Understanding copyright and convincing administrations that it’s actually not very risky to engage in the sort of digitization most small institutions want to do should be top priority.

Also I had a great time connecting with other PA Digital participants in person! Tara took all the pictures, and I think this is the first time my name has been a hashtag!

Lackawanna Valley Digital Archives: Our Partnership with Local Community Organizations

Headshot, Martina Soden


This guest blog post was written by Martina Soden, Collection/Metadata Manager of the Lackawanna Valley Digital Archives


Scranton Public Library takes a large role in our community in Northeastern Pennsylvania.  We have a strong competent, skilled staff who jump at the chance to create or organize something first.  In 2008 the Scranton Public Library wrote and received an LSTA grant to create a group to look at and think about creating an online digital collection of items.  A small group of local historical societies, museums and government groups were invited to attend.  With the help of Lyrasis, these organizations determined that the period of 1850-1865 was perfect for all the parties.

This partnership includes The Lackawanna Historical Society, the Anthracite Heritage Museum, The Scranton Times-Tribune, Steamtown National Historic Site, the University of Scranton Weinberg Library, and the Scranton Public Library. While we were unable to get a second LSTA grant to fund the digitization and creation of the Lackawanna Valley Digital Archives, we were able to receive funds from an outside funding organization.  These funds were used to scan many documents, and buy the software and license to ContentDM.  Lackawanna Valley Digital Archives was created and since this first collection we have added nine new collections using our connections with our original partners.  Please visit us at or find our items on the Digital Public Library of America at

Book cover, This is Waverly by Mildred Mumford