As a Pennsylvania native (Kittanning, PA), educated by our flagship institution (Penn State), and current resident (Erie, PA), I can’t help but be a little partial to the PA Digital Hub. So I am delighted to have the opportunity to discuss collections, description, and access at DPLA on this blog, and to particularly talk about Aggregating and Representing Collections in the Digital Public Library of America.
I am the Data Services Coordinator at DPLA, which means that I am in charge of managing our data aggregation services. This involves working with our Content and Service Hubs in their early stages to ensure that their data will be interoperable with our data set and then managing the quality review and remediation process for the first data ingest. After that, I work closely with the technology team to schedule re-ingest and maintenance of the data. I have several ongoing initiatives to analyze and improve our overall data quality that I’ll be working on (with the help of our Hubs) over the coming months. There’s always more improvement to be made, which is a great challenge!
Putting together the Archival Description Working Group
Throughout 2015 several themes kept popping up in questions and conversations at DPLA: How do we communicate the context of materials that are closely related to each other? How do we take advantage of information that is created about collections while still keeping DPLA a library of digital materials only? How can materials that come from different descriptive traditions (libraries, archives, museums) be reconciled?
DPLA had always had metadata fields for collection name and description in our item-level records, but at that point it had never been tightly controlled. We focused far more on the basic item descriptors (title, subjects, description) and let institutions do what they wished with collection. Since “collection” is actually a term with a lot of different meanings, we ended up with a hodge-podge of data that could at times seem misleading or redundant. Some Hubs organized all the content from each partner into an institution-based collection, others had very broad subject- or format-based collections, while still others had very specific provenance-based collections. In short, we felt that the collection data was not ready for prime-time, so while it was retained and indexed in the record for searching, it was not featured in the website version of the metadata record.
In late 2015, DPLA also decided that in order to continue to collaborate with the community effectively the time had come to move on from the very broad open committees that had been created during the planning stages of the DPLA(?) to a series of more focused working groups that could help solve specific issues. One of the first of these working groups we decided to put together was one related to the issues of collections data, specifically that created in archival description — hence, the Archival Description Working Group. While archives were the initial focus, the work of the group ended up being a more all-encompassing analysis of collections in DPLA regardless of the type of originating institution.
The recommendations the working group came up with were published as a whitepaper in 2016. They addressed five areas:
- Recommendations for representing objects (item vs. aggregate)
- Recommendations for relationship of object to collection
- Recommendations for creating and sharing collection data
- Recommendations for user interface
- Recommendations for process
The Methodology of the Working Group
An important first step the group tackled was to define what we meant by the word “collection” in our context. From the whitepaper:
The term is used loosely by the working group to mean any intentionally created grouping of materials. This could include, but is not limited to: materials that are related by provenance, and described and managed using archival control; materials intentionally assembled based on a theme, era, etc.; and groupings of materials gathered together to showcase a particular topic (e.g., digital exhibits or DPLA primary source sets). Not included in this definition are assemblages of digital objects that are not the result of some sort of intentional selection. For example, all of the objects that are exposed to DPLA by a particular institution would, generally speaking, not be a collection in this sense. All of the digital objects that belong to a specific type or form/genre – maps, for instance – would also not be a collection in the context of this white paper.
In order to develop our recommendations we took a three-phased approached. First, we did research. We read as many reports and articles we could find on combining materials described at item- and collection- or aggregate-level and we reviewed several digital library sites that did something innovative in this area. After the research phase, we synthesized our findings and created a list of user-based scenarios that we thought DPLA should support:
- It should be apparent to users when they find an item/s that these materials are part of a collection if appropriate.
- Users should know as soon as they search that items are part of collections and should be able to act on that knowledge.
- Users should be able to refine and limit their searches by membership in collections.
- Users should understand when objects are described using a traditional component-level archival-style descriptions, i.e., one object that represents many items.
- Users should be presented with appropriate metadata for objects, and this level of metadata and context may not be the same for all objects and collections. This could result in many items with the same description.
- Users may be presented with information that helps them makes sense of where the item belongs within a collection if the collection structure or arrangement is meaningful.
- Collection/context information applies to different types of collections including exhibitions and primary source sets.
- Users should be able to go to DPLA and find a collection that interests them without doing an item search.
- Users should be able to find similar materials related to a retrieved item by their membership in the same collection.
We then used the scenarios to guide us through the process of making recommendations for changes to DPLA metadata, workflow, and interface.
Recommendations for item and aggregate objects
Rather than write at length here about each of the areas of recommendations, I’d like to just address one of the areas of biggest discussion: Recommendations for representing objects (item vs. aggregate).
The question that drove this discussion was how data created about materials in the aggregate can be used in DPLA. A prime example of this kind of data is a folder-level description that an archive might create. In the past decade in particular this kind of practice has increased in the archival community, largely inspired by the landmark publication of More Product, Less Process. DPLA has increasingly gotten records from contributing institutions that reflect things like folders of materials rather than the individual items within them. In the archival community the finding aid, which contextualizes and describes an entire collection is the norm. However, DPLA doesn’t just serve archives. We have a huge collection of books, films, reports, journals, etc., all of which are individual items. Our searching, indexing, and presentation designs are all based on the idea that each record corresponds to a single individual object. Since, DPLA can’t just adopt an archival, collection-based description approach, the working group focused part of their efforts of thinking through how aggregate-level descriptions could be combined with the existing item-level paradigm.
The two solutions usually adopted when faced with the question of how to translate metadata for a folder to DPLA were basically either to create one description that described a bunch of items in the aggregate, or to create a bunch of really minimal records for each item reusing similar data in each one. Either of these approaches might be best in particular situations. For example, in specific cases of unique visual materials an institution may want to opt for lots of individual items with minimal metadata. Even though the metadata is similar for each, this would allow the visual material to be discretely findable.
On the other hand, the search experience of seeing record after record for textual materials with indistinct images that are virtually identical would not suit the majority of those types of materials. In this case, the experience of finding a basic, high-level description of a folder and then following the link back to the originating institution to examine the materials in depth seemed to be the best fit.
The working group actually doesn’t want to recommend one style of description over another. Instead we want to work with the kinds of descriptive practices that professionals are already using. We want DPLA to fit into the accepted professional practice, not create yet another new approach that may or may not be adopted. We think that having an infrastructure that can be flexible enough to encompass aggregated objects and item level objects, while also communicating relationships between materials will serve the user scenarios we came up with best. Furthermore, we wanted to rely on the judgement of the librarians, archivists, and other professional on the best way to describe and provide access to their own materials rather than dictate something to them.
In both of these types of description though, the working group members agreed that the addition of collection titles, descriptions, and links back to collection-level descriptions or home pages at the original institution would help greatly in communicating what these objects are to their audience. The other sections of the whitepaper go into detail on how that collection-level information can be gathered, stored, and displayed effectively in DPLA. Combined with a flexible approach to item description described above, the working group felt that these changes would best achieve the goals of the user scenarios.
Recommendations have context too
It’s important to remember that this and the other recommendations in the whitepaper are for DPLA, in other words, they pertain to the handling of collection and aggregate-object metadata in a heterogeneous, large-scale aggregated environment. They should not be read as recommendations for every cultural heritage institution everywhere. Those submitting data to DPLA would need to publish it in a way that we could use, but within their own context, their own repository or website, it may be best served by being put up another way. I would encourage anyone involved in a DPLA contributing institution or interested in metadata aggregation overall to read the whitepaper and think about how these recommendations might or might not fit in their own institution.
It’s also important to remember that these are recommendations only. DPLA is in the process of implementing a number of them, but some have turned out to be infeasible or are affected by other DPLA initiatives. In particular, recommendations around representing objects and process are being implemented, but those around creating and sharing data and user interface have been refined. Another working group working on overall revisions to DPLA’s metadata application profile is suggesting further refinements of collection data, and the interface is being worked out through an overall DPLA website redesign. In the end, I feel like the spirit of the recommendations will definitely be adopted, but with a few tweaks.