About the PA Digital Aggregator

The PA Digital aggregator is the suite of systems that we use to harvest, gather, and distribute records from contributing cultural heritage institutions, with the goal of making their digital collections more widely discoverable. This digital content is contributed to the Digital Public Library of America (DPLA) as part of our role as the Pennsylvania service hub for the initiative.

What is Funnel Cake?

Funnel Cake is the search interface where you can preview the metadata included in the PA Digital aggregator.

The content you see:

Was harvested from a number of Pennsylvania institutions’ digital library repositories
- It is in turn harvested by the DPLA as a single “feed” or by other distribution partners with the permission of the contributor.
May or may not be contributed to the DPLA
- We use Funnel Cake as a staging area where we assess metadata to prepare for harvest into the DPLA. Depending on where we are in our workflow, collections viewable in Funnel Cake may or may not be in the DPLA yet.
Includes metadata that has been “normalized” or is in the process of being “normalized”
- With our help, the various contributing institutions have worked hard to prepare their metadata to be more compatible with the DPLA.
- Aggregated records are normalized to be in line with the DPLA standards. This may include but is not limited to: normalizing object types and language strings, stripping leading and trailing whitespace, creating PA Digital identifiers, etc.
Does not include the actual digital objects (e.g., image files)
- We only harvest the metadata, display thumbnails when available, and provide a link back to the original repository hosting the content.
- This is a metadata aggregator, and NOT a repository hosting content.

Technical information

Funnel Cake is the frontend interface used to search and view PA Digital aggregated metadata for quality assessment. It is built using Blacklight, an open-source discovery framework. Funnel Cake also serves as the development and production endpoint for our OAI-PMH feed.
Tasks associated with harvesting and aggregating metadata are managed and executed with the open source platform Apache Airflow using institution-specific automated workflows. This set of processes is nicknamed Shoofly Pie.
Shoofly Pie and Funnel Cake were introduced in 2019, replacing PA Digital’s original aggregation software from 2014. From 2019 to 2021, PA Digital migrated our existing workflows for contributing institutions to these new aggregator systems and phased out the old software in 2021.
Our code is accessible in the following GitHub repositories:
- Shoofly Pie
  - PA Digital metadata transformations and validations (executed as part of Shoofly Pie workflows)
- Funnel Cake
- DPLAH, our original aggregation software