The PA Digital Metadata Rights Subgroup Team is excited to present the second of three video modules on copyright and rights statements. The second module, “What is a Rights Statement?” provides an overview of rights statements and their application for digitized cultural heritage collections. The video covers: what are rights statements, what are they not, a history of rightsstatements.org, and overview of the statements. It also covers licenses, a comparison of rights statements and licenses, the benefits of normalized rights statements, the challenges and risks, and sources for more information.
This is Gabe Galson. I work here at Temple University on the PA Digital Metadata team and would like to share with you some tricks of the trade. Shhhh! These are the exclusive Metadata Cleanup secrets the pros don’t want you to know about. Field value normalization life hacks that, after this exclusive blog post, will go back into the PA Digital Vault forever.
Are you frustrated because your local repository doesn’t feature expandable and collapsable facets that can be sorted either alphabetically or by frequency of occurrence, enabling effortless detection of slightly divergent values? Don’t be. Now there’s OpenRefine.
OpenRefine is a powerful metadata cleanup tool that allows you to replicate our PA Digital aggregator’s key functionalities on any standard computer. All you need is an export of your data in a tabular format, that is to say, your metadata as an Excel, tab separated value [TSV], or comma separated value [CSV] file. Refine will ‘Hoover’ up such data, display it as a spreadsheet, then allow you to view any field’s constituent values via a facet box, as in our aggregator. When sorted alphabetically the facets Refine can generate will allow you to eyeball inconsistent values. Check out ‘Philadelphia (Pa.)’ vs ‘Philadelphia, (PA)’ in this screenshot of a Refine facet:
From there the values can be quickly standardized.
Refine’s clustering functionality will pull from a dataset of any size slightly divergent values that may not be obvious from a simple browse. These values can then be quickly standardized en masse from within the clustering tool. All 145 Philadelphia variants detected in the screenshot below can be standardized with a single click. Wow!
Traditional spreadsheet programs don’t make it easy to work with multiple values contained in a single cell. With Refine it’s a breeze. Assuming the values are separated by a single delimiter one can split each value into its own individual row, then fold each row back into the original record when cleanup is complete.
Go from this…
… and then back again whenever you want!
But wait, there’s more! Refine also sports an array of other features useful to anyone with messy data on their hands. It lets you structure unstructured data, converting text blocks into spreadsheets. It allows you to stack multiple facets to your heart’s content, star individual records of interest then facet on the star marker, or isolate a particular subset of your data and perform complex operations on only it. Refine offers a robust undo/redo interface, allowing you to test out complex transformations without risk. It lets you integrate regular expressions, GREL commands, and Python script into your basic cleanup operations, making it extremely flexible and powerful. GREL –the Google Refine Expression Language, a simple programming language native to Refine– is no more complicated than Excel’s formulas; it lets those with no coding experience perform fairly sophisticated data cleanup operations. Refine will also allow you to populate columns with data called from websites or APIs. For example the Google Maps API can be called to return the geo coordinates corresponding to individual street addresses found within your dataset.
As you can see, Refine, which is open source, free to download, and fairly easy to pick up, will put many of our aggregator’s functions at your fingertips, allowing you to independently prepare your data for ingestion into the DPLA.
If you’d like to learn more about OpenRefine feel free to sign up for the PA Digital team’s upcoming in-person Metadata Anonymous workshop, which will feature a hands-on, in-depth intro to Refine that will take you from zero to 100 in no time at all! All of the operations illustrated or described in this blog post will be covered. Sign up here if you’re interested!
If you want to dive right in, take a look at this index of OpenRefine resources on the web.
Overview of GREL syntax:
Comprehensive index of GREL functions by type:
OpenRefine listserve/discussion group. If I have an issue I can’t solve on my own I ask for help here. Many advanced Refine users monitor the board and will be happy to help in many cases.
Regex cheat sheet focusing on OpenRefine users:
Good basic OpenRefine introduction:
Another good OpenRefine introduction:
Another good introduction:
Treasure trove of advanced OpenRefine recipes:
The recipes found on this page are especially useful to Refine novices:
The “OpenRefine for Metadata Cleanup” PDF that can be found on this page is an excellent openrefine tutorial that includes many useful recipes, including the date transformations featured in our own cookbook:
Book: Verborgh, R., & De, W. M. (2013). Using OpenRefine: The essential OpenRefine guide that takes you from data analysis and error fixing to linking your dataset to the Web.
Free Your Metadata OpenRefine Reference:
Especially for Archivists: Chaos —> Order is a great blog about data manipulation and cleanup tools for archival data. The authors have a few very useful posts about Open Refine, especially dealing with dates and duplicate subjects.
Refine posts: https://icantiemyownshoes.wordpress.com/tag/openrefine/
Especially for Archivists: The Bentley Historical Library at the University of Michigan maintains an excellent blog about their experiences integrating Archives Space, Archivematica and Dspace. The authors have a few very interesting posts about using Open Refine and Python to clean their EAD files.
Regular expression (regex) tutorials:
The PA Digital Metadata Rights Subgroup Team is excited to present the first of three video modules on copyright and rights statements. The first module, “Copyright 101,” provides a basic introduction for library and information professionals considering copyright and rights issues in digitized cultural heritage collections.
2017 has already been an amazing year for PA Digital!
We began our year with a webinar, “Highlights of DPLA Whitepapers Webinar” in January in order to give an overview of three complex documents for our existing and prospective contributors. During this webinar, we summarized Aggregating and Representing Collections in the Digital Public Library of America. This paper explored the possibility of including more collection-level description within the DPLA. The second white paper, RightsStatements.org White Paper: Recommendations for Standardized International Rights Statements acted as documentation and information for Rightsstatements.org. Lastly, we spoke on DPLA Metadata Quality Guidelines which acts as a refresh of the DPLA’s metadata requirements and recommendations for better data quality. View our slides here!
We have had two harvests so far this year. Our April harvest saw the inclusion of Bryn Mawr College, Bloomsburg University, Montgomery County Community College, Slippery Rock University, Ursinus College, Philadelphia University, and the Pennsylvania Horticultural Society. This harvest included 19 new collections and 18,480 digital objects (records).
PA Digital was well-represented at DPLAFest 2017 in Chicago. Brandy Karl, Copyright Officer, from Pennsylvania State University presented on “Implementing Rights Statements @ PSU and PA Digital” (part of Turn the Rights On: a RightsStatements.org Update and Comparison of Regional Rights Standardization Projects). View her slides here!
Delphine Khanna and myself presented on “Reaching Out to Potential DPLA Hub Contributors: PA Digital’s Communication Strategy and Plan, or “The Accidental Public Relations Manager.” View our slides here!
Our June 2017 harvest saw the inclusion of West Chester University, Pennsylvania State Archives, La Salle University, Millersville University, Sewickley Public Library, and Carnegie Library of Pittsburgh. This harvest also added 48 new collections and 27,780 digital objects (records).
We would like to extend warm thanks to all who worked with us to bring in new collections.
You can see all of PA Digital’s records in the DPLA by searching or faceting on our name PA Digital: PA Digital Records in the DPLA.
View our progress since we went live in DPLA:
We also revamped our website recently. Check it out: https://padigital.org/
In addition to new contributors and records, we are planning:
- Two metadata workshops,
- Metadata Anonymous Webinar, 8/23 at 1pm
- If You Liked it Then You Should Have Put Metadata On It: Descriptive Cataloging and Selecting Rights Statements for Digital Collections at the 2017 Pennsylvania Library Association (PaLA) 10/18 at 9am
- Two orientation webinars, and
- Knight Orientation Webinar, 7/20 at 1pm
- Fall Webinar TBD
- Three educational online modules on rights statements for this summer and fall.
- What is Copyright?
- What is a Rights Statement?
- Implementing Rights Statements
We are looking forward to presenting our work and onboarding more institutions and more content from current contributors within the coming year. Stay tuned for more details.
As usual, for information about our project, or about how you can participate in PA Digital and the DPLA, please email anytime to firstname.lastname@example.org.
Rachel Appel, Co-Manager, PA Digital, on behalf of the PA Digital Team
In July, PA Digital’s Metadata Team began regular Virtual Office Hours. Inspired by instructors in our previous educational environments, we offer Virtual Office Hours as time and space for open conversation and information-sharing on digital collections and participation in PA Digital & the Digital Public Library of America (DPLA). The sessions are not recorded.
In Virtual Office Hours, we hope to hear questions and thoughts that our partners and potential partners from all over Pennsylvania have about the entire process of bringing digital collections to PA Digital and the DPLA. This includes not only steps for already-digitized collections, but also steps as early as planning digitization and creating metadata practices. We would also love to hear about what works and what doesn’t work for our partners in local contexts.
We will announce details (including dates, times, and direct links) of Virtual Office Hours regularly in several places:
- on the PA Digital Presentations/Events page
- by email to listservs, including the open-membership pa-digital listserv hosted by HSLC
- on Twitter
We hope to hear from you soon!
(Image credit: Mark Moz, https://www.flickr.com/photos/106574022@N04/10797544894)
We recognize that one of the greatest obstacles of bringing cultural heritage collections into digital spaces like PA Digital and the DPLA is the large step of initial digitization, including forming a plan and a workflow for digitization, and executing them. Here are a few select resources that can help your institution’s digitization planning and implementation. The concise list appears at the end of this post.
Planning & Workflow
Recently, the DPLA offered a digital projects training program (the Public Library Partnerships Project), and its self-guided curriculum remains available, along with a gallery of projects completed by participants. This curriculum introduces guidelines and topics for planning new digitization projects. Additionally, Franky Abbott (DPLA), Jennifer Birnel (Montana Memory Project), and Sarah Hawkins (East Central Regional Library), also presented a webinar on the topic for TechSoup, based on their collaborations within the Public Library Partnerships Project:
For financial planning stages, the Digital Library Federation’s Assessment Interest Group recently developed and released a Library Digitization Cost Calculator, currently in beta. Once you can roughly determine the total cost of a project of interest, it becomes a little easier to determine what grants you can apply to; there are many out there, including CLIR’s Hidden Special Collections and Archives competition, and multiple grants from the NEH such as Common Heritage, Humanities Open Book, and more.
Hardware & Hosting (In-State!)
Within Pennsylvania, the State Library offers a lending program for their portable tabletop Scribe Scanner. Our partners at the University of Scranton and Scranton Public Library engaged in a great community project with it; you can also read more about the scanner’s specifications here and here. The loan application process for the State Library’s Scribe Scanner is as follows:
- Verify you can meet the State Library’s requirements
- Complete an application form
- Complete an Internet Archive Partner form (send to RAemail@example.com)
Additionally, our partner HSLC via the POWER Library offers PA Photos and Documents, a content management and hosting service that doubles as a union catalog. That is to say, POWER Library aggregates participant collections together in a searchable database, and provides the hosting and content management service to participants for free or very low cost (contingent on some guidelines). The application to participate is available online.
Format & Metadata Guidelines
The Federal Agencies Digitization Guidelines Initiative (FADGI) has drafted some general guidelines and resources on digitization and digital-object metadata, including standards (like their Digital Imaging Standards), as well as explorations of specific topics, (like their file format comparisons).
If your institution’s goals include exposing your digital materials in the Digital Public Library of America, we at PA Digital are very happy to help! We suggest that you take a look at our PA Digital Readiness guidelines and our metadata guidelines, and feel free to email (firstname.lastname@example.org) or tweet (@PADigitalNews) the PA Digital team with any questions.
Planning & Workflow
- DPLA Public Library Partnerships Project
- Digital Library Federation
Hardware & Hosting
- State Library of Pennsylvania
- POWER Library
- Hosting & Content Management (PA Photos & Documents)
Format & Metadata Guidelines
Please share any other resources that you may know of with the PA Digital community in the comments below!
On May 10 & 17, 2017 Emily Gore (DPLA) and Greg Cram (NYPL) presented a two-part webinar on RightsStatements.org, a joint initiative of DPLA and Europeana that provides standardized rights statements for cultural heritage institutions and aggregators to apply to digital objects. RightsStatements.org was launched on April 14, 2016, and currently provides 11 statements for institutions to use for sharing usage rights status of their digital objects.
In the first half of “RightsStatements.org: Why We Need It, What It Is (and Isn’t) and What Does It Mean for the DPLA Network and Beyond?” (5/10/2016), Emily and Greg spoke about the background and philosophy behind RightsStatements.org’s creation. They pointed out the vast variety of statements currently describing digital objects, and the potential for users to be confused or misled regarding restrictions in using the objects; they also covered a basic primer on copyright and fair use.
The second half of the webinar (5/17/2016) focused on the statements themselves, which are separated into three types: In Copyright, No Copyright, and Other. Emily and Greg covered each of the 11 statements (and a potential 12th addition), and described the difference between rights statements (which institutions may apply) and the licensing tools of Creative Commons (which, aside from the public domain mark, only original copyright holders may apply). They also spoke about implementation of the rights statements in the DPLA, noting that the overall goal is to let users “know, as accurately as possible, what they can and cannot do with materials that they find,” and acknowledging that the work of implementation will require some time and resource-intensive work.
International Rights Statements Working Group (2016), “Rightsstatements.org White Paper: Recommendations for Standardized International Rights Statements.”
DPLA (2016), “Announcing the Launch of RightsStatements.org,” DPLAblog.
Sarah Shreeves (2016), “Clarity for Our Users,” In the Open.