National Register and the National Archives

From OpenPreservation.xyz
Revision as of 16:04, 31 July 2025 by Matt (talk | contribs)

Overview

This page describes a data analysis project to understand the scope of the National Register nomination form files made available on the National Archives website.

The National Park Service provides an Excel file titled 'Everything' on their Data Downloads page. This file includes all National Register of Historic Places Properties ("Listed/Returned/Removed/eligible/ineligible/Approved/Accepted/Rejected"). The columns in this spreadsheet include the following headings: Property Name, State, County, City, Street & Number, Status, Listing Type, Status Date, Restricted Address, Area of Significance, Category of Property, External Link, Level of Significance, Listed Date, Name of Multiple Property Listing, NHL Designated Date, Other Names, Park Name, and Property ID. Other metadata, such as architect name, or National Register Criteria for Evaluation A, B, C or D, are not stored in this file.

The 'External Link' column has an entry for some (but not all) properties that links to the corresponding file of the National Register nomination form stored on the National Archives website. For example, the Wainwright Building in St. Louis, Missouri, listed in 1968, points to https://catalog.archives.gov/id/63818176. On the NARA page, the PDF is loaded within a frame on the page. A user is also given the option to download the file directly to their desktop.


Steps taken:

For each listing in the national-register-everything-20240710.xlsx file available at https://www.nps.gov/subjects/nationalregister/data-downloads.htm, look up the corresponding NARA address.

For each NARA address, record the direct link to the PDF download, which is hosted on AWS servers (See NARA PDF Links tab in Excel file).

EG: 9001229 https://catalog.archives.gov/id/75320568 https://s3.amazonaws.com/NARAprodstorage/lz/electronic-records/rg-079/NPS_NY/09001229.pdf

The file can be downloaded here: https://openpreservation.xyz/wiki/File:NRHP_NARA_PDF_download_2025.04.18.xlsx

Automating the download of all PDFs

Automating the download of all PDFs of public, unrestricted National Register listings.

Once stored on a local disk, use a script to extract information about each PDF, including:

-file size

-page count

-other PDF metadata (title, subject, keywords, file creation and modification date, etc).

The 'Calculations' tab includes a chart of Page Count distribution, as well as some basic summary statistics:

-Total page count: 3,124,141

-Total documents: 76,092

-Average pages per document: 41

-Total file size: 3,759 GiBs

-Average file size: 51 MiBs.

This page will be updated once a more in-depth analysis of the content of the PDFs is available.

Future work

The above data analysis was carried out in April 2025, with the data made available by NPS through July 2024.

Reach out to hello@openpreservation.xyz if you have data analysis ideas about the 76,092 PDFs of National Register nomination forms available from the NARA website.