National Register and the National Archives: Difference between revisions

From OpenPreservation.xyz
(Created page with "File coming soon")
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
File coming soon
[[File:Excel thumbnail.jpg|none|thumb]]
File can be downloaded here: [[:File:NRHP NARA PDF download 2025.04.18.xlsx|https://openpreservation.xyz/wiki/File:NRHP_NARA_PDF_download_2025.04.18.xlsx]]
 
Steps taken:
 
For each listing in the national-register-everything-20240710.xlsx file available at https://www.nps.gov/subjects/nationalregister/data-downloads.htm, look up the corresponding NARA address.
 
For each NARA address, record the direct link to the PDF download, which is hosted on AWS servers (See NARA PDF Links tab in Excel file).
 
EG: 9001229 <nowiki>https://catalog.archives.gov/id/75320568</nowiki> <nowiki>https://s3.amazonaws.com/NARAprodstorage/lz/electronic-records/rg-079/NPS_NY/09001229.pdf</nowiki>
 
Automate the download of all PDFs of public, unrestricted National Register listings.
 
Once stored on a local disk, use a script to extract information about each PDF, including:
 
-file size
 
-page count
 
-other PDF metadata (title, subject, keywords, file creation and modification date, etc).
 
The 'Calculations' tab includes a chart of Page Count distribution, as well as some basic summary statistics:
 
-Total page count: 3,124,141
 
-Total documents: 76,092
 
-Average pages per document: 41
 
-Total file size: 3,759 GiBs
 
-Average file size: 51 MiBs.
 
This page will be updated once a more in-depth analysis of the content of the PDFs is available.
 
Reach out to hello@openpreservation.xyz if you have data analysis ideas about the 76,092 PDFs of National Register nomination forms available from the NARA website.

Latest revision as of 11:02, 18 April 2025

File can be downloaded here: https://openpreservation.xyz/wiki/File:NRHP_NARA_PDF_download_2025.04.18.xlsx

Steps taken:

For each listing in the national-register-everything-20240710.xlsx file available at https://www.nps.gov/subjects/nationalregister/data-downloads.htm, look up the corresponding NARA address.

For each NARA address, record the direct link to the PDF download, which is hosted on AWS servers (See NARA PDF Links tab in Excel file).

EG: 9001229 https://catalog.archives.gov/id/75320568 https://s3.amazonaws.com/NARAprodstorage/lz/electronic-records/rg-079/NPS_NY/09001229.pdf

Automate the download of all PDFs of public, unrestricted National Register listings.

Once stored on a local disk, use a script to extract information about each PDF, including:

-file size

-page count

-other PDF metadata (title, subject, keywords, file creation and modification date, etc).

The 'Calculations' tab includes a chart of Page Count distribution, as well as some basic summary statistics:

-Total page count: 3,124,141

-Total documents: 76,092

-Average pages per document: 41

-Total file size: 3,759 GiBs

-Average file size: 51 MiBs.

This page will be updated once a more in-depth analysis of the content of the PDFs is available.

Reach out to hello@openpreservation.xyz if you have data analysis ideas about the 76,092 PDFs of National Register nomination forms available from the NARA website.