Name File Type Size Last Modified
  replication_package 09/24/2021 09:36:AM

Project Citation: 

Abramitzky, Ran, Boustan, Leah, Eriksson, Katherine, Feigenbaum, James, and Pérez, Santiago . Data and Code for: Automated Linking of Historical Data. Nashville, TN: American Economic Association [publisher], 2021. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-09-24. https://doi.org/10.3886/E133781V1

Project Description

Summary:  View help for Summary The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.

Scope of Project

Subject Terms:  View help for Subject Terms census data; record linking
JEL Classification:  View help for JEL Classification
      N00 Economic History: General
      N01 Development of the Discipline: Historiographical; Sources and Methods
Geographic Coverage:  View help for Geographic Coverage United States; Norway
Time Period(s):  View help for Time Period(s) 1850 – 1940
Collection Date(s):  View help for Collection Date(s) 1850 – 1940
Data Type(s):  View help for Data Type(s) census/enumeration data

Methodology

Data Source:  View help for Data Source Census data; Genealogical data
Unit(s) of Observation:  View help for Unit(s) of Observation Individual

Related Publications

Published Versions

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.