Name File Type Size Last Modified
Maltz---Targonski-2002-A-Note-on-the-Use-of-County-Level-UCR-Data.pdf application/pdf 3.9 MB 01/21/2019 12:18:PM

Project Citation: 

Kaplan, Jacob. Jacob Kaplan’s Concatenated Files: Uniform Crime Reporting (UCR) Program Data: County-Level Detailed Arrest and Offense Data. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-10-15. https://doi.org/10.3886/E108164V4

Project Description

Summary:  View help for Summary Version 4 release notes:
  • I am retiring this dataset - please do not use it. 
  • The reason that I made this dataset is that I had seen a lot of recent articles using the NACJD version of the data and had several requests that I make a concatenated version myself. This data is heavily flawed as noted in the excellent Maltz & Targonski's (2002) paper (see PDF available to download here and important paragraph from that article below) and I was worried that people were using the data without considering these flaws. So the data available here had the warning below this section (originally at the top of these notes so it was the most prominent thing) and had the Maltz & Targonski PDF included in the zip file so people were aware of it. 
  • There are two reasons that I am retiring it.
    • First, I see papers and other non-peer reviewed reports still published using this data without addressing the main flaws noted by Maltz and Targonski. I don't want to have my work contribute to research that I think is fundamentally flawed.
    • Second, this data is actually more flawed that I originally understood. The imputation process to replace missing data is based off of a bad design, and Maltz and Targonski talk about this in detail so I won't discuss it too much. The additional problem is that the variable that determines whether an agency has missing data is fatally flawed. That variable is the "number_of_months_reported" variable which is actually just the last month reported. So if you only report in December it'll have 12 months reported instead of 1. So even a good imputation process will be based on such a flawed measure of missingness that it will be wrong. How big of an issue is this? At the moment I haven't looked into it in enough detail to be sure but it's enough of a problem that I no longer want to release this kind of data (within the UCR data there are variables that you can use to try to determine the actual number of months reported but that stopped being useful due to a change in the data in 2018 by the FBI. And even that measure is not always accurate for years before 2018.).
!!! Important Note: There are a number of flaws in the imputation process to make these county-level files. Included as one of the files to download (and also in every zip file) is Maltz & Targonski's 2002 paper on these flaws and why they are such an issue. I very strongly recommend that you read this paper in its entirety before working on this data. I am only publishing this data because people do use county-level data anyways and I want them to know of the risks. Important Note !!!

The following paragraph is the abstract to Maltz & Targonski's paper:
County-level crime data have major gaps, and the imputation schemes for filling
in the gaps are inadequate and inconsistent. Such data were used in a recent study
of guns and crime without considering the errors resulting from imputation. This
note describes the errors and how they may have affected this study. Until
improved methods of imputing county-level crime data are developed, tested,
and implemented, they should not be used, especially in policy studies.

Version 3 release notes:
  • Adds a variable to all data sets indicating the "coverage" which is the proportion of the agencies in that county-year that report complete data (i.e. that aren't imputed, 100 = no imputation, 0 = all agencies imputed for all months in that year.). Thanks to Dr. Monica Deza for the suggestion. The following is directly from NACJD's codebook for county data and is an excellent explainer of this variable.
    • The Coverage Indicator variable represents the proportion of county data that is reported for a given year. The indicator ranges from 0 to 100. A value of 0 indicates that no data for the county were reported and all data have been imputed. A value of 100 indicates that all ORIs in the county reported for all 12 months in the year.
      •  Coverage Indicator is calculated as follows: CI_x = 100 * ( 1 - SUM_i { [ORIPOP_i/COUNTYPOP] * [ (12 - MONTHSREPORTED_i)/12 ] } )
        •  where CI = Coverage Indicator
        •  x = county
        •  i = ORI within county
  • Reorders data so it's sorted by year then county rather than vice versa as before.
Version 2 release notes: 
  • Fixes bug where Butler University (ORI = IN04940) had wrong FIPS state and FIPS state+county codes from the LEAIC crosswalk causing it to be counted in the wrong location. This agency has been removed entirely from the county data. Thanks to Dr. Wade Jacobsen for finding this bug. 
The agency-level data used to make these files are the Jacob Kaplan's Concatenated Files: Offenses Known and Clearances by Arrest 1960-2017 (https://www.openicpsr.org/openicpsr/project/100707/version/V12/view) and Jacob Kaplan's Concatenated Files: Arrests by Age, Sex, and Race 1974-2016 (https://www.openicpsr.org/openicpsr/project/102263/version/V8/view) data that I have released. For the code I used to create these files please see here: https://github.com/jacobkap/crime_data/blob/master/R/county_data.R.

This data aggregates agency-level crime and arrest data into county-level counts. Which county each agency is in is based on the FIPS state-county code in the LEAIC (crosswalk) file which is already joined with the agency-level data. I also add a column with the county name based on the census data set Annual Survey of Public Employment & Payroll (ASPEP) (https://www.openicpsr.org/openicpsr/project/101399/version/V5/view). For agencies that do not report, or report fewer than all 12 months of the years, I use the following imputation procedure designed by NACJD. The imputation process is the same as NACJD's process except while they exclude offenses with zero months reported I do include them.

  • Agencies reporting between 3 and 11 months have their crimes/arrests multiplied by 12/number of months reported. Such that an agency that reports only 6 months out of the year and says there were 10 murders would be estimated to have had 20 murders in the years (10 murders * 12/6 months reported = 10 * 2 = 20).
  • Agencies reporting fewer than 3 months would simply have the average (mean) number of arrests for agencies in that state, year, and population group (e.g. cities population 250,000+, cities population 10,000-24,999). This average is generated only by agencies that reported all 12 months of the year! Such that if an agency reported 15 murders and only reported 2 months of the year, that agency would get the average number of murders for similar sized agencies (same population group) in that state during that year.
  • Agencies with a population of 0 (common in special agencies such as state police, universities, park police) and fewer than 3 months reported are dropped as they have no population group to match to.




Scope of Project

Subject Terms:  View help for Subject Terms crime; violent crime statistics; crime; victimless crimes; national crime statistics (USA); ucr; Uniform Crime Reports; arrest; arrest; arrest rates
Geographic Coverage:  View help for Geographic Coverage Counties in the United States
Time Period(s):  View help for Time Period(s) 1960 – 2017 (1960-2017 for crime data, 1974-2016 for arrest data)
Data Type(s):  View help for Data Type(s) administrative records data

Methodology

Unit(s) of Observation:  View help for Unit(s) of Observation County
Geographic Unit:  View help for Geographic Unit County

Related Publications

This study is un-published. See below for other available versions.

Published Versions

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.