A newer version of this project is available. See below for other available versions.
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: County-Level Detailed Arrest and Offense Data
Principal Investigator(s): View help for Principal Investigator(s) Jacob Kaplan, University of Pennsylvania
Version: View help for Version V4
Name | File Type | Size | Last Modified |
---|---|---|---|
|
application/pdf | 3.9 MB | 01/21/2019 12:18:PM |
Project Citation:
Project Description
- I am retiring this dataset - please do not use it.
- The reason that I made this dataset is that I had seen a lot of recent articles using the NACJD version of the data and had several requests that I make a concatenated version myself. This data is heavily flawed as noted in the excellent Maltz & Targonski's (2002) paper (see PDF available to download here and important paragraph from that article below) and I was worried that people were using the data without considering these flaws. So the data available here had the warning below this section (originally at the top of these notes so it was the most prominent thing) and had the Maltz & Targonski PDF included in the zip file so people were aware of it.
- There are two reasons that I am retiring it.
- First, I see papers and other non-peer reviewed reports still published using this data without addressing the main flaws noted by Maltz and Targonski. I don't want to have my work contribute to research that I think is fundamentally flawed.
- Second, this data is actually more flawed that I originally understood. The imputation process to replace missing data is based off of a bad design, and Maltz and Targonski talk about this in detail so I won't discuss it too much. The additional problem is that the variable that determines whether an agency has missing data is fatally flawed. That variable is the "number_of_months_reported" variable which is actually just the last month reported. So if you only report in December it'll have 12 months reported instead of 1. So even a good imputation process will be based on such a flawed measure of missingness that it will be wrong. How big of an issue is this? At the moment I haven't looked into it in enough detail to be sure but it's enough of a problem that I no longer want to release this kind of data (within the UCR data there are variables that you can use to try to determine the actual number of months reported but that stopped being useful due to a change in the data in 2018 by the FBI. And even that measure is not always accurate for years before 2018.).
- Adds a variable to all data sets indicating the "coverage" which is the proportion of the agencies in that county-year that report complete data (i.e. that aren't imputed, 100 = no imputation, 0 = all agencies imputed for all months in that year.). Thanks to Dr. Monica Deza for the suggestion. The following is directly from NACJD's codebook for county data and is an excellent explainer of this variable.
- The Coverage Indicator variable represents the proportion of county data that is reported for a given year. The indicator ranges from 0 to 100. A value of 0 indicates that no data for the county were reported and all data have been imputed. A value of 100 indicates that all ORIs in the county reported for all 12 months in the year.
- Coverage Indicator is calculated as follows: CI_x = 100 * ( 1 - SUM_i { [ORIPOP_i/COUNTYPOP] * [ (12 - MONTHSREPORTED_i)/12 ] } )
- where CI = Coverage Indicator
- x = county
- i = ORI within county
- Reorders data so it's sorted by year then county rather than vice versa as before.
- Fixes bug where Butler University (ORI = IN04940) had wrong FIPS state and FIPS state+county codes from the LEAIC crosswalk causing it to be counted in the wrong location. This agency has been removed entirely from the county data. Thanks to Dr. Wade Jacobsen for finding this bug.
- Agencies reporting between 3 and 11 months have their crimes/arrests multiplied by 12/number of months reported. Such that an agency that reports only 6 months out of the year and says there were 10 murders would be estimated to have had 20 murders in the years (10 murders * 12/6 months reported = 10 * 2 = 20).
- Agencies reporting fewer than 3 months would simply have the average (mean) number of arrests for agencies in that state, year, and population group (e.g. cities population 250,000+, cities population 10,000-24,999). This average is generated only by agencies that reported all 12 months of the year! Such that if an agency reported 15 murders and only reported 2 months of the year, that agency would get the average number of murders for similar sized agencies (same population group) in that state during that year.
- Agencies with a population of 0 (common in special agencies such as state police, universities, park police) and fewer than 3 months reported are dropped as they have no population group to match to.
Scope of Project
Methodology
Related Publications
Published Versions
Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.
This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.