A newer version of this project is available. See below for other available versions.
AP VoteCast 2018
Principal Investigator(s): View help for Principal Investigator(s) Trevor Tompson, NORC at the University of Chicago; Jennifer Benz, NORC at the University of Chicago
Version: View help for Version V1
Name | File Type | Size | Last Modified |
---|---|---|---|
|
application/zip | 61.9 MB | 05/13/2019 10:19:AM |
Project Citation:
Project Description
AP VoteCast combines interviews with a random sample of registered voters drawn from state voter files; with self-identified registered voters conducted using NORC's probability-based AmeriSpeak® panel, which is designed to be representative of the U.S. population; and with self-identified registered voters selected from nonprobability online panels. Interviews were conducted in English and Spanish. Respondents received a small monetary incentive for completing the survey. Participants selected from state voter files were contacted by phone and mail, and had the opportunity to take the survey by phone or online.
Note that the data file(s), codebook, and questionnaire used are all included in the .zip file.
Scope of Project
The VoteCast survey of voters and nonvoters nationwide is compiled from results of the 50 state-based surveys and a nationally representative survey of 4,913 registered voters conducted on the probability-based AmeriSpeak panel (4,413 completed online and 500 via phone). It includes 40,692 probability interviews completed online (30,133) and via telephone (10,559), and 93,324 nonprobability interviews completed online. The margin of sampling error is plus or minus 0.8 percentage points for voters (n=116,792) and 1.8 percentage points for nonvoters (n=22,137). Registered voters in the District of Columbia were not included. The overall response rate for the probability sample drawn from the state voter files was 4.2 percent.
VoteCast State Surveys
In 25 states, VoteCast is based on roughly 1,000 probability-based interviews conducted online and via phone, and roughly 3,000 nonprobability interviews conducted online. In these states, the margin of sampling error is estimated to be plus or minus 3.5 percentage points for voters and 8.8 percentage points for nonvoters.
In 25 additional states, VoteCast is based on between 475 and 1,000 nonprobability interviews conducted online. In these states, the margin of sampling error is estimated to be plus or minus 8.7 percentage points for voters and 19.2 percentage points for nonvoters.
Methodology
In each of the 25 states in which VoteCast includes a probability-based sample, NORC obtained a sample of registered voters from Catalist LLC’s registered voter database. This database includes demographic information, as well as addresses and phone numbers for registered voters, allowing potential respondents to be contacted via mail and telephone. The sample was stratified by state, partisanship, age and race. In addition, NORC attempted to match sampled records to a registered voter database maintained by L2, which provided additional phone numbers and demographic information. After the matching, NORC had phone numbers for 86 percent of sampled records, including cell phone numbers for 60 percent of records with a phone number. Prior to dialing, all probability sample records are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Postcards are addressed by name to the sampled registered voter if that individual is under age 35; postcards are addressed to “registered voter” in all other cases. Telephone interviews are conducted with the adult that answers the phone. Both online and telephone respondents provided confirmation of registered voter status in the state.
Nonprobability Sample
Nonprobability participants were provided via the Harris Panel, including members of its third-party panels. Digital fingerprint software and panel-level ID validation is used to prevent respondents from completing the VoteCast survey multiple times. Nonprobability respondents provided confirmation of registered voter status in the state.
AmeriSpeak Sample
During the initial recruitment phase of the AmeriSpeak panel, randomly selected U.S. households were sampled with a known, non-zero probability of selection from the NORC National Sample Frame and then contacted by U.S. mail, email, telephone and field interviewers (face-to-face). The panel provides sample coverage of approximately 97 percent of the U.S. household population. Those excluded from the sample include people with P.O. Box-only addresses, some addresses not listed in the USPS Delivery Sequence File and some newly constructed dwellings. AmeriSpeak panelists provided confirmation of registered voter status in the state.
VoteCast employs a four-step weighting approach that combines the probability sample with the nonprobability sample, and refines estimates at a subregional level within each state. The 50 state surveys and the AmeriSpeak survey are weighted separately and then combined into a survey representative of voters in all 50 states.
State Surveys
First, weights are constructed separately for the probability sample (when available) and the nonprobability sample for each state survey. These weights are adjusted to population totals to correct for demographic imbalances of the responding sample compared to the population of registered voters in each state. The adjustment targets are derived from a combination of data from the U.S. Census Bureau’s November 2016 Current Population Survey Voting and Registration Supplement, Catalist’s voter file and the Census Bureau’s 2017 American Community Survey. The variables used were:
- Sex (male, female)
- Age (18-34, 35-64, 65+)
- Race/ethnicity (Hispanic, NH-White, NH-Black, All Other)
- Education (less than high school/high school grad, some college, 4-year college grad, post-graduate)
- Age * race/ethnicity (18-34, 35-54, 55+ * NH-White, All Other)
- Education * race/ethnicity (less than HS/HS grad, some college, 4-year college grad+ * NH-White, All Other)
- Partisanship model score (strong Republican, lean Republican, lean Democrat, strong Democrat). Probability sample only
- Income (<= 25K, 25-50K, 50-75K, 75-100K, 100+K) Non-probability sample only
- County grouping using AP’s party grouping (variable “AP_PARTY_REGION”) Non-probability sample only
Second, all non-probability sample respondents receive a calibration weight. The calibration weight is designed to ensure the non-probability sample is similar to a probability sample in regard to variables that are predictive of vote choice that cannot be fully captured through the prior demographic adjustments. The calibration benchmarks are based on county level estimates from a multilevel regression and poststratification model that incorporates all probability and non-probability cases nationwide. A national level logistic regression model was fitted using data from all states (both probability and non-probability samples) and AmeriSpeak to make predictions for registered voters at the state-level for Party ID (Democrat, Independent, Republican) and Country on Right/Wrong Track. These state-level predicted estimates are used as calibration benchmarks for the non-probability sample for all states. For Party ID, separate models were fitted for predicting the proportion of Democrats and proportion of Republicans. In addition, five separate models were fitted based on how the county voted in the 2016 Presidential election (i.e., based on % Trump vote for county/town). Models included the following individual level variables and county/town level variables:
- Flag for 18-34 year old registered voter
- Flag for 65+ year old registered voter
- Flag for female registered voter
- Flag for voting for Trump in 2016 Presidential election
- Proportion of non-Hispanic non-White in county/town
- Proportion 25+ years who are college educated in county/town
- Population density in county/town
- Median household income in county/town
For each state, there were two models: 1) predicting percent of vote share that goes for either of the two major parties’ candidates, 2) predicting percent of major party vote share that goes for the Democratic/Republican candidate. The following variables were used as potential covariates in the model: 2016 Presidential election results, population density, median income, percent below poverty line, percent unemployed, percent college degree, portion on public assistance, percent insurance coverage, percent nonwhite, percent citizen, percent 18-34 years old, percent 65 and older, and percent who have not moved in last year. For each state, we included in the models: 1) the 2016 presidential vote choice, and based on model fit, 2) a measure of socioeconomic status, 3) at least one demographic or geographic measure.
Fourth, the survey results are weighted to the actual vote count following the completion of the election. This weighting is done in 8-30 sub-state regions within each state.
National Survey
The national survey is weighted to combine the 50 state surveys with the nationwide AmeriSpeak survey. Each of the state surveys is weighted as described. The AmeriSpeak survey receives a nonresponse-adjusted weight that is then adjusted to national totals for registered voters derived from the U.S. Census Bureau’s November 2016 Current Population Survey Voting and Registration Supplement, the Catalist voter file and the Census Bureau’s 2017 American Community Survey. The state surveys are further adjusted to represent their appropriate proportion of the registered voter population for the country and combined with the AmeriSpeak survey. After all votes are counted, the national data file is adjusted to match the national vote for members of the U.S. House of Representatives within each state.
Using Weights
AP VoteCast is designed to be analyzed using weighted data. The data file includes different weights for different types of analyses.
- To produce estimates at the state level (e.g., percent of Californians who approve of President Trump), the state weights should be used.
- To produce estimates at the national level (e.g., the percent of registered voters nationwide who voted for a Democratic candidate for the House), the national-level weights should be used.
- The FINALVOTE weights should be used to produce estimates that are adjusted to reflect the final vote counts in addition to demographic, geographic, and calibration adjustments. Certified vote count data was provided by AP. AP VoteCast recommends using these weights for most analyses.
- The POLLCLOSE weights can be used to produce estimates prior to any adjustments to final vote counts. These weights are provided for transparency of the methodology to permit comparison of the survey’s estimates at poll close but prior to adjusting the survey outcome to match the final vote count.
Related Publications
Published Versions
Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.
This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.