Name File Type Size Last Modified
Major_Forecasting_ICPSROPEN.zip application/zip 41.8 KB 07/17/2022 11:11:PM
README.Rmd text/plain 1.3 KB 08/12/2022 02:17:PM
data_availability_statement.html application/xhtml+xml 5.4 KB 08/12/2022 02:14:PM

Project Citation: 

Lang, David, Wang, Alexander, Dalal, Nathan, Paepcke, Andreas, and Stevens, Mitchell. Forecasting Undergraduate Majors: A Natural Language Approach (AERA OPEN). Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2022-08-12. https://doi.org/10.3886/E175541V1

Project Description

Summary:  View help for Summary This Repository contains the code related to the AERA Open  Major Forecasting Paper 
Abstract:
This Repository contains the code related to the AERA Open  Major Forecasting Paper

Abstract: Commitment to a major is a fateful step in an undergraduate education, yet the relationship between courses taken early in an academic career and ultimate major issuance remains little studied at scale. Using transcript data capturing the academic careers of 26,892 undergraduates enrolled at a private university between 2000 and 2020, we describe enrollment histories using natural-language methods and vector embeddings to forecast terminal major on the basis of course sequences beginning at college entry. We find that (I) a student's very first enrolled course predicts their major thirty times better than random guessing and more than a third better than majority-class voting, (II) modeling strategies substantially influence forecasting metrics, and (III) course portfolios vary substantially within majors, such that students with the same major exhibit relatively modest overlap.


Due to the PII nature of the data as well as to protect the underlying institution, data will not be available in this repository. 
If interested in obtaining the data, contact Mitchell Stevens  stevens4@stanford.edu.

see the readme in the code folder for additional details on how to execute the code.

Scope of Project

Subject Terms:  View help for Subject Terms major forecasting; higher education; natural language processing; word2vec; course2vec


Related Publications

Request Information

This material is sensitive in nature and is available as restricted data through ICPSR. Users are required to apply for access, will be required to pay a fee, and will experience a wait time before access is given. The material will be distributed exactly as it arrived from the data depositor. ICPSR does not check or process the material.

Published Versions

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.