Jan 1, 2006

How does your archive prepare a data collection for public release?

Our staff have established a series of steps for acquiring and archiving each new data set. In particular, each study is assessed to determine if any issues of respondent confidentiality exist and checks are made for problems arising from either direct or indirect identification. In addition, the technical characteristics of the documentation are verified against the data to assure that the data and documentation match. For intensively processed studies, all variables are examined to assure that each is thoroughly identified and labeled. When variables are not thoroughly described, our staff consults the documentation and/or questionnaires. In some cases it may be necessary to contact the principal investigator(s) to remedy any problems uncovered during the review of the data. Data definition statements for SPSS and SAS are prepared for all data sets and value labels are often added when they are not part of the files as we received them. After the initial processing is completed, further quality checks are made. For example, the observed frequencies are verified against the reported frequencies and sometimes checks are made for consistency of survey responses and skip patterns. Data files are also reformatted to the smallest possible size for optimum speed in file transfer over the Internet.

How do I obtain data from the Restricted Access Archive?

In order to ensure respondent confidentiality, certain sensitive data collections are available only through a restricted access archive. While these data are still available free of charge, users must specifically request these collections from the archive. More information about this process and the necessary Data Transfer Agreement Form can be found in our Restricted Use Archive.

Why are no data files listed for certain data collections?

If you find a data collection that you would like to download, but you are only given the option to download the documentation and no data, then you should read the README or Abstract files for more information about how to obtain the data. In order to ensure respondent confidentiality, certain sensitive data collections are available only through a restricted access archive. While these data are still available free of charge, users must specifically request these collections from the archive. More information about this process and the necessary Data Transfer Agreement Form can be found in our Restricted Use Archive.

I don't want to run my own statistics. Where can I get reports or statistics that have already been generated?

NACJD does not generally archive, produce, or distribute published reports, charts, or other analyses based upon data holdings. Users interested in published reports can search our online publications bibliography for publications related to NACJD data collections. Many of the publications listed in the online database are available in either paper or electronic form from the National Criminal Justice Reference Service. Another online source for compiled statistics is the Sourcebook for Criminal Justice Statistics, which brings together statistical information about all aspects of criminal justice in the United States. Additionally, you may want to try our online data analysis system, which allows you to perform certain statistical procedures on several NACJD studies, create custom subsets, or browse the codebook on the Internet, without downloading the entire collection and importing the data into a statistical package.

Where can I find technical support for using statistical packages?

General Statistical Resources

Statistical Packages Commonly Used with NACJD Data

SAS

SPSS

STATA

Other Statistical Packages

Does your archive keep the original version of the files that I submit for archiving?

Yes. The archive keeps several copies of the original files as submitted by the data depositor, including copies stored offsite. If you are in need of these files, please contact deposit@icpsr.umich.edu.

Once I've deposited my data, I do not want changes made to my data collection without my permission. Will you contact me prior to altering the files?

For most data collections, the archive distributes data and documentation in essentially the same form in which they were received. When appropriate, documentation is converted to Portable Document Format (PDF), data files are converted to non-platform-specific formats, and variables are recoded to ensure respondents' anonymity. Our staff will generally contact you regarding any suggested changes after an initial assessment of your data collection. Regardless of changes made, the archive keeps several copies of all files in the form in which they were submitted.

What is the preferred submission format for data files?

ASCII, SAS Transport, SPSS Portable, or Stata data files are accepted, with the non-ASCII files preferred.

I prefer not to submit my data collection via the Electronic Deposit Form. Do I have other options?

Yes. ICPSR welcomes data collections submitted on CD-ROM or DVD. Removable media can be sent to the following addresses:

U.S. Mail: ICPSR Acquisitions P.O. Box 1248 Ann Arbor, MI 48106

UPS, FEDEX, etc.: ICPSR Acquisitions Institute for Social Research 426 Thompson Street Ann Arbor, MI 48104-2321

Additional information regarding contacting ICPSR may be found online.

Please note that ICPSR now discourages the submission of data via email.

When will I know if my data collection will be archived?

ICPSR staff will contact you immediately to confirm the receipt of all materials submitted and to confirm the processing plan.

How do I deposit data in the ICPSR archive?

ICPSR has published a Guide to Social Science Data Preparation and Archiving, 4th Edition (PDF 2MB) to assist data producers in preparing their collections for deposit in a public archive. The document offers guidelines and suggestions that should be useful for anyone engaged in the creation of a dataset. Individuals interested in depositing data should use the ICPSR Data Deposit Form, which supplies essential information to staff in the ICPSR Data Archive and allows for secure transmission of files via a Web browser. For more detailed information, see Deposit Data or contact ICPSR staff at deposit@icpsr.umich.edu.

What is a codebook?

A codebook describes the contents, structure, and layout of a data collection. A well-documented codebook "contains information intended to be complete and self-explanatory for each variable in a data file1."

Codebooks begin with basic front matter, including the study title, name of the principal investigator(s), table of contents, and an introduction describing the purpose and format of the codebook. Some codebooks also include methodological details, such as how weights were computed, and data collection instruments, while others, especially with larger or more complex data collections, leave those details for a separate user guide and/or data collection instrument.

The main body of a codebook contains unambiguous variable level details. These include, as shown in the example below from the National Longitudinal Survey of Youth, 19792, the following:


Assessment of R's General Health

  • Variable name: The name or number assigned to each variable in the data collection. Some researchers prefer to use mnemonic abbreviations (e.g., EMPLOY1), while others use alphanumeric patterns (e.g., VAR001). For survey data, try to name variables after the question numbers - e.g., Q1, Q2b, etc. [In above example, H40-SF12-2]
  • Variable label: A brief description to identify the variable for the user. Where possible, use the exact question or research wording. ["SF12 - ASSESSMENT OF R'S GENERAL HEALTH"]
  • Question text: Where applicable, the exact wording from survey questions. ["In general, would you say your health is . . ."]
  • Values: The actual coded values in the data for this variable. [1, 2, 3, 4, 5]
  • Value labels: The textual descriptions of the codes. [Excellent, Very Good, Good, Fair, Poor]
  • Summary statistics: Where appropriate and depending on the type of variable, provide unweighted summary statistics for quick reference. For categorical variables, for instance, frequency counts showing the number of times a value occurs and the percentage of cases that value represents for the variable are appropriate. For continuous variables, minimum, maximum, and median values are relevant.
  • Missing data: Where applicable, the values and labels of missing data. Missing data can bias an analysis and is important to convey in study documentation. Remember to describe all missing codes, including "system missing" and blank. [e.g., Refusal (-1)]
  • Universe skip patterns: Where applicable, information about the population to which the variable refers, as well as the preceding and following variables. [e.g., Default Next Question: H00035.00]
  • Notes: Additional notes, remarks, or comments that contextualize the information conveyed in the variable or relay special instructions. For measures or questions from copyrighted instruments, the notes field is the appropriate location to cite the source.

For variables that are compiled, created, or constructed, such as the examples below from the Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-19403 study and the Welfare, Children, and Families: A Three-City Study4 , fewer details are needed: variable name and label, as well as a description of how the data were compiled or created.


Variable name: Siblings
Illegal Activities

The order of variable descriptions in the codebook usually matches the order of the data. To enhance usability on complex or larger data collections, researchers sometimes add appendices listing variable names and labels alphabetically, by sample characteristic, or according to the substantive groups to which they belong - e.g., Demographic Variables, Health Status Variables. This is helpful to the user in locating variables of interest.

Codebooks come in a variety of shapes and formats. As long as the content is complete and self-explanatory, the stylistic touches can match the needs of the research project.


Additional Examples

Below are additional examples of variable level details from a wide variety of research codebooks.

American National Election Study, 2008-2009 Panel Study5

Does R like or dislike Joe Biden

National Longitudinal Study of Adolescent Health (Add Health), 1994-19956

National Longitudinal Study of Adolescent Health (Add Health), 1994-1995

General Social Surveys, 1972-20087

General Social Surveys, 1972-2008

National Survey on Drug Use and Health, 20098

National Survey on Drug Use and Health, 2009

Capital Punishment in the United States, 1973-20089

Capital Punishment in the United States, 1973-2008

Resources

UK Data Archive, "Documenting Your Data/Data Level/Structured Tabular Data"

http://www.data-archive.ac.uk/create-manage/document/data-level?index=1

Institute for Health and Care Research Quality Handbook

http://www.emgo.nl/kc/codebook/

Princeton University Data and Statistical Services, "How to Use a Codebook"

http://dss.princeton.edu/online_help/analysis/codebook.htm

UCLA Social Science Data Archive, "Codebooks"

http://dataarchives.ss.ucla.edu/tutor/tutcode.htm



References


1Guide to the NLSY97 Data. Retrieved August 1, 2011, from http://www.nlsinfo.org/nlsy97/97guide/chap3.htm#threethree

2Ohio State University. Center for Human Resource Research. National Longitudinal Survey of Youth, 1979 [Computer file]. ICPSR04683-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2007-09-17. doi:10.3886/ICPSR04683

3Fogel, Robert W., et al. Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-1940 [Computer file]. ICPSR06837-v6. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2006-06-05. doi:10.3886/ICPSR06837

4Angel, Ronald, Linda Burton, P. Lindsay Chase-Lansdale, Andrew Cherlin, and Robert Moffitt. Welfare, Children, and Families: A Three-City Study [Computer file]. ICPSR04701-v7. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-02-10. doi:10.3886/ICPSR04701

5American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. Retrieved August 1, 2011, from http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt

6National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. Retrieved August 1, 2011, from http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html

7Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962

8United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621

9United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982