Jun 26, 2009

How does ICPSR manage versioning?

  • What triggers a new edition or version of a study?

    • A change in any of the data and/or documentation files.

    • The addition of withdrawal of data and/or documentation files.

  • How and where is such a change to a study documented?

    In the metadata record:

    • Version field (in version notation "ICPSRXXXXX-v3")

    • Version history field (collect.changes, which provides a text description of what has changed, and a datestamp)

    • Citation display includes the version statement

  • What happens to unchanged files (if changes don't apply to all files)?

    ICPSR does not currently version at the individual file level - our version statement references the collection as a whole. If only one file of a multiple file collection changes, the collection version changes.

  • Are previous editions/versions kept?

    Yes, through a back-up system and a searchable 'browse archive' feature available to authorized staff.

  • Are these made available to users?

    Upon request only, previous versions can be made available to users.

  • Clarification on terminology, do we use 'edition', 'version', or other terms?

    • ICPSR uses 'version' exclusively. (Historically, ICPSR used three different terms: edition, version, and release, but these have all been rolled into the single term "version" and the notation "ICPSRXXXXX-v3").

    • What do we mean by "version" : A form or variant of the original ICPSR-archived data collection.

Jun 23, 2009

Why and how should I cite data?

Why should I cite data?

Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate research findings and need the bibliographic information provided in citations to identify and locate the referenced data.

  • Citations appearing in publication references are harvested by key electronic social sciences indexes, such as Web of Science, providing credit to the researchers.

  • Data producers, funding agencies, and others can track citations to specific collections to determine types and levels of usage, thus measuring impact.

Where do I find the citation?

Citations for ICPSR data can be found in the following locations:

  1. Study descriptions that appear on the Web site
  2. File manifest
  3. PDF study description file

Both the file manifest and the PDF study description file are automatically included with every download. Thus, every download is accompanied by a copy of the standard citation that can be copied and pasted with ease.

What do the citations look like?

Here are some examples:

ABC News, and The Washington Post. ABC News/Washington Post Poll, May 2007 [Computer file]. ICPSR24588-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-04-17. doi:10.3886/ICPSR24588

United States Department of Commerce. Bureau of the Census, and United States Department of Labor. Bureau of Labor Statistics. Current Population Survey: Annual Demographic File, 1987 [Computer file]. ICPSR08863-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-02-03. doi:10.3886/ICPSR08863

Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O'Malley, and John E. Schulenberg. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2007 [Computer File]. ICPSR22480-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-10-29. doi:10.3886/ICPSR22480

Hall, David, Clement Leduka, Michael Bratton, E. Gyimah-Boadi, and Robert Mattes. Afrobarometer Round 3: The Quality of Democracy and Governance in Lesotho, 2005 [Computer file]. ICPSR22203-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-05-19. doi:10.3886/ICPSR22203

Note that we also include a DOI (Digital Object Identifier) at the end of each citation. A DOI is a unique persistent identifier for a published digital object, such as an article of a study, providing a link to the article or study. This means that if you publish an article using ICPSR data and you include the DOI in the data citation, you make it easy for other researchers to get back to the original data.

How can I let ICPSR know about my publication?

Users of ICPSR data are required to send us bibliographic citations for each completed manuscript or thesis abstract. This allows us to provide funding agencies with essential information about use of archival resources and facilitates the exchange of information about the research activities of principal investigators.

Email bibliography@icpsr.umich.edu to submit citations for inclusion in our Bibliography.

How do I submit a citation for a publication I have written using your data?

If you have published work based on our data, or if you know of data-related literature that is not in our bibliography, please send the citation to bibliography@icpsr.umich.edu

View the citations in the Bibliography of Data-Related Publications.

Jun 18, 2009

When I attempt to uncompress the files I downloaded from your site, WinZip complains that the file name is insensible. How can I uncompress the file?

The total path length (not file name length) has to be less than 255 characters. Our file names can be lengthy. If the path to which you wish to extract your files is also lengthy, then WinZip will fail.

Extract your files to the root directory of your hard drive. I.e., extract the files to c:/ instead of c:/User/My Documents/Various Social Science Projects On Which I Work/ICPSR Data/.

How do I use a SAS setup file to import ASCII data?

Setup files contain the syntax or program code to read raw data (ASCII) into a statistical package. The instructions below demonstrate how to use SAS setup files in a Windows environment.

These instructions assume that you have already downloaded the ASCII data and SAS setup file from the Internet. If you have a compressed version of a file, you will have to decompress it before using the setup file.

Note: In order to successfully use setup files, you must know the exact location (i.e., full pathname, such as C:\My Documents\Data) and filename (e.g., da9999.txt) of the files that you obtained from ICPSR.

Instructions

  1. Download the SAS setup file from the ICPSR Web site.

  2. Most of the files downloaded from the ICPSR Web site will be compressed. You will have to decompress the files using WinZip or other decompression software. More information about decompressing files can be found at the help page, How do I decompress the files I download from your site? Once the SAS setup file has been downloaded and decompressed, rename the file to add a '.sas' extension. This will allow SAS to recognize the file as a SAS syntax file.

    Screen Shot

  3. Open SAS for Windows.

    Screen Shot

  4. Open the SAS setup file in the SAS Program Editor window.

    • Click on File and then Open to get an Open File dialog box.

    • At the top of the box, where it says Look In, choose the path where the SAS setup file is located.

    • At the bottom of the box, set Files of Type to All Files.

    • You will then see a list of all files in the directory you selected. Either double-click on the SAS setup file or click once on the name of your chosen file (the name will appear after File Name) and then click on Open.

    Screen Shot

    • Since the SAS setup file is a text file, SAS will display the file in the SAS Program Editor.

    Screen Shot

  5. Most ICPSR setup files contain a header that describes the contents of the file. Once you have opened the setup file in the SAS Program Editor, read the ICPSR header, if present, for important information about the file.

    Screen Shot

  6. After reading the header, scroll to the DATA command. Add a dataset name for your data to this command line, if you want it located in the temporary SAS library 'work.' Please consult SAS documentation if you want the dataset saved in a permanent SAS library.

    Screen Shot

  7. Scroll to the INFILE command. Replace the text that says physical-filename or file-specification with the full path and name of the data file you extracted from the downloaded file.

    • It is important that you include the full path (e.g., C:\My Documents\Data); otherwise SAS may not be able to locate the file. For example, if you downloaded the data for ICPSR 2992 into the directory C:\My Documents\Data and you called the file da2992.txt, then the INFILE command should read:

      INFILE 'C:\My Documents\Data\da2992.txt' LRECL=30;

      (Note that the LRECL varies by study and the correct number will already be provided in the SAS setup file.)

    Screen Shot

  8. If there are PROC FORMAT, FORMAT, or MISSING VALUE RECODE commands in the setup file, ICPSR usually places SAS comment delimiters before (/*) and after (*/) the appropriate section, which means that SAS will not automatically read these commands. If you want SAS to read and execute these commands, you should remove the set of comment markers for each section.

    Screen Shot

  9. Scroll to the end of the setup file. If a RUN command is not already there, then type one in. Make sure the command ends with a semicolon.

    Screen Shot

  10. You are now finished editing the SAS setup file. Run the statements by clicking on Run > Submit.

    Screen Shot

  11. The log file will show the commands that SAS processed, as well as any error messages.

    Screen Shot

  12. The data can now be used for analysis. If you are using SAS System for Windows Release 7.0 or higher, you can view the data file in the SAS Table Editor. Go to the Tools menu bar and select Table Editor. Once the Table Editor window appears, click on File and Open to open your newly-created data file. The data file will be located in the Work library unless you changed the library reference prior to running the setup file. Click on the data file and then on Open to see the data displayed in the Table Editor.

    Screen Shot

  13. Users should be aware that a SAS dataset created in the Work library will be discarded at the end of the SAS session. To save a SAS dataset for subsequent SAS sessions you must assign the file a two-level name. The first level is the library name and the second level is the dataset name. This can be done in Windows by selecting Save As... under the File menu in the VIEWTABLE window, creating a new library using the Create New Library icon, then specifying a data table name and clicking on Save. Please refer to your SAS manual or SAS System Help for more information about saving SAS data sets.

    Screen Shot

  14. For further help with the SAS System for Windows, consult the HELP menu on the top toolbar of SAS or refer to your SAS manual.

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

How do I use an SPSS setup file to import ASCII data?

Setup files contain the syntax or program code to read undelimited data (ASCII) into a statistical package. The instructions below demonstrate how to use SPSS setup files in a Windows environment.

These instructions assume that you have already downloaded the ASCII data and SPSS setup file from the Internet. If you have a compressed version of a file, you will have to decompress it before using the setup files.

Note: In order to successfully use setup files, you must know the exact location (i.e., full pathname, such as C:\My Documents\Data) and filename (e.g., da9999.txt) of the files that you downloaded.

Instructions

  1. Download the SPSS setup file from the ICPSR Web site.

  2. Most of the files downloaded from the ICPSR Web site will be compressed. You will have to decompress the files using WinZip or other decompression software. Once the SPSS setup file has been downloaded and decompressed, rename the file to add the '.sps' extension. This will allow SPSS to recognize the file as an SPSS syntax file. Do not save the setup file on your local machine with the ".txt" extension because SPSS for Windows will try to read it as a data file rather than a syntax file.

    Screen Shot

  3. Open SPSS for Windows.

    Screen Shot

  4. Open the SPSS setup file in the SPSS for Windows Syntax Editor.

    • If a dialog window of shortcuts opens, close it by clicking the 'Cancel' button.

    • Click on File and then Open to get an Open File dialog box.

    • At the top of the box, where it says Look In, choose the path where the SPSS setup file is located.

    • At the bottom of the box, set Files of Type to All Files to see a listing of all files in a particular directory or to Syntax (*.sps) if you saved the setup file with an '.sps' extension.

    • You will then see a list of files in the directory you selected. Either double-click on the SPSS setup file or click once on the name of your chosen file (the name will appear after File Name) and then click on Open.

    Screen Shot

    • Since the SPSS setup file is a text file, SPSS will open a new Syntax Editor window to display the file.

    Screen Shot

  5. If you try to open the SPSS setup file and you are prompted with a dialog box that says Opening File Options, then press Cancel. SPSS is trying to read the setup file as a data file rather than a syntax file. This is likely to happen if your setup file has a ".txt" extension. You can either rename the file and remove the ".txt" extension or you can open the setup file in an editing program and copy and paste the text into the SPSS for Windows Syntax Editor.

    Screen Shot

  6. Most ICPSR setup files contain a header that describes the contents of the file. Once you have opened the setup file in the SPSS for Windows Syntax Editor, read the ICPSR header, if present, for important information about what is contained in the file.

    Screen Shot

  7. After reading the header, scroll to the DATA LIST command. Replace the text that says physical-filename or file-specification with the full path and name of the data file extracted from the downloaded file.

    • It is important that you include the full path (e.g., C:\My Documents\Data); otherwise SPSS may not be able to locate the file. For example, if you extracted the data for ICPSR 2992 into the directory C:\My Documents\Data and you called the file da2922.txt, then the DATA LIST command should read:

      DATA LIST FILE="C:\My Documents\Data\da2992.txt" /

    Screen Shot

  8. If there is a MISSING VALUES command in the setup file, ICPSR usually places an SPSS comment delimiter (*) before the command line, which means that SPSS will not read this command. If you want SPSS to read this command, you should delete the asterisk and be sure that the command starts in the first column of the line.

    Screen Shot

    Some SPSS setup files also contain a missing value RECODE command. This command may also have an SPSS comment marker (*) at the beginning of the line. If you want SPSS to read this command, you should delete the asterisk and be sure that the command starts in the first column of the line. When both a MISSING VALUES and missing value RECODE command are present in the same SPSS setup file, only one of the two commands should be executed. Choose MISSING VALUES if you want to retain the missing values in the data, but have them designated as missing values by SPSS for analysis purposes. Choose missing value RECODE if you want missing values converted to system missing. Please note that the missing value RECODE command may collapse several different missing values for one variable into system missing.

  9. Scroll to the end of the setup file. If an EXECUTE command is not already there, then type one in. Start the command in the first column of a new line and end the line with a period.

    Screen Shot

  10. You are now finished editing the SPSS setup files. Run the statements by clicking on Run -> All. The status bar at the bottom of the screen will show the commands that SPSS is processing. When SPSS has completed executing the commands, the status bar will display the message "SPSS for Windows Processor is ready."

    Screen Shot

  11. When the processor is finished, go to the Window menu and choose SPSS for Windows Data Editor to see the data. Any error messages will be printed in a log file in the SPSS Output window.

    Screen Shot

  12. If you do not see the data appear in the SPSS Data Editor, check the status bar in the lower left corner of the screen. If the status bar says Transformations Pending, go to the Transform Menu and click on Run Pending Transformations. This is usually necessary when you do not have an Execute command at the end of the setup files.

    Screen Shot

  13. Once you have read the data into the SPSS Data Editor, you may then start subsequent sessions using the imported data. You can bypass having to import the data with the SPSS setup files every time you want to access the data by saving the imported data on storage media; go to the File menu and click on Save As ... to save the file as either an SPSS system or portable file. Specify the directory where you would like to store the file using the Save in: box, enter a Filename, and choose the type of file you would like the data saved as. You can then begin subsequent SPSS sessions by opening the saved file from the Data Editor window.

    Screen Shot

  14. For further help with SPSS for Windows, consult the HELP menu on the top toolbar of SPSS or refer to the SPSS for Windows Base System User's Guide.

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

What are setup files?

Many of our data collections that contain ASCII data files are accompanied by setup files that allow users to read the text files into statistical software packages. Since a visual interpretation of alphanumeric data files is inefficient, statistical software is needed to define, manipulate, extract, and analyze variables and cases within data files. We currently provide for many of our data collections setup files for SAS, SPSS, and Stata statistical software packages, three of the more commonly used analytical software packages for the social sciences.

The following instructions explain the different components of SAS, SPSS, and Stata setup files. Setup files for certain collections may not contain all of the commands listed below.

SAS Setup Files

SAS setup files can be used to generate native SAS file formats such as SAS datasets, SAS xport libraries, and transport files. Our SAS setup files generally include the following SAS sections. Click on each section to see an example taken from ICPSR 6512 (Capital Punishment in the United States, 1973-1993).

  1. PROC FORMAT: Creates user-defined formats for the variables. Formats replace original value codes with value code descriptions. Not all variables necessarily have user-defined formats.
  2. DATA: Begins a SAS data step and names an output SAS dataset.
  3. INFILE: Identifies the input data file to be read with the input statement. Users must replace the "physical-filename" with host computer-specific input file specifications. For example, users on Windows platforms should replace "physical-filename" with "C:\06512-0001-Data.txt" for the data file named "06512-0001-Data.txt" located on the root directory "C:\".
  4. INPUT: Assigns the name, type, decimal specification (if any), and specifies the beginning and ending column locations for each variable in the data file.
  5. LABEL: Assigns descriptive labels to all variables. Variable labels and variable names may be identical for some variables.
  6. FORMAT: Associates the formats created by the PROC FORMAT step with the variables named in the INPUT statement.
  7. MISSING VALUE RECODES: Sets user-defined numeric missing values to missing as interpreted by the SAS system. Only variables with user-defined missing values are included in the statements.

SPSS Setup Files

SPSS setup files can be used to generate native SPSS file formats such as SPSS system files and SPSS portable files. SPSS setup files produced by generally include the following SPSS sections. Click on each section to see an example taken from ICPSR 6512 (Capital Punishment in the United States, 1973-1993).

  1. DATA LIST: Assigns the name, type, decimal specification (if any), and specifies the beginning and ending column locations for each variable in the data file. Users must replace the "physical-filename" with host computer-specific input file specifications. For example, users on Windows platforms should replace "physical-filename" with "C:\06512-0001-Data.txt" for the data file named "06512-0001-Data.txt" located on the root directory "C:\".
  2. VARIABLE LABELS: Assigns descriptive labels to all variables. Variable labels and variable names may be identical for some variables.
  3. VALUE LABELS: Assigns descriptive labels to codes in the data file. Not all variables necessarily have assigned value labels.
  4. MISSING VALUES: Declares user-defined missing values. Not all variables in the data file necessarily have user-defined missing values. These values can be treated specially in data transformations, statistical calculations, and case selection.
  5. MISSING VALUE RECODE: Sets user-defined numeric missing values to missing as interpreted by the SPSS system. Only variables with user-defined missing values are included in the statements.

Stata Setup Files

Stata setup files can be used to generate native Stata DTA files. Stata setup files produced by ICPSR generally include the following Stata sections. Click on each section to see an example taken from ICPSR 6512 (Capital Punishment in the United States, 1973-1993).

  1. FILE SPECIFICATIONS: Assigns values to local macros that specify the locations of the files used to build a Stata system file. Users must replace the "physical-filename" with host computer-specific input file specifications. For example; users on Windows platforms should replace "raw-datafile-name" with "C:\06512-0001-Data.txt" for the data file named "06512-0001-Data.txt" located on the root directory of "C:\". Simarlarly, the "dictionary-filename" should be replaced with "C:\06512-0001-Stata_dictionary.dct". The "stata-datafile" specification should be named with the specification for where you wish to store the Stata system file.
  2. INFILE COMMAND: Reads the columnar ASCII data into a Stata system file.
  3. VALUE LABEL DEFINITIONS: Defines descriptive labels for the individual values of each variable.
  4. MISSING VALUES: Replaces numeric missing values (i.e., -9) with generic system missing ".". By default the code in this section is commented out. Users wishing to apply the generic missing values should remove the comment at the beginning and end of this section. Note that Stata allows you to specify up to 27 unique missing value codes.
  5. SAVE OUTFILE: This section saves out a Stata system format file. There is no reason to modify it if the macros in Section 1 were specified correctly.

How do I interpret a record from an ASCII data file?

Our data files are usually distributed as columnar ASCII files that consist of rows and columns of alphanumeric characters. Since ASCII data files are simply text files, they can be opened in any word processing program or Internet browser. However, the alphanumeric characters are not meaningful without the help of a codebook or setup files to identify the columns of the ASCII data file as particular variables.

This example illustrates how to interpret an ASCII data file for ICPSR 2737, Capital Punishment in the United States, 1973-1997.

The data file consists of 6,819 cases or observations, which in this example is inmates under sentence of death or those who were executed. Example 1 shows the first 10 lines of data in this file. The first observation, or line of data, is highlighted in red.

Example 1: The first case or line of data in the data file

Screen shot of columns of numbers, first row highlighted in red

The data file is a fixed format data file and is stored in a logical record length of 81. This means that each line is comprised of 81 characters. These 81 characters correspond to 37 variables or data items. Example 2 illustrates that each line of data in the file is 81 characters long.

Example 2: Each record is the same length (81 characters wide)

Screen shot of columns of numbers, first and last columns highlighted in yellow

In order to know which columns comprise particular variables, it is necessary to refer to the codebook (PDF 234K). The following examples illustrate how to read the first ten variables from this ASCII data file, beginning with the first record (row) and counting from left to right:

VARIABLE 1

V1-ICPSR STUDY NUMBER: This variable is positioned in column locations 1 through 4 and contains the value "2737" for each record. This value represents the 4-digit ICPSR archival study number assigned to this data collection.

Example 3: Variable 1 in Columns 1-4

Screen shot of columns of numbers, first four characters highlighted in yellow

VARIABLE 2

V2-ICPSR EDITION NUMBER: This variable is positioned in column location 5 and contains the value "1" for each record. This value represents the ICPSR edition number assigned to the data collection.

Example 4: Variable 2 in Column 5

Screen shot of columns of numbers, fifth character in each row highlighted in yellow

VARIABLE 3

V3-ICPSR PART NUMBER: This variable is positioned in column location 6 and contains the value "1" for each record. This value represents the ICPSR part number assigned to the data file within the data collection.

Example 5: Variable 3 in Column 6

Screen shot of columns of numbers, sixth character in each row highlighted in yellow

VARIABLE 4

V4-ICPSR SEQUENTIAL ID: This variable is positioned in column locations 7 through 10 and contains the value "1" for the first record. This value represents the first sequential case identification number and is used to uniquely identify a given record in the data file.

Example 6: Variable 4 in Columns 7-10

Screen shot of columns of numbers, second column highlighted in yellow

VARIABLE 5

V5-REPORT YEAR: This variable is positioned in column locations 11 through 14 and represents the reporting year. The first record, highlighted in red, contains the value "0", which represents a reporting year prior to 1973. The fifth record, also highlighted in red, contains the value "1973", which represents the actual year of the event.

Example 7: Variable 5 in Columns 11-14

Screen shot of columns of numbers, third column highlighted in yellow

VARIABLE 6

V6-INMATE ID: This variable is positioned in column locations 15 through 18 and contains the value "8" for the first record. This value represents a four-digit inmate identification number.

Example 8: Variable 6 in Columns 15-18

Screen shot of columns of numbers, fourth column highlighted in yellow

VARIABLE 7

V7-STATE: This variable is positioned in column locations 19 through 20 and contains the value "1" for all 10 records in this example. This value represents the FIPS state code for Alabama.

Example 9: Variable 7 in Columns 19-20

Screen shot of columns of numbers, first character of fifth column highlighted in yellow

VARIABLE 8

V8-Q3 SEX: This variable is positioned in column location 21 and contains the value "1" for the first 10 records. This code identifies the sex of these inmates as "male".

Example 10: Variable 8 in Column 21

Screen shot of columns of numbers, second character of fifth column highlighted in yellow

VARIABLE 9

V9-Q4A RACE: This variable is positioned in column 22 and contains the value "2" for the first record. This code identifies the race of this inmate as "Black".

Example 11: Variable 9 in Column 22

Screen shot of columns of numbers, third character of fifth column highlighted in yellow

VARIABLE 10

V10-HISPANIC ORIGIN: This variable is positioned column 23 and contains the value "2" for the first record. This code identifies the Hispanic origin of this inmate as "Non-Hispanic".

Example 12: Variable 10 in Column 23

Screen shot of columns of numbers, fourth character of fifth column highlighted in yellow

To locate the column positions for the remaining variables for this study, see the codebook for CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-1997.

This example illustrates that a visual interpretation of the data record is inefficient. Commercially available statistical software packages such as SAS, SPSS, and Stata are available to interpret data files and to subset the variables and or cases as needed.

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Jun 17, 2009

I've forgotten my password. How do I get a new one?

Just go to the MyData login page. At the bottom of the page is a link titled Forgot your password? You'll be asked to enter your email address, and then a new password will be sent to you. After you get the email, you'll probably want to change your password to something easily remembered.

Why can't I print my PDF codebook?

Some versions of the Acrobat software can't properly manage PDFs created in older versions of the software. Our processors test documentation files to ensure compatibility with the latest version of the free Acrobat Reader. We suggest downloading the latest version of the free Acrobat Reader from Adobe's Web site.

How do I use Excel to import tab-delimited ASCII data?

SAMHDA produces and makes available for download ASCII data files in two formats. The first of these is a fixed-format data file (da99999-9999.txt) to be used in conjunction with a setup file for SAS, SPSS, or Stata. The second format is a tab-delimited data file (da99999-9999.tsv).

Note: The Import Wizard for SAS, SPSS, and Stata can read the tab-delimited file into the statistical package. However, if using one of these statistical packages SAMHDA encourages you to use the fixed-format (.txt) data file to read in the data with its' accompanying setup file.

Warning: An error will occur if you try to read in a data file with more than 65,536 cases or 256 variables. These are the maximum limits that an Excel spreadsheet can handle.

Instructions

1. Download the tab-delimited ASCII data file from the SAMHDA Web site.

2. Most of the files downloaded from the ICPSR Web site will be compressed. You will have to decompress the files using WinZip or other decompression software. More information about decompressing files can be found at the help page, How do I decompress the files I download from your site?

3. Open Excel for Windows.

Screen Shot

4. Open the tab-delimited ASCII data file.

Screen Shot

  • Click on File and then Open to get an Open File dialog box.
  • At the top of the box, where it says Look In, choose the path where the tab-delimited ASCII data file is located.
  • At the bottom of the box, set Files of Type to All Files.
  • You will then see a list of all files in the directory you selected. Either double-click on the .tsv file or click once on the name of your chosen file (the name will appear after File Name) and then click on Open.

5. This will open Excel's text Import Wizard Step 1 of 3.

Screen Shot

  • Make sure the button for Delimited is marked and the box for "Start import at row" is set to 1.
  • Then click on Next.

6. Go to Import Wizard Step 2 of 3.

Screen Shot

  • Select Tab in the Delimiters option box.
  • Then click on Next.

7. Go to Import Wizard Step 3 of 3.

Screen Shot

  • Leave every column set to General. You do not have to do anything in this step. SAMHDA studies do not contain string or date variables.
  • Then click on Finish.

8. Review imported data file.

Screen Shot

You now have completed importing the data file. Row 1 will contain the names of the variables. Column A will be the CASEID variable. To confirm the import worked properly scroll across and down to check on the number of variables and cases imported. Compare these numbers against those provided by SAMHDA in the file manifest. This file can be accessed by going to the bottom of the study's Description and Citation or Browse Documentation pages.

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Whenever I try to login, I get a message stating that I don't have cookies enabled. What's going on?

Under normal operation the Web site uses the HTTP protocol to deliver content. In a small number of cases we use HTTP with SSL encryption (commonly referred to as HTTPS) to protect the security of the data moving across the network. Our login procedure is one such case.

When one tries to access a resource on the Web site, the system checks for the presence of a login ticket (or, more generically, a cookie). If there is no ticket available, the browser is redirected to a URL (using HTTP) where the person enters a login and password. When the person clicks the Log In button, the login and password are delivered to the Web site via HTTPS, and the Web site returns a login ticket via HTTPS. Finally the Web site returns the person to the original Web page or resource via HTTP. If anything goes wrong during this process, we deliver an error message about cookies not being enabled, which is the most common cause of failure. The next most common failure is when a site uses a proxy server or firewall, but only proxies HTTP, not HTTPS, and so the transaction fails.

The work around is to force the entire transaction through HTTPS. Here's an easy way to do that:

  1. Go to this page: http://www.icpsr.umich.edu/mydata?path=ICPSR

  2. This should redirect you to the error page about cookies not being enabled. The URL in the "Address bar" should look like this:

    http://www.icpsr.umich.edu/ticketlogin.

  3. Modify the URL above, adding in an "s" between the "p" in "http" and the semicolon. It should now look like this:

    https://www.icpsr.umich.edu/ticketlogin

  4. Enter your MyData login and password, and click the Log In button.Or click the Log In Anonymously button.

  5. You should now have this URL in your "Address bar"(http://www.icpsr.umich.edu/mydata?path=ICPSR) and have a list of account-related actions you can take. The important thing, though, is that you now have a ticket (cookie) and should be able to download resources. If you click the logo at the top of the page, that will return you to the home page, and you can then use the Web site as usual.

Please note that you cannot combine steps (1) and (3) by starting at a HTTPS-delivered version of the home page, because you will still be redirected to an HTTP-type link for the login page after performing step (2). Thus step (3) will still be necessary, and it will also force many web fetches to incur the SSL encryption overhead on both our server and your desktop machine.

What's the date/number listed to the right of each search result?

The value in the right column changes based upon the current sort.

  • For most results, it displays the last date in the time period field, the date on which the study was undertaken.

  • If you sort by "Most Cited in ICPSR Bibliography," it will display the number of related citations that ICPSR has found in its bibliographic searches.

  • The "Most Downloaded" sort will display the number of unique users who have downloaded this study in the last 90 days.

  • "Released/Updated" will display the date on which the study was last updated or the release date (whichever is later).

On the previous version of the site, this was the date on which ICPSR released/updated the data, but we changed that in April 2010, as the time period is more useful to researchers.

Can I use an asterisk in the search to match partial words?

No, because our search engine matches partial words automatically. Search terms are stemmed, meaning that partial-word matches will appear in search results.

Example: a search for "network" will also turn up "networks" and "networked"

How can I search a particular field in the study descriptions?

With the addition of faceted searching on the ICPSR Web site, we no longer need or provide field searching. On the search results screen you will see a number of fields/facets in the right-hand column. You can use these to narrow your search using specific metadata fields, such as investigator, subject term, geography, time period, or series.

If you have a suggestion for a new facet, please contact us at web-support@icpsr.umich.edu.

A data collection instrument is included in the documentation for a study. Can I use the data collection instrument for my project?

Some instruments utilized as part of the data collection process for a project deposited with ICPSR may contain whole or in part contents from copyrighted instruments. Reproductions of such instruments are provided as documentation for the analysis of the data of the associated collection. Restrictions on "fair use" apply to all copyrighted content.

Circular 21 from the U.S. Copyright Office provides basic information on fair use and several important legislative provisions and other documents addressing reproduction of copyrighted materials by librarians and educators.

What are the consequences of violating the terms of use agreement for ICPSR data?

Subjects who participate in surveys and other research instruments distributed by ICPSR expect their responses to remain confidential. The data distributed by ICPSR are for statistical analysis, and they may not be used to identify specific individuals or organizations. Although ICPSR takes steps to assure that subjects cannot be identified, users are also obligated to act responsibly and not to violate the privacy of subjects intentionally or unintentionally.

If ICPSR determines that the terms of use agreement has been violated, one or more steps will be taken, which may include:

  • ICPSR may revoke the existing agreement, demand the return of the data in question, and deny all future access to ICPSR data.

  • The violation may be reported to the Research Integrity Officer, Institutional Review Board, or Human Subjects Review Committee of the user's institution. A range of sanctions are available to institutions including revocation of tenure and termination.

  • If the confidentiality of human subjects has been violated, the case may be reported to the Federal Office for Human Research Protections. This may result in an investigation of the user's institution, which can result in institution-wide sanctions including the suspension of all research grants.

  • A court may award the payment of damages to any group(s) or individual(s) harmed by the breach of the agreement.

How do I find data referenced in a journal as being available at your archive?

You may search our holdings by title, principal investigator, or other information related to the data.

What kind of documentation do you provide, and in what formats?

For any study, there are several possible types of documentation files for data collections:

  • Codebook: Information on the structure, contents, and layout of a data file. The codebook may also contain information on study design and methodology.

  • Dictionary file: Information on column locations and labeling of variables.

  • Data collection instrument: Original survey instrument or questionnaire.

  • Data map: Similar to a dictionary file.

  • Errata file: Errors noted for a particular collection, usually supplied by the principal investigator.

  • Frequency file: Frequency of response or descriptive statistics for selected variables in a collection.

  • Crosstabulation file: Crosstabulations for some or all variables in a collection.

  • User Guide: More detailed information about a particular collection, often provided by the principal investigator.

  • Manual: Instructions prepared by the principal investigator on some aspect of the data collection.

  • Appendices: Additional documentation.

  • Reports: Description of findings or results based on analysis of a dataset. Prepared by the principal investigator.

  • Record layout file: Similar to a dictionary file.

  • Tables/Crosstables: Similar to frequencies files but presented in tabular format.

Our standard for documentation is Portable Document Format (PDF), and we are moving toward compliance with the PDF/A standard. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as the Adobe Acrobat Reader.

Some older studies may have ASCII or Word-processed documentation, but those formats are being converted to PDF.

How do I decompress the files I download from your site?

files distributed via the Internet were compressed using Windows Zip data compression software. Files compressed using WinZip have the .zip file name extension. Users who download compressed files will have to decompress the files before using them.

Please note that files downloaded prior to November 29, 2004, may have the Gzip compression format.

Windows

Windows XP has a built-in decompression tool that decompresses .zip files. Users with other Windows versions may need to download the utility from the WinZip Web site.

WinZip and the Saved Files Utility

WinZip users (and those who use the built-in decompression tool in Windows XP), should be aware that WinZip has two ways to extract files: by using drag-and-drop and by choosing "Extract" from underneath the "Actions" menu. These two methods produce different results. If you use drag-and-drop, then you will only get the files...not the folders that enclose them. Hence you'll lose the hierarchy that has set up (including folders that are titled with study names and dataset names). If you use the "Extract" command from the "Actions" menu, then the folder hierarchy is preserved if 'Use Folder Names' is specified in the extraction dialog box.

Macintosh

For Macintosh OSX users, decompression software is built into the operating system; you can open compressed files by double-clicking on the .zip file.

If you're encountering problems with the MacOSX built-in decompression software, you may wish to download StuffIt Expander.

UNIX/Linux

Users in the UNIX/Linux environment can simply use the unzip command to decompress .zip files.

Once you have the appropriate software on your local machine, follow the instructions supplied by your software to decompress the zipped files.

Jun 4, 2009

How do I read data into R?

There is no such thing as an R system file similar to a Stata .dta or an SPSS .sav file. Instead, R reads data from a variety of formats – including files created in other statistical packages – directly into working memory. R generally lacks intuitive commands for data management, so users typically prefer to clean and prepare data with SAS, Stata, or SPSS. Once the data are ready, several functions are available for getting the data into R.

Reading Data Files in SPSS, Stata, and SAS formats

The foreign package can be used to read data stored as SPSS .sav files, Stata .dta files, or SAS XPORT libraries. If foreign is not already installed on your local computer, go to the Packages menu and choose Install package(s).

If prompted, choose the closest CRAN mirror. When the Packages dialog box appears, scroll down to choose foreign and then click OK.

To use the commands in foreign one must first attach the library using the library function. At the prompt, type

> library(foreign)

As an example of reading data from other formats, assume that there is an SPSS file called survey.sav saved in the directory C:\mydata. The read.spss function from the foreign library will read the file into R.

> dataSPSS<-read.spss("C:/mydata/survey.sav", to.data.frame=TRUE)

This creates a data object called dataSPSS that is ready for analysis. The to.data.frame argument, whose default value is FALSE, tells R to treat the object as a data frame. Note that when specifying the pathname, R understands forward slashes whereas Windows reads backward slashes. If it is necessary to read in several data files from the same directory, the amount of typing can be reduced by first setting the working directory and then using the relative pathname. For example,

> setwd("C:/mydata")

> dataSPSS<-read.spss("survey.sav", to.data.frame=TRUE)

Alternatively, if one prefers to search for the location of a data file, one can type

> dataSPSS<-(file.choose(), to.data.frame=TRUE)

This will open a dialog box that can be used to navigate to the appropriate folder.

R will assume that any value labels recorded in the SPSS file refer to factors (categorical variables) and will store the labels rather than the original number. For example, a variable named gender may be coded 0=male and 1=female, and the labels are saved in the .sav file. When R reads in the data from SPSS, the values of the variable will be "male" and "female" rather than "0" and "1". This is the default behavior, but it can be changed in the call to the read.spss function:

> dataSPSS<-read.spss(file.choose(), use.value.labels=FALSE)

Reading Stata files is equally straightforward using the read.dta function. Assuming there is a Stata data file survey.dta in the C:\mydata folder, the appropriate syntax is

> dataStata<-read.dta("C:/mydata/survey.dta")

or

> dataStata<-read.dta(file.choose())

The created object is automatically a data frame. The default is to convert value labels into factor levels ("male" and "female" rather than "0" and "1"), but this can be turned off.

> dataStata<read.dta(file.choose(), convert.factors=FALSE)

Note that Stata sometimes changes how it stores data files from one version to the next, and the foreign package may lag a little behind. If the read.dta command returns an error, try saving the data in Stata using the .saveold command. This will create a .dta file saved in a previous version of Stata that read.dta may be more likely to recognize.

R can also read SAS XPORT libraries. The function takes only a single argument, the pathname:

> dataXPORT<-read.xport("C:/mydata/survey")

The function returns a data frame if there is a single dataset in the library or a list of data frames if there are multiple datasets.

Reading in ASCII files

R can also easily read in space-, tab-, and comma-delimited text files. The read.table function handles the first two cases; read.csv handles the other. Say there is an ASCII data file survey.dat in which white space separates the values for each variable. The following syntax reads in this data.

> dataTEXT<-read.table("C:/mydata/survey.dat", header=TRUE, sep= " ")

The header argument tells R that the first row includes variable names. Its default is FALSE. The sep argument specifies that values are separated by any white space, which is the default. If the values are separated by tabs, the value of the sep argument is changed to

> dataTAB<-read.table("C:/mydata/survey.dat", header=TRUE, sep= "\t")

The read.csv command is available for reading data files with comma-separated values.

> dataCOM<-read.csv("C:/mydata/survey.csv", header=TRUE)

The following are also equivalent:

> setwd("C:/mydata") > dataCOM<-read.csv("survey.csv", header=TRUE)

and

> dataCOM<-read.csv(file.choose(), header=TRUE)

It is also possible to read fixed format ASCII files – those with pre-specified columns and no delimiters – using the read.fwf function. However, this task is tedious (as it is in any package). For ICPSR data it is recommended to use the available setup files to read fixed format data into another package and then use the commands in R's foreign library.

Data in Excel Format

The easiest way to get Excel data into R is to save the spreadsheet as a comma-separated file and use R's read.csv function. The file type can be altered in Excel by changing the Save as type option to CSV (Comma Delimited).

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Can I use R without having to learn the details of the R language?

Yes, there are a number of "front ends" that have been constructed in order to make it easier for users to interact with the R statistical computing environment. For example, a graphical user interface (or "GUI") allows the analyst to carry out data analysis tasks by selecting items from menus and lists, rather than entering commands.

One such GUI is the R Commander, written by John Fox. The R Commander is accessed by installing and loading the Rcmdr package within R. The R Commander provides an easy-to-use, menu-based system for loading data into R, manipulating data values, performing statistical analyses, creating graphical displays, and carrying out diagnostic tests on statistical models. Documentation for the R Commander is available on John Fox's Web site and in the following paper:

Fox, John. 2005. "The R Commander: A Basic-Statistics Graphical User Interface to R." Journal of Statistical Software 14(9).

There are several other GUI systems, in addition to the R Commander, for interacting with R. A useful discussion of R and GUIs, along with a list of current GUI projects for R is available.

The advantage provided by the R Commander or another GUI is that the user does not need to learn a language in order to carry out his or her analysis. Instead, each step is taken by making one or more selections from a menu of available options. The disadvantage of interacting with the R environment through a GUI is that the course of the analysis is limited to those actions that have been programmed into the GUI. Thus, one could argue that using a GUI removes much of the flexibility that is inherent in the R environment.

In order to overcome the preceding limitation, the R Commander and most other GUIs allow the user to employ both methods of interacting with the environment within a single R session. For example, one could invoke the R Commander, and use its GUI to read the contents of an external file and create an R data frame. For many types of analyses, other features of the R Commander could be used to estimate model parameters, construct graphical displays, and so on. But, if the user wanted to carry out a task that is not available in the R Commander (e.g., a multidimensional scaling analysis), then the data frame created in the GUI could still be treated like any other currently defined R object (say as an argument to a function or the target of an assignment) on the R command line. In this manner, a user could exploit the advantages of both the GUI and the command-line interface.

Jun 3, 2009

What is R?

R acts as an alternative to traditional statistical packages such as SPSS, SAS, and Stata such that it is an extensible, open-source language and computing environment for Windows, Macintosh, UNIX, and Linux platforms. Such software allows for the user to freely distribute, study, change, and improve the software under the Free Software Foundation’s GNU General Public License. It is a free implementation of the S programming language, which was originally created and distributed by Bell Labs. However, most code written in S will run successfully in the R environment. R performs a wide variety of basic to advanced statistical and graphical techniques at little to no cost to the user. These advantages over other statistical software encourage the growing use of R in cutting edge social science research.

Where can I obtain R?

Installation files for Windows, Mac, and Linux can be found at the Web site for the Comprehensive R Archive Network, http://cran.r-project.org/. The site also contains documentation for downloading and installing the software on different operating systems. There is no cost for downloading and using R.

Where can I find more information on R?

Books

Braun, W. and Murdoch, D. (2007). A First Course in Statistical Programming with R. Cambridge, MA: Cambridge University Press.
Chambers, J. M. (1998). Programming with Data: A Guide to the S Language. Murray Hill, NJ: Bell Laboratories.
Dalgaard, P. (2008). Introductory Statistics with R (2nd edition). New York: Springer.
Everitt, B., and Hothorn, T. (2006). A Handbook of Statistical Analyses Using R. Boca Raton, FL: Chapman & Hall/CRC.
Faraway, J. J. (2005). Linear Models with R. Boca Raton, FL: Chapman & Hall/CRC.
Faraway, J. J. (2006). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Boca Raton, FL: Chapman & Hall/CRC.
Fox, J. (2002). An R and S-Plus Companion to Applied Regression. Thousand Oaks, CA : Sage Publications.
Muenchen, R. A. (2009). R for SAS and SPSS Users. Springer Series in Statistics and Computing. New York: Springer.
Murrell, P. (2005). R Graphics. Boca Raton, FL: Chapman & Hall/CRC.
Pinheiro, J. C. and Bates, D. M. (2004). Mixed Effects Models in S and S-Plus. New York: Springer.
Spector, P. (2000). Data Manipulation with R. New York: Springer.
Venables, W. N., and Ripley, B. D. (2002). Modern Applied Statistics with S. Fourth Edition. New York: Springer.
Zuur, A. F., Ieno, E. N., and Meesters, E. H. W. G. (to be published 2009). A Beginner's Guide to R. Use R. New York: Springer.

Web Resources

Quick-R site
The Omega Project for Statistical Computing
The R Project for Statistical Computing
The R Journal

Seminal Journal article

Ihaka, R., and Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299-314.

References:

What is a SAS CPORT file? How do I use it?

The CIMPORT procedure imports a CPORT transport file that was created (exported) by the CPORT procedure. PROC CIMPORT will extract SAS datasets and catalogs from the .CPT file. ICPSR does not currently distribute SAS libraries in .CPT files.

Installations of SAS 9.2 on Windows operating systems supports the opening of .STC files from either Windows Explorer or My Computer. Other versions of either SAS or another OS will require the submission of edited versions of the following statements to the SAS processor. Libraries other than the WORK library can be specified if they are already defined.

proc cport
 file = '<drive:><\PATH>dannnnn-nnnn.stc'
 library = WORK
 disk
;
run ;