Feb 22, 2007

What are the components of a data collection in HMCA?

A data collection comprises one or more data files, plus technical documentation that describes the data. SAS, SPSS, and/or Stata setups are included with many collections.

Data files are often provided in multiple data formats. Every data file is supplied as an ASCII text file and, for many collections, in at least one other format as well, such as Stata files, SPSS portable files, and SAS transport files generated by the SAS XPORT engine or SAS CPORT procedure. SPSS portable and SAS transport files are the most common data formats besides ASCII.

Technical documentation typically includes the following:

  • study description that summarizes the collection
  • file manifest
  • bibliography of related literature
  • description of the study's methodology
  • data collection instrument(s)
  • data map/record layout of the ASCII data file(s)
  • variable descriptions
  • univariate frequencies (for most collections)

Study descriptions, file manifests, and bibliographies of related literature are presented as separate files. Other components of the documentation may be bundled in a single file or distributed among multiple files. Documentation files are provided in Portable Document Format (PDF) and/or as ASCII text files.

The setups, which usually contain complete variable and value labels and often include missing value declarations or recodes, can be used to create software-specific system files (e.g., SAS datasets) from the ASCII data files.