May 26, 2010

My data are collected from very vulnerable populations. How can I prevent these data from being used to portray them in an injurious way?

It is the policy of ICPSR that responsible science, which includes appropriate analytic methods and peer reviewed venues for research results, is adequate to protect vulnerable populations from inappropriate, unfair, and inaccurate portrayals. In order to participate in a valid scientific discussion of the issues that face vulnerable populations, researchers must be willing to share their data and methods in an ethically responsible manner with other researchers who wish to replicate or refute their findings. One must be willing to trust the peer review process to screen out analyses that do not conform to methods appropriate to the question at hand. ICPSR is strongly committed to protecting vulnerable individuals from being identified by data analyses, but the scientific process must be used to protect vulnerable populations from inaccurate representations.

I don't mind depositing the baseline study from my longitudinal data system, but is it possible to delay release of subsequent waves of data?

The utility of longitudinal studies lies primarily in the follow-up embedded in the research design. While the baseline data will be valuable in the short run, NAHDAP will work with depositors on a time frame for acquisition and release of the subsequent waves of data. Without a reasonable time frame, baseline studies will not be acquired for the NAHDAP. Depositors can work with NAHDAP staff to develop a method for acquiring and releasing the additional waves under a delayed-dissemination agreement. These agreements allow the subsequent waves to be acquired and prepared but not released for secondary analysis until the appropriate time.

Can my data be embargoed until I or my research team finish all our planned analyses?

ICPSR has a delayed-dissemination policy that allows researchers to deposit data earlier in the research process so that they may benefit from the data and documentation preparation services offered by staff. Delayed-dissemination contracts require depositors to commit to a timeline, which is usually two years from deposit to data release. Depositors have access to ICPSR files as soon as they are prepared and need not wait for the public release. They must, however, be willing to commit to the timeline for release.

Is it possible for me to read and approve research proposals based on my data? I wish to determine the nature of the research done with my data.

The policy of ICPSR is that responsible use of secondary data should be unfettered by the research agenda of the original data producer. When the data are distributed under restricted-use contracts, a research proposal is required in order to screen users for a credible research agenda and to ascertain whether the data will meet their research needs. The proposal, however, is screened only by the contract administrator at NAHDAP.

If I deposit data with NAHDAP, who owns the data?

ICPSR only asks for the right to redistribute the data, but does not acquire or retain the original copyright or transfer rights. ICPSR users must sign a terms-of-use agreement in order to download data that includes a clause that prevents the redistribution of the data for commercial purposes. The original owner of the data, which is usually the university or not-for-profit that received the grant or contract, retains copyright and other legal rights associated with the data.

In the informed consent documents, I promised the data would only be used by an approved research team. How can I now share my data with others?

Unless the informed consent document names the members of the research team specifically, an amended Institutional Review Board application that includes a plan for data protection and dissemination can be filed with the lead institution to define the research team. Restrictive informed consent documents may prevent the release of data in purely public releases, but do not preclude the possibility of a research team that is defined by a group of restricted- or limited-use contract holders. The research team may be defined as those persons known to the original researchers. In the case of restricted-use or limited-use contracts, the researchers using the data are known to ICPSR and to the original research team.

My data are on very sensitive topics; the risk to participants is very high should they be re-identified. How can I protect the respondents?

ICPSR evaluates all data files for disclosure risk using state-of-the-art techniques developed under a grant from the National Institutes of Health. From this evaluation, staff recommend a method of data release that protects the respondents from re-identification while retaining the analytic utility of the data. Release options include public release; public release with disclosure control practices put in place; restricted release with a user contract; enclave only release; and online analysis only with no micro-data download. A full public release is only warranted when there is little risk of re-identification or the data have been sufficiently transformed to substantially reduce that risk.

My data are very complicated. I am not sure users will be able to use the data. Will NAHDAP staff provide user support?

ICPSR has three levels of user support. Our central email and telephone service uses help desk software to track and prioritize all user support inquires. Technical questions about data downloading and software issues are answered by tier 1 support staff. Questions about specific data files will be sent to NAHDAP staff who prepared the data for release to provide user support on data content and structure. The NAHDAP director and manager will provide more sophisticated, tier 3 support for complex technical questions. Depositors will not be expected to provide ongoing user support, but rather to provide all the documentation necessary for secondary data users to make sense of the original data collection. The ICPSR archival collection includes many very complex data systems that have been successfully analyzed by responsible researchers.

My data/documentation are not in a format that can be released to secondary users. How do I find the resources to prepare it for broader distribution?

The National Institute on Drug Abuse has funded the National Addiction & HIV Data Archive Program (NAHDAP) to assist grant recipients in preparing data for release. NAHDAP staff will help clean and prepare data files, metadata and documentation in consultation with the grant staff. NAHDAP is built on the infrastructure of the Interuniversity Consortium for Political and Social Research (ICPSR), which is designed to easily create standardized, digitally stable data files, and to disseminate SAS, Stata, and SPSS files and searchable PDF codebooks and documentation. The staff of NAHDAP will standardize the data and documentation with input from the original data producers.

Why is sharing data useful to me? Why should I share data that I have worked very hard to collect and analyze?

While data sharing is primarily useful for expanding scientific knowledge, it does provide benefits for individual researchers. Data systems that are in the public domain often generate additional research which is credited to the original source. For instance, the National Longitudinal Study of Adolescent Health, which has been in the public domain since its inception, has generated over 3,000 publications in the last 20 years authored by persons not on the original research team. In addition, data citation practices and the norms of scientific practice have changed substantially in the past 20 years so that the production of data is now considered a scholarly pursuit. A 2009 committee report by the National Academy of Sciences has emphasized the emerging role of data sharing both in science and in the careers of scholars.