/ced2ar-web

You are viewing the official metadata. View crowdsourced contributions.

Stata SAS

SIPP Synthetic Beta v6.02

View Variables (123 variables)

Last update to metadata: 2016-06-06 23:30:52 (upload date)

Document Date: November 12, 2015

Codebook prepared by: Cornell NSF-Census Research Network

Data prepared by:

Principal Investigator(s): United States Department of Commerce. Bureau of the Census. , Social Security Administration. , Internal Revenue Service. , and  Cornell University. Labor Dynamics Institute.

Citation

Please cite this codebook as:

Lori B. Reeder and Martha Stinson and Kelly E. Trageser and Lars Vilhuber. Codebook for the SIPP Synthetic Beta 6.0.2 [Codebook file]. Cornell Institute for Social and Economic Research and Labor Dynamics Institute [distributor]. Cornell University, Ithaca, NY, 2015

Please cite this dataset as:

U.S. Census Bureau. SIPP Synthetic Beta: Version 6.0.2 [Computer file]. Washington DC; Cornell University, Synthetic Data Server [distributor], Ithaca, NY, 2015

Abstract

The SIPP Synthetic Beta (SSB) is a Census Bureau product that integrates person-level micro-data from a household survey with administrative tax and benefit data. These data link respondents from the Survey of Income and Program Participation (SIPP) to Social Security Administration (SSA)/Internal Revenue Service (IRS) Form W-2 records and SSA records of retirement and disability benefit receip ... more

Datasets

Terms of Use

Access Levels

released

The data can only be used on the VirtualRDC Synthetic Data Server at Cornell University. While no SSB data downloads are permitted at this time, users do not have to operate behind the Census Bureau firewall to access this server.

restricted

No description given

Access Restrictions (Default)

The data can only be used on the VirtualRDC Synthetic Data Server at Cornell University. While no SSB data downloads are permitted at this time, users do not have to operate behind the Census Bureau firewall to access this server.

Access Requirements

Researchers interested in using the SSB can submit an application to the Census Bureau. The application form and instructions can be downloaded from http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html. Applications will be judged solely of feasibility of the proposed project (i.e., that the necessary variables are available on the SSB). Once an application has been accepted, the new user will be given an account on a server where the data can be accessed and analyzed.
Additional information: http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html

Access Conditions

You will need to use an NX client to logon to the Synthetic Data Server. Information about how to set-up your account and use the Synthetic Data Servers will come to you directly from the staff that maintains this server, after approval of your access by Census staff.

Access Permission Requirements

The SSB files have been cleared by the Census Bureau Disclosure Review Board, SSA, and IRS for use by individuals without Census Bureau Special Sworn Status and outside of Census Bureau facilities.

Citation Requirements

We request that researchers who publish results from analyses done using these data cite the SSB as their data source and acknowledge the use of the SDS server at Cornell and the support of Census staff in running any validation programs. These citations will help ensure continued funding for the SDS server and the creation of the Gold Standard File and the SSB.

Suggested acknowledgement:

This analysis was first performed using the SIPP Synthetic Beta (SSB) on the Synthetic Data Server housed at Cornell University which is funded by NSF Grants SES-1042181 and BCS-0941226, and through a grant from the Alfred P. Sloan Foundation. These data are public use and may be accessed by researchers outside secure Census facilities. For more information, visit http://www.census.gov/sipp/synth_data.html. Final results for this paper were obtained from a validation analysis conducted by Census Bureau staff using the SIPP Completed Gold Standard Files and the programs written by this author and originally run on the SSB. The validation analysis does not imply endorsement by the Census Bureau of any methods, results, opinions, or views presented in this paper.

Disclaimer

The data synthesis process employed by Census to protect the linked data from the risk of disclosing the identity of individuals is relatively new and substantially changes both the survey and administrative data. The intent of the modeling done as part of the synthesis is to preserve relationships among variables that are of interest to researchers while ensuring that personally identifiable information is not revealed to the data user. It has not been feasible to ensure accuracy by comparing every relationship among SSB variables with the corresponding relationship in the underlying confidential micro-data. Hence, we strongly urge researchers not to publish results produced from the SSB without first requesting that Census validate these results with confidential data housed in a secure environment at the Census Bureau. Census will perform this validation free of charge to researchers, as resources permit and according to the protocol established by the three agencies involved and outlined below. Without validation of results, Census, SSA, and IRS make no guarantee of the validity of the SSB for any research purpose. See http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html for validation conditions.

Contact

For questions regarding this data collection, please contact: sehsd.synthetic.data.use.list@census.gov

Additional Information

Related Material

  1. Using SSB:

    The GSF and Completed Data implicates contain personally identifiable information protected by Titles 13, 26, and 42 and cannot be accessed without Census Bureau Special Sworn Status nor outside of Census Bureau facilities. The SSB files, however, have been cleared by the Census Bureau Disclosure Review Board, SSA, and IRS for use by individuals without Census Bureau Special Sworn Status and outside of Census Bureau facilities.

    Researchers interested in using the SSB can submit an application to the Census Bureau. The application form and instructions can be downloaded from http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html. Applications will be judged solely on feasability of the proposed project (i.e., that the necessary variables are available on the SSB). Once an application has been accepted, the new user will be given an account on a server where the data can be accessed and analyzed. While no SSB data downloads are permitted at this time, users do not have to operate behind the Census Bureau firewall to access this server.

    The SSB is designed to be analytically valid in that sense that point estimates should be unbiased and estimated variances should lead to inferences similar to those that would be drawn from an identical analysis on the Completed Data implicates. Initial tests of analytic validity of the SSB have been promising. All SSB users are invited to help further test the analytic validity of the SSB by submitting programs used to analyze the SSB to be run on the Completed Data and/or Gold Standard files. Users need only inform Census Bureau staff of the location on the server of such programs and work with Census Bureau staff to ensure that the programs run without error. Census Bureau staff will run the programs on the confidential data and release to the user resulting output that are cleared for release by the Census Bureau Disclosure Review Board. In order to evaluate the effects of the data synthesis separate from the effect of imputing missing data, comparisons should be made between results from the SSB and the Completed Data. To evaluate the effects of missing data imputation, comparisons should be made between results from the Completed Data and the Gold Standard.

  2. When analyzing the SSB, users should account for the multiple imputation aspect of the SSB by averaging statistics of interests across all sixteen implicates. Variance measures should be created following the appropriate multiple imputation formulae as described in the document Using the SIPP Synthetic Beta for Analysis.
  3. Protocol for Validation of Results:

    Census will validate results obtained from the SSB on the internal, confidential version of these data (Completed Gold Standard Files). Users who wish to obtain validated results should follow the protocol outlined here. The restricted access site will provide SAS and Stata analysis software and a computing environment similar to the one used to analyze the confidential Completed Gold Standard data on Census Bureau internal computers. Researchers should follow the Census Bureau programming requirements described in SSB Validation Request Guidelines to ensure that the programs will successfully transfer to internal Census computers for validation. Researchers should plan to share their results and programs from the synthetic data analysis with Census, ORES/SSA and SOI/IRS. After programs have successfully run without error on the synthetic data, researchers may request that Census run these programs on the Completed Gold Standard Files. Only programs successfully run without error on the SDS will be eligible to be run on the confidential data by Census staff. Any programs that produce errors on the Completed Gold Standard Files will be returned to users for correction. Once an analysis has been repeated on the Completed Gold Standard File, the results will be reviewed by Census staff for disclosure concerns. Researchers should familiarize themselves with standard Census disclosure rules for outside projects (See the RDC Researcher Handbook here) and should fill out the appropriate memo documenting the requested output (see RDC Disclosure Request Memo). Data products and output approved by Census staff will be released to the users, ORES/SSA, and SOI/IRS. The validation process can be accomplished in as little as one week for simple results that are generated by clean code and have no disclosure issues. However if the code does not run properly, the sample sizes are too small, or the researcher does not accurately fill out the disclosure memo, the process can take much longer. Census makes no guarantee on the length of time between submission of programs and the release of results from the confidential data. For more information about the validation process, including advice on how to make the process go smoothly and quickly, please see SSB Validation Request Guidelines.

Related Publications

  1. U.S. Census Bureau, "Disclosure Review Board Memo: Second Request for Release of SIPP Synthetic Beta Version 6.0," U.S. Census Bureau 2015.

    Available at http://www.census.gov/content/dam/Census/programs-surveys/sipp/methodology/DRBMemoTablesVersion2SSBv6_0.pdf

  2. J. M. Abowd, M. Stinson, and G. Benedetto, "Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project," U.S. Census Bureau 2006. Available at https://www2.vrdc.cornell.edu/news/wp-content/papercite-data/pdf/ssafinal.pdf.

Related Studies

  1. L. B. Reeder, M. Stinson, K. E. Trageser, and L. Vilhuber, "Codebook for the SIPP Synthetic Beta v5.1 [Codebook file]," {Cornell Institute for Social and Economic Research} and {Labor Dynamics Institute} [distributor]. Cornell University, Ithaca, NY, USA, DDI-C document, 2014. Available at http://www2.ncrn.cornell.edu/ced2ar-web/codebooks/ssb/v/v51.