Brian Tamanaha’s Straw Men (Part 1): Why we used SIPP data from 1996 to 2011
(Reposted from Brian Leiter’s Law School Reports)
BT Claim: We could have used more historical data without introducing continuity and other methodological problems
BT quote: “Although SIPP was redesigned in 1996, there are surveys for 1993 and 1992, which allow continuity . . .”
Response: Using more historical data from SIPP would likely have introduced continuity and other methodological problems
SIPP does indeed go back farther than 1996. We chose that date because it was the beginning of an updated and revitalized SIPP that continues to this day. SIPP was substantially redesigned in 1996 to increase sample size and improve data quality. Combining different versions of SIPP could have introduced methodological problems. That doesn’t mean one could not do it in the future, but it might raise as many questions as it would answer.
Had we used earlier data, it could be difficult to know to what extent changes to our earnings premiums estimates were caused by changes in the real world, and to what extent they were artifacts caused by changes to the SIPP methodology.
Because SIPP has developed and improved over time, the more recent data is more reliable than older historical data. All else being equal, a larger sample size and more years of data are preferable. However, data quality issues suggest focusing on more recent data.
If older data were included, it probably would have been appropriate to weight more recent and higher quality data more heavily than older and lower quality data. We would likely also have had to make adjustments for differences that might have been caused by changes in survey methodology. Such adjustments would inevitably have been controversial.
Because the sample size increased dramatically after 1996, including a few years of pre 1996 data would not provide as much new data or have the potential to change our estimates by nearly as much as Professor Tamanaha believes. There are also gaps in SIPP data from the 1980s because of insufficient funding.
These issues and the 1996 changes are explained at length in the Survey of Income and Program Participation User’s Guide.
Changes to the new 1996 version of SIPP include:
Roughly doubling the sample size
This improves the precision of estimates and shrinks standard errors
Lengthening the panels from 3 years to 4 years
This reduces the severity of the regression to the median problem
Introducing computer assisted interviewing to improve data collection and reduce errors or the need to impute for missing data
Most government surveys topcode income data—that is, there is a maximum income that they will report. This is done to protect the privacy of high-income individuals who could more easily be identified from ostensibly confidential survey data if their incomes were revealed.
Because law graduates tend to have higher incomes than bachelor’s, topcoding introduces downward bias to earnings premiums estimates. Midstream changes to topcoding procedures can change this bias and create problems with respect to consistency and continuity.
Without going into more detail, the topcoding procedure that began in 1996 appears to be an improvement over the earlier topcoding procedure.
These are only a subset of the problems extending the SIPP data back past 1996 would have introduced. For us, the costs of backfilling data appear to outweigh the benefits. If other parties wish to pursue that course, we’ll be interested in what they find, just as we hope others were interested in our findings.