Balancing Data Needs with Individual Concerns: Participant ID Codes for Pre/Post Test Surveys – by Dan Goldstein

Pre- and post-tests collect and compare individuals’ responses before and after an intervention. You must have a method of identifying the participant so that you can measure the change from the first survey to the second. One potential way to both maintain confidentiality and link surveys from pre- to post is to anonymously generate ID codes for the participants. However, they need to be codes that can be easily recovered if needed and are unlikely to be duplicated across multiple respondents. This topic recently came up on an evaluation listserv. Evaluators shared their experiences with different types of participant IDs. Participant ID Blog

Methods for managing participant IDs Some methods used to identify respondents while maintaining confidentiality may not be reliable. For example, you could send your participants a generated individual link that will be used to match both surveys. This is not an error-free method; participants may forward or share this link with others, resulting in two different responses for the one ID code. If participants are unable to receive the link the first time, re-sending it may cause another error in matching pre/post test surveys. If you have access to electronic survey software, such as SNAP (which we use at the Improve Group), you may pre-load the e-mail addresses of your participants. When respondents click on the link from the e-mail, they are matched to the preloaded addresses. In this case, the link itself is less important. However, if a participant’s e-mail address changes or he or she uses multiple addresses to access the link, you run back into this risk of being unable to link the pre and post test surveys to one participant. You may have your participants generate their own ID codes. An example might be: create an ID code using the first three digits of your phone number and the last two digits of your zip code. However, this presents other challenges, as these are items that may change or two participants may end up with the same code. To minimize this, you could have a more complicated set of instructions, including only items that won’t change. Be specific with the way you describe the response you are seeking. If you are asking for birth month, be sure to identify that the response you are looking for is “09” and not just “9.” Here are some examples of components you may ask in order to generate an anonymous participant code, as provided by a study on “Improving the Use of Self-Generate Codes”[1]:

First letter of own first name (A–Z)
First letter of father’s first name (A–Z)
First letter of mother’s first name (A–Z)
Birthday – “01–31”
Birth month – “01–12”
Birth year – “yyyy”

Another suggestion for generating an identifying code but keeping participant anonymity is to simply gather identifying information that would allow you to generate a code, but keep that in a separate database from the private data.

These are just a few suggestions that may help you when conducting studies where you’ll need to match responses over time. Do you have any other methods that you have found useful in matching participant pre/post test surveys? Have some of these methods worked for you? Feel free to share some of your ideas! For more information on data privacy, see my previous blog Protecting Sensitive Data.

[1] A synopsis of the work conduct by Rainer Schnell, Tobias Bachteler and Jörg Reiher of the University of Duisburg-Essen, Germany can be found at http://erx.sagepub.com/content/34/5/391. Access to the study is limited to subscribers or short-term users.