Internal Validity Tutorial

e. Instrumentation

The reliability of the instrument used to gauge the dependent variable or manipulate the independent variable may change in the course of an experiment. Examples include changes in the calibration of a mechanical measuring device as well as the proficiency of a human observer or interviewer. Suppose that the dependent variable is measured twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. Suppose also that the ability of a recording device to detect instances of the target behavior improves (declines) as the experiment progresses. If scores on the dependent measure differ at these two times, the discrepancy may be due to the independent variable or to more (less) sensitive recordings of the target behavior at Time B relative to at Time A.

Background Information

Example

The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted. For ease of record keeping, all Control Group children were tested first, then all the Experimental Group children. The student teacher scored children's responses to the confederate's lures. In the beginning, he hid indoors and strained to see and hear through an open window; later on, he discovered he could see and hear better by hiding outside and peeking around a corner. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation.

Nonexample

The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted, in which children were selected from class to be tested in random order. The student teacher scored each child's response to the confederate's lures. Pilot research at the same school revealed that the best observation procedure was to hide outside and peek around a corner, which the student teacher did consistently throughout testing. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation.

Analysis

The first item is an example in which instrumentation is a threat to internal validity. Two factors compound the problem. First, the student teacher's ability to detect instances of the target behaviors improved over time. Second, children in the Control Group were tested first. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video or to fewer missed observations of the target behaviors for the Experimental Group children than for the Control Group children.

In the second item, because the recording location remained constant throughout probe testing, we expect the number of missed observations also to be constant. However, even if observer proficiency did change over time for other reasons, perhaps the result of fatigue, the random ordering during the Generalization Probe ensured approximately equal numbers of students in the two groups both early and late in testing. Thus, while missed observations may increase with the number of students tested, they would be equally distributed between the two groups. With these two procedural changes, we can be more confident that the better Generalization Probe score for the Experimental Group was not the result of instrumentation.