Final Consensus of Reliability

Inter-rater reliability for the fourth design cycle was generally high, although there still appear to be areas where definitions need to be refined. Overall, the observers had an average agreement of 81.39%. Areas of strong agreement where in the “Teacher is Resource” and “Student uses Student” codes. These codes were the most common in all videos and were the most heavily negotiated (Figures 11, 12).

Figure 11. Sample comparison of “Teacher is Resource” for two observers, showing 95.6% agreement.

Figure 12. Sample comparison of “Student uses Student” for two observers, showing 94.6% agreement

 

There were some areas of disagreement in “Student uses Teacher” but the disagreements appear to be durational. Overall, observers agreed on the beginnings of “Student uses Teacher” events.  The difference appears to be whether or not the observer coded the teacher’s response as “Student uses Teacher” or only the incidence of the student asking the question as “Student uses Teacher” (Figure 13).

Figure 13. Sample comparison of “Student uses Teacher” showing the longer codes due to including the teacher’s response. This particular example has 71.1% agreement.

 

The greatest disagreement between observers was for the rarely occurring codes, “Teacher points to Resource” and “Student uses Object.” A sample of the disagreement on “Teacher points to Resource” can be found in Figure 14. Only one observer coded “Student uses Object” in the final run. It appears that these rare codes still need to be better negotiated and defined.

Figure 14. Sample comparison of “Teacher points to Resource” showing low agreement. This particular example shows 57.7% agreement. Counting only the durations coded, there is 5% agreement given the existence of a code.