Are there differences between behavioral measurement methods? A comparison of the predictive validity of two ratings methods in a working dog program
Wilsson, E and Sinn, DL, Are there differences between behavioral measurement methods? A comparison of the predictive validity of two ratings methods in a working dog program, Applied Animal Behaviour Science, 141, (3-4) pp. 158-172. ISSN 0168-1591 (2012) [Refereed Article]
Consistent behavioral variation within and between individuals is ubiquitous in all work- ing dog populations. Most working dog programs have recognized this fact, and have subsequently attempted to quantify behavior through the use of standardized tests. Stan- dardized tests may employ several measurement methods, but two common ones are behavioral ratings and subjective ratings. The former is characterized by a rating for a behavior (e.g., reaction to a noise) usually based on a single observation or test; the lat- ter is characterized by a rating for a trait (e.g., confidence) that is based across multiple observations of behavior. The main difference between the two rating methods is the level of aggregation, or intuition, that is required by the human observer. Measurement the- ory predicts that ratings based on multiple observations (i.e., subjective ratings) should be more reliable because measurement error is reduced. However, subjective ratings, by definition, may be susceptible to observer bias, in which case ratings based on fewer, but better-defined observations (i.e., behavioral ratings) could result in greater reliabil- ity. In either case, the ultimate criterion of most working dog programs is the predictive validity of measured behaviors in standardized tests. To the best of our knowledge, the relative predictive validity of subjective and behavioral ratings has yet to be tested within the same working dog population. Here we analyzed behavioral test results along with training outcomes from a large sample (∼400) of German shepherd dogs, aged 15–18 months, all bred at the Swedish Armed Forces breeding kennel. Behaviors observed in the test were measured using 25 behavioral and 13 subjective ratings. Data reduction and confirmatory techniques identified five underlying dimensions in behavioral ratings (con- fidence, physical engagement, social engagement, aggression, and environmental sureness) and three in subjective ratings (engagement, confidence, and aggression). Both ratings methods correctly classified a high percentage of dogs that did/did not complete train- ing (70.3–78.3%). However, only minor differences in predictive validity were observed between the two measurement methods (1.7–6.6%). Engagement and confidence, irrespec- tive of measurement method, were the strongest predictors of training completion, but the two ratings methods identified different aspects of engagement and confidence that may be important to training outcomes in this and other working dog programs. Taken together, our results suggest that in some cases, the use of subjective versus behavioral ratings may be inconsequential from the standpoint of prediction to training criterion. Further empirical verification is needed, along with improvements in the explicit definition and measurement of ‘success’ in this and other working dog programs.