Abstract
To test the reliability of the evaluation of musical performances by musical experts, the protocols of the jury of an international music contest have been subjected to statistical analysis. The object of the analysis were 2156 jurors’ points and rank ratings, given by 28 members of the jury, assessing 77 different performances of one of Fryderyk Chopin’s polonaises, evaluated during the first stage of an international music competition. The analysis revealed the following: 1. very large interpersonal (inter-rater) differences of the jurors’ ratings, 2. despite these differences, there was a very high level of statistical significance of the inter-rater agreement (p<.ool) of the jurors’ evaluation, 3. despite the high level of statistical significance of inter-rater agreement of the evaluation of musical performance, this accounts for only 1/3 of the general variance of ratings. Conclusion: individual ratings of musical performance are not a reliable measure of musical achievement, even when given by music experts of the highest level.