13. Conclusions

Nathan Divinsky in The Chess Encyclopedia [D] calls the Elo System "a mathematically sound and universally accepted (1970) rating system for chess players."  The year refers to the adoption of the Elo System by FIDE. Actually, aside from Elo's 1965 study in The Journal of Gerontology, there has been virtually no peer review of the system beyond the world of organized chess. One of the few references is to be found in The Mathematics of Games, by J. D. Beasely [B].  Beasely offers this scathing footnote on the work of the late Professor Elo:

His statistical testing is unsatisfactory to the point of being meaningless; he calculates standard deviations without allowing for draws, he does not always appear to allow for the extent to which his test results have contributed to the ratings which they purport to be testing, and he fails to make the important distinction between proving a proposition true and merely failing to prove it false.

Beasely nevertheless accepts the premise of the Percentage Expectancy Curve.  His basic concern is the difficulty of demonstrating any suitable probability function for that role.  Again the essential  incoherence of Elo's position seems to have escaped notice.

The proof of the pudding, it has been said, is the actual operation of a rating system, and the Elo System has been grinding out chess ratings for over four decades now with hardly a grumble from the rating pool.  One is tempted to say that the system works despite its theory rather than because of it.  The reputation of the Elo System, on the other hand, rests largely on its supposed ability to predict chess outcomes.  There is even the occasional inquiry as to whether the system can predict outcomes in sports such as basketball, football, golf and soccer.

The principles of rating theory undoubtedly have applications beyond chess. As Elo said of his own system, it is "applicable to any type of competitive activity in which individuals or teams engage in pairwise competition" [E1].  To this may be added applications for noncompetitive pairwise comparisons, as for example in opinion sampling for marketing research. One would hope that the current controversy is resolved before such wholesale applications.  For some, however, the allure of rating theory lies in the controversy itself.  It is a controversy that has not yet been played out in organized chess and a cautionary tale for all involved.