chess rating algorithm that performs better than ELO system

lmader · Post by **lmader** » Thu Aug 05, 2010 12:09 am

Interesting article about finding a chess rating algorithm that performs better than the official Elo rating system:

http://games.slashdot.org/story/10/08/0 ... e-Over-Elo

BB+ · Post by **BB+** » Thu Aug 05, 2010 5:25 am

I don't know about the data pool in this experiment, but just switching from Bell to logistic in expectation curve can give a noticeable correction when typical games are between players of large skill differential.

However, ELO Benchmark is back on top (as of right now):

Code: Select all

 	#  	Team Name 	 	RMSE  	 	#Submits  	 	Date of Latest Submission  	 
	1 	Elo Benchmark      0.723834 		4 		1:27pm, Thursday 5 August 2010 		
	2 	EdROpen 		     0.729125 		2 		3:47pm, Wednesday 4 August 2010 		
	3 	whiteknightOpen    0.731656 		4 		6:29pm, Wednesday 4 August 2010 		
	4 	Chris_ROpen 	 	0.742663 		2 		1:27am, Thursday 5 August 2010 		
	5 	ulvundOpen   		0.742744 		8 		10:23am, Thursday 5 August 2010 		
	6 	FirstTryOpen 		0.833059 		1 		2:01pm, Thursday 5 August 2010

Note, the method for creating seed ratings for Elo Benchmark is being refined, so don't be surprised if the benchmark improves a little in the competition's first week.

I don't know if RMSE is the best metric here. Listing it to 6 sigfigs (and with no error bound) with only 781 data points (to be 7809 in the end) is not a good sign from the quality-control standpoint.

We reserve the right to disqualify any competitor who is blatantly attempting to decode the leaderboard portion of the test dataset. Well, at least the RE guys have something to do...

hyatt · Post by **hyatt** » Fri Aug 06, 2010 4:38 pm

My take on this is that the Elo system is nothing more than a curve that approximates expected results, which has been tuned to be as accurate as possible when dealing with humans playing the game of chess. Curve-fitting is taught in most every numerical analysis course. And its based on the idea of minimizing the error of the curve when matched against observed data. But humans and computers are different. And perhaps humans have changed enough since this was first done to make the system need additional tuning. Maybe the original approach is too simplistic. And then again, the Elo system is abused daily when different sorts of time controls are co-mingled into one rating pool. A human playing game/5min is quite different than a human playing with even a 1 sec increment. This only deals with wins and losses. What about when one player is sick? What happens when one plays several consecutive games? What about an adjourned game that resumes after the final round of the day so that one player stays up late, another does not? All of these influence the games, yet none of them are factored into the rating system to affect future predictions.

benstoker · Post by **benstoker** » Wed Aug 11, 2010 4:16 pm

hyatt wrote:My take on this is that the Elo system is nothing more than a curve that approximates expected results, which has been tuned to be as accurate as possible when dealing with humans playing the game of chess. Curve-fitting is taught in most every numerical analysis course. And its based on the idea of minimizing the error of the curve when matched against observed data. But humans and computers are different. And perhaps humans have changed enough since this was first done to make the system need additional tuning. Maybe the original approach is too simplistic. And then again, the Elo system is abused daily when different sorts of time controls are co-mingled into one rating pool. A human playing game/5min is quite different than a human playing with even a 1 sec increment. This only deals with wins and losses. What about when one player is sick? What happens when one plays several consecutive games? What about an adjourned game that resumes after the final round of the day so that one player stays up late, another does not? All of these influence the games, yet none of them are factored into the rating system to affect future predictions.

Or if one side is using mind control rays; reference --> Fischer / Spassky

lmader · Post by **lmader** » Wed Aug 11, 2010 7:37 pm

benstoker wrote: Or if one side is using mind control rays; reference --> Fischer / Spassky

Good call. Also Topalov / Kramnik

jcrusious · Post by **jcrusious** » Fri Jul 08, 2011 11:12 am

lmader wrote:
benstoker wrote: Or if one side is using mind control rays; reference --> Fischer / Spassky
Good call. Also Topalov / Kramnik

yes i also wants to know about that, you have mentioned a long detail about the computer and man.

noctiferus · Post by **noctiferus** » Sun Jul 10, 2011 5:32 pm

If anybody is interested in more details, this is the site of the rating competition:
http://www.kaggle.com/c/ChessRatings2

(I didn't check if datasets are still available...)

noctiferus · Post by **noctiferus** » Sun Jul 10, 2011 5:38 pm

Of course, for a correction to elo system, an evaluation of the modified rating, and a huge rating exercise on old and current players (interesting!), you can also look at Sonas' site
www.chessmetrics.com

hyatt · Post by **hyatt** » Sun Jul 10, 2011 5:46 pm

The Elo system for humans has worked just fine for years. The problem is that you want relatively slow change after some time, because a human's real skill does not change, although his health and mental acuity varies day by day. Computers are a different kind of player entirely. Their strength is high, compared to humans. Their consistency is off the charts compared to humans, they don't get tired, sick, irritable, distracted, hungry, bored, etc. They can play 100 games in a row, non-stop, and play just as strongly in the last game as they did in the first. No human can do that. So at least the smoothing component of the Elo system is not as well tuned for computers as it could be. But then again, I doubt _any_ rating system will fit humans _and_ computers perfectly. It seems almost impossible.

noctiferus · Post by **noctiferus** » Sun Jul 10, 2011 8:48 pm

Bob, I fully agree about differences between human and comp ratings (MHO isn't quite relevant, of course).
For what concerns human performance, however, it looks like Sonas' method has a less inertial response without being too prone to short term variations. IMHO, it is a good compromise between a nervous eval, and a rock-builded one.
Of course, engines scenary is quite different. Maybe it could be used as a quite loose trend evaluation for engines strenght prediction, but awaiting for a lot of jumps, anomalous predictions etc etc...

OpenChess

OpenChess

chess rating algorithm that performs better than ELO system

chess rating algorithm that performs better than ELO system

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys

Re: chess rating algorithm that performs better than ELO sys