Ban: Automatic Learning of Evaluation [...]

Code, algorithms, languages, construction...
Post Reply
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Ban: Automatic Learning of Evaluation [...]

Post by BB+ » Thu May 10, 2012 7:51 am

Automatic Learning of Evaluation, with Applications to Computer Chess, by Amir Ban. Seems to be part of GAMES 2012, the schedule lists it in the poster session.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Ban: Automatic Learning of Evaluation [...]

Post by BB+ » Thu May 10, 2012 9:38 am

Abstract. A new and fast learning method is described in the context of teaching a program to play chess. A theory of the meaning of a position evaluation is developed, and is then confronted with a large collection of games played by masters or other programs. The program learns by fitting its evaluation to better predict the results of the games. The method has been employed by a top-rated program for the past 10 years, and has earned several world championships and successful matches against the world’s best grandmasters for the program. The effectiveness of the method is demonstrated by showing its successful prediction of known playing strength of the programs.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: Ban: Automatic Learning of Evaluation [...]

Post by User923005 » Fri May 11, 2012 7:32 pm

To me, Amir Ban is a hero of computer chess.
He is sharing his innermost secrets of his best ideas about how to make a chess engine play better.
This is the opposite of the current wind, and it is also the way that it should be.

I downloaded that paper and I will read it tonight. It looks very interesting. I'm a math geek anyway, so I will enjoy it one way or another due to its mathematic tenor.

mballicora
Posts: 26
Joined: Tue Aug 09, 2011 7:58 pm
Real Name: Miguel A. Ballicora

Re: Ban: Automatic Learning of Evaluation [...]

Post by mballicora » Sat May 12, 2012 7:25 am

I have been doing this almost exactly since ~2007 and got it to work by 2009.
The implementation seems to be different (though they do not describe the improvement part), but the theory is almost identical, even using the same logistic equation (I do not include any drawishness term), which I based it on the wiki http://chessprogramming.wikispaces.com/ ... 2C+and+ELO (Adam Hair recently confirmed this relantinship with much better data).

This comment caught my attention:

4.4. Effect on Style
Possibly the most intriguing result of the use of the current method has been on
the playing style of the Deep Junior programs: It has become daring, attacking
and disdainful of ”material” in a way that is entertaining to human observers and
often produces brilliancies. Commentators on Deep Junior have called [1] this
style ”speculative”, ”human”, ”understanding of the concept of compensation”
and have likened it to the playing styles of former world champion Mikhail Tal
and, by former world champion Garry Kasparov, to himself [14].


When you compare it with the title of my post here!
http://talkchess.com/forum/viewtopic.ph ... 66&start=0

It seems that the method tends to give a dynamice style?

A little more of the description is here
http://talkchess.com/forum/viewtopic.ph ... =&start=11

A bit more discussion in a thread initiated by Marcel and his method:
http://talkchess.com/forum/viewtopic.php?t=40166

I made comments about it later, but almost always very few or nobody was interested to talk with very few exceptions. This "tuning" topic always involved some sort of secrecy, or nobody considered it seriously. Of course, now the people will look at it differently.

Miguel

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Ban: Automatic Learning of Evaluation [...]

Post by BB+ » Sun May 13, 2012 4:59 pm

I might note that (from the internal evidence) the paper appears to date from 2008.
mballicora wrote:(though they do not describe the improvement part)
In fact, in practise I might guess that the "improvement part" (step 2 of algorithm 3) could be more relevant than the rest. Maybe the speed of feedback allows one to tune multiple parameters more easily than with game testing, so issues of hill-climbing methodology are not so prevalent?
I made comments about it later, but almost always very few or nobody was interested to talk with very few exceptions. This "tuning" topic always involved some sort of secrecy, or nobody considered it seriously. Of course, now the people will look at it differently.
I too have typically found the silence about "tuning" in general rather odd, with the main chatter being various personae (notably VD, and VR to some extent) claiming that they [or the NSA] have super-secret methods with great superiority to whatever everyone is using. :? Certainly the mainstream has gone toward game-testing rather than position-testing in the last years, though this could be herd-mentality more than anything else. [Incidentally, I would tend to guess (following DD's comment) that temporal-distance learning or something would likely be somewhat superior to win-draw-loss accounting].
It seems that the method tends to give a dynamice style?
I had mused that this might be due to the drawishness measurement in Junior, but now you report the same result with no such parameter. MvK says: I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games -- but using comp-comp games should reduce that? It is somewhat of a mystery...

Here is another quotable from Ban (emphasis in original):
Amir Ban wrote:[...] Chess programmers still code expert chess knowledge into their evaluation functions, to the best of their understanding and capability, subject to its observed success in improving the results achieved by their programs (itself a time-consuming and statistically error-prone determination).

The field lacks a theory of evaluation which is able to suggest answers to any of the following questions: What is the meaning of the numerical value of the evaluation? What makes an evaluation right? In what sense can it be wrong? Between two evaluation functions, how to judge which is better?

It is the purpose of this article to suggest a theory of evaluation within which these questions and related ones can be answered. Furthermore I will show that once such a theory is formulated, it may immediately and economically be tested vis-a-vis the entire wealth of recorded chess game repertoire. As a result I will derive a novel automatic learning procedure that works by fitting its evaluation to recorded game results. While new to computer games, this method bears similarity to techniques of maximum-likelihood optimization and logistic regression widely used in medical and social sciences.
I tend to be skeptical and think some of this to be hyperbole, but would enjoy being proven incorrect.

User avatar
marcelk
Posts: 52
Joined: Fri Jan 28, 2011 10:27 pm
Real Name: Marcel van Kervinck

Re: Ban: Automatic Learning of Evaluation [...]

Post by marcelk » Sun May 13, 2012 11:14 pm

BB+ wrote:
It seems that the method tends to give a dynamice style?
I had mused that this might be due to the drawishness measurement in Junior, but now you report the same result with no such parameter. MvK says: I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games -- but using comp-comp games should reduce that? It is somewhat of a mystery...
I have recently completed the testing with a reduced number of human input games (45% CCRL, 45% GM, 10% own server games). The resulting style is unchanged, so my speculative hypothesis is rejected.

It is indeed interesting to see three independent reports of the same phenomena ("dynamic Tal-like play") emerging from comparable tuning methods. (I also do the optimization in the [0..1] domain after applying the logistic function btw). Maybe it is the nature of the game that is exposing itself and manual tuning has always supressed it? Who knows.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Ban: Automatic Learning of Evaluation [...]

Post by BB+ » Mon May 14, 2012 10:26 am

marcelk wrote:It is indeed interesting to see three independent reports of the same phenomena ("dynamic Tal-like play") emerging from comparable tuning methods.
Not to quibble, but I can now count 4: you, MB, AB, and also DD:
Don Dailey wrote: Like your experience we found that the program was aggressive - it was making all sorts of interesting sacrifices. I cannot say that they were all sound, but we were testing at limited depth, I think 6 ply. At any depth, a move is a calculated gamble and being wrong about a sacrifice, whether winning or losing, can cost you. So if there is to be error then why not error on the side of being too aggressive once in a while? We didn't keep that version even though it tested quite well for technical reasons.

Post Reply