Some tablebase formats

General discussion about computer chess...
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Some tablebase formats

Post by BB+ » Thu Nov 25, 2010 11:54 pm

An old FAQ: http://www.horizonchess.com/FAQ/Winboard/egtb.html You can read about formats like Ferret, Nimzo, Yace, and maybe more.

I will discuss the more well-known ones, and ones that post-date that page.

Nalimov
By Eugene Nalimov, circa 1998?
Generation code has appeared in various public/private places; the most useful is perhaps the rewriting by Gian-Carlo Pascutto (see http://talkchess.com/forum/viewtopic.php?t=29745 ), though I'm not sure the compression failures were ever resolved.

It takes about 1 day to build a 6-piece ending (not sure if with a pawn?), with it being bound by streaming I/O. Not sure what the RAM requirements are, and it uses the grandfather method I think. The 6 piece do not include 5 vs 1, and take about 1.2TB; the 5 piece are around 7.4GB. Uses 8K blocks with compression from code of Kadatch, and can take about 20 minutes to load the indexing for the 6 piece (I'm not even sure how much RAM this uses, but around 8GB).

One of the main problems is that it is not easy to obtain a license from Nalimov. There is a paper with Nalimov as a co-author that tells about indexing schemes, though in retrospect (due to nearly free compression of broken positions), it is not clear whether these are all that useful, at least for disk storage. It lists some main objectives as: White/Black treated symmetrically, practical and efficient for OTB play (indexing should be compact, 8K blocks are called "time-optimal", side-to-move men are clustered). By far the most used amongst aficionados, even with all the warts.

FEG ftp://ftp.ubisoft.com/games/chessmaster ... Readme.TXT
Made by Johan de Koning (1998-2003). A generator came with ChessMaster. Can build 6 piece in 512MB of RAM, and Bourzutschky built RPP vs R in about a month using 3 desktop computers many years ago. He built a subset of the 6 piece, already taking about 0.5 terabytes.

Says all 5 piece can be built on 1.5GHz machine (2002) in less than 5 days. Unclear if parallel build is possible, but likely I/O bound for 6 piece in any event. I'm not sure which method it uses; in the Readme it says "backmoves" are generated, but that doesn't imply too much. Uses DTM with length-to-mate or not-win (losses/draws are the same) I think, with about 5.7GB for the 5 piece. Has a few ambiguities with mates in more than 254 (only in RN vs NN I think). Some versions of the generator had bugs with en passant. Has a non-public format, in 32K chunks (reverse engineered by Bourzutschky). Can actually build 7 piece, and even 5 vs 5 supposedly. :)

Scorpio bitbases http://cap.connx.com/chess-engines/scorpio/
From Daniel Shawul (2006?), and also used in Toga. These are distributed as 1-sided bases, and use 8K blocks as with Nalimov (it uses some LZ/Huffman method, though not the same of Kadatch's). The generation seems to use the grandfather method. They take around 340MB (though the download is 204MB I think). I do not think they can be loaded efficiently into memory too easily due to the compression method, though presumably a factor of 4 could be saved by decompressing each result into 2 bits rather than a byte. There was a thread about building some 6 piece bases already in 2007, but I don't see where they are available (if at all). It is not clear to me what the current status of the project is.

Shredderbases
By Stefan Meyer-Kahlen (2006?), bitbases for use with the Shredder engine.
These are 441MB for 5 piece, and 157MB for a half-set. I think (not completely sure) they use a method involving capture/check resolution to increase data redundancy, and then some type of run-length encoding. This is similar to the checkers work of Schaeffer et al., and the RobboTripleBases have followed the same path. They can be loaded into memory and then accessed directly w/o any further decompression into a larger table. The 6 piece have been said to be coming soon for a few years.

RobboBases
By "Roberto Pescatore" as it were (Oct 2009?). Open source generator code and bitbases.
First the tablebases. The data format uses 64K blocks (or 1MB for 6 piece I think), and splits up as broken/win/draw/loss-in-X, where distance-to-conversion is used. There is a line in the code: "Suffix_fare (bufI, indici, pro); /* larsson plus sadakane */" so it seems they are using the Larsson/Sadakane method with suffix trees for the compression, in a bzip2-style overall. This seems to work well in practise, as they take 3.4GB for the 5 piece and only 475GB for the 6 piece (including 5 vs 1). The generation code uses the out-counting method, and runs in parallel, building the 5s in around 3 hours on a quad. For 6 piece, they have no disk-build option, and it seems to me that you need about 12GB to build them in RAM, though they say 32GB (and then it takes 3-4 cpu-months). They split pawns up by files in many cases, which allows these to be built in less RAM.

The bitbases are called "TripleBases" (I guess meaning three results), and seem to meant to be used in RAM, at least before the 6 piece became available. There is now some sort of loading option from disk. These use capture-resolution and run-length encoding for compression. There is some stuff about Huffman encoding on top of that in the code, but it seems not to be used (perhaps for speed?). One upshot of this is that, for instance, the 5 piece "TripleBases" use around 570MB in a loadable state, but can be compressed down to about half that (useful when downloading). The 6 piece "TripleBases" are about 100GB on disk. To build a "TripleBase", you need the associated "TotalBase" of above, but for usage you do not.

They provide full download options for everything, including the 6 piece sets. Both types of RobboBases can include something called "BlockedBases", which occur for white/black pawn pairs on say e3/e4. These increase the size of the data by about 40% it seems.

Gaviota http://sites.google.com/site/gaviotache ... blebases-1
By Miguel Ballicora (Feb 2010). This is the newest, and sort of the Swiss army knife of the bunch. For instance, you can plug in your own compression method. The tablebases are distance-to-mate. The 5 piece are currently available, and the plans for 6 piece are unclear. As usual, the concerns about RAM and streaming I/O will come into play, though SMP is already (partially?) feasible in the build process. There is a lot of pawn slicing done for endgames with pawns, though I have not been able to determine the exact impact of this.

I had thought the generation code was available (given that there was a great cry "Finally, Open Source Tablebases!" when the announcement was made), but I didn't find it at http://sites.google.com/site/gaviotache ... e/download and all is said in the original post is the ominous: "PS: The generator code release will come later." With this in mind, I can't really say anything about the bitbases either, other than that they exist, and are said to be "on-the-fly"?!

Josh Shriver has 145 files for 3/4/5s that can be downloaded, and they total up to about 6.5GB, but as I say, the specific
compression can be changed (see http://sites.google.com/site/gaviotachessengine/schemes for more). There seem to be version updates on a fairly regular basis, though it is unclear what the next major development goal is (for instance, is 6 piece being pursued?).

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Some tablebase formats

Post by BB+ » Fri Nov 26, 2010 12:04 am

So, to the best of my knowledge, here is a brief table for formats:

Code: Select all

           Format   5s Size  Has 6 piece    Notes
FEG          DTM     5.7GB    Yes  500GB+
Nalimov      DTM     7.2GB    Yes, 1.2TB    Doesn't have 5 vs 1
Gaviota      DTM     6.5GB     No           Multiple compression methods
RobboBases   DTC     3.4GB    Yes, 475GB    Also with blocked pawns
Scorpio      WLD     340MB*    No           Size is for half the bases
Shredder     WLD     441MB     No           Only 157MB for half the bases
TripleBases  WLD     570MB    Yes, 100GB    Also with blocked pawns
For bitbases, other criteria (like access time) might be relevant also.

And for generation code:

Code: Select all

           Source    Method   Time for 5s   6s?  Notes
FEG           No    unknown     < 1 day?   Yes   Not sure if uses SMP(!)
Nalimov      Yes*   gfather    1-2 days?   Yes   Bound by streaming I/O for 6s
Gaviota       No    unknown    ~36 hours    No
RobboBases   Yes   out-count  12 cpu-hrs   Yes   6 piece requires 12GB+ RAM
Scorpio      Yes    gfather?      ?         No
TripleBases  Yes    derived     ~2 hours   Yes   Requires RobboBases to build
The 5s time might be misleading in extrapolation, as the method for 6s (and SMP scaling) might differ.

With Gaviota, the type of compression used can also affect the timings. Although the tbgen is SMP, the "transitions" phase (more than half the time in some cases, though not overall) seems to use only 1 cpu. It also seems that one cannot generate a desired subset of the tablebases, but the options are just "3", "4", or "5". I was a bit disappointed at slow it was, given that FEG on one processor appears to be faster to finish. However, the question of how it will scale to 6s is perhaps more important to many.

RobboBases seem to be readily parallelisable, at least when pawn promotions (and thus disk access?) are not prevalant. The opposite seems to be the case for TripleBases, as there the disk access throttles most SMP gains. It is by far the quickest to build, presumably due to efficient SMP usage and the out-count method. The "Larsson/Sadakane" compression seems to take nearly 10% of the time, and is also done in parallel. The main two problems I see with them are that it is DTC rather than DTM, and that it takes so much RAM to build 6s [admittedly, either of these could be seen as a design choice, or even as an advantage].

I was not too inclined (or was unable) to time the others, and so just guessed based upon other information.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Some tablebase formats

Post by BB+ » Fri Nov 26, 2010 2:02 am

Ballicora makes some comments about the future of Gaviota bases in http://kirill-kryukov.com/chess/discuss ... 127#p53720

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Some tablebase formats

Post by BB+ » Sat Nov 27, 2010 1:37 am

My initial estimate for the completion time with Gaviota for building the 5 piece bases was wrong. It finished in slightly under 24 hours (with compression then taking a few more hours).

User avatar
IWB
Posts: 195
Joined: Thu Jun 10, 2010 4:10 pm

Re: Some tablebase formats

Post by IWB » Sat Nov 27, 2010 4:22 pm

Hello
BB+ wrote: Shredderbases
By Stefan Meyer-Kahlen (2006?), bitbases for use with the Shredder engine.
These are 441MB for 5 piece, and 157MB for a half-set. I think (not completely sure) they use a method involving capture/check resolution to increase data redundancy, and then some type of run-length encoding. This is similar to the checkers work of Schaeffer et al., and the RobboTripleBases have followed the same path. They can be loaded into memory and then accessed directly w/o any further decompression into a larger table. The 6 piece have been said to be coming soon for a few years.
The 441 and the 157 MB set are identical full 5pc bitbases, neither is a half- or subset. No DTM is stored. The difference between them is just the compression. Therefore the smaller one is slower in access when loaded into memory. With a modern computer the 157 set seems to be obsolet.

Bye
Ingo
Ponder ON rating list: http://www.inwoba.de

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Some tablebase formats

Post by BB+ » Sun Nov 28, 2010 10:59 pm

OK, thanks for this. I was always wondering what the difference was between them. I had thought (given the ratio of 157/441) that the smaller was only half the bases [the only blurb I could find said the 441MB version was 10x faster, which could be for a variety of reasons]. I guess I should have thought of extra compression --- for instance, bzip2 reduces the size of the TripleBases by about 50%, so I guess one can profitably do more compression that just RLE, at the cost of speed.

CarstenL
Posts: 26
Joined: Fri Jun 11, 2010 10:23 am

Re: Some tablebase formats

Post by CarstenL » Wed Dec 01, 2010 11:51 am

BB+ wrote:So, to the best of my knowledge, here is a brief table for formats:

Code: Select all

           Format   5s Size  Has 6 piece    Notes
FEG          DTM     5.7GB    Yes  500GB+
Nalimov      DTM     7.2GB    Yes, 1.2TB    Doesn't have 5 vs 1
Gaviota      DTM     6.5GB     No           Multiple compression methods
RobboBases   DTC     3.4GB    Yes, 475GB    Also with blocked pawns
Scorpio      WLD     340MB*    No           Size is for half the bases
Shredder     WLD     441MB     No           Only 157MB for half the bases
TripleBases  WLD     570MB    Yes, 100GB    Also with blocked pawns
For bitbases, other criteria (like access time) might be relevant also.
.....
What is Shredder Format WLD ?
I know DTM = Distance To Mate, but WLD is what?
.

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Some tablebase formats

Post by Odeus37 » Wed Dec 01, 2010 11:59 am

Win Loss Draw ?

CarstenL
Posts: 26
Joined: Fri Jun 11, 2010 10:23 am

Re: Some tablebase formats

Post by CarstenL » Wed Dec 01, 2010 12:05 pm

Odeus37 wrote:Win Loss Draw ?
Puhhh .....
sounds easy :)

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Some tablebase formats

Post by BB+ » Fri Dec 03, 2010 11:50 pm

I found that Pepito 1.59 (from 2002) manages to both be licensed under the Gnu GPL and use the Nalimov tablebase code. I have no idea if/how this issuance of the Nalimov code is done in a compatible manner with this license. Even if Nalimov OK's your usage of his (and Kadatch's code), it's not clear to me how you could convey it in conjunction with code that is licensed under the Gnu GPL, as this license is notably "viral" (that is, all code that links in to it must use a compatible license). All the thanx.txt says is "Thanks to Eugene Nalimov for making publicly available your code for probing the endgame tablebases", which if true to nth degree would relief a great hassle for many developers. At the very least, if this distribution was done in a proper manner, it would imply that the Nalimov probing code can be freely used in open source engines/GUIs whose licenses are compatible with the Gnu GPL.

Post Reply