Firstly, with Rybka 1.3 the node count was "normal", over 1Mnps on the given machine in 2004:
Subject: Rybka @ 64 bit
From: Vasik Rajlich
Message Number: 356880
Date: March 27, 2004 at 04:14:02
Rybka 1.3
---
Athlon 3400+, 32 bit win XP: 1,201,030 nps
Athlon 3400+, 64 bit win server '03, 32 bit .exe: 1,201,030
Athlon 3400+, 64 bit win server '03, 64 bit .exe (win DDK compiler): 1,506,580
The speedup is around 20%.
There is a small hopefully-temporory caveat, something is choking the UCI output
under the 64 bit OS. When Rybka sends all of the UCI "info XXX" commands to the
GUI, the nps rate drops by about 20%! The above is with all UCI "info XXX"
commands disabled.
Vas
Next (well, actually 4 days previous) there is part of a long post about the "Top Three" engines:
Subject: Re: Shredder 8 secret: search depth?
From: Vasik Rajlich
Message Number: 356131
Date: March 23, 2004 at 05:05:56
True. What exactly the "Big Three" engines do is not 100% clear, however after considerable
playing around I can make some observations/hypotheses.
Shredder is the most aggressively tuned, and the "deepest" searcher.
It's possible that it is not reporting its NPS rates truthfully
[...]
Depth I agree about [that it is hard to compare between engines].
There aren't too many ways to calculate NPS, though, this is real info IMO.
Finally, in response to Cozzie concerning Carrisco's noting that "nodes" in Rybka 1.0 actually decreased in some examples:
Subject: Re: The Rybka Flamewar & question for Vasik
From: Vasik Rajlich
Message Number: 487297
Date: February 17, 2006 at 04:23:50
[...]
>I would think that no matter how creative your counting scheme is, it should still increase monotonically.
>
>anthony
[...]
Actually, if you go in a debugger, you can trivially see that two quantities are being combined.
One I call "gulp", this is for me the interesting figure (for my private tests).
The second is a simple ticker for the next I/O check.
As I've pointed out elsewhere, the "modern" Rybka node-counting method simply counts White make-moves and divides by 7, so it is low by a factor of ~14, or maybe 15-16 if you include null moves in your accounting. I've never been interested in figuring out how the parallel speed-up is computed, but I think only the primary cpu counts nodes, and there is a generic multiplier folded in.
