DTS-like SMP

ThinkingALot · Post by **ThinkingALot** » Thu Jul 25, 2013 8:09 am

http://talkchess.com/forum/viewtopic.ph ... highlight=
The algorithm described here is very similar to the one used in Gull. I've got some comments and questions.

Peter Österlund wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the helper thread fails high, the master thread still finishes searching move one before it finds the fail high result for move two in the transposition table and propagates the fail high result up in the search tree. In this case there is an opportunity for the parallel search to search fewer nodes than the serial search would search, by immediately aborting the master thread. I don't know if the DTS algorithm handles this case but I would guess that it does.

This issue can be solved with the use of setjmp/longjmp. However I found the gain from using such jumps to be minimal (it's buggy in Gull 2.1 but fixing the bug doesn't provide a significant improvement). Although in this particular case (master searching the very first move) such jumps may be useful.

Peter Österlund wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the master thread fails low on move one, it will start searching move two. When this happens the helper thread is "kicked out" of that branch and has to start searching somewhere else.

Another option is for the master thread to just start searching move three without aborting the helper thread working on move two. The helper thread should be aborted only if the master reached and searched (or skipped due to some helper already working on it) the last move in the list.

Peter Österlund wrote:The idea is that the master thread should quickly get to the same point where the helper thread was interrupted because of transposition table hits. However, there is still some overhead involved

especially if aggressive LMR is used... I think there are two ways of fixing this: 1) using setjmp create a return point for the master thread and have it join the helper as a helper thread; 2) try to obtain the state of a search tree from the helper (easy) and reproduce it within the master (hard). I tried both options but didn't obtain any measurable gain. Perhaps because my implementations were buggy.

Robert Hyatt wrote:The main problem with this is that DTS depended on being able to choose a split point in ANY subtree, of ANY thread

I think this is the very same thing but with depth>=SplitDepth restriction, which is necessary (with any algorithm) for two reasons:
-splitting at depth too low increases overhead;
-if a helper thread chooses a move with a small depth there isn't going to be enough time (cutoffs) for the history table to get up-to-date.

Recursion makes that a pain.

How so? PositionInfo PI[MaxDistanceFromRoot] array will do the job.

Robert Hyatt wrote:In practice, the book-keeping details are simply not worth the effort

At depth >= 5 Ply overhead is negligible.

Robert Hyatt wrote:and are likely non-portable as well since you have to deal with the stack in an ugly way...

No stack: just shared memory.

hyatt · Post by **hyatt** » Sun Jul 28, 2013 5:26 am

There's much more to doing DTS recursively than you imagine. alpha/beta/side to move/etc are generally passed as arguments. Which means they are local variables, along with things like move counters and the like. You can't easily get to them to peek at what is happening at every ply in a single path (searched by a single thread). It is not easy to "split back up there" when you are at depth=N in a recursive search. You already have pointers to the local state for that tree. To search in parallel, each thread needs its own private copy to use when searching below that node, including the original. It is not so easy to go back 10 plies and say "use a different pointer to a different split block now" since that is also a local variable. If you make everything array-based, indexed by ply, then it is no longer a true recursive approach.

I can't see any justification for aborting anything on a fail low. You can't "fail low" until all moves at a particular node have been searched, without changing alpha. So how on earth can one "fail low" at a node after searching just one move? That makes no sense to me at all. Fail high? Yes. But to fail high, you do, by definition, avoid searching all moves at a node anyway.

I also didn't understand your history counter point. History counters are generally global data rather than local, since they are global in nature as opposed to killers that are local to a specific ply.

For your last point, if you do recursive programming, you definitely have a stack. Otherwise you can't possibly have recursion. How do you return from a recursive function if there is no stack, ditto for allocating local variables for each recursive call, etc...

ThinkingALot · Post by **ThinkingALot** » Sun Jul 28, 2013 8:47 am

hyatt wrote:So how on earth can one "fail low" at a node after searching just one move?

"fail low" in the sense of a particular move not producing a cutoff. Better to call it "move fail low".

hyatt wrote:I can't see any justification for aborting anything on a fail low.

Here lies the main drawback of the algorithm discussed if compared with true DTS.
DTS:

...the idle processor specifically chooses a ply=S position and tells the owner that S has been chosen as the split-point ply. The processor with that subtree in progress then copies the complete tree state to a shared memory area called a "split block" (this tree state includes the various search bounds, move lists for each ply under analysis, current board position and other related search data such as the repetition list and so forth). Both processors can now extract moves from this shared data and search in parallel. Whenever one runs out of work, it simply repeats this process. (http://www.cis.uab.edu/hyatt/search.html)

DTS-like:

Code: Select all

If a helper runs out of work it just finds the best available split point.
But if the master runs out of work when searching a split point node it cannot join an arbitrary split point of some helper thread. Instead it finds a move (of the forementioned node), which is currently being searched by some helper, aborts that helper and starts searching the move itself.

In theory if the hash table is perfect and there's no history table, then the master is going to pick the search exactly at the point it was left by the helper. Thus there wouldn't be any nodes searched unnecessarily.
However in reality the master is going to generate moves in different order due to the history heuristic and assign these moves different depths due to LMR. Consequently, some moves are going to be searched deeper than necessary. Hence some slowdown is expected.
It's not obvious that 1) such a slowdown is significant 2) if significant the slowdown results in a measurable ELO drop. Only if the latter is true the DTS-like algorithm is worse than true DTS.

hyatt wrote:History counters are generally global data rather than local

That depends on the implementation. Regardless of the type of history used the MaxPositionalGain array ([piece][from][to]) is local.

hyatt wrote:since they are global in nature

Different branches of the search tree may lead to different types of positions with different moves being preferable. So a fast-updating history table makes sense.

hyatt wrote:For your last point, if you do recursive programming, you definitely have a stack.

Sure. I just meant that everything can be pre-copied into the shared memory so the master needn't do anything with the stack on demand. And saving/restoring the state of the stack is handled by setjmp()/longjmp().

hyatt · Post by **hyatt** » Mon Jul 29, 2013 1:08 am

setjmp/longjmp is not good enough. I need to back up 10 plies (to the place where I want to split) and then set things up. Then I want to go right back to where I was. Not possible after the longjmp. Also, are you going to do a setjmp() at EVERY node? Because you will need to be able to back up to there if that turns out to be the best split point. I don't see how this is going to work at all since setjmp() doesn't allow you to have an array of return points where you can pick just one...

OpenChess

OpenChess

DTS-like SMP

DTS-like SMP

Re: DTS-like SMP

Re: DTS-like SMP

Re: DTS-like SMP