
It is reasonably fast, is multi-threaded, and I have employed a global hash table and made it lock-free using C++11 std::atomic
https://github.com/jniemann66/juddperft
Check it out, and feel free to provide feedback.
Sorry, windows 64-bit only at this stage; needs VC++ 2015 ( I have made a win32 build, but it is very Sloooow ... )