The Evidence against Rybka

Code, algorithms, languages, construction...
Post Reply
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

The Evidence against Rybka

Post by BB+ » Mon Jul 04, 2011 2:17 pm

I will be expanding this thread, to state specifically what evidence against Rybka was both considered, and then which parts of it merited inclusion in the Report (which I had no direct part in writing), but for now I'd like to highlight this comment:
This "paper" uses an embarrassingly amateurish methodology for looking for correlation between feature sets.
The EVAL_COMP work aimed not at the most rigorous statistical methods, but rather in something readily understandable to the Panel members (else the final number of opinions might have been even lower than it was). One thought was that additional layers of statistical abstraction wouldn't change the conclusion in the end, but would lead to a general glazing-over of eyes [and those who are more stats-conversant could use the data given in the PDF to run whatever tests they prefer]. Most computer chess programmers will have heard of Gaussian distributions, but perhaps not of more abstruse statistical measures.

Regarding 54% for Crafty/Fruit overlap (in D.2.3 of RYBKA_FRUIT.pdf): this was in a preliminary (and cruder) version of the EVAL_COMP analysis, with the final Crafty/Fruit number being 34% when the whole process of comparison had been decided upon. Furthermore, this 54% is unidirectional (that is, it measures the Fruit/Crafty overlap, but not the Crafty/Fruit overlap, and as mentioned therein, Crafty contains about 10-15 features that Fruit lacks). The Rybka/Fruit number in that comparison was 78.3%, not 73%. Indeed, the whole EVAL_COMP document was produced precisely to quantify a relationship between raw percentages and expected "random" occurrences. As there didn't seem to be any great need in distinguishing between a 1 in 10^6 and a 1 in 10^8 (or more) occurrence, rather crude statistical measures were used (and at any rate, whatever method was chosen would be subject to question, while Rajlich was free to re-analyse the data if he thought he got a raw deal, etc.).

In any event, it's nice to see some discussion of the evidence, and not the Process.

EDIT: On a different note,
I was going to add some proof that Mark was looking at functionality rather than code copying or transliteration, but it was so trivial that I'll just reference one item, where he calls the similarity between Rybka's evaluation function and Fruit's evaluation function his most damning piece of evidence:
Rybka's evaluation has been the subject of much speculation ever since its appearance. Various theories have been put forth about the inner workings of the evaluation, but with the publication of Strelka, it was shown just how wrong
everyone was. It is perhaps ironic that Rybka's evaluation is its most similar part to Fruit; it contains, in my opinion,
the most damning evidence of all.
The inner quotation is from Zach, not from me. As for the question of Rybka/Fruit evaluation feature functionality (and its relation to copyright): I'll probably address that stuff a little bit later.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Tue Jul 05, 2011 10:03 am

This "paper" uses an embarrassingly amateurish methodology for looking for correlation between feature sets. One need only Google on covariance between feature sets to see how this can be done in a reasonable manner. [...] I've seen work by third graders with more statistical rigour...
To the best my of knowledge, none of the top Google™ search results is directly relevant to the current situation. It seems that there has been some sort of misunderstanding (or miscommunication) regarding what was actually done in the EVAL_COMP comparison.

In particular, as can be seen in the Introduction, a "feature" is not a conglomerate like "redness" for which there is a given score [often binary, that is 0 or 1] against some fixed scale, but rather each evaluation feature leads to a pairwise comparison for each engine pair. That is, there are not merely 9 random variables [one for each engine] for each of the 48 "evaluation overlap features" [note that these random variables themselves are sometimes conflatedly also called "features"], but rather 36 random variables, albeit with many (possible) correlations among them. For instance, if one knows that (for a given feature) the A-B overlap is 70% and the A-C overlap is 70%, this [presumably] gives some bounds on the B-C overlap, but doesn't force it to be anything. This is in contrast to what is sometimes meant by "feature set" [where one might have that A gets a 75% redness score, B has 60% redness, and C is 25% -- note that these "redness" scores are on A,B,C themselves, and not on the A-B, A-C, B-C pairings]. An example of this other concept would be in Interspecific analysis of covariance structure in the masticatory apparatus of galagos, where such features (like condylar width) are measured on an absolute scale.

One need only look at many of the examples in the EVAL_COMP (say "backward and weak pawns") to see that this other notion of "feature set" is inapplicable in the application here -- there is no "absolute" measurement to be made concerning a "backward and weak pawns" feature, and so by necessity one measures the relative overlap [getting, as noted above, 36 "random variables" rather than just 9]. A covariance analysis on the 36 random variables of pairwise data could presumably be done, but I am not sure there is much guidance on how to interpret (say) the correlation between the various Rybka/X and Fruit/X comparisons in terms of an overall likelihood that the Rybka/Fruit overlap is "not random" as it were.

Again I don't say that the EVAL_COMP methodology is necessarily the best way to measure overlap in the given case, though Adam Hair (independently, it seems) chose something quite close to it when analysing "similarity data" on TalkChess [there is no particular "bestmove" in any of the positions, so one measures the pairwise overlap between datasets -- one could make a similar statement about, say, DNA sequences (though the "sequencing" aspect plays a role there, as does alignment issues)]. One can argue about the right scaling of the standard deviation [a significant problem with his set, as the engines fit into a number of "similarity" groupings], whether the pairwise correlations themselves have interconnections [such can be analysed, if desired], and, as was pointed out in that thread, whether student-t (finitely many degrees of freedom) is superior to Gaussian (infinitely many). None of those appear to matter much in the EVAL_COMP analysis, unless you care whether the probability of the Fruit/Rybka evaluation feature overlap was 1 in 10^6 versus 1 in 10^10 or the like.

My apologies if I have misunderstood what the complaint might be, but attempting to interpret inspecific criticisms is never an easy task. A recent work on the difficulties of statistics with pairwise measurements [which again is one those phrases that can have variant meanings] is http://www.cs.princeton.edu/~blei/paper ... ng2009.pdf

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Tue Jul 05, 2011 11:46 am

BB+ wrote:My own preference is to adjudge the matter by the "precedent standards of computer chess"
[...] Where are these enumerated?
The ICGA webpage lists a partial selection of various precedents, where at least some of details of the decisions can be found. Other cases (such as Hyatt/Berliner in 1986) appear in the ICCA Journal. Another writing of interest, perhaps concerning why "evaluation feature overlap" was of import [there was a bit of discussion amongst the Panel regarding its merit], could be World Computer Chess by Hayes and Levy (1974), with again some relevant parts being on the ICGA Wiki.

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: The Evidence against Rybka

Post by Adam Hair » Tue Jul 05, 2011 12:50 pm

BB+ wrote:
This "paper" uses an embarrassingly amateurish methodology for looking for correlation between feature sets. One need only Google on covariance between feature sets to see how this can be done in a reasonable manner. [...] I've seen work by third graders with more statistical rigour...
To the best my of knowledge, none of the top Google™ search results is directly relevant to the current situation. It seems that there has been some sort of misunderstanding (or miscommunication) regarding what was actually done in the EVAL_COMP comparison.
After perusing the first three pages of links, I concur.

Though you don't need my support, the use of more sophisticated statistical tools was/is unnecessary in the EVAL_COMP comparison. Your rigour is sufficient for the data.

Thank you for your comments concerning the "similarity" data. I am currently preparing for a new run of tests. Though, Richard Vida's posts on TalkChess (http://talkchess.com/forum/viewtopic.ph ... 71&t=39577) have me questioning what I am actually measuring ( the similarity of the entire Eval() or just piece square tables + material values).

Adam

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Tue Jul 05, 2011 3:20 pm

I might also explain the purported discrepancy between
the use of exactly the same evaluation features
and the final 74% "evaluation feature overlap" number from EVAL_COMP.

The first comes from the initial RYBKA_FRUIT document, in which there was no "fine-grained" analysis of evaluation features -- that is, every "feature" was considered in bulk, so "overlap" was a yes/no answer (e.g., engine considers bishop pair -- yes or no). It was only later that the process was refined [after such a yes/no procedure was deemed too unsuitable], and "partial scores" were given. Furthermore, the "exactly" here was meant to say that everything in Fruit was definitely in Rybka, but not necessarily vice-versa (there are only a few minor exceptions in any event). E.g., "Rybka uses exactly the same 20-odd features that Fruit does, and maybe two or three others." At this level of crudity, the claim is indeed true.

To wit, both of them consider (taking the list from D.2.3 [an early EVAL_COMP version], or from the narrative in Section 3 of the above PDF): knight mobility, bishop mobility, trapped bishops, blocked bishops, opposite bishop endings, rook mobility, rook on open file, rook on semi-open file, rook attacks toward opposing king from such a file, rook on 7th, blocked rooks, queen mobility, queen on 7th, doubled pawns, isolated pawns (open/closed), backward pawns (open/closed), king danger (contact squares), shelter/storm (3 files), candidate pawns, blocked passers, free passers, distance to passers, and opening/endgame interpolation. Rybka has a bonus for being on-move and lazy evaluation.

Of course, some of these terms are so broad that a refined analysis is needed (such as king danger), and this is what was done with EVAL_COMP. In all cases for the above terms, even when the phrase is quite broad, I think it is the case that the EVAL_COMP analysis found the overlap of Rybka 1.0 Beta and Fruit overlap to be at least 0.6 for each of these terms (or their nearest equivalent), with the exception of "free passer" it seems (the preliminary analysis did not take into account that Rybka has a rank-based scaling for "free passers", which differs from the constant bonus of Fruit -- however, Rybka splits the "passer" bonuses up in any event, and meanwhile the "unstoppable passer" was given separate consideration in EVAL_COMP). [There were, however, other features added (such as "bishop pair" and "draw recognition") for which the Rybka/Fruit overlap was less than 0.5].

So one should not mix-and-match statements from different documents, at least not without determining the exact connotations therein. It is indeed the case that Rybka includes the use of exactly the same evaluation features as Fruit in the sense implied by the original RYBKA_FRUIT document, and also that there is (only) 74% evaluation feature overlap in the final analysis, and there is no contradiction here once one understands what each statement means.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Wed Jul 06, 2011 10:06 am

Here I'll start expanding on some of the evidentiary issues that were considered.

First, there was the question of the purpose of the Panel Investigation -- what should come out of it? Two issues were to be addressed, the first being the "hard" evidence of breaking Rule #2 by plagiarising Fruit, and the second being the "soft" evidence -- the latter would include such things as an assessment of Rajlich's credibility, which would aid the Board in some possible circumstances, e.g. (to quote CW): In the absence of source it would also be possible for VR to simply solemnly swear that he did XYZ.

The evaluation features were analysed after much discussion, with EVAL_COMP being the end result [it can be a bit confusing in that the Report also quotes various prior versions of the "evaluation feature" comparison, the first of which was merely qualitative]. The first question was whether the word "originality" in Rule #2 contained "evaluation features" as a possible subheading, and the second was whether the Rybka/Fruit overlap therein was out of the ordinary. After some discussion, it was agreed that "originality" did include sets (and implementation peculiarities) of evaluation features. The second question demanded quantification, and the final EVAL_COMP document was accepted to show that there was vanishing small chance that Rybka's evaluation features were "independent" of those in Fruit from the standpoint of originality.

Following the outline of the original RYBKA_FRUIT document, I might mention the other concerns. The root search similarities never came up, probably because it was viewed as unneeded due to the other evidence, and also that it was not so easy to quantify. Similarly with PST and File/Rank/Line weights.

The data structures with hashing were never discussed, while the quad()-like weightings for passed pawns were (I think) agreed to be insignificant by themselves [as there is a unique parabola that is 0 for ranks 2 and 3 and 100 for rank 7]. The UCI parsing was also not discussed, nor was time management ("0.0"), or the recurrence of parts of the search_root_t structure (which will likely be more significant for copyright infringement, showing fairly definite "direct copying").

For the general question of Rajlich's credibility (see, e.g., the "node counting" section B.3.1 in RYBKA_FRUIT), it seemed that the Crafty evidence already sufficed (which might also indicate why the things mentioned in the last paragraph were never discussed explicitly).

The specific point that any breaking of "Rule #2" must occur in an ICGA event was also discussed, and there was agreement that (unless Rajlich claimed otherwise -- again perhaps a credibility issue) something like Rybka 2.1 sufficed for the 2006 WCCC, while Rybka 2.3.2a was almost contemporaneous with the 2007 WCCC. Almost every version (thanks to Dann's collection) from Rybka 1.0 Beta to Rybka 2.3.2a was at least glanced at, and the history of how the evaluation changed over this period was enumerated. This was sufficient to convince various Panel members that there had indeed been little change (or attempt to remedy the Fruit issue) with the Rybka evaluation function over these 18 months [LK was hired near the end of this, but hadn't done too much concrete yet, it seems]. Finally, it seemed that Rybka 2.3.2a was a logical stopping point, as this was already a version that had won the WCCC, so that there was definitely something significant for Rajlich to answer (i.e., the evidentiary burden had been shifted).

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Wed Jul 06, 2011 1:01 pm

Another question that has been brought up regards: whether raw evaluation function output [which can be obtained w/o too strenuous of effort via RE] is a "better" way to measure engine similarities. This depends on the meaning of the word "better"... :)

Suppose that I copy the evaluation code of engine X [including in particular the "features"], but tune the numbers. Then the raw evaluation function output won't necessarily detect it [e.g., it might only show up at 3 sigma, which might be thought too weak].

Suppose that I copy 50% of the evaluation features of engine X verbatim [say, all the piece evaluation, including the ordering of features, but not the king safety] with the same numbers (or I copy a "large chunk" of the evaluation, like PST and material imbalances). Then the raw evaluation function output won't necessarily detect it.

I would consider both the above scenarios to be "unoriginal" under ICGA Rule #2. In short, any comparison with the raw evaluation function output will detect sufficiently close evaluators, but won't necessarily detect other things that might be considered to be a transgression of "originality". So I would say that the "better" way to measure similarities for the ICGA "originality" definition would be to separate the "evaluation feature comparison" from the "numerology similarities", whereas the raw evaluation function output (which can indeed be quite useful) essentially lumps these together.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Wed Jul 06, 2011 2:19 pm

Regarding EVAL_COMP:
This would seriously penalize someone who threw in every feature they could think of, and then used optimization techniques to assign appropriate weights (this is a pretty standard method for designing automated classifiers). It also overstates similarity in cases where feature weights differ. In fact, in the limiting case, you can have the exact same feature sets, with no overlap at all, because any feature in one set is zero weighted in the other (this is also common in automated classifiers).
The EVAL_COMP methodology would not necessarily penalise such an engine X: though X contained all the features of Y (for each other engine Y), not all features of Y would be in X, and EVAL_COMP is bi-directional (admittedly doing this in a somewhat crude manner).

Furthermore, as noted elsewhere, a "feature" is not a conglomerate, so it kinda hard to "throw in every feature" when many of them do quite redundant things (e.g., every engine has slightly different "backward and weak pawns" -- do you really need/want N versions of this in your engine, all weighted differently?). In the context of chess engine, this seems unlikely to work without subsequently throwing out some of the redundant/unimportant features (as otherwise evaluation would be hopelessly slow), after which one would have no reason to suspect that the resulting "feature list" of engine X looked much like any of those which it (quite neatly, I agree) harvested. Incidentally, there were examples of engines which had "zero-weighted" features (such as QueenCentreOpening in Fruit, or various commented-out code in Pepito). In the context of chess engines, these are somewhat odd in that they simply slow things down (there were some examples in Rybka 3, but it seems that IPPOLIT eliminated them).

If one (either Rajlich or someone else) wants to make a serious argument that Rybka 1.0 Beta could be or was indeed made by the process proposed (throw in all known features, and run an automated classifier), I think the ICGA would consider such evidence. I simply can't see how this could remotely produce the great similarity to Fruit evaluation features, though if a classifier program that could do so was indeed produced, I'd obviously be wrong. Suffice to say that I don't think the "ball is my court", as it were.
Aside from being totally arbitrary (how does one determine that a similar feature is worth 0.5 rather than 0.8?), this also doesn't bother to factor in associated weights.
I think this is about as arbitrary as the legal standard for Nevada libel cases. :) I personally am still awaiting anyone to take up the challenge of going through the various open-source engines listed and inventing their own numbers. I suspect that any reasonable attempt that follows the methodology given would lead to the Rybka/Fruit overlap being 1 in 10^N, where N is at least 5, likely more. As for the question of "feature weights" being separated from "feature similarity", see my previous post. [In some cases weightings were taken into account in EVAL_COMP, though I agree that in general it excludes them]. If the question is whether the methodology itself is flawed, that's a different question [and indeed, one that was considered in the Panel].
For reasons described above, a maximum of one significant digit should be associated with these numbers of features.
I disagree with this (though it might be that this is talking about accuracy, while I am talking about precision, the latter being what is usually reported as "error"). I think the "observations" are 95% determinable at least to within 0.2, and as there are 48 features, this would indicate a maximal 95% error bar of around 1.4 -- however, no engine has all 48 features (30 is the average), and many of the "observations" have less "error" than 0.2 [when they match exactly for instance], so something more like 0.8 seems closer to me for a (statistical) 95% confidence bound. That is, if I did the analysis again 6 months now w/o looking at the prior numbers, I would expect 95% of the 36 pairwise scores (in other words, all but 2) to be within 0.8 in the Accumulation of Scores. [This assumes that I first didn't re-invent the "features" to be used].
However, for some reason, after a ratio is taken, they end up with three significant digits of precision. When you're about ten years old, they teach you that you can't grow your precision by performing arithmetic operations. I think this is still true today...
The Accumulation of Scores lists values to one place after the decimal, as (gasp) each feature does so. It is correct that the Percentage Table should be more circumspect, but my copy editor chose to use the same format as the previous table. :shock: Using sigfigs is a grade-school technique in any event. :) Indeed, I forget whether sigfigs are supposed to be 1-sigma [which is my recollection] or 95% intervals -- given (see previous paragraph) that the sigma for a particular comparison is about 0.4, it seems that the numbers are precise to about 1%, so indeed the 3 sigfigs is (slightly) gratuitous. This does not seem to affect the gravamen of the case against Rybka.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Wed Jul 06, 2011 3:10 pm

Some brief comments (not quite a "long back and forth dialog", so maybe I have the wrong stuff -- again it was unreferenced other than "It's all on the record for anyone that wants to read it") from Rajlich toward Cozzie when Strelka was an issue (Apr 2008) have been cited in his defence. Here is (what I think is) the relevant thread: http://rybkaforum.net/cgi-bin/rybkaforu ... l?tid=3172

I personally think [you can see my comments there] that Cozzie uses a wrong accounting with how much Rybka was optimised (conflating 32-bit and 64-bit numbers a bit).

In any event, I think the relevant comments Rajlich makes are the following.

http://rybkaforum.net/cgi-bin/rybkaforu ... 7#pid42407
Responding to Cozzie's claim that Rybka is "highly optimised", as it were:
Vasik Rajlich wrote:Sorry, this is just nonsense.

In fact, I've never even profiled Rybka and spend zero time on optimization. This should be quite obvious from the sources - there are no unrolled loops or other arcane constructions, assembly sequences which don't map to C, etc.

It just doesn't seem like a productive area, especially long-term.

A few comments for any bitboard fanatics who might be browsing here:

1) I typically put elegance and simplicity before speed. Don't look for too much meaning at the low level. Someone like Gerd Isenberg could probably speed Rybka up by 10-15% without crossing over into any really hard-core optimization.

2) I've always just used plain Crafty-style rotated bitboards, and haven't yet managed to try anything else. My intuition is that the magic number approach (which wasn't around when I started) would be a little bit better. This may be doubly true for Rybka, as I suspect that she pollutes the cache more than most engines. If I started today, this is probably what I'd go with.
Cozzie replied vehemently that what Rajlich said was BS, and I don't think VR much entered into the scuffle after that, other than to say that NPS has little to do with optimisation, and the following:
Ok, if you look hard enough, you'll probably manage to find some examples where I sacrificed elegance for speed. It's apparently not that easy, because in your examples you're 0/3.

I) Separate PV search - this is to help me think. PV nodes and scout nodes are just different conceptually. My way shouldn't be any faster, as the multiple routines will pollute the cache.

II) Moves & scores together - my way is definitely more elegant. The point you're missing here is that in the Rybka code the history stuff is encapsulated. What you're looking at is post-inlining - the hackers just didn't clean it up.

III) SEE - again, my way is just better, simpler, more compact, more to the point. Why output some value which you don't plan to use?

I also don't see why you're getting so worked up about this. You're not taking Knuth literally, are you? :)
http://rybkaforum.net/cgi-bin/rybkaforu ... 3#pid42413
Vasik Rajlich wrote:FWIW, re. search vs eval - I think that they're roughly equally important.

In Rybka, we tend to go through phases.

In late 2005, I went through a big eval phase. I think that this was the strongest point of Rybka 1.0 - although Larry seems to disagree with me about it :)

Throughout 2006, I basically just worked on search. This culminated with Rybka 2.2n2 and WinFinder.

When Larry joined me in 2007, we went back and worked on the eval again, under a completely different philosophy. The Rybka 2.3.X versions were a sort of early prototype of this method. I was quite happy with these steps and now they've been taken much further, to the point that Rybka is nearly unrecognizable.
So I don't see anything much here that addresses (say) the Strelka/Fruit overlap.

There was a subject branch to this "Sneak a peek into the lab", which did not involve Anthony Cozzie, and did not mention Fruit. There was also the Strelka 2.0 thread, where AC famously used his "Fritz5" colloquialism to describe that Strelka was a "bean-counter" rather than a "knowledge" engine, but it doesn't seem that Rajlich said anything in that.

So I can't find any evidence of any real "dialog" about Fruit/Strelka, let alone Fruit/Rybka. The most VR seems to have said about Strelka is when he claimed it as his own [and Fruit was not mentioned]. There are only a few times I can find where VR marginally addresses the Fruit issue, and usually he is not too direct and gives few details. For instance, he answered Zach's evidence (Aug 2008) by saying that the discussion looked like a mess [rather true], and could someone please summarise the main points [Zach's made an enumerative "Rybka's piece square tables are generated from the same code as Fruit's" post at the Rybka Forum back then, but now can be found via a Google™ search, so maybe it's been hidden -- I don't recall if VR responded, but I think he did, saying about as much as he did with Schüle]. In correspondence with Schüle (Jun 2010), VR did little more than assign percentages to what he thought of Zach's claims.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: The Evidence against Rybka

Post by BB+ » Thu Jul 07, 2011 1:47 pm

BB+ wrote:
Aside from being totally arbitrary (how does one determine that a similar feature is worth 0.5 rather than 0.8?),
I think this is about as arbitrary as the legal standard for Nevada libel cases. :)
Perhaps not the best place for me to insert a joke. Indeed, the difference between "arbitrary" versus "subjective" could be an important one. And while I certainly know persons who like to indulge in a philosophy of scientism/objectivism on occasion, in most life-situations some amount of subjectivity is typically applied in a decision process [in fact, sometimes even an "arbitrator" is used].

The EVAL_COMP methodology was trial-ballooned before being thrust upon the Panel. At least one person was sceptical at first [not sure of the reasons], but later there seemed to be agreement that it was a decent (if imperfect) procedure. The alternative, that is, having a complete lack of quantification, was seen as worse than having something "subjective" (though with enough discussion for someone to replicate the experiment with the same methodology, if desired -- also, I think I still have some "mark-up" pages with various notes about the numbers). The closest judicial analogue to the EVAL_COMP result would seem to be an "expert opinion", which also expresses its evidentiary nature.

It is indeed the case that the value of "expert opinions" can sometimes be diminished via extenuating circumstances [e.g., a major methodological flaw behind the reasoning -- of course, tedisome lawyers will often attack the most benign of errors simply as a tactic of discombobulation], but the more typical way to rebut them is to present an alternative "expert opinion" [a third option, that of simply (and resolutely) labelling the "expert opinion" as "arbitrary", also gets used by a substrata of lawyers, but tends to have a low win%]. In the case at hand, anyone who reads the EVAL_COMP paper can then pretend to be an "expert" [as indeed the writer of it did at one point] by accepting the narrative and filling in their own numbers. As stated previously, I suspect that I can replicate my own numerology within something like 0.8 overall for 95% of the 36 pairwise comparisons -- I would expect something similar to be true of anyone else making a serious effort.

A more burdensome assignment would be to re-do the whole "feature selection" and/or narrate various other programs [yet another level of yoke could be assumed by including some engines that are not open-source]. A completely different project would be to either measure "evaluation feature overlap" [a major onus of "nonoriginality" with Rybka] in an alternative manner, or to make a convincing argument that all such pseudo-measurements of "evaluation feature overlap" are inherently flawed. One could also strive to demonstrate that "evaluation feature overlap" has nothing to do with "originality" (in ICGA Rule #2), but it seems to me that both the Panel and Board were fairly uniform in its acceptance as such.

Relatedly, what does "substantial similarity" mean in a copyright case? One typical procedure would be for the plaintiff to present an "expert opinion" that enumerates the various similar aspects: some of them "objective", and some of them "subjective"; some of them "qualitative", and some of them "quantitative". Any decision-maker would take this "expert opinion" into account like any other piece of evidence, possibly accepting it, and possibly rejecting it.

Post Reply