Replication of the famous rabbis experiment - a reply to Doron Witztum

Brendan McKay, Australian National University
Gil Kalai, Hebrew University of Jerusalem


We answer the claims made by Doron Witztum against the Torah codes experiments conducted by McKay et al using data provided by an independent expert. We show that, rather than achieving the refutation he wanted, Witztum provided additional direct evidence that his "research" is conducted by scientifically invalid methods.


The paper by McKay, Bar-Natan, Bar-Hillel and Kalai (MBBK) published in Statistical Science in 1999 described several Torah codes experiments conducted using data compiled by an independent expert.

Specifically, we wished to test the claims of Witztum, Rips and Rosenberg (WRR) that the birth and death dates of famous rabbis of the past are encoded in the Hebrew text of Genesis by means of Equidistant Letter Sequences. This claim was supported by their 1994 paper in Statistical Science, which described an experiment using data they say was compiled by their expert Professor Shlomo Havlin of Bar-Ilan University.

WRR did two such experiments, based on lists of names and appellations of famous rabbis, together with the dates of their births or deaths. These are commonly called the "first list" and the "second list". The issues to resolve in preparing the data include determining the correct dates, but the most problematic is that of deciding which names and appellations to use and how to spell them.

To avoid the subjectivity of making these decisions ourselves, and to ensure high quality of the data, we engaged Dr. Simcha Emanuel, a specialist in Rabbinical History at Tel-Aviv University, to prepare names and appellations according to his professional judgement. He also investigated dates of birth and death. For two experiments, Emanuel used his own judgement to choose which appellations to include and how to write them. For a third, Emanuel was shown the first list of WRR and asked to prepare appellations for the second list of rabbis following the same practices.

To cut a long story short, the data used by WRR shows a strong effect consistent with Torah codes existing, but the data supplied by Dr. Emanuel does not. In other words, our replication of WRR's experiment failed to show the phenomenon they claimed.

Doron Witztum has now provided a response to our experiments. Witztum's style is to present argumentation against every syllable we write, but our reply is going to restrict itself to a few of the key issues.

Part A - Witztum makes a remarkable demonstration

Two rabbis that appeared on WRR's second list were not included in our list because they fail the criteria for inclusion. (This is discussed more in Part B.) Witztum wanted to put them back to assist in his "refutation", so he needed to know the appellations Emanuel wrote for them. Here is how Witztum reports on his solution to this problem:

[Witztum:] A conversation with Emanuel helped us to deduce the missing appellations:
For no. 16: RBY YHWDH, @YA$, @AYA$, YHWDH @YA$, YHWDH @AYA$.

[Our transliteration scheme for Hebrew letters is defined here.]

As Witztum states in his article, Dr. Emanuel had already provided us with appellations for those two rabbis. Here they are:

The interesting thing to notice is that, in the appellations he gave to us, Emanuel spelt "Ayash" using two yuds, whereas Witztum spells it with only one yud. (The extra appellations for no. 5 are of minor importance as WRR's experimental method removes appellations with more than 8 letters.)

Note that Witztum does not actually claim that he is using the appellations given to him by Emanuel, merely that Emanuel "helped [him] to deduce" them.

We asked Emanuel whether he had changed his mind, but he still maintains that the correct spellings use two yuds, citing the tombstone of this rabbi and the frontispieces of his books.

Conclusion: Witztum did not use the spellings provided by Emanuel.

Witztum's spellings work better for him than the spellings preferred by Emanuel.

If one decides to do an experiment using an expert to compile the data, the rules of good science obligate one to use the data the expert believes to be correct. This is especially true in a case, like this one, where it has been thoroughly established that different reasonable decisions about the details of the data can drastically effect the results. Even if Witztum feels that his spellings are more correct than Emanuel's, it is incorrect procedure to deny the expert's clear wishes. Otherwise, the assumptions behind the statistical analysis are violated.

We also wonder why Witztum's article fails to mention that he changed the spelling.

Emanuel also reported to us that Witztum tried several times to convince him to change his mind about various spellings, or to add additional appellations. Since the choice and spelling of appellations is the primary determinant of success in the experiment, this attempted interference is a serious methodological mistake. This, and especially the "Ayash" incident, remarkably improve our understanding of WRR's experiments:

Lesson: The entire success of WRR's original famous rabbis experiment can be explained by supposing that Witztum originally interacted with his own expert (Havlin) in the same unscientific manner that he recently interacted with our expert (Emanuel).

In other words, Witztum has just treated us to a simple explanation why the "Torah codes puzzle" is not such a puzzle after all.

Part B - Answers to some specific claims

Witztum's claim #1: The data provided by Dr. Emanuel was changed without his knowledge.
Reply: The data provided by Emanuel was used exactly as he provided it. Witztum's specific allegation is that we changed the list of rabbis Emanuel provided. However, Emanuel did not provide any list of rabbis. He just provided appellations and dates for rabbis we asked him about and played no other part. The list of rabbis was chosen by us by strictly applying the rules set down by WRR (see claim #2).

In general, Emanuel was only told the bare minimum about how his data would be used, in accordance with good scientific practice. Nevertheless, after the experiment was finished, Emanuel approved the description of it that appeared in our paper.

Witztum's claim #2: In repeating WRR's second experiment, we were obliged to use the same rabbis and the same dates.
Reply: Witztum's reply overlooks the purpose of our experiment. We wished to evaluate his evidence for the codes, which involves all aspects of his experiment and not just the appellations. A proper replication has to be as independent of WRR's experiment as possible. We were obliged to redo (not copy) any process of decision making that may have contributed to WRR's result. That includes the questions of choice of rabbis and choice of dates, not just the choice of appellations.

Concerning the choice of rabbis: When WRR did their experiment, they stated a rule which defines which rabbis were to be included. They are those given from 1.5 to 3 columns in the encyclopedia of Margaliot with a date of birth or death given. However, WRR applied the rule carelessly and made at least five mistakes (see our paper). Witztum is claiming that we were required to make the same mistakes in our experiment. It is hard to believe that anyone with scientific training could make such a claim. In our replication we tried very hard to apply WRR's rules correctly. One necessary exception was that Rabbi David Ganz had to be excluded because he was already in WRR's first list. (WRR excluded him for the same reason.) Otherwise, neither Witztum nor anyone else has found an error in our selections.

The issue of dates is not much different. In their own experiment, WRR replaced or deleted some of Margaliot's dates on the basis of historical evidence. We did not originally ask Emanuel to do the same, but of his own accord he started to make comments about the inaccuracy of some of the dates in the Margaliot encyclopedia. Emanuel is an expert on such historical questions, so we then asked him to check all the dates. This gave us a compilation of dates of the best possible historical accuracy without the need for any subjective choice of our own.

Part C - Emanuel's data versus Havlin's data

Our experiments with Emanuel's data failed to detect any "codes", as is completely obvious from the results, but Witztum has an answer: Emanuel merely found fewer appellations; otherwise they perform about the same as WRR's appellations.

Witztum is correct in observing that Emanuel's appellations are fewer than WRR's, and also that most of Emanuel's appellations were also used by WRR. The situation is illustrated by the following diagram.

Witztum's method is to compare the performance of Emanuel's appellations to that of a random subset of WRR's appellations of the same "size" (where the precise definition of "size" need not concern us here). He finds (he says) that Emanuel's appellations are much the same as a random subset.

Witztum's calculation has too many severe errors for it to be taken seriously. It relies on using a choice of rabbis that violates his own rules, and some dates that are known to be wrong. Worst of all, there is no mathematical sense in comparing random subsets to a set written by an expert. The latter has strong internal structure that effects its behaviour in a way that random subsets cannot match.

These considerations are enough reason to reject Witztum's conclusions, but they aren't really necessary. Whenever replication of an experiment fails to give the same result as the original, it is always possible for the first experimenter to point to some differences between the experiments and claim that the "explanation" for the failure of the replication lies with those differences. The mere fact of differences existing does not alter the fact that the second experiment failed to confirm the first, and does not alter the fact that the onus of proof has moved back to the first experimenter. This is a standard scientific principle.

However, since Witztum has raised these issues, we wish to point out that his evidence does not even point in the direction he wishes. Appellations are not all equal. Some are well-known and widely used; others are more obscure. It is clear than an expert like Emanuel who selects only a comparatively small number of appellations, is, on average, going to favour the important ones and shun the obscure ones. The codes hypothesis promoted by Witztum implies that important appellations should perform better on average than unimportant appellations. (We can see this because he very often invokes the importance or popularity of an appellation in arguing whether it should be included or not.) If so, Emanuel's appellations should perform much better on average than WRR's appellations. They don't, even by Witztum's own defective calculation.

In fact, exactly the opposite is true. The set of appellations chosen by Emanuel has nearly the same "size" (in the same sense as before) as the set of appellations chosen by WRR but not chosen by Emanuel (indicated by blue shading in the diagram). However, according to the same success measure used by Witztum, Emanuel's appellations perform 250 times worse. This flatly contradicts Witztum's conclusion, and shows instead that WRR's appellations are very different from Emanuel's.

We don't know a method of assigning a significance level to the factor of 250 just mentioned that satisfies our strict mathematical standards. However, since someone is sure to ask, we note for the record that Witztum's method of choosing random subsets gives a "significance level" smaller than we could measure: at most 1 in 65,000.

We also note that these experiments directly support our hypothesis that WRR's data was not compiled objectively. The wiggle-room available in selecting appellations is greater amongst the uncommon ones than amongst those which are well-known. This predicts that the "phenomenon" seen by WRR should appear more in the uncommon appellations - oppositely to what the codes hypothesis predicts. And this is exactly what we observe.

Postscript (July 2001)

Witztum published a reply to the above article on July 6, 2001. As a consequence, we make the following comments:

Witztum claims that Emanuel did not approve the description we published of our experiment.
The facts are that on Feb 4 1999, and again on March 2 (after some minor edits) we sent Emanuel the full text of that part of our paper which describes our experiment with him. This was in accordance with a prior agreement that he would approve what we wrote about him before we published his name. Soon afterwards he gave his approval by telephone to our assistant.
[Note that Witztum has not provided the slightest evidence that Emanuel disapproves of anything we wrote. In fact, he does not. The best Witztum can do is to claim that Emanuel did not know about the choice of 33 rabbis. That was completely proper: the choice was not up to Emanuel but was made according to Witztum's own rules (which he doesn't deny). It was also completely proper to ask for appellations of those rabbis which Witztum included incorrectly, as their ineligibility has been public knowledge for years and there was no question but that we were bound to exclude them.]

Witztum claims that we never asked Emanuel to check all the dates.
The facts are that on Feb 4 1999 we wrote to Emanuel: "In the case of rabbis for which you did not previously make a careful investigation of the date of death, please do so. It is very important that we use the best available date for every rabbi." On Feb 20, he replied: "I investigated most of the death dates during my work of the last few months... In last two weeks I investigated the death dates of those names I didn't check before... So, now I [have] investigated the death dates for everyone on the new list.".
Let us take this opportunity to ask Witztum again why he did not take Havlin's advice on the dates.

Witztum claims that Emanuel used the wrong spellings for Rabbi Ayash's name, and that he was wrong to claim there was a dispute over the death date of Rabbi Hasid.
The facts are that we are obliged to ignore Witztum's opinions. We had employed an expert to make decisions to protect the data from our bias, according to the best scientific practice, and were obliged to accept his decisions. Witztum himself is obliged to use the appellations in the precise form provided by Emanuel.

Doron Witztum does not seem to understand that when he bases an experiment on data provided by an expert he cannot use his own judgement to edit that data. (This matter was explained to Witztum repeatedly and one of us (GK) personally tried to explain it to him back in 1993.)

As we said above, Witztum's mistake of trying to edit data provided by an independent expert goes a long way towards explaining the "Torah code puzzle". Incidentally, if he was not sure about these spellings he could have asked us. On several occasions (most recently in May 2001) Witztum asked us questions of similar nature and we provided him with the information he requested.

In summary: We stand by what we wrote in the article above.

