DOWSING -looking at scientific evidence

Very simple graphs of the Scheunen experimental results (plots of dowsers' choices vs. water-pipe locations), when thoughtfully considered, clearly demonstrate that dowsers with their various kinds of witching sticks usually lost their way completely, when seeking a hidden pipe in a barn (Figs. 1 to 4 in [3]). Attempts to discredit that graphically obvious conclusion [1, 2] demonstrate that statisticians who search through data, armed with various fancy tests rather than divining rods, can also lose their bearings.

[Footnote]

*: Commentary on "Dowsing Reviewed..." [1] and "The Dowsing Data..." [2]

THE ORIGINAL DATA, WITHOUT STATISTICAL COSMETICS Anyone who is interested in dowsing and the outcome of the Scheunen experiments should consult the graphs of the results from the six "best" dowsers that are presented as Figs. 1-4 in my review of the experimental data [3]. Those simple plots of the observed dowsers' choices relative to location of the hidden pipe stand on their own, independent of all statistical analysis. For example, it is an empirical fact, and not the outcome of testing some obscure null hypothesis, that 5 of the 6 "best" dowsers (including the famed #99) could have performed better on average if they had simply chosen the midpoint of the test line in each and every trial. It is difficult to imagine results that look more scattered than the summaries that are plotted in [3]-- and those data were chosen for illustration because they came from the "best" dowsers, based on criteria of the investigators themselves [4]. In my opinion, such graphs argue so strongly against a significant dowsing effect ("significant" in its broader meaning, rather than its statistical usage) that a contrary interpretation could only suggest a compulsion to cling tenaciously to a failed hypothesis: nothing less than the will to believe.

The long and the short of it is that dowsing performance in the Scheunen experiments was not reproducible. It was not reproducible inter-individually: from a pool of some 500 self-proclaimed dowsers, the researchers selected for their critical experiments 43 candidates whom they considered most promising on the basis of preliminary testing; but the investigators themselves ended up being impressed with only a few of the performances of only a small handful from that select group. And, even more troublesome for the hypothesis, dowsing performance was not reproducible intra-individually: those few dowsers, who on one occasion or another seemed to do relatively well, were in their other comparable test series usually no more successful than the rest of the "unskilled" dowsers. (See Figs. 2 and 3 in [3].) STATISTICAL ANALYSIS Perhaps the most important caveat about statistics that was emphasized in my review [3] is that proper application of the logic of inferential statistics demands that the test to be applied to a set of results be selected before any of the results have been examined. I expressed this concern about the original analysis [4] as follows: "The statistical procedure used in the Final Report for the Scheunen experiments is a special, unconventional and customized analysis, and the report does not indicate whether this choice of statistical procedures was made before any of the critical experiments was performed."[3] This issue is important because if the chosen statistical procedure was developed or selected with some of the data already in hand, the "probabilities" calculated and the "levels of significance" would lose all objective meaning. As I care fully emphasized, identical reservations apply to the more conventional statistical examination of the data that I myself undertook: with after-the-fact analyses, as were mine (and as are all the more recent analyses of Ertel [2]), no confidence whatever can be placed in probabilities derived from ordinary statistical testing; and I explicitly refrained [3] from drawing any inferential conclusions based on "statistical significance" from my own analyses.

The commentary by Betz, Konig, Kulzer, Tritschler and Wagner [1] provided an ideal opportunity for them to totally disarm such concerns, by asserting that the complex multinomial test used in the original analysis was fully developed and had been chosen for application to those data before any of the critical experiments was conducted, i.e., prior to 9 April 1987. The absence of such a statement in their commentary may represent no more than a simple (but important and alarming) oversight; but if such an assertion cannot honestly be made, then statistical analyses of the Scheunen experiments in the Final Report [4], all of which were based on that test, are no more than an "exploratory" application of statistics, in which probability values are the outcome of an empty formalism.

That distinction is by no means a subtle issue. Exploratory analysis can be compared with using today's newspaper to find a scheme that could, in principle, have "predicted" the outcome of yesterday's horse races or yesterday's lottery: not genuine prediction but "postdiction", with no risk whatever that the wager can be lost. The probabilities associated with proper statistical testing are, instead, applicable only to genuine prediction, made before the race has been run or the lottery numbers drawn.

The meaninglessness of "probabilities" calculated from inappropriate statistical "testing" applies with great emphasis to the reexamination of the Scheunen data described by Ertel [2], in which exploratory statistical analysis was pushed to an astonishing extreme. Some of the "hypotheses" that were derived from the data are so bizarre that no sensible dowsing enthusiast could have dreamed of suggesting them until after carefully sifting through the experimental data, and manipulating them in dozens of ways. This was followed by multiple testing, driven to the nth degree, so as to find the most "significant" way of reorganizing the observations. And yet that article [2] is fairly peppered with probability values and the terminology of inferential testing, leaving the impression that exploratory data analysis (i.e., "postdiction") deserves the same sort of credence and respect for purported "significance levels" that are warranted by the real predictions of legitimate statistical testing! Given enough freedom in the design of customized tests, the imaginative use of exploratory statistics guarantees that any data set can be made to seem non-random in several different ways. In the present context, such an outcome has nothing to do with the skill of the dowsers, as Betz et al. [1] and Ertel [2] contend; instead, it simply demonstrates the persistence and ingenuity of the statisticians. Perhaps the most remarkable aspect of this situation is that statisticians who are so adept at manipulating data seem to have lost their way at the outset, by failing to recognize the fork in the road between exploratory data analysis and genuine statistical testing.

As can easily happen in such after-the-fact analyses, Ertel [2] and the original researchers [4] used the same data to reach qualitatively different sorts of interpretations, based on different criteria-- although they seem not to have noticed that discrepancy [1, 2]. Betz and colleagues [4] concluded that nearly all of their candidates showed no dowsing skill whatever; they decided, instead, that only a very few of the dowsers had (occasionally) shown noteworthy skill -- where skill is (quite reasonably) demonstrated by proximity between hidden pipe and dowser's choice. Ertel [2] ignored those interpretations and instead charged into the data with the assumption that if dowsing skill exists, it would be equitably apportioned among all dowsers at all times-- where skill can also be demonstrated by a "wrong" choice that is located at a mirror image of where the sought-after pipe was placed, reflected around a point that was 34 dm (Yes, 34 dm!) from one end of the 10-m test line. The open-minded, thoughtful reader who takes a careful look at the simple plots of the data presented in my review [3] will, I think, reach a still different conclusion: the recognition that the experiments led only to extremely scattered and unreproducible results, with no persuasive evidence at all for the existence of practical dowsing skill-- no matter how persistent and ingenious the statistician may be who chooses to dissect, manipulate and massage the data. CAN REASON PREVAIL? I stand by my conviction: "...the Scheunen experiments are not only the most extensive and careful scientific study of the dowsing problem ever attempted, but-- if reason prevails-- they probably also represent the last major study of this sort that will ever be undertaken." (p. 369 in [3]) To my regret, however, I am not, at this time, completely convinced that reason will prevail.

1. Betz, H.-D., Konig, H. L., Kulzer, R., Tritschler, J., Wagner, H.:

Naturwissenschaften, [Editor: please insert vol., pages & date.]

2. Ertel, S.: Naturwissenschaften, [Editor: please insert vol., pages &

date.]

3. Enright, J. T.: Naturwissenschaften 82, 360-369 (1995)

4. Wagner, H., Betz, H.-D, Konig, H. L.: SchluBbericht 01 KB 8602. BMFT

1990.

Water Dowsing: The "Scheunen" Experiments

J. T. Enright

Neurobiology Unit 0202

Scripps Institution of Oceanography

La Jolla, CA 92093, USA

Abstract

The most extensive and carefully controlled experimental investigation ever undertaken of whether dowsers can, as they claim, detect hidden water at a distance, was conducted in a barn (German: "Scheune") near Munich in 1987 and 1988 by physicists at nearby Universities, a project funded by the German government for 400,000 Marks (about a quarter of a million dollars, at present exchange rates). More than 500 dowsers participated, in more than 10,000 individual, double-blind tests, and the researchers claimed to have demonstrated in these Scheunen experiments that a real dowsing phenomenon "can be regarded as empirically proven." Several years later, Enright undertook a thorough re-examination of the data on which that claim was based, and concluded that the entire research outcome can reasonably be attributed to chance.

Introduction

Public attitudes in North America toward water dowsing (also known as "water witching", or "divining") involve a remarkable ambivalence. As is true of the rest of the world, a majority of the water wells that are drilled there involve consultation with a dowser, and many such "experts" are available: in 1967, for example, the number of practicing dowsers in the USA was estimated to be about 25,000 (1). Nevertheless, most of the educated American public, serious scientists as well as non-scientists, would, I think, characterize water dowsing as a holdover from medieval superstitions: nothing more than unreliable folklore. The usual justification for such skepticism is that no plausible physical explanation has ever been offered for the stimuli to which a dowser, with his "divining rod", might be responding. When considered objectively, however, a rejection of dowsing simply because physics and physiology cannot provide an adequate mechanism to account for the phenomenon can be interpreted as scientific arrogance. An open-minded counter argument is that the tradition and folklore of dowsing are not based on its theoretical underpinnings but on the claimed successes of its practitioners; and if the method "works", but current science cannot explain it, so much the worse for science!

But are water dowsers truly more successful than can be accounted for by pure chance? Anecdotal reports of positive results can of course be found in abundance, but on the other hand, a survey by the U.S. Department of the Interior of more than 500 publications on dowsing led to the following assessment (2): "It is doubtful whether so much investigation and discussion have been bestowed on any other subject with such absolute lack of positive results. It is difficult to see how for practical purposes the entire matter could be more thoroughly discredited, and it should be obvious to everyone that further tests by the United States Geological Survey on this so-called "witching" for water, oil or other minerals would be a misuse of public funds."

In Germany, where water dowsing apparently originated sometime in the sixteenth century, a divining rod-- the tool of the trade-- is called a "Wunschelrute" (literally a "wishing rod"), a term which seems to imply a similarly skeptical attitude. Nevertheless, a considerably more tolerant opinion can often be encountered in Germany than in America among both professional scientists and educated non-scientists: not that dowsing is widely and confidently accepted as a proven phenomenon, but it seems to be tolerated as an expression of the attitude, "There are more things in heaven and earth..." than modern science can fully account for; perhaps dowsers actually ARE able to respond to some stimulus not yet understood by modern science. And in Germany, there is widespread interest among laymen in what dowsers refer to as "Erdstrahlen" --some sort of radiant energy purportedly associated with underground water (and other minerals as well), to which dowsers are thought to respond. Although the exact physical nature of these rays remains unspecified, their presumed existence is regarded by some as a major gap in current scientific knowledge. A Scientific Study of Dowsing

In view of that widespread cultural acceptance (or at least toler ance) of water dowsing in Germany, it should not be surprising that in 1986 the BMFT (Bundesministerium fur Forschung und Technologie) provided DM 400,000 to several established scientists for an experimental investigation of the phenomenon. The funding was one element of much larger research program, entitled "Unconventional methods of cancer control", a connection based on the claim by some Radiesthetists (as European dowsers sometimes call themselves) that their services can help protect homeowners from the adverse effects of Erdstrahlen on health. Two of the principal investigators on that grant, entitled "Earthrays and Dowsers", were physicists, one from the University of Munich and one from the Technical University of Munich, and the third was a professor of pharmaceutical biology at the University of Munich. The objective of their study was to determine whether careful scientific research on water dowsing could lead to convincing evidence for or against the existence of real, reproducible effects: an experimental examination of whether dowsers really can somehow detect water with a success rate greater than can be accounted for by chance alone. It was not the intent of the research, even in the event of a favorable conclusion, to provide a theoretical explanation for such effects, and this seems to be an eminently reasonable approach. There is no point in seeking a physical explanation for Earthrays and their possible connection with cancer until persuasive experimental evidence is available to demonstrate that dowsers really can do as they claim, since it is those claims that gave rise to the hypothesis of Earthrays in the first place.

This research project was no doubt the largest carefully controlled scientific study of dowsing ever conducted. Some 500 candidates who claimed skill as dowsers were investigated in some 10,000 individual tests. Much of that effort was devoted to preliminary experiments, which were intended to develop the most sensitive methods available and to select the best candidates for the final stage of the project: "critical" tests with rigorous methods and extremely careful experimental precautions. The most interesting and enlightening of those final experiments, which are to be examined in more detail here, were referred to as the "Scheunen" [barn] Experiments; as the name suggests, these tests were conducted in a two-story building that had previously served as a barn. In these tests, the dowsers were required to select a location, on the second floor of the building, that they thought was directly above a water pipe located on the ground floor. That pipe could be moved back and forth across the floor on the ground level, to a location that was chosen randomly for each test. (More experimental details are given below.)

The Final Report on this research project was submitted to the granting agency (BMFT) in 1990, with a title that almost seems intended to
conceal rather than disclose its content (3): "Setting up and operation of test arrays with artificial variable low-energy fields for the study of the response in biological macrosystems." By use of italics, the summary of the report emphasized the two primary conclusions, one negative, the other positive:

I: "The success rate of average dowsers in the tests conducted was poor and in most cases indistinguishable (or nearly so) from chance;"

II: "Some few dowsers, in particular tasks, showed an extraordinarily high rate of success, which can scarcely, if at all, be explained as due to chance." (3, p. 5)

Elsewhere in the abstract, the second, positive conclusion was elaborated: "...in every sort of test conducted, there were some few people who showed location-dependent responses, some with good and some with extraordinarily good reproducibility, which, in their departure from chance expectations,
were highly significant."
As a final re-phrasing of that second, positive and widely publicized conclusion, the summary says: "...a real core of dowser-phenomena can be regarded as empirically proven [praktisch nachgewiesen]..." (3, p. 5)

As these quotations demonstrate, the investigators had become thoroughly convinced by their results that dowsing is reproducible phenomenon; that something truly extraordinary was involved in the performances of at least some few of the water dowsers investigated. A conclusion of this sort has a significance that should not be ignored by the broader scientific community; there are far-reaching implications, not just for dowsing itself, but for both physics and physiology. (What form of energy might arise from flowing water that could serve as an adequate stimulus for precise determination, from a distance, of the direction to its source? What sort of sensory system might be able to respond to such stimuli, which would presumably be exceedingly weak? How can a high-tech instrument be built to amplify those stimuli?). Fortunately the Final Report also includes extensive appendices in which the experimental observations are summarized, and this permits others to decide for themselves whether such revolutionary conclusions are fully warranted by the data.

The Scheunen Experiments

The overall research program under the "Dowsers and Earthrays" grant involved several different approaches. In one of these, the ability of dowsers to detect artificial magnetic fields was investigated, with completely negative results-- but water dowsers do not necessarily insist that magnetism is involved in their skills. And another set of tests ("Laufbrett" experiments) evaluated whether there was any agreement among dowsers in their tendency independently to choose identical locations along pre-selected outdoor test paths, presumably as a response to some unknown local stimulus, perhaps related to Earthrays. Evidence for non-random agreement among dowsers was reported, but that sort of testing is very difficult to interpret, since no correlation with underground water supplies was investigated, and the actual location and nature of the relevant stimuli were completely unknown to the experimenters. This experiment is somewhat comparable with asking a group people each to choose ten numbers between 1 and 100. If one then finds that their selections show non-random agreement with each other, no sensible person would insist that mysterious stimuli originating from the preferred numbers themselves ("Zahlenstrahlen"?) were involved. The third component of the research, however, the so-called "Scheunen" experiments, seems to get to the heart of the matter; those tests were designed to determine whether experienced dowsers can do exactly what they claim: localize the presence of water from a distance, in the absence of ordinary clues. Flowing water was actually present at a nearby hidden location, known only to the experimenters, and that portion of the research program seems to have been well designed for its purpose.

A ten-meter-long stimulus line was established on the ground floor of a two-story building, and a short water pipe, oriented perpendicular to that line and connected to hoses, was fastened to a wagon that could be moved back and forth along the stimulus line, so that the pipe could be repositioned for each test, to a location determined by the random generator of a computer. In most tests, water was pumped through that pipe at rate of up to 40 liters per minute, and a geologist might well ask why flowing water was used, rather than a tub of standing water. The answer lies in the tradition of dowsing; the remarkable assumption is that underground water is usually to be found in "water arteries" ["Wasser-Ader"]-- rather than as extensive pools in porous sediment, as most geologists today believe; and within the dowsing tradition, the stimuli to which a dowser and his divining rod respond are often thought to be intimately related to the movement of the water through the Wasser-Ader.

(The widespread existence of constricted, flowing underground streams of water is, of course, central to the dowsing tradition, presumably making it essential to excavate in almost exactly the right place; if, instead, one assumes that water may well be present in extensive, distributed deposits, the services of a skillful dowser would become irrelevant.)

On the second floor of the barn, another ten-meter-long line was established, directly above the ground-floor stimulus line, with measurement markers corresponding to those below; and it was the task of the dowser, using whatever tools of the trade he preferred, to indicate in each trial exactly where along the upstairs test line he "felt" the pipe to be. The standard protocol typically involved a series of 10 tests (sometimes fewer), which were usually completed within about an hour, so the dowser had roughly 5 minutes available to wander back and forth along the test line before making each decision. Between tests, while the pipe on the ground floor was being moved, the dowser was taken to an adjacent upstairs room or outside the building, so as to minimize any opportunity of hearing noises that might offer a hint about where the new pipe location was; and, as a further precaution, a constant time interval between tests was allowed for moving the pipe, so that no conclusions could be drawn from that interval about how far the pipe had been moved. Another extremely important element in all the reported tests is that they were conducted "double blind", meaning that neither the dowser nor the two experimenters, who were in the same room to supervise his activities, knew the actual location of pipe below. During those "critical" test series, the dowsers received no indication about their success or failure; they were, however,

permitted to terminate a session if they became tired or felt they had lost their ability to concentrate, and the published data make it appear that this happened relatively often. Some dowsers often noted more than one location along the line where they "felt" something, and in such cases they were encouraged to select only one of the locations as the main choice, with the other choice(s) being recorded but ignored in the final data analyses. In a small fraction of the tests, no preference between two locations could be made.

Several superficial aspects of the test situation were varied from one dowser to the next; and this was done intentionally so that each person would be given an optimal opportunity to demonstrate his capabilities. In preliminary experiments, as well as in the "training" tests (in which feedback about success or failure was provided), potential variables included the material of which the pipe was constructed, the fluid in the pipe and pumping system (fresh water, salt water or air), the rate of fluid flow, the extent of turbulence in the fluid flow and, within limits of the available space, the exact location of the test and stimulus lines. Dowsers who were accepted into the final, critical-test program were those who had shown some indication of success in the preliminary tests; and for each accepted candidate, those variables from the preliminary tests (type of pipe, type of fluid, etc.) that had led to the best results were used for subsequent critical tests with that individual.

There can be no doubt that the experimenters took generous precautions to avoid the objection that the research was biased against successful dowsing. "Less skillful" dowsers were eliminated at the outset: after preliminary testing of some 500 available candidates, only 50 dowsers were selected to participate in the final critical experiments. Thus, if the preliminary tests gave any basis for judgment, one should expect the participants to be the most skillful 10% of the candidates. The preliminary experiments also assured that each dowser was familiar with the experimental setup, and would be permitted to deal with the stimulus that had previously proven optimal for him. Having been through preliminary testing, the dowsers' voluntary participation in the final, critical tests can be taken to indicate that they considered the experiments a fair test of their abilities, and were convinced that they could be successful.

On the other hand, the experimental design incorporated a variety of features that are important from the viewpoint of a skeptic. The computer-generated random locations assured that the dowsers could not successfully attempt to outguess the experimenters; the double-blind design avoided the possibility that the dowsers would get any unintentional hints about where the pipe was located from subtle behavior of the experimenters; and precautions were taken to prevent accidental transfer of information during displacements of the water pipe. There is, of course, residual and legitimate concern about whether some of the dowsers might have been clever enough to defeat the experimental design; in this kind of project, the temptation to defraud cannot be overlooked. Particularly when remarkable performances were purportedly obtained from only a few individuals, such concern grows, because even a few cheaters would be enough to produce a few surprising results. And when confronted by a clever, strongly motivated candidate, it might be extremely difficult for physical scientists to anticipate all the subtle ways in which they might be deceived. As a sensible precaution against this possibility, the investigators invited a professional magician-- one of those experts of deceit-- to examine the experimental setup for possible weaknesses.

Nevertheless, certain subtle variables were incompletely controlled during the experimentation, the most obvious of which is the sound associated with the circulating water below. In at least some cases, the water flow was intentionally made turbulent, so if one had searched with a very sensitive parabolic microphone along the upstairs test line, while monitored a broad frequency band of sound, it seems likely that subtle local differences would have been detectable in some parts of the sound spectrum, which could indicate where the pipe was located. As shown below, however, there is very little reason in the experimental results to suspect that any of the dowsers consistently benefited from this sort of noise. Overall, the experimental design indicates that the investigators did their best to give the dowsers a fair chance, but also that they took many reasonable precautions of the sort appropriate for rigorous testing of the purported extraordinary abilities.

The experimental results from the Scheunen experiments were tabulated in an extensive appendix to the Final Report (3, pp. 89-101). The outcomes of more than 800 critical tests, all conducted in the barn, are presented, including data from 43 of the 50 pre-selected dowsers, in a total of 104 test series. For each test series, the participant is identified by number, and the actual locations of the pipe (typically ten values) are given together with the corresponding primary (and sometimes secondary) locations chosen by the dowser. Location data are given to the nearest decimeter, with 100 possible values along each of the two 10-meter lines. Even a first glance at those numbers provides clear support for the first of the general conclusions cited above: that overall, the dowsers did very poorly in matching their choices to the locations of the water pipe.

Some of the observations are presented graphically in Fig. 1, and as can be seen there, the choices were extremely scattered; and the results in Fig. 1 were chosen for plotting here because they represent all the data from those six dowsers who can be judged (see below) to be the "best" of the 43. What can one conclude from such a plot? One can easily notice that a majority of the locations chosen are on the upper half of the graph---more than 61%-- and such a strong bias is very unlikely to have arisen due to chance, but it is irrelevant to the issue of dowser accuracy.

(It is, however, the kind of strong non-randomness that could contribute to impressive results in tests like those of the Laufbrett experiments.) In addition, with a bit of squinting at the graph, one can also notice a weak tendency for the plotted points to aggregate along a 45o diagonal, from lower left to upper right, and this kind of trend indicates a more or less correct match between pipe and chosen location. (This is not surprising, of course, since the data plotted here were selected to include the very best test series.) On the other hand, one might also be able to persuade a willing witness that quite a few points are also distributed along the opposite, downwardly directed diagonal. When data are this scattered relative to expectations, it is not easy to decide, simply from looking at a graph, whether the responses provide persuasive support for the claim that a "real core" of dowsing phenomena has been proven. That kind of question can, instead, be more rigorously examined by statistical analysis.

Conventional Statistical Analysis

The purpose of most statistical analysis is to distinguish objectively between properties of a data set that could well have arisen due to random chance alone, and those that probably represent "real" phenomena. By definition, "real" or "statistically significant" effects are those that are expected to be reproducible, if the experiment were to be carefully replicated by someone else in some other place. It is important to realize, at the outset, that there are dozens of different ways that a body of data like those from the Scheunen experiments might be analyzed statistically, any one of which would be fundamentally correct and appropriate; and in general, those different ways of examining the data can be expected to give somewhat different answers. It would be an abuse of inferential statistical analysis, however, to search through the armory of statistical tests available until one finds a test that, when applied to the available data, gives the answer one is hoping for; instead, rigorous inferential analysis requires that the test to be relied upon has been chosen in advance, before any of the data have been examined. That demand arises because, with a sufficiently extensive search among analytical techniques, almost any set of unrelated numbers will lead to a purportedly "statistically significant" outcome that is spurious. [In "exploratory" data analysis (4, 5, 6), on the other hand, it is entirely appropriate to examine a data set in as many different ways as one chooses. Then, however, levels of "statistical significance" lose all objective meaning.]

The statistical procedure used in the Final Report for the Scheunen experiments is a special, unconventional and customized analysis* [See footnote.], and the report does not indicate whether this choice of statistical procedures was made before any of the critical experiments was performed. Because of this concern, I have undertaken a variety of other more ordinary analyses, to see whether the conclusions of the researchers will withstand other sorts of scrutiny.

Among the alternatives that I have considered are calculation of correlation coefficients, as well as the fitting of regression lines, for data like those in Figure 1; analyses based on the binomial distribution (with two alternative criteria for "success": a choice within + one meter of pipe location, or a choice within + one-half meter); chi-square procedures; and Kolmogorov-Smirnov tests. When such analyses were undertaken, based on all the available data, the results were uniformly discouraging for the dowsing hypothesis. For example, the product-moment correlation between observed and expected, even for the "best-dowser" data in Figure 1, is only about 0.106-- a value that cannot be confidently distinguished (5% level) from zero. This means, then, that by one of the standard tests, there is no convincing indication that the pipe location had any relationship with the dowsers' choices.

Such a calculation cannot, however, be taken as a decisive answer for two reasons: the data in Figure 1 have a greater-than-chance likelihood of being judged "significant" simply because they are not a random sample but a selected subset (the "best" dowsers, as determined by another sort of test on the data); and, more important, my decision to calculate correlation coefficients was made after examining the data, thereby violating the critical rule mentioned above and potentially biasing the outcome by the choice of method. The latter of these objections could be raised about any sort of retrospective re-analysis of the results. The

Footnote for p. 13

*: In the Final Report (3), the statistical significance of a given test series by a given dowser was calculated as follows: with a set of n tests (5 # n # 10), resulting in n values for the distance, D, between actual pipe location and dowser's chosen location, calculate for each test a score, s, based on the following criteria and categories:

if |D|< 0.2143 m, s = 0.7875;
if 0.2143 < |D| < 0.6429, s = 0.558;
if 0.6429 < |D| < 1.0715, s = 0.1854;
if 1.0715 < |D| < 1.5001, s = 0.0295;
if |D| > 1.5001, s = 0.

Sum the n values of s, to give a test-series score, S; then determine the percentile ranking of this total score in a cumulative distribution of random expectations based on the multinomial expansion, (w + x + y + z)n, where w, x, y and z are the probabilities of each single value of s, determined as the ratio of its width (0.4286 m or 0.8572 m) to the total length of the test line. Whenever the pipe location was less than 1.5 m from the end of the test line, correction for "end effects" must be incorporated in calculating w, x, y and z.

most that can be concluded from my many analyses is that because several ordinary statistical procedures failed to detect unusual consistencies in the dowsers' performances, the interpretations in the Final Report seem to have been critically dependent on the selection of a customized, non-standard method of statistical analysis; the investigators would have been led to an opposite, negative interpretation if they had instead selected any of several more usual methods of data analysis. Hence, all claims about statistical significance of the results are absolutely dependent on the assumption that the choice of statistical method was made before any data had been obtained from the critical experiments (i.e., before 9 April 1987); a later choice of method would represent "exploratory" data analysis, where probability levels lose objective meaning. Statistical Analyses in the Final Report

The Final Report nowhere specifies which of the 43 dowsers should be regarded as the "einige wenige Personnen" who were truly skilled at locating water and who constitute the "real core" of the dowsing phenomenon, but a reasonable criterion is available by which to identify them. Each test series was analyzed separately (See footnote to p. 13), and Table 6 of the Final Report summarizes those 104 calculations for the standard barn experiments with a derived "probability" for each test series, along with identity of the dowser. In that Table, three of the test series, each from a different dowser, were assigned probabilities of less than 0.01; and another 4 test series, from 3 other dowsers, were assigned probabilities between 0.01 and 0.03. For purposes of further examining the results, it seems reasonable to assume that the "best" dowsers are the 3 who achieved the 3 most significant test series (Dowsers #18, #99 and #108); and that the 4 next-best results (with assigned probabilities between 0.01 and 0.03) came from the 3 "second-best" dowsers

(#23, #89 and #110).

The relationship between pipe locations and the dowsers' choices from the 3 "best" test series are illustrated in Fig. 2A. It is evident there that although there were many conspicuous errors, some 14 of the 26 positions chosen agreed with the location of the pipe to within 1 meter, and 11 of the 26 agreed to within 0.5 meter. On the basis of this kind of agreement, and the statistical analysis of the Final Report, the three people who participated in those series can reasonably be regarded as the creme de la creme of dowsers, the three very best of a highly select group: part of the "realer Kern" of the dowsing phenomenon referred to in the Final Report.

The results shown in Fig. 2A do indeed look impressively favorable for the capabilities of these particular dowsers; and in the Final Report, the only graphical presentation of results from the Scheunen experiments (3, Fig. 18) is a plot of those data in Fig. 2A that are shown by filled circles. The customized test utilized in the Final Report classifies the results from each of these three test series as having a probability of less than 0.01 of being due to chance alone. Despite those calculations (and the general impression of non-randomness evident in the plot of Fig. 2A), a skeptic might note that the correlation coefficient between observed and expected, even for the highly selected data in Fig. 2A, is only 0.321, which is not "statistically significant" at even the 10% level. An advocate of the dowsing hypothesis might counter, however, that the correlation coefficient penalizes too heavily for occasional gross errors, thereby ignoring the fact that many of the choices in these tests were remarkably close to the actual pipe location; that, for example, in two of those test series, the dowser chose a location within 50 cm of the pipe's location 4 times out of 10. The skeptic might then reply that according to the binomial expansion (defining "success" as a choice within +50 cm of correct), it is not at all surprising (p>0.50) to find two such cases (4 successes in 10 attempts) among the 104 test series reported. Such differences of opinion illustrate the ambiguity that arises if one sets aside the requirement, for rigorous statistical inference, of pre-selecting the (single) test to be relied upon.

In any case, a major problem arises if one wants to interpret the results from these three test series in terms of the special skills of these particular dowsers: the fact is that these same three individuals participated in several other experimental series at other times, and their performances in those other tests were by no means as impressive as those in Fig. 2A. The rest of their results (same dowsers as in Fig. 2A) are shown in a composite plot in Fig. 2B; and the results from all tests in which each of those three dowsers participated are summarized in Fig. 2, parts C, D and E. It is difficult to avoid the impression from these graphs that overall, there was little if any relationship between the locations chosen and the location of the pipe. (The correlation coefficient for Dowser #99 is +0.06, and those for Dowsers #18 and #110 are slightly negative. None of those values would be considered "statistically significant": p>0.50, but of course, such a statements about "probability" should be discounted, being based on a posteriori testing of the data.) In any case, it seems quite clear that a dowser who did unusually well on one occasion was not particularly likely to do well in another comparable test series, on another occasion. The scatter in these results demonstrates that it was not three particular DOWSERS ("some few people") who consistently did well in locating the pipe, but instead that within the array of 104 TEST SERIES available, one can find three in which many of the choices were relatively close to the pipe location. Reproducibility by a given individual seems to be acutely lacking.

What about the overall success rate of the next-best group of dowsers: the three individuals who in single test series achieved results that were assigned "probabilities" between 0.01 and 0.03 (dowsers #23, #89 and #110)? The results from their best performances (two test series by dowser #23) are illustrated in Figure 3A, and again (as in the case of Fig. 2A), these "best" results, when viewed by themselves, look impressive. When such results are placed in the context of what those same individuals did in other test series, however, the evidence in favor of significant abilities blends into in a cloud of scattered and seemingly random choices (composite results in all their other tests shown in Fig. 3B, and overall data from each of these three dowsers in Fig. 3, parts C, D and E). From these graphs, it appears that the three "second-best" dowsers were neither appreciably better-- nor appreciably worse-- than the three best.

There is another interesting way of considering the overall performances of these six "outstanding" dowsers, which is shown in Figure 4, where their errors (distances between observed choice and pipe location) are compared with what might have been achieved, if, on every test, the dowsers had simply indicated that they thought the water pipe was located exactly in the middle of the 10-m test line. As that presentation indicates, choosing the midpoint would have been a relatively successful strategy. For 5 of the 6 dowsers, the average distance between the water pipe and the middle of the test line was less than the average error actually made when dowsing, by amounts ranging from 43 cm to 110 cm. The only exception to this trend is dowser #89; his choices averaged slightly closer to the pipe (by 4 mm) than if he had consistently chosen the middle of the test line. The potential advantages that would have been provided by this simple alternative strategy suggest that concentrated searching by the dowsers with their divining rods was a waste of both time and effort. Discussion

The experiments described here represent the most extensive and carefully conducted study ever undertaken to investigate the capability of dowsers to detect water at nearby, established locations. If water dowsers--- even some small fraction of them-- have the ability that is claimed by so many, this study should have had a very good chance to demonstrate so. The pipe to be detected was only 3 or 4 meters from the dowser, rather than tens to hundreds of meters below the ground, so the task here seems to involve a simpler assignment than the kinds of field problems that water dowsers regularly confront. Those who were actually tested in the final, critical experiments were pre-selected from a much larger pool of candidates on the basis of what was judged to be good performance on preliminary trials, so these should have been the best of available experts. Furthermore, each dowser was permitted modest variations in testing conditions (e.g. velocity of flow, and nature of the fluid in the pipe), which conformed with those situations in which he had done well in the preliminary trials-- again a measure that appears favorable to a successful outcome of the testing. One of the common objections to scientific tests of unusual sensory capacities or extraordinary phenomena based on paranormal abilities (ESP) is that the presence of a hostile audience can make it difficult for specially gifted people to perform at their best; but the entire published description of this experimental study, as well as the conclusions to which the investigators themselves came, indicate that this project was conducted in an atmosphere that was anything but hostile to the claims being tested.

This study also had many features that should please the skeptic. A variety of proper precautions were taken to assure objectivity of the testing procedure: mechanically randomized locations of the target; "double-blind" arrangements, so as to avoid subtle, unintended signals between experimenter and dowser; no feedback to the dowser about the quality of his performance during testing; isolation of the dowser between tests, to eliminate several possible sources of unintended transfer of information about the target location. All these measures represent sound experimental design, and involve the kinds of precaution that a thoughtful skeptic should expect to see-- or at least hope for-- when someone is testing a controversial hypothesis that challenges established scientific principles.

Had the outcome of such a large, well planned study been unequivocally positive, had it demonstrated strong and reproducible skills in most of the dowsers, that outcome should be expected to serve as a springboard for intensive follow-up research by physicists as well as physiologists, to explore what mechanisms might be responsible for those capabilities. Had that kind of data been obtained, then someone who remained a skeptic would be almost forced to invoke undetected cheating by the dowsers to support his position. That sort of success was not, however, achieved; as the Final Report recognizes, the vast majority of the participants in the critical experiments (who had been pre-selected as the best 10% of a much larger group of people, all of whom thought that they had the ability to detect and localize hidden water supplies) did very poorly; most of their performances in the critical tests could not be distinguished from the results of random chance.

Nevertheless, the researchers who conducted the study were persuaded by their data and their analyses that they had uncovered a small but "real core" of the water-dowsing phenomenon-- that some few individuals showed an extraordinarily high success rate. The re-examination of those data described here indicates that this conclusion rests on very flimsy grounds indeed.

In the Final Report, attention was focussed on a small number of unusually good performances (Figs. 2A and 3A). If one relies only on probabilities from the unusual, customized statistical test of the Final Report, then it is relatively unlikely that a small subset of the results could have arisen due to chance. Other, more standard ways of examining the overall data, however, support the opposite interpretation: that obtaining a few such results among the 104 test series is not at all surprising. Thus, the interpretation of those few exceptional test series ("unlikely" or "very likely" to be due to chance) depends more on choice of statistical procedures (and when that choice was made!) than on the data themselves; and there is no objective way of deciding, after the fact, which interpretation is more credible.

If dowsing is a "real" phenomenon, however, the most important, central expectation is that in some way, success must be reproducible-- and the overall results certainly do not meet that expectation. As shown in Figs. 2 and 3, even the most "successful" test series were obtained by people who could not themselves replicate that sort of performance. Overall, those same "experts" did very poorly, with general success rates no better than those of the rest of the dowsers, no better than one might expect due to chance alone. Perhaps the most interesting conclusion that can be drawn from the entire analysis here is that even the dowsers, who, on single occasions, managed to perform at considerably better than chance levels, could on average have done better overall than they in fact did, if they had simply chosen the midpoint of the test line in each and every test.

It is my impression that the Scheunen experiments, which were conducted by a research group fully sympathetic to the cause of dowsing, and have been interpreted by them as indicating that successful dowsing is indeed a real phenomenon, in fact have devastating implications for the art and profession of water dowsing. Even the most favorable interpretation of the experiments is that if results appreciably better than chance levels are to be obtained, one must engage a very select dowser (one with skills better than 99% of his competitors) on one of his very good days (and he cannot tell you whether it is a good day or not); and that even then, that super-expert, on his best day, is apt to be badly wrong about half the time at locating a water source that is only 3 or 4 meters away. Would you yourself be willing to pay someone for his advice, if he attempted to demonstrate his competence by showing a graph resembling one of those illustrated in parts C, D and E of Figures 2 and 3? The results shown in Figure 4 could be interpreted as suggesting that in the absence of other information, it would be a better strategy, for someone who is planning to drill a well, to sink his hole right in the middle of the available region, rather than to rely on advice from a dowser, no matter how successful the dowser may claim to have been on certain past occasions. This interpretation depends, of course, on the assumption that there is a single best place to drill, and that the closer one is to that location, the better the expected outcome.

Conclusion

Because the Scheunen experiments involved such a large-scale test program, which incorporated both generous allowances favoring the dowsing hypothesis and a careful, rigorous experimental program, a definitive answer is finally available to the central, age-old questions about water dowsing. Briefly stated, the conclusion is that even with very extensive testing, by researchers sympathetic to the cause, no persuasive evidence could be found for reproducibility of the "dowsing phenomenon", neither inter-individual reproducibility nor intra-individual reproducibility. Instead, it has now been demonstrated that:

I: IF the ability to locate water from a distance by extraordinary stimuli exists, that skill cannot be reproducibly demonstrated across a select group of 43 experts from among 500 dowser who all THINK that they have the ability (a conclusion consistent with the summary in the Final Report); II: In those few cases in which a single series of tests suggests that a given dowser may perhaps have better-than-chance abilities, similarly good results are not reproducible by that same individual in other comparable test sessions (a conclusion which contradicts the summary of the Final Report). Properly considered, then, these are answers as definitive as experimentation could ever provide. Reproducibility lies at the core of successful experimental science; and if a phenomenon is not reproducible, even for select individuals, what possible gain could come from further, similar experiments, no matter how extensive the program? Thus, the Scheunen experiment are not only the most extensive and careful scientific study of the dowsing problem ever attempted, but -- if reason prevails-- they probably also represent the last major study of this sort that will ever be undertaken.

This does not, of course, constitute a rigorous refutation of the dowsing hypothesis. A universal negative can never be proven by observation, and it remains conceivable that individuals exist who can indeed reproducibly detect water from a distance by extraordinary means. But if so, one must assume that they are so rare that none turned up in the sample of 500 candidates, all of whom THOUGHT that they had the required ability. Hence, if unusually talented individuals exist, distinguishing them from among the unskilled appears to be a hopeless task. This leads to a valuable insight: whether one prefers the interpretation that truly skilled dowsers exist, who are so rare that none was found in the Scheunen experiments, or whether one instead prefers the interpretation that the ability claimed by dowsers does not exist, is no longer a question of evidence; the choice is simply a matter of belief, of taste in hypotheses that are indistinguishable in practice.

We recommend that people get up to date scientific information from: Naturwissenschaften Magazine by the good people at Springer-Verlag
Skeptics Dictionary discussion of dowsing.
Another article by Jim is found at: http://www.csicop.org/si/9901/dowsing.html Also, as a note, Tom Napier ran a statistical study finding one synthesized "subject" out of 300 which exceeded Betz's best subject.

EXTRAORDINARY CLAIMS REQUIRE EXTRAORDINARY EVIDENCE!

Scientific American looks at dowsing
Sign up to be on the dowsing email list! Click to subscribe to dowse

send mail to eric@phact.org
FastCounter by LinkExchange

DOWSING -looking at scientific evidence "

created 6-7-97, last updated 12/02 this page found as http://www.phact.org/e/dowsing.htm

Click to subscribe to the weekly dowsing news email list

PRO DOWSING LINKS:

INTRODUCTION TO DOWSING and a DOWSING FAQ PAGE

LINKS SKEPTICAL OF DOWSING :

Sign up to be on the slower moving dowsing email list! Click to subscribe to dowse

This site owned by Eric Krieg is a member of The Dowsing Ring Previous | Next5 | Next | Skip | Random To join The Dowsing Ring or for info.

EXTRAORDINARY CLAIMS REQUIRE EXTRAORDINARY EVIDENCE!

This site owned by Eric Krieg
is a member of The Dowsing Ring
Previous | Next5 | Next | Skip | Random
To join The Dowsing Ring or for info.