cultural cognition project - Cultural Cognition Blog - Likelihood Ratio ≠ 1 Journal (LR ≠1J)

Friday

Mar152013

Likelihood Ratio ≠ 1 Journal (LR ≠1J)

Friday, March 15, 2013 at 11:51AM

LR ≠1J should exist. But it doesn't.

Or at least I don't think LR ≠1J exists! If such a publication has evaded my notice, then my gratitude for having a deficit in my knowledge remedied will more than compensate me for the embarrassment of having the same exposed (happens all the time!). I will be sure to feature it in a follow-up post.

The basic idea (described more fully in the journal's "mission statement" below) is to promote identification of study designs that scholars who disagree about a proposition would agree would generate evidence relevant to their competing conjectures--regardless of what studies based on such designs actually find. Articles proposing designs of this sort would be selected for publication and only then be carried out, by the proposing researchers with funding from the journal, which would publish the results too.

Now I am aware of a set of real journals that have a similar motivation.

One is the Journal of Articles in Support of the Null Hypothesis, which as its title implies publishes papers reporting studies that fail to "reject" the null. Like JASNH, LR ≠1J would try to offset the "file drawer" bias and like bad consequences associated with the convention of publishing only findings that are "significant at p < 0.05."

But it would try to do more. By publishing studies that are deemed to have valid designs and that have not actually been performed yet, LR ≠1J would seek to change the odd, sad professional sensibility favoring studies that confirm researchers' hypotheses (giving a preference to studies that "reject the null" alternative is actually a confirmatory proof strategy--among other bad things). It would also try to neutralize the myriad potential psychological & other biases on the part of reviewers and readers that might impede publication of studies that furnish confirming or disconfirming evidence at odds with propositions that many scholars might have a stake in.

Some additional journals that likewise try (very sensibly) to promote recognition of studies that report unexpected, surprising, or controversial findings include Contradicting Results in Science; Journal of Serendipitous and Unexpected Results; and Journal of Negative Results in Biomedicine. These journals are very worthwhile, too, but still focus on results, not the identification of designs the validity of such would be recognized ex ante by reasonable people who disagree!

I am also aware of the idea to set up registries for designs for studies before they are carried out. See this program, e.g. A great idea, certainly. But it doesn't seem realistic, since there is little incentive for people to register, even less than that to report "nonfindings," and no mechanism that steers researchers toward selection of designs that disagreeing scholars would agree in advance will yield knowledge no matter what the resulting studies find.

But if there are additional journals besides theses that have objectives parallel to those of LR ≠1J, please tell me about those too (even if they are not identical to LR ≠1J).

I also want to be sure to add -- in case anyone else thinks this is a good idea -- that it occurred to me as a result of the work of, and conversations with, Jay Koehler, who I think was the first person to suggest to me that it would be useful to have a "methods sections only" review process, in which referrees reviewed papers based on the methods section without seeing the results. LR ≠1J is like that but says to authors, "Submit before you know the results too."

Actually, there are journals like this in physics. Papers in theoretical physics often describe why observations of a certain sort would answer or resolve a disputed problem well before there exists the requisite apparatus for making the measurements. My favorite example is Bell's inequalities-- which was readily understood (by those paying attention, anyway!) to describe the guts of an experiment that couldn't then be carried out but that would settle the issues about the possibility of an as-yet unidentified "hidden variables" alternative to quantum mechanics. A set of increasingly exacting tests started some 15 yrs later--with many, including Bell himself, open to the possibly (maybe even hoping for it!) that they would show Einstein was right to view quantum mechanics as "incomplete" due to its irreducibly probabilistic nature. He wasn't.

Wouldn't it be cool if psychology worked this way?

As you can see, LR ≠1J, as I envision it, would supply funding for studies with a likelihood ratio ≠ 1 on some proposition of general interest on which there is a meaningful division of professional opinion. So likely its coming into being -- assuming it doesn't already exist! -- would involve obtaining support from an enlightened benefactor. If such a benefactor could be found, though, I have to believe that there would be grateful, public-spirited scholars willing to reciprocate the benefactor's contribution to this collective good by donating the time & care it would take to edit it properly.

Likelihood Ratio ≠ 1 Journal (LR ≠1J)

The motivation for this journal is to overcome the contribution that a sad and strange collection of psychological dynamics makes to impeding the advancement of knowledge. These dynamics all involve the pressure (usually unconscious) to conform one’s assessment of the validity and evidentiary significance of a study to some stake one has in accepting or rejecting the conclusion.

(1) Confirmation bias is one of these dynamics, certainly (Koehler 1993).

(2) A sort of “exhilaration bias”—one that consists in the (understandable; admirable!) excitement that members of a scholarly enterprise generally experience at discovery of a surprising new result (Wilson 1993)—can distort perceptions of the validity and significance of a study as well.

(3) So can motivated reasoning when the study addresses politically charged topics (Lord, Ross & Lepper 1979).

(4) Self-serving biases could theoretically motivate some journal referees or scholars assessing studies published in peer-reviewed journals to form negative assessments of the validity or significance of studies that challenge positions associated with their own work. Note: We stress theoretically; there are no confirmed instances of such an occurrence. But less informed observers understandably worry about this possibility.

(5) Finally, in an anomalous contradiction of the strictures of valid causal inference (Popper 1959; Wason 1968), the practice of publishing only results that confirm study hypotheses denies researchers and others the opportunity to discount the probability of various plausible conjectures that have not been corroborated by studies that one reasonably would have expected to corroborate them if they were in fact true.

LR ≠1J will solicit submissions that describe proposed studies that (1) have not yet been carried out but that (2) scholars with opposing priors (ones that assign odds of greater than and less 1:1, respectively) on some proposition agree would generate a basis for revising their estimation of the probability that the proposition is true regardless of the result. Such proposals will be reviewed by referees who in fact have opposing priors on the proposition in question. Positive consideration will be given to proposals submitted by collaborating scholars who can demonstrate that they have opposing priors. The authors of selected submissions will thereafter be supplied the funding necessary to carry out the study in exchange for agreeing to publication of the results in LR ≠ 1J. (Papers describing the design and ones reporting the results will be published separately, and in sequence, to promote the success of LR≠1's sister journal, "Put Your Money Where Your Mouth Is, Mr./Ms. 'That's Obvious,' " which will conduct on-line predication markets for "experts" & others willing to bet on the outcome of pending LR≠1 studies.)

In cases where submissions are “rejected” because of the failure of reviewers with opposing priors to agree on the validity of the design, LR ≠ 1J will publish the proposed study design along with the referee reports. The rationale for doing so is to assure readers that reviewers’ own priors are not unconsciously biasing them toward anticipatory denial of the validity of designs that they fear (unconsciously, of course) might generate evidence that warrants treating the propositions to which they are pre-committed as less probably true than they or others would take them to be.

For comic relief, LR ≠1J will also run a feature that publishes reviews of articles submitted to other journals that LR≠1J referees agree suggest the potential operation of one of the influences identified above.

References

Koehler, J.J. The Influence of Prior Beliefs on Scientific Judgments of Evidence Quality. Org. Behavior & Human Decision Processes 56, 28-55 (1993).

Lord, C.G., Ross, L. & Lepper, M.R. Biased Assimilation and Attitude Polarization - Effects of Prior Theories on Subsequently Considered Evidence. Journal of Personality and Social Psychology 37, 2098-2109 (1979).

Popper, K.R. The logic of scientific discovery. (Basic Books, New York,; 1959).

Wason, P.C. Reasoning about a rule. Q. J. Exp. Psychol. 20, 273-281 (1968).

Wilson, T.D., DePaulo, B.M., Mook, D.G. & Klaaren, K.J. Scientists' Evaluations of Research. Psychol Sci 4, 322-325 (1993).

Update on Saturday, March 16, 2013 at 1:07PM by

Dan Kahan

Predictably, many thoughtful replies, including ones that have filled in my own picture of how scholars are addressing the dynamics that I addressed in this post. I discuss them here.

Email Article |

Permalink |

Print Article

14 Comments

View Printer Friendly Version

Email Article to Friend

Reader Comments (14)

Interesting idea. But a key reason behind the existing system is that most questions one can think of (will the sun rise tomorrow?) are not really worth putting much effort into, because the results are too unsurprising. So although such questions are investigated at a preliminary level all the time, the results get published (with the associated extra cost and work load, including for people other than the original investigator) only when they are surprising, and this is where the file drawer effect comes from.

So how would this system decide which questions are worth asking? The problem is how to encourage high risk, high reward studies without generating unnecessary extra work in cases where the reward is not forthcoming.

March 15, 2013 |

konrad

@Konrad: Editors, w/ assistance of reviewers, would need to exercise judgment about the value of the answer to the question. I suppose that there will be a risk of the sorts of biases I'm discussing coming in there too! But in general, there are many *disputed*, *open* propositions in the social sciences, so it shouldn't be difficult in general to find ones that people in theory woudl agree woudl benefit from being approached in the manner I'm discussing. Whether they would agree -- in advance -- about *which* studies would yield evidence relevant to adjusting the relative likelihood to assign to their competing conjectures is a notehr matter. Be interesting to find out.

Meanswhile, some people have directed my attention to "adversary collaboration" -- in which scholars with competing positions collaborate on a desing that would satisfy them. That is obviously in the spirit of what I'm describing. I will post something so others can get the benefit of this feedback -- & also explain how it is I see what I'm proposing as systematizing & magnifying this practice.

March 15, 2013 |

Dan Kahan

The basic idea (described more fully in the journal's "mission statement" below) is to promote identification of study designs that scholars who disagree about a proposition would agree would generate evidence relevant to their competing conjectures--regardless of what studies based on such designs actually find.

This would be great.

I have thought it would be fun to see a blog working from something of a similar concept - where blog-masters from opposing sides of an issue, who respect and trust each other, would agree to field comments and respond with commentary on whether or not the comments were biased by motivated reasoning.

Articles proposing designs of this sort would be selected for publication and only then be carried out, by the proposing researchers with fundnig from the journal, which would publish the results too.

Honestly, I'm not at all sure that I would be more interested in the outcome of the studies than in the process that would go into an agreed framework for research that would generate evidence relevant to their competing conjectures -- as you say, "regardless of what studies based on such designs actually find."

March 15, 2013 |

Joshua

Love the journal idea. Less sure about the title. I think you'd agree -- if not, let me know! -- that most really awful studies have a likelihood ratio ≠ 1 for the proposition they claim to test. A study that finds a correlation between two items does provide some evidence that one causes the other...the evidence just isn't strong enough that we'd publish it. Maybe call it the Best Practices Journal or the Methodological Review Review, or something. I'll start buying domain names.

March 16, 2013 |

@MW:

I think I do disagree.

Usually what makes a study awful is the lack of design geared to generating any particular inference relating to likelihood of competing hypotheses. I.e., LR = 1.

E.g., I have colleagues who collected lots of data on whether users of a social media product would be "surprised" to learn that the contract they executed when the consented to "terms and conditons" in fact contained particular terms that they didn't like. My colleagues then offered the "normative" argument that terms that a "majority" of users would view as "surprising" should be put in a conspicuous red warning box.

To me, this is a truly awful study. It has LR = 1 w/r/t whether existing practice for presenting "terms & conditions" or their alternative would improve efficiency, increase autonomy or advance any other normative goal.

The reason is that the researchers didn't offer up any theory -- much less a defensible one -- about what we should expect to observe under the hypothesis that the existing regime is efficient, autonomy respecting, etc. vs. what we should expect to see under the hypothesis that the existing regime is not any of those things (& *as a result of* some problem likely to be corrected by their proposed remedy).

It's perfectly reasonable for people not to concern themselves with more than they need to know about a transaction or acquire information that they wouldn't be able to use. If there are terms they'd be surprised by that would reduce their welfare, threaten their autonomy, etc., then there'd be an incentive for a competitior to draw their attention to those terms in the other company's product. So I expect that in an "efficient" & an "autonbomy maximizing" regime, there will be terms consumers would be "surprised" by.

At the same time, I know that people don't always make rational decisions, and that there can be market failures that inhibit them from obtaining as much information that it would be optimal for them to have. I expect that consumers will be surprised to discover terms in contracts that exist as a result either of bounded rationality or high transaction costs or like market failures.

So, I expect to observe that people will be "surprised" at some of the terms in their contracts whether the existing regime is normatively good or normatively bad. My colleagues excitedly report *finding* that some fraction of the sample (a majority; why is that threshold imortant? Why not 25% why not 100%?) are "surprised" by certain terms -- at p = 0.00000000001! So what? We know nothing we didn't know before b/c we didn't have any jutified expectations about what the data would look like if one hypothesis but not the other were true.

Classic LR = 1. Truly awful.

They also did a multivariate regression to identify what sort of characterisrtics predict agreeing to a contract that has "surprising" terms. They report the coeffficients & their significance levels (w/ lots of asterisks of course).

Again, this information is LR=1 relative to the study hypothesis b/c again they have failed to offer up any theory (much less a plausible one) about what sorts of characteristics *would* predict "surprise" if the surprises are normatively undesirable and which ones would if the surprises are normatively benign.

E.g., by far the biggest predictor (assuming it is valid to estimate the parameters by treating all manner of individual characteristic as "independent variables" as opposed to indicators of some latent profile for "likely to be informed/uninformed consumers"; I would have gotten dull stare if I'd raised this issue with them) was income. Moreover, the sign of the coefficient was positive, indicating that the wealthier the user was, the more likely he or she was to report being "surprised" by one or more terms.

What does that tell us? Should we think, "if rich consumers would be surprised, there's got to be a problem!" Or instead, "if rich consumers would be surprised, then surely that's the sort of thing a smart consumer wouldn't waste his or her time trying to figure out"?

Beats me. LR =1 b/c no matter what we might observe, we have no more basis for believing one thing than another. Awful.

A good design in contrast is good precisely b/c it is geared to generating an inference that makes a hypothesis more or less likely relative to an alternative no matter what the data show, including if they show that some predictor of interest is "not significantly different from 0 at p < 0.05." Do you agree it is possible to design studies like that?

Another reason to like the title: Studies should be judged by the likelihood ratio, not by any particular reader or set of readers' posteriors. If I give you good reason to take a proposition you don't believe in more seriously than you did before, you should adjust your priors in favor of that proposition; you'd be guilty of confirmation bias if you decided to "take on" (recognize, credit, remember, etc.) the study result only if it was "sufficiently strong" to shift your priors from less than 1:1 to greater than 1:1.

By the same logic, if you are a reviewer or editor, you shouldn't treat as your publishability criterion whether the study "persuades" you that the hypothesis is true. You should (subject to judgments about whether the hypothesis in question is of any importance & the study thus worthy of displacing attention from whatever else readers might have looked at instead) publish so long as LR ≠1 -- or else you are infecting readers with confirmation bias.

March 16, 2013 |

Dan Kahan

Not sure that I agree that bad studies usually have "LR=1".

In my experience, they more usually have "LR=?", where the probabilities are indeterminable or based on questionable statistical models of what probability particular events have.

I also don't think it's necessary to have LR≠1 for all possible observations. You can get lucky!

An experiment has three possible outcomes A, B and C. Under hypothesis H1: P(A) = 0.9, P(B) = 0.001, P(C) = 0.099, while under hypothesis H2: P(A) = 0.9, P(B) = 0.099, P(C) = 0.001. You do the experiment and you see outcome A, as expected. What is the LR for H1 over H2? Was the experiment worth doing?

But in general terms, I agree. I do think it would be useful to require an explicit calculation of the LR in a paper that is testing hypotheses. It might make people think more carefully about these things.

March 17, 2013 |

NiV

@NiV:

1. I would assimilate LR=? to LR=1. The point is that a study is poorly designed if one can't offer a reasonable account ex ante of how the results -- whatever they are -- would warrant adjusting one's estimated probability that the hypothesis is true. An LR = ? -- which is basically what the situation was for the awful study I described -- is equivalent to saying, "that result gives me no reason to adjust my assessment of anything." In other words, LR = 1.

2. The study you describe is indeed likely a big waste of time. I'd likely say to someone proposing a study like that that he or she can do better than a design that that promises a 0.9 chance we'll learn nothing. Obviously, if you fiddle around w/ the expected value of finding out something w/ such a design & the cost of designs that have a likelihood higher than 0.1 of yielding insight, the answer changes -- but that's not usually what we are lookign at when we see people concoct harebrained designs (like the one I described above).

3. I don't trust "getting lucky." If somoene runs a study w/o having a sensible causal-inference theory about the results, & then says, "wow! hey! look!," then we are back to Feynman's conversation w/ the rat-maze researcher. If you think you got "lucky" -- discovered something you weren't expecting & could make snese of only after the fact -- better try it again.

March 17, 2013 |

dmk38

I agree that the social media study, as described, has an LR = 1 because it gives no account -- not even an incomplete one -- of why we'd be more likely to see its results given the authors' hypothesis than given the alternative hypothesis. Maybe I shouldn't have said "most" bad studies have LR ≠ 1. I don't have any great sense of what percentage of bad studies are bad for certain reasons versus others.

I was thinking more of the many (and more loudly picked-on) studies that fail to eliminate alternative accounts of their data. In those studies, the results usually counsel strongly against one hypothesis -- thereby having a LR > 1 for all other hypotheses relative to that one -- but don't provide a reason for choosing one of those hypotheses over the other. These include the above-mentioned correlation/causation conflation studies (which may be fine and publishable if you have a good reason for believing causation is the only explanation for correlation, but put that aside for now) or other studies that fail to control for important variables (assuming those variables should actually be controlled for).

If I want to find out whether excessive fast food consumption causes early death (my hypothesis), and all I look at is amount of fast food consumption and age of death and find a correlation, doesn't that have an LR > 1 for my hypothesis? Even if, were I to look further, I would find that there were also correlations with low income, lack of health insurance, race, gender, fondness for standing under tall trees during electric storms, etc. etc.? And that purely correlational study would fit your criterion because if it showed no correlation, or an inverse one, it would have a LR < 1 for the hypothesis. Do you disagree that the proposed study will likely show an LR ≠ 1, that it's a bad study, and that it's (loosely) representative of a broad category of bad studies?

March 17, 2013 |

1. I see what you mean - but I think you can only put it into a Bayesian framework that way if you count the statistical model as part of the hypothesis. And then you're applying a meta-model that assumes the default distribution in the case of any indeterminate hypothesised model will always be the same. That sounds reasonable, but I remain uneasy about it. I'd be worried that you could come up with some counter-intuitive results if you tried the trick with a set of probabilities that were indeterminate but had internal constraints.

You can get away with it as a formal device to be applied on the condition that you don't ever do anything else with it, but I'd think it would be safer not to risk it, and simply say that without known probabilities you can't apply Bayes rule at all. What if someone tried to implement it in software?!

2/3. Sometimes nature doesn't give you the choice. Take those gravity wave detectors. Prior calculations on the strength of gravitational waves made it doubtful they'd be sensitive enough to detect anything, but we don't have the technology yet to do any better, and the new window on the universe that would be opened up if we could was thought to be worth the risk. In this case we do have a causal inference model about the results. If the experiment is cheap, and the payoff is big, then it generally is worth giving it a go, even if you think it's a long shot.

March 17, 2013 |

NiV

@MW:

You said "most really awful studies have a likelihood ratio ≠ 1 for the proposition they claim to test." In one of your latest examples, I think the problem actually is LR = 1. I'm not sure about the other, but it's not my impression that the example is representative of "most really awful studies."

1. The "counsesls strongly against 1 but not others" example makes me assume you are discussing NHT testing, where the "one" is the null. NHT is a useful way to test only if the the design is such that the alternative to the null is more likely than other hypotheses that people would plausibly expect to produce the observed effect. A study that "rejects the null" in favor of a hypothesis that doesn't explain the result any more convincingily than readily imaginable, equally plausible alternatives is said to have a "confound" in its design -- which is a way to say that LR = 1 as between the that hypothesis and the alternative. Crappy papers like that get published. The problem in such cases isn't that LR ≠ 1 failed to screen them out; it's that authors, reviewers & editors aren't thinking clearly.

2. I don't know what to say about your fast food example. I have no idea if that is an awful study. Consider:

a. Two people get in an argument about whether "climate literacy" "causes" perceptions of "cliamte change" or whether some 3d variable -- cultural predispositions, say -- "causes people both to form perceptions of risk and to learn the basis for those risk percdeptions". If the first produces a study that correlates climate chnage literacy & risk perception at p < .00000000000000000000000000000001, the LR = 1 -- not even 1.0000000000000000000000000000000000000000000001 -- as between those two hypothesis. An observational study is not interesting unless one has a theory that makes the correlation observed more consistent with one hypothesis than another. If there was some alternative hypothesis that a 3d variable caused both fast food consumption & early death, then yes, your example is "awful study" - - b/c LR = 1.

b. A mysterious disease arises. Most people assume that the disease is just some sort of air-borne virus or bacteria that is being transmitted by chance enounters among infected & uninfected people. Someone else hypothesizes on the basis of some theory that seems plausible that the cause is conamination of drinking water coming from one of the town's wells. She then collects lots of data from which it will be possible to observe whether there is a correlation between consuming water from that well as opposed to 10 others in the town. So long as we can say what we'd expect correlations between source of drinking water to look like if the disease were being transmitted via casual contact & what it woudl look like if the dieases were originating in consumption of water in a particular well, then whatever she finds (assuming sufficient statistical power; insufficident power is an LR = 1 problem), then LR ≠ 1. If someone has a hyothesis that is consistent with that observation but inconsistent w/ "consumption of contaminated water from that particular well," fine -- he can come up w/ another test w/ LR ≠ 1 & conduct it. Is your example like this? If so, then it could still be an "awful" study, I suppose, if there is something else the initial observational-study researcher could have done just as easily that woudl have generated an even higher LR or if what she investigated is just plain boring & wasting space in the journal and wasting the attention of anyone who makes the mistake of reading it (I don't think I'd feel that way about a zero-order correlation between eating fast food & dying young!).

But I don't see so many studies that strike me as "truly awful" for that reason. Obviously, we will have a hard time constructing a sample here from which to gauge frequency. But I gave you (in my last response) a real example of what I consider to be a "typical" "awful study" b/c LR = 1; give me some real ones of the sort you have in mind -- ones that really do have LR ≠ 1 but that are just boring, trivial etc as opposed to in fact invalid (b/c, e.g., in fact the observations, while inconsistent w/ "null" are *just as* consistent w/ what reasonable person would have said ex ante is a plausible alaternative).

****

On reflection: I wonder whether what we are disagreeing about is either (a) what proportion of "awful studies" are awful b/c of some aspect of mindlessness of NHT, &/or (b) whether that problem is properly characterized as an "LR = 1" problem. If you accept (b) (as I think you should) but not (a), then I'll be very interested to know why our samples of "truly awful studies" must have such different members.

March 17, 2013 |

dmk38

@NiV:
1. I am pretty much using LR =1 & ≠ 1 as heuristic for capturing the sort of thought people should give to why a design makes sense -- why the observations that it is making will support inference relating to hypothesis. Bad designs are "=1" in my book, and can be for lots of reasons, including lack of meaningful causal mechanism, bad sample, insufficient power, etc, as well as formal "P(E|H) = P(E|~H)" -- although latter is shockingly common.
2. Yes, you must be right that even if the probability of getting an uninformative result (LR =1) is much much higher than getting an informative one (LR ≠ 1), the benefit of getting the informative relative to the cost of carrying out the test (and relative to the cost of carrying out the test in a way that would have higher probability of generating an informative result "no matter what") might well make it worth the "gamble." What I see all the time, though, are people design studies that either will generate some (arguably) informative result if the finding allows them to "reject the null," or else leave them w/ no more infrmation than before (sometimes, embarrassingly, the researcher doesn't understand that a "null" finding was uninformative, given the design, including statistical power). I think that is really a tragedy; it is part of the cost of the mindlessness of NHT . So "no matter what" is a nice corrective -- it actually makes people who might otherwise have overlooked the point realize that it is frequently possible to design a study that will generate an informative result whether one "rejects the null" or not.

March 17, 2013 |

Dan Kahan

dmkSr (I've decided the 38 stands for Strontium):

In the last post, I backed away from the "most" in "most really awful studies," and I fully stand by that hasty retreat! As I said, I don't have a sense of what proportion of bad studies my example represents.

Responding to all of your points in a jumble: I'm a little confused by what "imaginable, equally plausible alternatives" you're referring to in 1. Do you mean explanations other than the hypothesis for the lack of a null effect (what I'm trying to get at) or explanations other than the hypothesis that show why the experimenter got the data he did while the null is actually true?

I'm picturing a world where there are (say) three plausible hypotheses: (1) No correlation between A and B, (2) correlation between A and B for X reason, and (3) correlation between A and B for Y reason. Study finds correlation between A and B. Author says this study was designed to test X and has provided evidence for X. I think it is true that this study has an LR > 1 for X, even though it fails to distinguish between X and Y and is therefore not a very good study. As you stuggest in 2a, this has an LR = 1 in the X v. Y wars. But it has an LR > 1 in the X v. null wars. If you're eliminating one out of three possibilities (hey, maybe there's no correlation between climate literacy and risk perception!) you have an LR > 1 for either of the remaining possibilities.

The more interesting question might be, then: how do you determine which hypotheses it must have an LR > 1 relative to for a study to be publishable? I think this is what you're getting at in 2b: if there's another hypothesis the researcher could have tested his hypothesis against and didn't, this may be a problem. But there's going to be a spectrum of how obvious/tenable/otherwise worthwhile those other hypotheses will be (if nobody had thought of culturally-based risk perception, the correlational study in 2a might be considered decent evidence!). So I don't think it's fair to say that every study with an LR > 1 relative to one hypothesis but LR = 1 relative to another has an LR = 1.

(Did you see the cool new interactive cholera map?)

Also, I'm skeptical of your claim that "insufficient power is an LR = 1 problem." If I ask two people "What are your cultural values? Do you believe anthropogenic climate change poses a threat?" and one says "I'm an EC and I think it's a threat" and the other says "I'm an HI and I don't think it's a threat." -- is that LR = 1 for the hypothesis "there's a correlation between cultural view and climate change risk perceptions"? Are you sure that isn't LR = 1.000000000000000000001 (or something of that sort)? It seems like it to me. Crummy evidence. Evidence that shouldn't change your mind very much at all. But evidence no less.

The primary examples that comes to mind (more related to insufficient power than the example I thought of before, although they're related) are neural fishing expeditions. Not the literal neural fishing expedition, but voodoo neuroscience. These studies -- if I'm getting this right through memory and quick review -- indiscriminately survey the brain, come up with some correlation between a psychological state and activation of a certain small area of the brain, and publish that as a neural correlate. The correlations will almost certainly be exaggerated, and there's a none-too-low probability that they arise from noise (see the fish). That's what makes these studies bad. "Invalid," even. But are they actually LR = 1 for the hypothesis, "the urge to do the electric slide resides in amygdala voxel 2974"? LR = 1.00000000001, maybe. But 1? Going into this study, you know you've got an invalid methodology on your hands, but you also know that if you find a correlation between a voxel and the psychological state, that LR > 1 for that voxel being a correlate. I think you're going to tell me that's wrong, but I'm not yet completely clear on why.

I'll need to think harder about more direct failure-to-distinguish-between-alternatives examples...they're the ones that journals are on the lookout for, so they may show up in published literature less frequently.

March 17, 2013 |

@10^6 watts: I will work through this. For now:

I think there is danger we will end up in rut of conceptualism. Let's be sure not to (what a waste).

My Primary Question when I see empirical work: Is it the case that the observations give us reason to have more or less confidence in some proposition (of interest) than we did before? PQ likely seems too basic to be of use? In fact, I am convinced that astonishing # of researchers reveal by how they proceed that they don't actually ask PQ before they start; or if they do, they don't know how to answer. Consider my social media "contract surprise" example. In fact, every problem with a study is some special case of answer "no" to PQ -- so it is usseful, when explicating what the particular problem is, to see that what it is is something that makes it impossible to draw any inference w/r/t the hypothesis (if there even is one...).

Every time answer is no to PQ, then we (as machines who are perpetually adjusting our estimations of probability of different prpositions of interest; I recognize you as such a machine -- one who claims to have 10^6 W, but probably considerably more, processing power) have "LR = 1" -- in sense that result, whatever it is or could have been, gives us no reason to change our assessment of the likelihood ("probability" for those who get agitated when "likelihood" is used in connection with either prior or posterior!) of the proposition in question. Every time answer is "yes," then we have LR ≠ 1 w/r/t that proposition -- that is, we more or less reason to believe it.

Which is to say that I'm using "LR = ≠" heuristicaly. On LR≠J is good name, heuristically, for the proposed journal b/c it helps to teach/remind those who don't ask what I described as the important question that they should. It also teaches those who make the mistake of thinking that studies should be published only when the reviewer or editor or imagined readers' posterior odds are greater than 1:1 on the proposition (in other words, they "believe the hypothesis is true now") are making a very bad, but common mistake from the point of view of machines like us.

More later

March 18, 2013 |

Dan Kahan

Cool cool. (When I caught this post earlier, I thought you were responding to Prof. Bilz!) I certainly hope nothing I've written could be taken to mean that a study where LR = 1 could be good. It couldn't! But as discussed above, I think there are studies where LR ≠ 1 that are still bad. What's now intriguing me is whether all of these studies are bad because while LR ≠ 1 in one sense, in some other critical way -- comparing the hypothesis being tested against another plausible hypothesis, not knowing which hypothesis will have its LR changed going into the study (as is the case with the voodoo neuroscience. I see my dead salmon fMRI link got messed up above) -- LR does equal 1, or whether that's not a useful way to understand why these studies are problematic. I look forward to your further thoughts!

March 18, 2013 |