When ad hockery is, and is not, a problem

In the most basic form, a hypothesis is a statement of prediction.  One may hypothesize that Apple stock will lose 2 points in the coming week, that the solution will turn blue when exposed to a blood sample, etc.

While one can propose hypotheses haphazardly, they typically arise as predictions derived from an underlying model of the phenomenon.  In other words, the prediction about Apple’s stock or the solution are presumably educated guesses based on prior knowledge about the system.

Hence, a correct prediction lends its generative model some credibility.  If Apple stock does fall by 2 points, or the solution turns blue, then the models which generated those predictions seem like better representations of the systems they model.

But this next point is not well appreciated: the amount of credibility lent is actually very small.  Why?  Because, even if they are not considered, many alternative models could’ve generated the same prediction.  This means that a correct prediction spreads its credibility among all these models such that each model only receives a little boost in credibility.  Hence, it is wise to be aware of as many models as possible and to test hypotheses about which different models would make different predictions. 

Another complication arises from the fact that every observation reflects the influence of numerous factors—some we can measure and control, and others we cannot. As a result, each experiment or measurement is vulnerable to confounds that cast doubt on its validity. To mitigate this, researchers typically replicate tests multiple times and across various conditions. The hope is that, although certain confounding factors might bias one particular test, they will not affect all tests in the same way—allowing the true signal to emerge from consistent patterns in the data.

In all, the process of seeking truth is a process creating plausible models, deducing testable implications of those models (hypotheses), testing those hypotheses as proxies of the models, slowly and carefully whittling away the bits of the models that don’t seem to fit observation, and modifying the remainder of the models to encompass the new observations.

Generally, ad hoc hypotheses—that is, hypotheses made after making an observation—are looked at with suspicion.  Why?  One of the things that strikes us about a correct prediction is its improbability: given that other predictions could’ve been made, the one that was made aligns with what really happened…which is what lends credibility to the model that generated that prediction.  So passing off a report of what happened as a correct prediction seems to undermine that credibility.

But revisiting an earlier point, how much credibility is really lent?  Is a hypothesis any less true for having been formed after the observation?  Are the models which could generate that ad hoc hypothesis any less supported by the observation?  I think the answer to these questions is likely to leave one wondering why ad hoc hypotheses should be suspicious.

 The reason, I think, falls out of a problem with the usual application of traditional frequentist-based hypothesis-test p-value statistics (though the same problem can and does occur in other frameworks, i.e., Bayesianism).  In this framework, only two hypotheses are ever considered.  One of these is the “null” hypothesis, which usually states that the observation is due to randomness; the actually-interesting hypothesis is called the “alternative” hypothesis, which basically states that the observation is due to the factor(s) identified by the generative model.  By default, the null hypothesis is assumed to be true; only if the observation is highly unlikely given this assumption is the null hypothesis rejected and the alternative hypothesis accepted.

Here, it’s easy to see how an ad hoc hypothesis is very suspicious.  The framework presents the illusion that only two models are possible: randomness and the interesting hypothesis.  Hence, an ad hoc hypothesis here is basically cheating; it takes all of the credibility lent by the observation and gives it to a single model when it should rightly be divvied up among many nameless models.  But notice that the cheating is only effective if others are unaware of the possibility of other competing models.

Previous
Previous

Imposed boundaries and emergent realities: From drug trials and stocks to debates on sex/gender and race

Next
Next

Bits and bickering: Information, discourse, and disagreement