Bits and bickering: Information, discourse, and disagreement

Jan 1

Bits and bickering

1. Introduction: The Puzzle of Disagreement

Why do people disagree, even when they have access to the same facts? A common assumption—what we might call the “idealist” view—is that disagreement stems from a lack of shared information. According to this perspective, if everyone had all the relevant data and understood it perfectly, no disagreements would remain.

But reality suggests otherwise. Even when parties share the same evidence, they may still reach different conclusions. Consider political debates, scientific controversies, or everyday arguments. Clearly, the problem isn’t just about who knows what.

This essay explores a less obvious but crucial aspect of disagreement: people interpret information through different “models” of the world. A model, built from assumptions and prior knowledge, shapes how we assess probabilities, weigh evidence, and ultimately extract meaning from observations. Because each model frames evidence differently, two people can walk away from the same evidence with different impressions—and enduring disagreement.

We’ll begin by laying some groundwork in information theory, a field that helps us understand what “information” really is and how it relates to uncertainty. From there, we’ll see how the interpretation of information depends on the models we hold, and how differences in models naturally lead to differences in opinion. Ultimately, we’ll find that addressing disagreement isn’t just about giving people more data; it’s about understanding the underlying models through which they view the world.

2. What is information?

Information is the resolution of uncertainty. This definition is intuitive and also technical. To better understand it, we need to measure uncertainty first; the standard measure of uncertainty is called entropy.

2.1 Uncertainty as entropy

Imagine you have a random event with several possible outcomes, each associated with some probability. Before observing the result, you’re uncertain about which outcome will occur. Entropy quantifies this uncertainty. Formally, for a random variable X with outcomes X∈{x₁,… ,x_n} and probabilities p(X=x₁ ),…,p(X=x_n), entropy H(X) is defined as:

H(X) = ∑_i=1ⁿ [ p(x_i) × log(1 / p(x_i)) ]

Why this formula? Big X is a category of possible mutually-exclusive events. We generically denote the actual observations as little x. If we observe many trials, we keep track with the little subscript i; so x_i is the observed value of X on the ith trial. The probability of X=x_i can be denoted as p(x_i).

The term (1/p(x_i)) can be thought of as a measure of “surprise” (actually, the formula uses log(1/p(x_i)), but we don’t need to worry about that here); call this S(x_i). Rare events (low p(x_i)) yield large values of (1/p(x_i)), meaning they’re more surprising. More-probable events yield smaller “surprise” values.

How often do we expect to be surprised by the amount S(x_i)? As often as x_i occurs: p(x_i). So, we multiply S(x_i) by p(x_i).

How much surprise should we expect over n trials? The sum of the expected surprise for each possible value of X (summation over all n observations is denoted by the ∑_i=1ⁿ).

So, entropy is the expected amount of total (log) surprise over a set of observations; higher entropy means greater expected total (log) surprise means more uncertainty.

2.1.1 An Intuitive Example: The Fair vs. Biased Coin

Consider flipping a coin and recording the results as X∈{heads, tails}. For a fair coin:

H(coin)=∑_i=1ⁿ [ p(heads_i) × log(1 / p(heads_i)) + p(tails_i) × log(1 / p(tails_i)) ]

Since p(heads)=p(tails)=0.50, S(heads)=S(tails)=log(1/0.50)=log(2). We’re free to choose whatever logarithm base we want, though it’s common to use 2 (in which case, our measurement is called a “bit”). So we have S(heads)=S(tails)=log₂(1/0.50)=log(2)=1 bit. Owing again to p(heads)=p(tails)=0.50, 0.50×1+0.5×1=1 bit/flip. Thus, over n flips, we get H(X)=∑_i=1ⁿ [ p(x_i) × log(1/p(x_i)) ]=n bits.

Now consider a biased coin with p(heads)=0.75 and p(tails)=0.25. Ask yourself, do you expect more or less entropy. Here, heads is less surprising, S(heads)=log₂(1/0.75)≈0.415, while tails is more surprising, S(heads)=log₂(1/0.25)=2. When we weight the surprises by their probabilities and sum over n flips, we get H(X)=∑_i=1ⁿ [(0.75×0.415)+(0.25×2)] =0.812n—less than the 1 bit for the fair coin.

Is that what you expected? When the less-likely thing happens (tails), there is more surprise/entropy, but it happens less often. The reverse is true for heads. The overall effect is less entropy. Intuitively, there is less uncertainty about each coin flip because we know that it’s more likely to turn up heads, so each observation contains less uncertainty.

2.1.2 Information as the resolution of uncertainty

The amount of information I contained in evidence (e.g., a set of observations) is the difference in entropy before and after assimilating the evidence:

I(evi)=H(X) - H(X|evi)

2.1.3 Information depends on assumptions

It might seem as though entropy and information are objective facts about events, but notice that p(x_i) depends on how you model the situation. If you think the coin is fair, you’ll assign p(heads)=p(tails)=0.50. If you believe it’s biased, you’ll use a different probability. These probabilities affect how surprising an outcome is and how much information the observation delivers. If the evidence is close to your expectations, it barely reduces uncertainty and thus provides little new information. But if the outcome surprises you—if it was less likely under your assumptions—it resolves more uncertainty, delivering more information. After seeing evi={heads, heads, tails, heads}, the fair-coin believer (call her Alice) receives I(evi)=1×4-0.812×4=0.752 bits of information while the 75-25 believer (Bob) receives I(evi)=0.812×4-0.812×4=0 bits.

In the next sections, we’ll see how these underlying models shape interpretation and lead people with different assumptions to disagree about what the “same” evidence means.

3. Something hidden in probability: the subjectivity of information

Probability appears so simple, p(x); where could something be hiding? Answer: behind the curtain; pull back the righthand parenthesis and you reveal the hidden term: p(x|M). This expression is called a conditional probability; we read this as “the probability of x given M is true”.

What is M? M stands for model. A model is a representation of how the world works—how likely events are. Models are built from assumptions and prior knowledge. When we mentioned p(heads)=p(tails)=0.50, we should’ve said p(heads|M)=p(tails|M)=0.50, where M encodes the assumption of a fair coin. We didn’t have to assume a fair coin; we could’ve assumed an unfair coin, or a double-headed coin, or a coin with different images on either face, or that you’d be rolling dice rather than flipping coins. In each case, p(heads) would be different.

Probabilities are not handed down by the universe; they reflect our beliefs, assumptions, and knowledge. In other words, when we reference p(x), we are really referencing p(x|M). This is always true, so we usually do not write the M. But this, as you might deduce, may be a mistake since it is apparently forgotten…leaving us with the impression that probabilities are objective, handed down by the universe.

3.1 Models shape probabilities

Let’s say this both ways:
The probability of an event p(x|M) varies with the model M.
Different models assign different probabilities to the same event.

Since different models represent different understandings about the world, they assign different probabilities to the same event and receive different amounts of information from the same evidence. What seems surprising to one person may seem completely expected to another—hence, different models receive different amounts of information from the same evidence. If Alice assumes a fair coin, seeing three heads in a row provides some information because it’s somewhat surprising under her assumption. Bob, who believes the coin is biased towards heads, might find the same sequence unremarkable; for Bob, that observation may impart no information.

The crucial point is that probability—and, by extension, information—is model-dependent. The same data means different things to different people because they operate from different models. This is disagreement; disagreement is rooted in imperfectly-matched models.

4. When the same evidence means different things

Let’s illustrate just how different models assign probabilities and assess evidence differently. We observe evi = {heads, heads, tails, heads}; we encode heads=1, tails=2 such that our evidence is recorded as evi = {1,1,2,1}. Three individuals are shown the data: Alice, Bob, and Cody.
    Alice believes a coin is perfectly fair (p(heads|M_A)=p(tails|M_A)=0.5).
    Bob believes the coin is biased to land heads 75% of the time (p(heads|M_B)=0.75, p(tails|M_B)=0.25).
    Cody doesn’t believe the data comes from coins, but rather from rolling a fair six-sided die, where “1” and “2” are just numbers on its faces.

4.1 When the same evidence implies different amounts of information

We’ve already found, in section 2.1.3, that Alice receives 0.752 bits of information from this evidence while Bob receives 0 bits. What about Cody.

4.2 When the same evidence is interpreted in different contexts

Information in inherently meaningless; the meaning of information comes from context. Consider, as both analogy and example, words. Words do not get meaning from their spelling or pronunciation; they are only meaningful within the context of a larger language. Further, their meaning varies by the specific context of their usage—who speaks them in what circumstance. The listener, then, uses these factors in combination with their own models of language and the circumstances to decode the message. Do different listeners, having received the same message, interpret the same message identically? Does any listener interpret the same message exactly as was intended by the speaker? No, because the meaning is based on the models of the individuals, and different individuals have different models.

Let’s return to Cody. Cody observes the evidence in a different context from either Alice or Bob. So not only does Cody receive a different amount of information than either Alice or Bob, I(obs|M_C)=(∑_i=1⁴ (1/6×log₂(1/(1/6))) ) - ( (3/4×log₂(1/(3/4))) + (1/4×log₂(1/(1/4))) )=(1.723×4)-(0.812×4)=6.892-3.248=3.644 bits, but that information pertains to a different system—a six-sided die rather than a coin. Whereas disagreement between Alice and Bob pertains to the value p(heads), Alice and Bob disagree with Cody about the meaning of the evidence.

These types of arguments are likely to be frustrating if the interlocuters are not explicit about their models and thus do not realize they are arguing about kind rather than degree. They will spend their time arguing about the observation and believing the other side to be incompetent, when they should instead focus on the differences in their models. It is often easier to believe that this of others rather than the possibility that we are wrong.

5. The Machinery of Models

What exactly is a model made of? Generally, a model has three telescoping components:
    Context: The overarching scenario you believe you’re examining (e.g., coin flipping vs. die rolling, or analyzing job markets vs. analyzing social patterns).
    Parameters: The features you consider important (like the bias of a coin, the force of a toss, or the number of die faces). The parameters, of course, are determined by the context. In the context of flipping a coin, an important parameter may be the bias of the coin; had you assumed a different context, like rolling dice or the performance of the S&P 500, very different parameters would be needed. A single parameter can be generically labeled θ; a set of parameters may be labeled θ⃗.
    Parameter Values: The specific numerical estimates for those parameters. If your context is flipping a coin and you have one parameter, the bias of the coin, and you believe the coin to be fair, then θ_bias=0.

5.1 Models vs beliefs, and holding multiple models

It is true that one’s beliefs about something are represented by a model they hold. But the opposite is not true: a model one holds does not necessarily reflect belief in that model; it is possible to hold a model without believing it, or acknowledging that it may be credible without investing any belief in it, or tentatively believing it with skepticism, or withholding judgement until presented with evidence or given time to consider.

It is also possible for one person to hold two or more conflicting models. Sometimes, the conflict goes unrecognized. Other times, the conflict is recognized and produces the uncomfortable feelings of cognitive dissonance. Still other times, a person can comfortably acknowledge the conflict by apportioning their belief across the incompatible models.

6. Types of Disagreements

Different models can differ along any of the model dimensions (parameter values, parameters, context), and thereby cause disagreement. People, the possessors of models, can additionally disagree about how much belief should be apportioned among competing models.

Disagreements about parameter values are most easily resolved since they imply general agreement about the context and parameters. Resolution is usually found by collecting more evidence. If Alice believes p(heads|M_A)=0.5 while Bob believes p(heads|M_A)=0.75, their disagreement is resolved as more flips are observed.

Disagreements about parameters are more difficult to resolve because different sets of parameters may tend to make similar predictions such that both models may seem equally valid. Only by testing situations wherein the different models make different predictions can one model be shown to be more appropriate than another. For example, Alice may believe that the outcome of a coin flip is an inherent feature of the coin such that her model M_A contains only a single parameter θ_bias, whereas Bob may believe the outcome was due to both θ_bias and the force with which the coin is flipped θ_force. To see if either model is better than the other, several coin flips could be undertaken at several recorded flipping forces.

Disagreements about context can be the most difficult to resolve because, unlike the subordinating relationships between parameter values < parameters and parameters < context, there isn’t necessarily an agreed-upon superordinate for context to constrain discourse along agreeable lines. In principle, these disagreements can be resolved in the same fashion as disagreements about parameters—making observations in situations wherein different models will make different predictions—but in practice, the lack of superordinate structure often thwarts this process because either party is able to conceive of any set of parameters that supports their particular context. For instance, say Alice believes a regular coin is being flipped while Bob believes a double-heads coin is being flipped. If, after 4 flips with only heads, Alice may concede that Bob was correct about the double-heads coin—but she may also insist the flips were due to an improbable streak; after 10 heads, she may insert a heavy θ_bias parameter in an otherwise normal coin; after 100, she may suspect a skilled and deceptive coin flipper; and if they are allowed to examine the coin and find it to, in fact, be doubled-headed, Alice may argue that the coin had been switched.

7. Towards resolving disagreements

Resolution of disagreements can be difficult for several reasons. To start with, all models, even the most scientifically stringent, are (at least partially) founded on one or more untestable assumptions. For example, all models are founded on the assumption that the world is understandable to some extent—otherwise, why would you bother constructing a model. This alone can doom the possibility of resolution in many cases. Somebody starting from the assumption that there is a benevolent God will have disagreements with those that believe differently. The areas of disagreement may be very distant from differing beliefs about God—say, interpretation of- or preferences for US immigration policy—but persistent conversation is likely to reveal the source of disagreement to be, in part, these diverging roots. Both sides may marshal arguments for their position about God, but this is ultimately an untestable assumption. The believer may assume that something cannot come from nothing such that a supernatural being, existing outside of time and space is the most reasonable explanation for the existence of anything (and assuming there is a God, the __(insert religion here)__ is the most probable God), but the non-believer may be agnostic to this point.

Another difficulty is that probability can be mathematically difficult. I’m not saying that people consciously churn through the calculations to arrive at their model, but some analogous process must be occurring in the machinery of our brains. When computers carry out sufficiently difficult computations, operating thousands of times faster than our brains, they can run for hours to weeks. It’s no wonder, then, that it is so difficult for humans—using comparatively underpowered brains on more complex problems—to quickly update their models and change their minds.

This is more true for foundational issues that form part of M—which requires recalculation of many probabilities associated with M—and doubly so if one must adopt an entirely new context or parameters, since one must recalculate both p(x|M) and p(x|M') in order to make a proper comparison.

In the extreme, these difficulties can cause disagreement about observed events and testable assumptions. Given certain models, it can become more believable that an observation was faked or its existence a lie; that a test is unfeasible or unlikely to be accurate.

Disagreement is resolved insofar as opposing parties move closer to sharing the same model. Ideally, this convergence follows a careful, evidence-based process so that it reflects reality rather than merely bending to social pressure. Complete resolution is rarely attained, as we will discuss, and thus should not be taken for granted. Instead, we can aim for all parties to engage in a principled procedure for resolving disagreements. If they refuse, they act in bad faith, leaving coercion or capitulation as the only routes to “agreement.”

7.1 Updating models and beliefs

We should update models when they do not perfectly fit all available evidence—which includes prior assumptions. We should update our beliefs about models when competing models fit evidence differently. Updating models occurs by finding better parameter values while retaining the same parameters and context. Updating beliefs occurs by weighing the merits and faults of competing models, which differ in parameters or context.

Between these, updating parameter values is more straightforward since possible parameter values are well constrained by the parameters and context of the model. Updating parameter values, then, is akin to adjusting sliders along 1-dimensional lines. The methods for updating parameter values also apply to updating beliefs, but updating beliefs is more difficult.

Whereas parameter values vary only in a single dimension, every parameter within a model represents its own dimension such that a model varies in as many dimensions as it has parameters. The one-dimensional space of parameter values can be fully explored quickly, but the multidimensional space of models gets multiplicatively more vast with every additional parameter. Competing parameter values are efficiently sorted such that updating occurs unambiguously updated towards a single optimal value that best fits assumptions and observation, but the flexibility afforded by multiple dimensions permits similar fit among competing models.

7.1.1 Bayesian Updating

Bayesian updating is a way of updating models and beliefs. It updates towards the optimal compromise between a) models/beliefs prior to receiving new evidence, and b) the model/belief that would perfectly match the new evidence. Notice I said “compromise”. An updated model should not automatically update to perfectly fit new evidence. Why not? Three reasons:

1. Evidence is not perfectly reliable. Evidence can be unclear, misleading, or even faked. Hence, it is desirable to buffer against completely updating to align with every bit of evidence.

2. Received evidence is only a subset of all possible evidence. Evidence that you receive is only a small subset of all possible evidence that exists. Different evidence from the pool of all possible evidence would imply a different perfectly-fit model. For example, the observation from our 4 coin flips suggests that p(heads)=0.75. say we flipped the coin one more time, regardless of the outcome, this probability will change: if we flip another heads, then p(heads|heads)=0.80; if we flip tails, then p(heads|tails)=0.60. So the model suggested by the data is contingent on the particular subset of observations we actually make. This contingency goes by the name “sampling bias.”

3. Prior-to-evidence models also contain information that should be considered. Our pre-evidence model is, presumably, informed by something: assumptions and previous evidence. We want the model to fit total information, not just the most recent evidence. In the case of our coin, the information contained in our pre-evidence model may be based on our intuitive sense of physics and the apparent balance of the coin, or the fact that most coins are roughly fair. It wouldn’t make sense to discard this information in favor of this new data—especially considering the problems of unreliable evidence and sampling bias. Let’s make this clearer with our coin. Say we started with the model p(heads|M)=0.50. We again observe evi={heads, heads, tails, heads}, suggesting p(heads|obs)=0.75. Even though we shouldn’t, say we update our model by completely adopting p(heads|M')=p(heads|obs)=0.75. Now we make another set of observations evi₂={tails, heads, heads, tails} such that p(heads|evi₂)=0.50. Should we again discard M' in favor of p(heads|M'')=p(heads|evi₂)=0.50? Of course not, since this would be akin to discarding the information in the first set of observations. By extension, then, we should not have discarded M in the first place in favor of p(heads|obs) since this discards the information present in M. Instead, we want to accumulate all the information as it comes in. This is what Bayesian updating does.

7.1.2 Prediction and cross validation

Let’s take a step back. What is the goal of our modelling? Is it to describe the observations we’ve made? No; if all our model can do is describe what we’ve already observed, then it contains no insight—literally, no information because it is maximally surprised with every new observation. A model is judged by how well it generalizes to unseen data. (The extent to which a model well fits observed data but poorly fits unseen data is called “overfitting”. We should not expect models to perform as well on unseen observations as they do on previous observations since the previous observations have helped form the current model.)

The most straightforward way to test a model against unseen data is to make predictions and then find new data. But new data is not always easy to get. This leads to an alternative: “cross validation”.

Cross validation is a well-known method in statistics. But I’ve never seen it discussed in terms of general decision making. So what is it? Imagine the parameters and evidence that have informed your model and a competing model. Note the parameters and informing evidence that the models have in common. Now, remove one of the common parameters/observations and reconfigure the models without; how surprising would that omitted parameter/observation be, presented as a new piece of evidence, for each model? The model for which it is more surprising—that is, the model which receives the most information—is less well fit. Replace the omitted parameter/observation and repeat for each (or, at least, several) common parameter/observations. The model that accumulates more total surprise is less likely to be the better model.

Hopefully, the how makes sense. But the why may yet be mysterious. As a first pass, you can think of this as a way of simulating unseen data. Going a little deeper. Those parameters/observations that are not well integrated with the rest of the model will incur more surprise in proportion to how much the model relies on them: once when omitted, since the remaining model will not easily predict the omitted, and several times as the other parameters/observations are omitted, since the offending parameter/observation will skew the prediction for omitted. This is desirable because it is these poorly integrated parameters/observations, those that are largely implied by the others, are more likely to be unreliable.

Now, does the model that performs best against unseen data win? No. But it does gain credibility at the expense of the worse-performing model(s). Just like the case of parameter values, the credibility of models adjusts from the prior credibility. Even after a superior performance from model B, Alice may still give more credibility to model A if her prior credence in model A was high enough; all we can hope for is that Alice’s credence in model is diminished from what it was.

7.2 Incremental convergence rather than instant agreement

After Alice and Bob, sharing context and parameters but starting from different prior parameter values, observe the same evidence and update their models accordingly, do their posterior models match one another’s? No, and for good reason: even though they both observed the same evidence and updated appropriately, their different priors tend to keep them from converging completely upon viewing the evidence—though their posterior models will tend to be closer than were their prior models. This reflects one of the virtues of Bayesian updating: the prior contains information which should not be immediately discarded but rather continually considered even as more evidence is available. Alice and Bob, then, continue to disagree (albeit to a lesser extent) because, despite observing the same evidence, they still hold different information which is represented in their different priors. But as more evidence accumulates, the information contained in their priors will become dwarfed such that they come ever closer to identical posterior models.

However, convergence is no guarantee: parties that begin with wildly different contexts, parameters, or deeply rooted assumptions may still disagree. Bayesian updating, while systematic, must contend with the complexity of real-world models. If you must adopt an entirely new context—say, deciding that what you thought were coin flips are actually die rolls—the updating process is not just a matter of adjusting one probability; it’s a wholesale restructuring of how you interpret evidence.

8 Obstacles to updating and agreement

As straightforward as I’ve made updating and agreement to be, it is obviously more difficult than I’ve suggested. Here, I’ll try to discuss some of the difficulties.

8.1 Complexity and cognitive limitations

Real disputes rarely hinge on a single binary event like a coin flip. They involve multifaceted systems—economies, ecosystems, human societies—with numerous interdependent parameters. Approximating these processes is computationally challenging. Humans, with our limited cognitive resources, rely on intuition and shortcuts. Sometimes these shortcuts lead to biases that slow or prevent proper updating. I will not discuss these any further, since there are many resources on the topic—the book “Thinking, fast and slow” by Daniel Kahneman being especially valuable.

8.2 Untestable assumptions and value-laden models

Some disagreements stem from assumptions that are beyond empirical testing. If one model assumes a benevolent deity influences all outcomes, while another model denies any supernatural force, no finite set of observations may decisively settle which is correct. The same applies to trust in institutions, or beliefs about human nature—these can shape how we interpret every piece of evidence, creating an interpretive lens that’s hard to remove.

Relatedly, we often build models in our minds to preserve something we value—some cherished beliefs that somehow improves our sense of wellbeing: the reality and goodness of God, the federal government cannot be trusted, it is better to have loved and lost than to have never loved at all. This does not mean that values are, well, not valuable. We may try to model the world such to maximize realization of those things we value, but mistaking values as hard facts lends to persistent disagreement. These types of disagreement tend to take on the character of the ever-retreating context shifting, as Alice did to preserve her belief in a fair coin. It is not improper to build models to support a foredrawn conclusion—we must often do this to explain something otherwise unexplainable, if only to knock that model down by collecting evidence—but failing to update a model away from such models will tend to lead to poor predictions and promote disagreement among good-faith interlocutors.

8.3 Deception and the hypothesis thereof

Deception is a malignant tumor within the body of a model. Once detected, it can be nearly impossible to perfectly excise without killing the model; every piece of a model that was informed by the deception is contaminated, and even the remaining tissue which appears healthy must be monitored lest undetected cells lay dormant.

Spreading deception in order to reach agreement, a few lies to patch up differences, is dangerous for this reason. The deception deployed such to have others build a particular model is thereby seeded with its own destruction.

Deception is so corrosive that even its possibility can destroy the prospect of agreement. In these circumstances, ultra-cynicism is increasingly viable such that updating is all but impossible. Simply put, if deception is probable, then every piece of countervailing evidence may credibly be considered as evidence of deception. Many conspiracy theories, in the derogatory sense, have this character; if a government-appointed commission concludes an assassin acted alone, this is only more evidence that the government was involved.

8.4 Reciprocal relationship between models and evidence

One subtle challenge to resolving disagreement is that the credibility of evidence isn’t fixed. Observations and models interact in a reciprocal relationship: evidence updates models, but models also determine how much weight—or skepticism—we apply to evidence.

This idea is almost built into the definition of information: less-probable events are more-surprising events—and there is more skepticism of improbable events. This can be an unfortunate and tedious problem, since interlocuters may need to argue about the validity of evidence in addition to the implications of the evidence for different models. Nevertheless, this additional burden should not derail conversation. Sometimes, the context of the evidence is so clear to be beyond skepticism, or bring about similar skepticism in all parties.

But even if different opinions emerge, the diminished evidence still constitutes some evidence and merely increases the burden of evidence required by the skeptical party.

9 A taxonomy of disagreement

Before wrapping up, I’d like to offer an oversimplified framework for categorizing disagreements. Real-life arguments are of course more nuanced, but this taxonomy provides a useful starting point. The approach focuses on two high-level questions: (1) Are participants acting in good or bad faith? and (2) How do they believe truth is found?

    Good-Faith vs. Bad-Faith Participants
    Good-Faith Participants are motivated by reasons they openly acknowledge. Their primary goal is to discover or defend a version of truth they sincerely believe in.
    Bad-Faith Participants have hidden agendas; they may lie, manipulate, or withhold motives to “win” rather than to discover truth. Disagreements with bad-faith participants often resist resolution by evidence or reasoning alone, as the real conflict involves power or deception rather than ideas.

    Naturalists vs. Supernaturalists
Among good-faith participants, we can broadly divide worldviews into naturalist and supernaturalist:

    Naturalists believe that truth can be discovered through universal, often empirical, means accessible to all (e.g., science, logic, observation).
    Rationalists: Emphasize reasoning from prior beliefs or principles. They often want to establish or validate their assumptions before examining evidence.
    Empiricists: Emphasize observing data and letting “the facts speak for themselves.” They generally take the stance that evidence is decisive, provided it’s good-quality and sufficiently comprehensive.

    Supernaturalists believe that truth may require spiritual, esoteric, or otherwise non-empirical insight beyond straightforward observation and reasoning.
    Romantics: Associate “truth” closely with personal values or subjective experiences.
    Mystics: See truth as tied to universal or transcendent values, discoverable only after adopting certain spiritual or metaphysical beliefs.

Because this essay largely focuses on how shared information can still yield disagreement, we’ll zero in on naturalists—the group that (at least ostensibly) relies on facts and logic. Even within naturalism, disagreements can take different forms:

    Rationalist vs. Rationalist: They generally clash over the validity or weight of assumptions. Productive progress may happen through discussion, but evidence must eventually be factored in to refine those starting assumptions.
    Empiricist vs. Empiricist: They usually disagree about how to interpret evidence. Discussion may help clarify which data are relevant or how best to analyze them, but it also calls for acknowledging background assumptions.
    Rationalist vs. Empiricist: They often end up talking past each other. The empiricist feels that “the evidence should settle the matter,” while the rationalist thinks “we can’t interpret evidence without making certain assumptions.” Each side struggles to recognize the other’s unspoken premises, which can leave them at an impasse.

This taxonomy doesn’t capture every nuance, but it illustrates how deeper beliefs and motivations shape disagreements—even when people ostensibly share the same facts.

10. Conclusion

We began by asking why disagreements persist even among well-informed people. The simple idea that more evidence automatically resolves conflict doesn’t stand up to scrutiny. Instead, this essay showed that disagreement often arises because people interpret the same information through different models—frameworks shaped by assumptions, context, and values.

Information theory gave us a handle on how uncertainty is measured and how information reduces that uncertainty. But what we learned is that the “amount” of information depends on what you assume in the first place. Two observers can see the same event yet feel very different degrees of surprise and learn different lessons. Without recognizing this model-dependence, it’s easy to assume that those who disagree are just ignorant or irrational.

In reality, they may simply be using different mental maps of the world. Convergence requires careful, incremental updating of these maps, guided by Bayesian reasoning and tempered by considerations of evidence quality, complexity, and deeply held values. While perfect agreement may be out of reach, understanding why disagreement exists and how to approach it makes us more patient, more open-minded, and more inclined to seek common ground.

Ultimately, there is a reality out there, but our interpretations of it vary. Through good-faith communication, rigorous testing of assumptions, and a willingness to update our models, we can move closer to a shared understanding—even if we never fully arrive.

Nicklaus Millican

Bits and bickering: Information, discourse, and disagreement

1. Introduction: The Puzzle of Disagreement

2. What is information?

2.1 Uncertainty as entropy

2.1.1 An Intuitive Example: The Fair vs. Biased Coin

2.1.2 Information as the resolution of uncertainty

2.1.3 Information depends on assumptions

3. Something hidden in probability: the subjectivity of information

3.1 Models shape probabilities

4. When the same evidence means different things

4.1 When the same evidence implies different amounts of information

4.2 When the same evidence is interpreted in different contexts

5. The Machinery of Models

5.1 Models vs beliefs, and holding multiple models

6. Types of Disagreements

7. Towards resolving disagreements

7.1 Updating models and beliefs

7.1.1 Bayesian Updating

7.1.2 Prediction and cross validation

7.2 Incremental convergence rather than instant agreement

8 Obstacles to updating and agreement

8.1 Complexity and cognitive limitations

8.2 Untestable assumptions and value-laden models

8.3 Deception and the hypothesis thereof

8.4 Reciprocal relationship between models and evidence

9 A taxonomy of disagreement

10. Conclusion

Bespoke Statistical Consulting and Data Analysis

nicklaus.millican@bespoke-stats.com

Bits and bickering: Information, discourse, and disagreement

1. Introduction: The Puzzle of Disagreement

2. What is information?

2.1 Uncertainty as entropy

2.1.1 An Intuitive Example: The Fair vs. Biased Coin

2.1.2 Information as the resolution of uncertainty

2.1.3 Information depends on assumptions

3. Something hidden in probability: the subjectivity of information

3.1 Models shape probabilities

4. When the same evidence means different things

4.1 When the same evidence implies different amounts of information

4.2 When the same evidence is interpreted in different contexts

5. The Machinery of Models

5.1 Models vs beliefs, and holding multiple models

6. Types of Disagreements

7. Towards resolving disagreements

7.1 Updating models and beliefs

7.1.1 Bayesian Updating

7.1.2 Prediction and cross validation

7.2 Incremental convergence rather than instant agreement

8 Obstacles to updating and agreement

8.1 Complexity and cognitive limitations

8.2 Untestable assumptions and value-laden models

8.3 Deception and the hypothesis thereof

8.4 Reciprocal relationship between models and evidence

9 A taxonomy of disagreement

10. Conclusion

When ad hockery is, and is not, a problem

AI for research: powerful servant, poor master

Bespoke Statistical Consulting and Data Analysis

nicklaus.millican@bespoke-stats.com