When the Numbers Tell a Different Story
Reading Behavior Instead of Rhetoric
The previous installments established a framework: metric governance produces harm through targets and incentives rather than through decisions and directives, and accountability evaporates because responsibility is distributed across roles that each disclaim ownership of the whole.
But frameworks are only as useful as their application. The critical question is: how do you know? When an institution claims one goal and pursues another, how can an outside observer—or an inside participant—distinguish the stated mission from the actual optimization target?
The answer is deceptively simple: you look at what the system does, not what it says. You examine the behaviors that emerge under pressure. You identify what is rewarded and what is ignored. You trace the gap between claimed objectives and measured outputs.
This is not mind-reading. It is not imputing secret motives to individuals. It is inference about system optimization under given incentives—a method as applicable to institutions as to any other complex system that responds to feedback.
The Gap Between Claims and Measurements
Every institution operates with two parallel descriptions of itself.
The first is rhetorical: the language of mission statements, press releases, legislative intent, and public justification. This description articulates what the institution is for—the values it serves, the problems it solves, the outcomes it seeks.
The second is operational: the metrics by which performance is actually evaluated, the targets that determine budgets and promotions, the numbers that appear on dashboards and in reports to leadership. This description reveals what the institution rewards—the behaviors that are incentivized, the outputs that are counted, the results that matter for institutional survival.
When these two descriptions align, there is no puzzle. An institution that claims to prioritize public safety and measures its success by reductions in crime or improvements in community trust is coherent. Its rhetoric and its operations point in the same direction.
But when these descriptions diverge—when an institution claims one goal and measures something else entirely—the divergence itself becomes data. The gap between what is said and what is counted reveals the actual optimization target, regardless of stated intent.
This is not cynicism. It is Goodhart's Law, applied to institutional behavior: when a measure becomes a target, it ceases to be a good measure. An institution that measures arrests will produce arrests. An institution that measures processing volume will produce processing volume. The measure shapes behavior, and behavior reveals what the system is actually for—whatever its mission statement claims.
Falsifiable Predictions
The method of reading behavior instead of rhetoric works by generating falsifiable predictions. If an institution is genuinely optimized for its stated goal, certain patterns should be observable. If it is optimized for something else, different patterns should emerge. The patterns that actually appear tell you which optimization is real.
Consider an enforcement system that claims its goal is public safety—removing dangerous actors who pose genuine threats to communities.
If public safety were the true optimization target, we would expect to observe:
Concentration of resources on high-priority targets. The system would focus its limited capacity on individuals who pose the greatest documented risk. Serious criminal records, violent offenses, active warrants—these would dominate the caseload.
Declining action rates over time as the priority population is addressed. If you are removing the most dangerous actors, and those actors are a finite population, the numbers should decrease as the mission succeeds. A shrinking target population is a sign of effectiveness.
Increasing severity among those processed. As the highest-priority cases are resolved, the average severity of remaining cases should rise, not fall. The easy cases would be handled first, leaving harder but still high-priority cases for later.
Protection of voluntary compliance mechanisms. A system optimized for public safety would want accurate information about who is in the community and where they are. It would protect channels that encourage people to maintain contact with authorities, appear for appointments, and remain legible to the system. Destroying these channels makes the population harder to track and the genuine threats harder to find.
If throughput were the true optimization target, we would expect to observe:
Sustained or increasing action rates regardless of target population changes. The numbers must be hit. If high-priority targets become scarce, the system shifts to lower-priority targets rather than declaring success and scaling down.
Declining severity among those processed. As the hardest cases are depleted or avoided, the system fills its quotas with easier cases. The average severity of offenses among those processed falls over time.
Targeting of legible, accessible populations. People who maintain known addresses, appear for appointments, comply with reporting requirements, and otherwise make themselves visible become preferred targets—not because they are dangerous, but because they are easy to count.
Destruction of compliance channels. If appearing for a scheduled check-in makes you a convenient target, rational actors will stop appearing. The system converts voluntary cooperation into a trap, optimizing for immediate captures at the cost of long-term visibility.
These predictions are falsifiable. The patterns that emerge in actual data reveal which optimization is operative. If the data shows the first set of patterns, the institution is doing what it says. If the data shows the second set, the institution is doing something else—regardless of its rhetoric.
What the Data Shows
When independent analysts examine enforcement data across multiple systems and contexts, a consistent pattern emerges—and it is not the pattern that public safety optimization would predict.
In system after system, the composition of enforcement actions shifts over time. Early in an enforcement surge, a higher proportion of those processed have serious records—because these are the cases that were already flagged, already prioritized, already in the pipeline. As the surge continues, this proportion declines. The high-priority population is finite; the quotas are not.
The share of enforcement actions involving individuals without serious records grows, often dramatically. Actions against people with pending charges (not convictions) increase. Actions against people whose only violation is administrative rather than criminal increase. Actions against people who were attempting to comply with legal processes increase.
Meanwhile, the overall volume remains stable or grows. The system does not interpret the depletion of high-priority targets as a sign of success. It interprets the depletion as a problem to be solved by expanding the target population.
This pattern is visible in multiple independent data sources: agency statistics (when they are released), court records, journalistic investigations, and civil-society monitoring. The specific numbers vary by context, but the trajectory is consistent. The system sustains its volume by shifting its composition—and the shift is always in the same direction: toward easier targets, not harder ones.
This is throughput optimization. It is not public safety optimization. The two would produce different patterns, and the patterns we observe are the patterns throughput would produce.
Inference, Not Accusation
It is important to be precise about what this method does and does not claim.
Reading behavior instead of rhetoric is an inference about system optimization. It is not a claim about anyone's private motives. It does not require assuming that leadership secretly wants to harm innocent people, that operators are deliberately targeting the vulnerable, or that anyone involved is acting in bad faith.
The inference is structural, not personal: given these targets, given these incentives, given these reporting requirements, this is the behavior that would be rewarded—and this is the behavior we observe.
Individuals within the system may genuinely believe they are pursuing public safety. They may be distressed by the patterns they observe. They may try, within their limited sphere, to prioritize serious cases. None of this changes the system-level optimization. Individual intentions are overwhelmed by structural incentives.
This is why "bad apples" explanations fail and why leadership turnover rarely changes outcomes. The individuals are not the problem. The optimization target is the problem. And the optimization target is encoded in what gets measured, what gets rewarded, and what gets ignored—not in anyone's heart.
Refusing to impute motive is not naivety. It is analytical discipline. The system is doing what it is designed to do. The people inside it are responding to the incentives they are given. These are structural claims, and they are stronger than accusations of bad faith precisely because they do not depend on anyone being a villain.
The Metric Is the Message
Once you understand that measurement shapes behavior, certain features of institutional design become legible in new ways.
The choice of what to measure is a policy choice, even when it is framed as technical. An institution that measures arrests has chosen to optimize for arrests. An institution that measures convictions has chosen to optimize for convictions. An institution that measures crime reduction has chosen to optimize for crime reduction. These are not equivalent. They produce different behaviors and different outcomes. The choice among them is a choice about what the institution will actually do—whatever its stated mission.
The choice of what not to measure is equally significant. An institution that does not measure false positives has chosen to accept false positives. An institution that does not measure community trust has chosen to ignore community trust. An institution that does not measure downstream costs has chosen to externalize downstream costs. These absences are not oversights. They are permissions.
Metrics create reality within the institution's self-understanding. What appears on the dashboard is visible, discussable, actionable. What does not appear on the dashboard does not exist in the terms the institution uses to understand itself. If errors are not counted, they do not become a problem to be solved. If harms are not tracked, they do not enter the cost-benefit analysis. The measurement system defines the boundaries of institutional perception.
This means that the design of the measurement system—who creates it, what it includes, what it excludes, how it defines its categories—is where the actual policy decisions are made. The public-facing policy may say one thing. The measurement system says what will actually happen.
To understand what an institution is optimized for, do not read its mission statement. Read its dashboard.
Why This Matters
The ability to infer optimization targets from observed behavior matters because it cuts through strategic communication.
Institutions that produce harm have strong incentives to describe themselves in terms of their stated goals rather than their actual operations. Press releases highlight sympathetic cases. Public statements emphasize dangerous actors removed. Leadership points to the worst examples and implies they are representative.
This communication is not necessarily dishonest. It may accurately describe a subset of cases—the subset that supports the preferred narrative. But a subset is not a sample. The highlighted cases tell you what the institution wants you to believe. The aggregate patterns tell you what the institution actually does.
When independent data contradicts institutional narrative—when the share of serious offenders is declining even as total enforcement rises, when compliance channels are becoming traps, when the patterns match throughput optimization rather than public safety optimization—the data is more revealing than the press releases.
This is not a matter of taking sides. It is a matter of evidentiary standards. Claims about what an institution does should be evaluated against evidence about what the institution does—not against the institution's description of itself.
The behavior is the testimony. The patterns are the evidence. The gap between rhetoric and measurement is the verdict.
Limits of the Method
Reading behavior instead of rhetoric is a powerful tool, but it has limits.
Data availability constrains analysis. Many institutions do not publish the data needed to assess their actual optimization. They release aggregates without breakdowns by severity, or they define categories in ways that obscure the patterns that would be revealing. The absence of data is itself informative—it suggests the institution does not want the analysis performed—but absence is harder to work with than presence.
Category manipulation can obscure patterns. If "criminal" is defined broadly enough, the share of "criminals" in the processed population can remain high even as the underlying composition shifts toward minor offenses. The metric can be gamed by changing the definitions rather than changing the behavior. This is why examining the metric is insufficient; you must also examine the categories the metric uses.
System defenders can always invoke the unobserved. "The truly dangerous actors are still out there—that's why we can't scale back." "You don't see the threats we're preventing." These claims are difficult to falsify precisely because they invoke what cannot be seen. But they should be weighed against what can be seen: if the observable patterns match throughput optimization, the burden is on defenders to explain why the unobservable reality is so different.
Inference is not proof. The method generates strong evidence about what a system is optimized for, but it does not meet the standards of criminal conviction or even administrative finding. It tells you what is most likely true, not what can be proven beyond doubt. This is a limitation in legal contexts, but it is the ordinary condition of understanding complex systems. We rarely have proof; we have patterns and inference.
The Question That Metrics Answer
Every measurement system answers a question, whether it intends to or not.
When an institution measures volume and does not measure accuracy, it is answering the question: "How many?" It is not answering the question: "How well?"
When an institution measures outputs and does not measure outcomes, it is answering the question: "What did we do?" It is not answering the question: "What difference did it make?"
When an institution measures actions taken and does not measure harms produced, it is answering the question: "Are we active?" It is not answering the question: "Are we effective?"
The question that gets measured is the question that gets answered. And the question that gets answered shapes behavior, directs resources, and determines what success looks like.
To change what an institution does, you must change the question it is answering. This means changing what gets measured—adding accuracy to volume, outcomes to outputs, harms to actions. Without that change in measurement, the change in rhetoric is meaningless. The system will continue optimizing for the question the metrics ask, regardless of what the mission statement proclaims.
What Comes Next
We have now established that the gap between rhetoric and measurement reveals the true optimization target, and that observed behavior patterns allow us to infer what systems are actually for.
But measurement does not happen on neutral ground. Before data can be collected and patterns can emerge, categories must be defined. What counts as "criminal"? What counts as "error"? What counts as "force"? These definitional choices operate upstream of any measurement—and they determine what can be seen, compared, and contested.
The next installment examines category control: the hidden battle over definitions that shapes what data can mean.
This essay is part of an ongoing series on metric governance and accountability.