The Hidden Battle Over Categories

The Hidden Battle Over Categories

How Definitions Determine What Can Be Seen


The previous installment established a method for reading institutional behavior: when rhetoric and measurement diverge, the measurement reveals the true optimization target. Observed patterns—who gets processed, how composition shifts over time, what happens to compliance channels—tell you what a system is actually for, regardless of its stated mission.

But this method has a vulnerability. It assumes that the categories being measured are stable and meaningful—that when we compare "arrests" over time, or examine the share of "criminals" in a population, we are comparing like to like.

This assumption is often false.

Before any data can be collected, before any pattern can emerge, categories must be defined. What counts as an "arrest"? What counts as a "criminal"? What counts as an "error" or a "use of force"? These definitional choices operate upstream of measurement itself. They determine not just what gets counted, but what can be counted—what is visible, comparable, and contestable within the system's official frame.

The battle over categories is quieter than the battle over data. It generates no headlines. It appears in footnotes, in policy manuals, in the fine print of reporting requirements. But it is often decisive. An institution that controls its own definitions can win the measurement contest before it begins.


The Power of Definitions

Consider a simple question: what proportion of enforcement actions involve "criminals"?

This seems like a straightforward empirical matter. Count the total actions. Count those involving people with criminal records. Divide. The answer should be a number.

But the number depends entirely on how "criminal" is defined.

Does "criminal" mean someone with a conviction? Or does it include people with pending charges—accusations that have not been proven and may be dismissed? Does it include people whose only offense is administrative rather than criminal in nature? Does it include decades-old misdemeanors? Does it include offenses that are crimes in some jurisdictions but not others?

Each definitional choice produces a different number. A system that defines "criminal" broadly—including pending charges, administrative violations, and minor offenses from decades past—can report high rates of "criminal" involvement even when very few of those processed have serious or recent convictions. A system that defines "criminal" narrowly would report lower rates from the same underlying population.

The definition is not neutral. It is a policy choice disguised as a technical one. And because it operates upstream of the data, it shapes what conclusions can be drawn from any subsequent analysis.

An institution that wants to claim it is targeting dangerous actors has an incentive to define "criminal" as broadly as possible. An institution that wants to acknowledge errors has an incentive to define "error" as narrowly as possible. An institution that wants to minimize reported use of force has an incentive to set the threshold for reportable force as high as possible.

These incentives do not require conspiracy or deliberate deception. They operate through the ordinary bureaucratic process of establishing reporting requirements. Someone must decide what the categories mean. Those decisions are made by people with institutional interests. The definitions that emerge tend to serve those interests—not because anyone is lying, but because the people making definitional choices are the same people whose performance will be evaluated using those definitions.


Category Manipulation in Practice

The manipulation of categories takes several forms, each with its own logic and effect.

Expansive definitions inflate favorable metrics. When an institution wants to claim success in targeting a particular population, it defines that population broadly. "Criminal aliens" can include anyone with any interaction with the criminal legal system, regardless of outcome. "Public safety threats" can include anyone whose presence is deemed undesirable, regardless of any specific threatening act. "Illegal aliens" can include people with pending legal claims, people in legal gray zones, and people whose status is contested—not just people with clear-cut violations.

The effect is to make the favorable metric—the share of bad actors among those processed—appear as high as possible. The definition does the work that enforcement quality cannot.

Narrow definitions suppress unfavorable metrics. When an institution wants to minimize reported problems, it defines those problems narrowly. "Use of force" might require physical contact causing injury, excluding drawn weapons, physical restraint, or property destruction. "Error" might require formal adjudication, excluding cases where someone was detained and later released without explanation. "Complaint" might require a specific procedural filing, excluding informal reports or concerns raised through other channels.

The effect is to make the unfavorable metric—the rate of problems—appear as low as possible. Incidents that would intuitively count as force, error, or complaint are defined out of the category and thus out of the count.

Categorical ambiguity prevents comparison. When definitions are inconsistent over time, across jurisdictions, or between official and independent sources, comparison becomes impossible. If "arrest" meant one thing last year and something different this year, trend analysis is meaningless. If "criminal conviction" means something different in federal reporting than in state records, reconciliation fails. If official categories do not map onto categories used by researchers, journalists, or advocates, the conversation becomes a dispute about definitions rather than about facts.

This ambiguity can be accidental—the result of evolving bureaucratic practice without coordination. But it can also be strategic. An institution that cannot be pinned down on definitions cannot be pinned down on performance. "You're measuring something different than we are" is a complete defense against any external analysis, regardless of what that analysis shows.


Denominators: The Hidden Half of Every Statistic

Categories define what goes into the numerator—who or what gets counted. But equally important is the denominator—what the count is measured against.

A statistic without a denominator is almost meaningless, yet institutions routinely report numerators alone.

"We arrested 10,000 criminals this year." Is that a lot? It depends on how many criminals there were to arrest, how many resources were deployed, what proportion of total arrests this represents, and how it compares to prior years. Without a denominator, the number is pure rhetoric—impressive-sounding but uninterpretable.

"Complaints decreased by 20%." Did complaints decrease because problems decreased, or because the process for filing complaints became more burdensome? Did the population subject to the institution's authority decrease? Did fear of retaliation increase? The numerator alone cannot answer these questions.

Denominators are where institutions exercise quiet control over what their statistics can mean.

Arrests per capita versus raw counts. An institution can report rising arrest totals as evidence of increasing effectiveness, even if the per-capita rate is falling because the target population has grown. Or it can report falling per-capita rates as evidence of restraint, even if raw totals are increasing. The choice of denominator determines the story.

Convictions versus charges. Reporting the share of those processed who had "criminal history" sounds rigorous. But if the denominator includes pending charges that are later dismissed—accusations that never result in conviction—the statistic overstates the share with actual criminal records. The choice to include charges in the denominator changes what the number means.

Complaints received versus complaints substantiated. An institution can report "complaints received" as evidence of transparency and responsiveness. But if most complaints are dismissed without investigation, or if "substantiated" is defined so narrowly that almost nothing qualifies, the number of complaints received tells you nothing about the number of actual problems. The choice of which denominator to emphasize—or whether to report a denominator at all—shapes public perception.

The most powerful move is often to report no denominator at all. Raw numbers, presented without context, allow the audience to supply their own interpretation—which will typically be the interpretation the institution prefers.


Metric Mimicry: The Absorption of Accountability Language

As external pressure for accountability increases, institutions face a choice. They can resist transparency, which is costly and generates conflict. Or they can adopt the language of transparency while controlling the substance.

This is metric mimicry: the adoption of harm-adjacent metrics in forms that sterilize them.

An institution under pressure to track use of force might establish a "use of force reporting system"—but define reportable force so narrowly that most coercive encounters are excluded. The system exists. Reports are filed. Statistics are published. But the statistics capture only a small and unrepresentative slice of actual force, and the institution can point to its "transparency" as evidence of accountability.

An institution under pressure to address errors might establish an "error review process"—but define reviewable errors so narrowly, and make the review process so cumbersome, that very few cases are ever examined. The process exists. Reviews occasionally occur. But the rate of formally acknowledged errors remains low regardless of the actual rate of mistakes.

An institution under pressure to respond to community concerns might establish a "community feedback mechanism"—but structure it so that feedback is collected, acknowledged, and filed without any requirement for response or action. The mechanism exists. Feedback is received. But nothing changes, because nothing is required to change.

Metric mimicry is particularly effective because it occupies the conceptual space that genuine accountability would occupy. Once an institution has a "use of force reporting system," critics who call for use of force tracking can be told that tracking already exists. The demand has been formally satisfied. The fact that the tracking is designed to minimize what it captures is a technical detail, invisible to anyone who does not examine the definitions closely.

This is absorption as a strategy: adopting the form of accountability to neutralize the substance.


The Upstream Battle

All of this means that the contest over institutional behavior is not just a contest over data. It is a contest over the categories that data can occupy.

When independent analysts attempt to assess an institution's performance, they face a prior problem: they must either accept the institution's categories or construct their own. Each choice has costs.

Accepting official categories means working within a framework designed by the institution being analyzed. If "criminal" is defined to include pending charges, the analyst who uses official data will reproduce that definition in their findings. If "use of force" excludes drawn weapons, the analyst will undercount force. The official categories shape what can be seen, even by critics.

Constructing independent categories means building a parallel measurement system—which requires access to raw data that institutions rarely provide, resources that independent analysts rarely have, and methodological choices that institutions will challenge as illegitimate. "That's not how we define it" is a ready response to any finding that uses non-official categories, and the burden falls on the critic to justify their alternative.

This asymmetry gives institutions a structural advantage. They control the default definitions. They control access to the data that would allow alternatives to be constructed. And they can always retreat to definitional disputes when substantive disputes become uncomfortable.

The battle over categories is upstream of the battle over facts. Winning the definitional contest often makes the factual contest unnecessary.


What Counter-Instrumentation Must Do

For counter-instrumentation to be effective—for independent measurement to genuinely contest official narratives—it must engage at the level of categories, not just data.

This means several things:

Document at the level of incidents, not official categories. If official definitions are unreliable, the raw incidents become the foundation. Time, place, what happened, who was involved, what force was used, what documentation was provided. These specifics can later be aggregated into whatever categories are analytically useful, rather than being pre-filtered through official definitions.

Make definitional choices explicit. Any analysis that counts "criminals" or "errors" or "uses of force" must specify exactly what is being counted. The definition should be stated clearly enough that readers can evaluate whether it is reasonable and that the institution cannot claim the analysis is measuring "something different" without identifying what that difference is.

Propose alternative denominators. If official statistics lack context, supply the context. Arrests per capita. Error rates as a share of total actions. Complaints per exposure to the institution's authority. The denominators force the numerators to mean something—and reveal when official statistics are designed to obscure rather than illuminate.

Track categorical drift. When definitions change over time, document the change. If "criminal" meant one thing in 2020 and something different in 2025, that shift is itself evidence. It suggests the institution is managing its metrics rather than its performance.

Contest metric mimicry directly. When an institution claims to have accountability mechanisms, examine what those mechanisms actually capture. If the "use of force reporting system" excludes most force, say so. If the "error review process" substantiates almost nothing, say so. The existence of a mechanism is not evidence of accountability; the mechanism's design is.

This is painstaking work. It lacks the drama of confrontation. It requires technical competence and sustained attention. But it is where the real contest often occurs. An institution that controls its categories can survive any amount of data collection. An institution whose categories are successfully contested loses control of its own narrative.


The Ontology of Power

At the deepest level, category control is a form of power that operates prior to politics.

Political contests assume a shared reality—a common set of facts about which people can disagree. But if institutions control the categories that constitute facts, they control what can be contested. Disputes that should be about substance become disputes about definitions. Questions that should be empirical become questions about methodology. The ground shifts beneath the argument.

This is why category control is so valuable and so fiercely protected. An institution that defines its own terms is an institution that can never be proven wrong in its own language. It can only be contested by those who refuse that language—and refusal puts the burden of proof on the challenger.

The vocabulary of accountability—"transparency," "oversight," "reporting," "metrics"—can be adopted by institutions without adopting the substance of accountability. The words remain; the meaning is hollowed out. And because the words are present, critics must do the additional work of showing that the words have been emptied—a harder task than simply pointing to an absence.

Counter-instrumentation, to be effective, must be a contest not just over what is measured but over what measurement means. It must propose alternative categories, defend them against institutional challenge, and insist that the definitions matter as much as the data.

This is the hidden battle. It is fought in footnotes and methodological appendices. It is invisible to most observers. But it determines what the visible battles can possibly achieve.


What Comes Next

We have now seen how systems produce harm through metric optimization, how they evade accountability through distributed responsibility, how behavior reveals true optimization targets, and how category control shapes what data can mean.

The next installment turns from diagnosis to response: what happens when people try to document institutional harm, why documentation so often fails to produce change, and what counter-instrumentation can and cannot realistically accomplish.


This essay is part of an ongoing series on metric governance and accountability.

Subscribe to The Grey Ledger Society

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe