Call-Center AI: Productivity, Surveillance, Surplus

Why a Case Study, and Why This One

The first five essays in this series gave four philosophical lenses for the AI economy and an elenchus on the optimistic productivity claim. The framework is right. The framework is also too clean. Real AI deployments do not respect the symmetry of a four-by-four table.

This essay is the corrective. It takes the strongest published study on workplace generative-AI productivity, names what it actually measured, and runs the four lenses through that one domain in enough depth to produce different recommendations, not one consensus verdict. The reader should leave the essay with a clearer sense of which lens binds in which situation and which questions the published evidence does not answer.

The anchor study is Brynjolfsson, Li, and Raymond's Generative AI at Work (2023), which examined the staggered rollout of a large-language-model-based assistant to 5,172 customer-support agents at a Fortune-500 software firm.¹ Headline finding: a 14 percent average increase in issues resolved per hour, with nearly all of the gain concentrated among the lowest-skill quartile of agents. The highest-skill agents showed no measurable productivity change. Customer sentiment improved on average. Attrition fell. The study is the cleanest piece of public evidence for the productivity-gains-from-AI claim in customer-facing white-collar work.

This is the right anchor because four things are true at once:

The productivity number is real and randomized.
It is the friendliest case for the optimistic AI-productivity claim.
It is the worst case for treating that claim as a complete answer.
It maps onto a domain where two of the lenses (Foucault and Marx) bind hard regardless of the productivity number's sign.

A case study should expose, not flatter, the framework. This one does.

The Setup, Tersely

Before the lenses, the facts.

What	Detail
Firm	A US software company; full identity not disclosed in the published paper
Workforce studied	5,172 customer-support agents
Treatment	Phased rollout of an LLM-based "AI assistant" surfacing recommended replies during customer chats
Identification	Staggered rollout supplies a difference-in-differences design with phased treatment
Headline outcome	$\approx +14\%$ issues resolved per hour, averaged across all agents
Heterogeneity	Gain concentrated in the lowest two skill quartiles; top-quartile agents essentially unchanged
Customer sentiment	Net improvement in chat-transcript sentiment scores
Attrition	Lower in the treated arm
Open question 1	Whether the productivity gain persists; published follow-up data are limited
Open question 2	Where the productivity surplus went: agent wages, retained jobs, customer pricing, firm margin, AI-vendor licensing
Open question 3	What logs the assistant produced, who could read them, and whether agents could contest them

That is the case. Now the four audits.

The Rawlsian Audit

The headline finding looks Rawlsian on its face. The difference principle prefers arrangements where inequalities are arranged to the greatest benefit of the least advantaged. A productivity gain that lands on the lowest-skill quartile points in that direction.

That reading is too quick. Rawls's actual question is structural, not local. Where does the productivity surplus go?

The published paper measures issues resolved per hour and customer ratings. It does not measure how the firm split the gains. The full distribution table has at least five columns:

Recipient	What "gain" looks like	Was it measured?
Agents (raises)	Higher wages tied to higher productivity	Not reported
Agents (retained jobs)	Headcount that would otherwise have been cut is preserved	Indirectly via lower attrition; mechanism unclear
Customers	Lower prices or improved service	Not reported
Firm (shareholders)	Operating margin improvement	Not reported
AI vendor	Per-seat licensing revenue	Not reported
Public revenue	Tax on increased earnings	Not reported

None of those zeros are the firm's fault for not reporting. They were not the study's research question. They are also the structural variables a Rawlsian institutional analysis cares about. "Novice agents do better at the task" and "novice agents end up better off in the basic structure" are different claims and the published evidence supports the first one and is silent on the second.

Selbst, Boyd, Friedler, Venkatasubramanian, and Vertesi argue this point sharply for fairness specifically: technical interventions that abstract away the surrounding sociotechnical system can become ineffective or misguided as fairness instruments because the variables that determine real-world outcomes lie at the institutional layer the abstraction excluded.² The same trap applies to productivity. A randomized field experiment measures the decision-rule-and-tool layer cleanly and is silent on the institutional layer above it.

The Rawlsian recommendation, therefore, is not "do not deploy AI assistants in call centers." It is: the productivity number is informative about the model and uninformative about the institution. The institutional question is a separate empirical project, not a corollary of the model evaluation.

The Foucauldian Audit

Call-center work was already heavily measured before generative-AI assistants arrived. Average handle time. Schedule adherence. Post-call survey scores. Call recording. The supervisor's view of the floor was already a dashboard.

The AI assistant adds two new categories of measurement. First, it sees every keystroke, every recommendation surfaced, every recommendation accepted, every recommendation overridden. The assistant's logs are themselves a worker-monitoring record. Second, "agent uses suggestion X percent of the time" becomes a tractable metric. Once tractable, it tends to become managerial.

Kellogg, Valentine, and Christin synthesize the management-research literature on what they call "algorithmic control": the shift from human supervisors directing work to algorithmic systems directing, monitoring, evaluating, and disciplining workers.³ The call-center assistant is a textbook instance: it directs (recommended replies), monitors (every interaction), evaluates (suggestion-acceptance rate), and ultimately can discipline (the rate becomes a performance metric).

Ifeoma Ajunwa's The Quantified Worker is the closest book-length treatment of what happens when this kind of measurement scales: every aspect of the worker becomes a continuous data stream, the legal and policy framework lags the technology by years, and the asymmetry between what the firm sees and what the worker can see widens.⁴

The Foucauldian audit does not say "surveillance bad." Some measurement protects workers. Recording is necessary for safety and for unpaid-overtime claims. The audit asks five harder questions.

Question	What the published study says
Who can read the assistant's logs?	Not specified; presumably management.
Can the agent inspect the record before a performance review?	Not specified.
Can the agent contest a measurement they believe is wrong?	Not specified.
Are the logs admissible against the agent in a termination dispute?	Not specified.
What measurement boundaries does the firm refuse to cross even with technical capability?	Not specified.

Five "not specifieds." That is exactly Foucault's point. The productivity result is published; the governance regime around the productivity result is not. The dashboard precedes its rules.

The recommendation that follows is concrete: the assistant's worker-facing artifact is a different system from the assistant's manager-facing artifact, and they should be specified separately. If there is no contestability, do not call it accountability. Call it control. That phrasing is from the framework essay; it lands harder here, on a real product, than it did as an abstract heuristic.

The Marxian Audit

The 14 percent productivity gain was not produced by the model alone. It was produced by:

the contact-center agents using the assistant (the workers in the study)
the engineers who built the assistant (employees of the AI vendor or of the firm)
the data labellers who annotated training examples (often outsourced to lower-wage labour markets)
the platform vendor selling the assistant as a per-seat license
the cloud provider charging per inference call
the prior generation of customer-support workers whose conversation logs trained the assistant in the first place

Mary Gray and Siddharth Suri's Ghost Work is the canonical reference for the second-to-last and last items: the human labour that is structurally invisible inside an "AI" product, scoped out of the labour-cost line, and counted as part of the model's apparent capability.⁵ The productivity gain reads cleanly in the headline because the data-labelling layer is already absorbed into the vendor's cost basis and the historical-conversation layer is absorbed into the training data.

A subtler form of absorbed labour shows up at the agents' own desks. The assistant recommends; the agent accepts, edits, or overrides. When the recommendation is wrong and the agent corrects it, the conversation succeeds and the productivity number moves up. The repair work is real labour and it is what made the metric look clean. In the Marxian frame, that repair labour was absorbed into "AI productivity." The agent's intelligence is part of the product specification of the AI assistant whether or not the firm names it as such.

The political question (whose surplus is this?) has a determinate answer only if the institutional context is specified. Three illustrative scenarios on the same productivity number:

Scenario	Where the 14% lands	Distribution outcome
Tight labour market, strong worker bargaining	Wages rise; firm absorbs vendor licensing as ordinary cost	Agents capture a meaningful share
Loose labour market, vendor lock-in	Headcount cut by ~14%; remaining wages flat; vendor margin grows	Firm and vendor split the surplus; agents lose
Public-sector contracted call centers, multi-year procurement	Firm bills public buyer flat; productivity gain becomes vendor margin	Public buyer overpays for unchanged service; agents and citizens lose

The same model and the same productivity number produce three different surplus distributions because the institution sitting around the model is different. That is not a Marxist slogan; it is the clean reading of the published evidence.

The Beauvoir Audit

Two agents at the same call-center see the same tool from different situations.

Agent A is a part-time worker, a single parent, in a country whose work visa is tied to the current employer, with no four-year degree, two years tenure, and on a non-portable schedule because of childcare. Agent B is a college student working evenings, lives at home, has a part-time wage as supplemental income, expects to leave the job within a year. Both receive the assistant. Both improve productivity by the average amount.

The productivity number is the same. The freedom that productivity number creates is not.

Agent A's gain looks like a margin of safety: the metric improves, attrition risk falls, the visa renewal is more likely, the schedule stays. Agent A's loss, if the firm uses the productivity gain to consolidate shifts and reduce headcount, is a category beyond compensation. It includes the visa, the school district the kids are in, the proximity to extended family. None of those are reflected in any productivity dashboard.

Agent B's gain looks like upward optionality: the metric improves, the dossier is stronger, the leverage to ask for a transfer or to leave is greater. Agent B's loss is bounded; another job is available, the rent is low, the timeline is short.

Beauvoir's argument is that the average is not wrong; the average is incomplete. Whose situation lets them turn this gain into mobility, and whose situation makes it a tightening of shifts? That is a different question from "did productivity rise?" and it is the question that decides whether the deployment counts as humane. The published study does not have demographic data adequate to answer it; this is a research opportunity, not a published result.

A second-level Beauvoir question: what happens to the agents who would have been hired but no longer are because the AI assistant compressed the headcount needed to handle the same call volume? They are the most invisible category in any AI-productivity study because they were never on the firm's payroll. The productivity number is silent on them by construction.

What the Four Audits Tell You, Together

Two of the lenses (Rawls, Beauvoir) push toward "wait, look at distribution and situation." Two of them (Foucault, Marx) push toward "the productivity number under-describes what changed." That is the right way to use the lenses: as four different audits run on the same firm-level claim, not as one consensus diagnosis.

A summary of where each lens lands:

Lens	What the headline number does not tell you
Rawls	Where the surplus went: agent wages, firm margin, customer prices, vendor fees, public revenue
Foucault	Whether agents can inspect, contest, or refuse the assistant's logs as performance evidence
Marx	Whether the productivity gain absorbs worker repair labour and prior-worker conversation logs as if they were the model's output
Beauvoir	Whose situation lets them turn the productivity gain into mobility, and whose situation makes it a tightening of shifts

The right next deployment question is therefore not "should we ship the assistant." The right deployment question is which of these four columns the firm has specified, and which it is silent on. The silences are policy. Not having a specified contestability path is itself a Foucauldian decision; not specifying where the surplus goes is itself a Marxian decision; not measuring situated impact is itself a Beauvoirian decision.

Three Plausible Futures, Same Productivity Result

The published study is point-in-time. Two years later there are three plausible futures, all consistent with the same underlying technology.

Future 1: integrated augmentation. The firm formalizes the assistant's logs as worker-visible. The contestability path is a real grievance procedure with a human reviewer. The productivity gain is partly absorbed into wage growth and partly into reducing the worst-quintile of customer-experience problems. Headcount holds. The vendor licensing is one of several costs the firm renegotiates. The agents who were lowest-skill move up the skill ladder partly because the assistant made the ladder cheaper to climb.

Future 2: managerial intensification. The assistant's logs become the basis of weekly performance reviews. Agents who override the recommendation too often or accept it too often are flagged. The productivity gain accrues mostly to the firm and the vendor. Headcount falls 12 percent. Attrition continues to fall, but partly because the labour-market alternative for the displaced agents is worse. The bottom quartile is helped relative to not using the tool and is hurt relative to a counterfactual deployment where the same gain landed in wages.

Future 3: platform-mediated invisibility. The firm contracts the call-center function to a vendor whose offering bundles the assistant. The vendor charges per resolved ticket. Agents become contractors of the vendor rather than employees of the firm. Almost all of the productivity gain becomes vendor margin. Worker-facing logs are now governed by the vendor's terms of service. The agents are technically more productive, technically less directly supervised, and technically with no way to contest a single record.

All three futures are consistent with the published 14 percent. The framework essays in this series explain why. The case study makes the choice between them concrete enough to argue about.

Where the Case Breaks

Limits on the use of this case as a stand-in for "AI in the workplace":

The published evidence is one industry, one firm, one type of work, in one labour market. The productivity gain in customer support is unusually clean to measure; in software engineering, the gain is contested and depends heavily on task type; in nursing, the assistant's product-shape would be entirely different.
The lowest-skill-gains-most pattern may not reproduce. Subsequent deployments may show productivity gains accruing to higher-skill workers, no aggregate gain, or productivity gains paired with quality declines that compensating measurement did not catch.
The customer-sentiment improvement is from chat-transcript sentiment scores, which are themselves model outputs and inherit the same alignment limitations as the assistant.
The firm's institutional choices in this case (logs, contestability, surplus split) are private; the analysis above describes plausible options, not specific firm policies.

The case study earns its conclusions only when the limits are stated. The four lenses still produce four audits.

The Generalization

A reader can re-run this exercise on three other domains. The same four columns apply; the binding lens changes.

Domain	Which lens binds hardest
Résumé screening at scale	Rawls: the system rations opportunity, and the binding question is whether the institutional pipeline that uses the model gives the least-advantaged candidate a route in
Warehouse worker monitoring	Foucault: measurement is the product, the contestability path is the design problem, and worker bargaining about which behaviours count is more important than the model's accuracy
AI coding assistants	Marx: the question of who captures the productivity gain over five years (junior workers? senior workers? employers? cloud vendors? open-source maintainers?) is genuinely open and depends on bargaining outside the model
Care work / nursing assistants	Beauvoir: situated freedom is the binding axis, with the question of whether the assistant gives the worker more discretion or restructures the workflow into more compliance

That is the practical version of "make the lenses argue." Different domains foreground different lenses. Which domain you choose decides which lens is doing the most work.

Five Hypotheses That the Headline Number Cannot Answer

The case study is most useful as a research design rather than as a settled verdict. Each lens turned a silence into a hypothesis the published evidence cannot adjudicate. Naming the hypotheses explicitly is what makes the framework portable.

Hypothesis	What it predicts	What would falsify it	Lens
H1	AI assistance compresses the novice learning curve: time to a given productivity level shortens.	Treated novices show no faster ramp than untreated novices on tasks the assistant did not help with.	Rawls / Marx
H2	Compression weakens the wage premium for experienced agents: the productivity advantage of experience converges toward zero.	Experienced wages keep their premium; senior agents retain higher per-hour pay despite no productivity gap.	Marx
H3	Suggestion-acceptance logs become performance-management data unless explicitly prohibited by contract or regulation.	Two years post-deployment, no employer in the relevant sector cites suggestion-acceptance rates in promotion or termination decisions.	Foucault
H4	Firms and AI vendors capture most of the productivity surplus unless compensation, bargaining, or regulation intervene.	Wage data over a 3-5 year window shows the surplus split with workers in proportion to the productivity gain.	Marx / Rawls
H5	"Lower attrition" is ambiguous in worker welfare without satisfaction, outside-option, wage, and scheduling data.	Survey and labour-market data show treated agents report higher satisfaction, comparable outside options, and more predictable schedules.	Beauvoir

The Brynjolfsson, Li, and Raymond paper does not test any of these directly. It was not its job. The hypotheses are testable, falsifiable, and require longitudinal wage data, contract analysis, regulatory filings, worker surveys, and audit access to suggestion-log usage. They are the kind of follow-up work the productivity literature still owes.

A reader who comes away with the productivity number alone has read the abstract. A reader who comes away with H1-H5 has the research agenda.

Why This Matters for the Series

This essay is the case study the series needed. The lenses arrive at different recommendations. The published evidence is real and is also incomplete in specific, traceable ways. The framework does not collapse under the weight of a real example; it gets sharper.

The synthesis the series has been building toward fits in one line:

A technical system can be corrected by feedback. A worker needs recourse. Do not confuse correction in the model with correction in the institution.

That distinction is the through-line. Locke / Hume / Kant gave the epistemic correction story (essay 1). Plato gave the source-and-experience correction story (essay 2). Rawls / Foucault / Marx / Beauvoir give the institutional correction story (essays 4 and 5). The case study (this essay) shows the gap between the two senses of "correction" in one concrete domain. The next essay in the series, The Right to Correct the Record (forthcoming), is the explicit synthesis: a model can be corrected by feedback; a worker, applicant, or citizen needs recourse, contestability, and power.

This case study is the bridge.

Sources

The anchor study:

Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. "Generative AI at Work." NBER Working Paper 31161, 2023; later published in Quarterly Journal of Economics, 2025. https://academic.oup.com/qje/article/140/2/889/7990658

Workplace and algorithmic-control sources (new in this essay relative to the rest of the series):

Selbst, Andrew D., danah boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. "Fairness and Abstraction in Sociotechnical Systems." FAT* 2019. The canonical statement of why fairness interventions abstracted from the surrounding sociotechnical system can fail as fairness instruments.
Kellogg, Katherine C., Melissa A. Valentine, and Angèle Christin. "Algorithms at Work: The New Contested Terrain of Control." Academy of Management Annals, 2020. Synthesizes the algorithmic-control literature: directing, evaluating, disciplining, and tracking workers.
Ajunwa, Ifeoma. The Quantified Worker: Law and Technology in the Modern Workplace. Cambridge University Press, 2023. Book-length treatment of workplace data collection and the legal lag.
Gray, Mary L., and Siddharth Suri. Ghost Work. Houghton Mifflin Harcourt, 2019. Hidden human labour inside apparently autonomous AI products.

Frame essays in this series:

Internal Links

PhilosophyPath:
- empiricism-induction-and-llm-limits. Essay 1.
- platos-cave-and-the-era-of-experience. Essay 2.
- platos-cave-and-the-era-of-experience. Essay 3.
- four-lenses-on-the-ai-economy. Essay 4.
- ai-productivity-and-the-distribution-question. Essay 5.
TheoremPath:
- reinforcement-learning-from-human-feedback-deep-dive
- calibration-and-uncertainty
PedagogyPath (forthcoming): workplace-learning, AI-literacy.

Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. "Generative AI at Work." Quarterly Journal of Economics 140, no. 2 (2025): 889–942. Originally NBER Working Paper 31161, 2023. https://academic.oup.com/qje/article/140/2/889/7990658 ↩
Selbst, Andrew D., danah boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. "Fairness and Abstraction in Sociotechnical Systems." FAT* 2019, January 2019. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265913 ↩
Kellogg, Katherine C., Melissa A. Valentine, and Angèle Christin. "Algorithms at Work: The New Contested Terrain of Control." Academy of Management Annals 14, no. 1 (2020): 366–410. https://journals.aom.org/doi/10.5465/annals.2018.0174 ↩
Ajunwa, Ifeoma. The Quantified Worker: Law and Technology in the Modern Workplace. Cambridge University Press, 2023. ↩
Gray, Mary L., and Siddharth Suri. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt, 2019. ↩