Evidence (6869 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
Automated market and model optimization create economic efficiencies but reduce transparency for buyers, sellers, and regulators (Efficiency vs opacity trade-off).
Auction and market analysis literature and theoretical arguments; examples from RTB market structure and opaque bid optimization policies discussed; no new controlled experiment provided.
More targeted messaging can improve relevance and conversion but increases risks of nudging and informational harms (Relevance vs manipulation trade-off).
Conceptual trade-off illustrated via causal inference and targeting literature; supported by empirical studies in cited literature (not reproduced here) showing higher conversion with targeting and separate literature on persuasion risks.
The economic performance, social impacts, and durability of AI-driven advertising are determined as much by institutional arrangements (platform design, governance, regulation, market structure) as by model accuracy.
Theoretical and institutional analysis, case-study style arguments and literature references; paper does not present new randomized or large-sample empirical results quantifying the relative contribution.
Federated systems can lower barriers for advertisers and publishers who previously lacked aggregated data, but they also create coordination and infrastructure costs that may favor organizations able to invest in shared infrastructures or consortium governance.
Economic analysis and policy discussion outlining effects on entry, competition, and coordination costs. Evidence is conceptual; no empirical market-entry case studies provided.
Automation reshapes job tasks — reducing demand for some routine manual roles while increasing demand for technical, supervisory, logistics-planning, and service roles — implying substantial reskilling needs rather than outright net job collapse.
Labor-market analysis using occupational employment and job-posting data (task content), supplemented by qualitative interviews and surveys tracing task changes and reskilling needs; scenario sensitivity checks on net employment under alternative adoption paths.
Labor market institutions (unions, collective bargaining), education and training systems, social safety nets, and regulations substantially mediate distributional and aggregate outcomes of AI adoption.
Comparative institutional analysis and equilibrium models linking institutional settings to wage-setting and reallocation dynamics, supported by empirical cross-jurisdiction comparisons where available.
Developing economies face different trade-offs from AI adoption than advanced economies, due to different occupational structures and complementarities.
Comparative analyses and sectoral studies drawing on cross-country microdata and institutional comparisons; theoretical models highlighting differences in task composition and absorptive capacity.
Occupational reallocation occurs: declines in some routine occupations alongside growth in AI-complementary roles (e.g., AI maintenance, oversight, and creative tasks).
Administrative and household employment data analyzed with occupational breakdowns, supplemented by task-mapping methods and panel/event-study approaches documenting shifting occupational shares over time.
Lower-skill roles experience mixed outcomes: some see adverse effects from automation while others benefit where AI is complementary to their tasks.
Microdata analyses and case studies showing heterogeneous effects by task complementarity; task-based exposure measures that differentiate which low-skill tasks are automatable versus augmentable.
AI contributes to wage polarization: earnings grow at the top of the distribution and stagnate or fall for middle occupations.
Wage distribution decompositions and panel regression studies that examine percentile-level wage changes, combined with task-based exposure measures linking AI adoption to differential impacts across the wage distribution.
The employment impact of automation depends crucially on labour-market structure (formal vs informal), availability of alternative employment, and social protections.
Theoretical framing supported by secondary literature comparing institutional contexts and their mediating effects on automation outcomes; no primary causal estimates in this paper.
Standard policy responses focused on retraining and active labor-market programs are necessary but insufficient to fully offset structural job losses where K_T substitutes broadly for tasks.
Model simulations and policy experiments in the calibrated dynamic model comparing scenarios with aggressive retraining versus structural fiscal/interventionist reforms; discussion of empirical limits from case studies and historical reskilling outcomes.
Routine automation of routine drafting tasks by GLAI may reduce demand for junior drafting labor while increasing demand for skilled reviewers, auditors, and legal technologists.
Labor-market reasoning based on task automation literature and illustrative vignettes; no labor-force survey or longitudinal employment data provided.
Improved efficiency of data centres significantly reduces capacity needs and system peaks.
Counterfactual/efficiency-improvement scenarios within the optimisation model showing lower capacity requirements and peak loads.
The Order should be read as policy that privileges state and cloud-provider access over broader democratic accountability and social considerations (labor, education, culture, the commons).
Synthesis of textual absence of social-domain terms in the EO, the EO's access/control provisions, and the paper's political-economic critique.
Structurally, the Order is not deregulation but re-regulation centered on state access and cloud rent—a policy instantiation of technofeudalism with a security face.
Political-economic analysis connecting EO provisions (access, testing, state capabilities) with literature on cloud capital and technofeudalism (e.g., Varoufakis) and the paper's archival operators.
The Order mandates testing for 'advanced cyber capabilities' but omits or fails to adopt benchmark frameworks (e.g., Reasoning Under Load (RUL), PER, DSL, IPF, Diversity Contraction, Constitutive Provenance) that the Crimson Hexagonal Archive has deposited.
Comparative policy analysis between the EO's testing mandate language and the list of evaluation frameworks deposited by the Crimson Hexagonal Archive; textual absence of those benchmarks in the EO.
The Order's call for a 'voluntary' corporate framework operates as a 'Mediation Ratchet' that strengthens corporate governance control rather than providing substantive public protections.
Critical/theoretical reading of the Order's voluntary mechanisms combined with the paper's Mediation Ratchet concept.
The Order formalizes an 'AI caste system' that stratifies access into public tiers (e.g., Opus 4.8) and frontier/privileged tiers (e.g., Mythos Preview / Glasswing).
Policy text read against observed product/access tiers in industry; theoretical framing of access stratification.
The paper presents the 'Anthropic arc' (Feb 27 supply-chain-risk designation → June 1 IPO filing → June 2 EO endorsement) as a worked example of 'Institutional-Prior Foreclosure' via state co-optation of a firm.
Chronological mapping of public events (designation, IPO filing, EO) and interpretive analysis linking them as an example of state-firm coordination/co-optation.
Engagement is systematically tied to the intensive, performative labor of children (the platform rewards commodification of the child's identity and labor over traditional advertising), which challenges policy frameworks focused solely on financial trusts.
Synthesis/interpretation based on observed correlations and within-channel view premiums for performative and emotional-bait content versus lack of premium for explicit product placement; policy implication drawn by authors.
Governance ambiguity is responsible for 61% of hybrid workflow failures (and the framework aims to remediate this).
Paper reports 'governance ambiguity responsible for 61% of hybrid workflow failures' as a documented gap; no methodological details or sample size provided in the abstract.
Attribution failures occur in 68% of organizations (and the framework addresses these attribution failures).
Paper states 'attribution failures in 68% of organizations' as a documented gap the constructs address; abstract does not report study method or sample size behind the 68% figure.
Public discourse often portrays AI as a threat to employment.
Statement in the paper summarizing public/media discourse; no specific survey or corpus size reported in the excerpt.
The negativity asymmetry has both token-level and semantic components, though attributing the balance is exploratory at our sample sizes.
Exploratory decomposition analyses reported as follow-ups suggesting both low-level (token) and higher-level (semantic) contributions to asymmetry; authors note limited sample size for attribution.
Consolidation creates platform monopolies extracting value from professional labour while eliminating the expertise that creates it.
Synthesis of market concentration data and theoretical frameworks (platform capitalism) presented in the paper.
AI implementation serves vendor interests in labour cost reduction rather than improving information access.
Analytic argument supported by synthesis of vendor consolidation data, documented implementations, and theoretical analysis of vendor incentives.
Librarians bear operational accountability for systems they neither control nor can modify.
Critical qualitative synthesis including a revelatory case study of verification infrastructure failure and theoretical framing (platform capitalism, sociology of professions, critical information science).
The tech industry's discourse of exceptionalism obscures its dependence on BPOs to externalise labour costs and accountability.
Argument in paper supported by the authors' GDPR-based document findings that reveal BPO involvement and contract practices; specific linkage details not provided in the excerpt.
Analysis of four additional platforms suggests the attack may generalise across the knowledge-graph ecosystem.
Authors report analysis across four additional platforms and observe indications that the attack generalises; specific platform names and quantitative outcomes not provided in the summary.
An attacker sophistication gradient reveals discrete break points, a minimum skill at which trust flips from 0% to 100%, reframing the attack as a question not of whether but of how much.
Experiments varying attacker sophistication levels reported by authors; observed threshold behavior (discrete break points) in model trust outcomes.
The fragile metric fails manipulation invariance and cannot support the same useful predeclared class-coverage certificate; under the envelope-level certificate, it produces large violations at every tested instance, with a large mean gaming gap across random catalogs at a fixed audit budget.
Empirical/experimental results reported in the paper based on the three verification methods (finite-state enumeration, SMT checks, PRISM-games MDP); claims about 'large violations' and 'large mean gaming gap' are based on tested instances and random catalog experiments described in the paper.
Gradient-based attribution can be inflated by adversarial inputs, and detecting such inflation requires external baseline data.
Adversarial-testing experiments reported in paper that demonstrate inflation of attribution by adversarial inputs and indicate detection depends on availability of external baseline data.
Unless targeted interventions occur — including inclusive education, vocational training, and labor reforms — AI may exacerbate poverty and joblessness.
Inference and policy recommendation based on the systematic review's identification of risks; presented as a conditional/forecast rather than a measured causal estimate in the summary.
Analysis of implementation ambiguities reveals these challenges in practice.
Paper reports analysis of implementation ambiguities (qualitative/examples); no quantitative sample size or systematic empirical evaluation described in the summary.
Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level.
Paper's argument combining theoretical reasoning about delegation-induced channels and experimental evidence showing prompt-level confidentiality instructions do not prevent inference (as implied by the numeric-budget comparison). Specific experimental details not provided in excerpt.
Existing approaches to AI explainability, grounding and hallucination detection do not address input fidelity because they focus on output quality rather than input fidelity.
Argument in the paper contrasting prior work on explainability and hallucination detection with the problem of input fidelity; based on literature review and conceptual analysis.
Human advisors suppressed warnings under pressure at two to four times the AI rate.
Comparison between human benchmark (1,201 participants) and LLM outputs (3,360 conversations) in the preregistered experiment; reported suppression rates for humans were 2–4x those for AIs.
Because experienced workers are aging out of the workforce, simultaneous curtailment of formative occupational layers by platforms may create a shortage of workers able to manage complex systems.
Argument combining demographic observation (aging workforce) with the paper's theoretical claim about erosion of entry-level apprenticeship layers; no empirical test or quantified projection provided.
Microsoft's realized routing bias has been voluntarily constrained by a March 2026 multi-model pivot.
Paper's descriptive assessment based on observable product/strategy events (March 2026 pivot) and how that affects routing bias in the comparative mapping.
Models are beginning to be deployed to generate revenue for the companies that created them through advertisements, creating potential conflicts of interest between company incentives and users' best interests.
Conceptual/observational claim advanced in the paper motivated by industry deployment trends and the authors' framework; not a quantified experimental result in the abstract.
Unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience to LLM-driven automation (i.e., very low OAI), quantifying a 'Cognitive Risk Asymmetry.'
Empirical classification from computed OAIs showing low exposure for unstructured physical trades and high-stakes caretaking roles; the excerpt does not provide specific OAI values or counts.
Variance-based Human-in-the-Loop (HITL) validation with an expert panel demonstrates a profound cognitive gap: isolated algorithmic probabilities fail to encapsulate the "institutional premium" imposed by experts bounded by professional liability.
Empirical validation procedure reported: variance-based HITL validation involving an expert panel that compared algorithmic scores and expert adjustments, concluding a systematic difference attributed to institutional liability considerations. The excerpt does not give panel size or quantitative variance statistics.
Industry self-regulation has demonstrably failed, motivating the need for IASCA.
Proposal asserts a 'demonstrated failure of industry self-regulation' as rationale for IASCA; no specific empirical studies, incidents, or metrics are cited in the provided text.
That measured machine-equivalent work appeared on no financial statement, workforce report, or government statistical return.
Claim about absence of reporting for the deployment's measured work (asserted in the paper for the deployment case).
The emergence and diffusion of these technologies create an era of labor displacement.
Framed in the paper as a premise motivating policy proposals; presented as a conceptual claim rather than supported by original empirical estimates in the text provided.
The economic inevitability of technological transformation (in agentic finance) and the critical urgency of proactive intervention.
Author claim synthesizing the paper's argument and modeling results (normative conclusion based on earlier analysis and assertions, not a validated empirical finding).
Beyond an environment-specific optimum, scaling further degrades institutional fitness because trust erosion and cost penalties outweigh marginal capability gains.
Analytical argument from the Institutional Scaling Law together with illustrative examples and discussion of mechanisms (trust erosion, cost penalties) in the paper.
Bias effects vary by vulnerability type, with injection flaws being more susceptible to framing bias than memory corruption bugs.
Subgroup analysis in Study 1 comparing framing sensitivity across vulnerability classes (injection vs memory corruption) within the experiment dataset.
Model convergence in DRL can lead to crowded trades, which has implications for market stability and motivates a robust regulatory framework balancing innovation with market stability.
Analytical argument in the paper linking convergence/crowding to systemic effects; the excerpt does not include empirical market-impact studies, simulations, or measured incidence rates of crowding.