Evidence (3470 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
A foreign state actor threat model for enterprise identity governance establishing that Silk Typhoon, Salt Typhoon, Volt Typhoon, and North Korean AI-enhanced identity fraud operations have already operationalized AI identity vulnerabilities as active attack vectors.
Paper claims to provide a threat model and asserts these named actors have operationalized AI identity vulnerabilities; stated grounding implied to be threat intelligence and incident analysis, though not detailed in the excerpt.
Nation-state actors including Silk Typhoon and Salt Typhoon have operationalized ungoverned machine credentials as primary espionage vectors against critical infrastructure.
Asserted in paper and described as grounded in threat intelligence; no specific threats, incidents, or data described in the excerpt.
A single ungoverned automated agent produced $5.4-10 billion in losses in the 2024 CrowdStrike outage.
Statement in paper attributing a $5.4-10B loss to an ungoverned automated agent during the 2024 CrowdStrike outage; no citation or method shown in excerpt.
No integrated framework exists to govern machine identities (AI agents, service accounts, API tokens, automated workflows).
Asserted in paper as a gap in existing governance frameworks; no empirical test or survey reported in the excerpt.
Automated agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1.
Statement in paper (asserted prevalence); no sample size or data source provided in the excerpt.
Foundation-model usage can increase compute-related emissions.
Conceptual/environmental concern highlighted in the paper about the carbon footprint of heavy model use and persistent storage; no quantified emissions analysis or lifecycle assessment presented.
These systems can cause skill atrophy.
Theoretical risk articulated in the paper that reliance on AI assistance may degrade human skills over time; no longitudinal skill-measurement or experimental evidence provided.
The same foundation-model systems can also intensify surveillance.
Cautionary claim in the paper noting the surveillance risk of durable, queryable traces and integrated tooling; presented as a conceptual risk rather than empirically measured increase in surveillance.
Baseline (non-structured) interactions had 16 of 50 accepted on first pass.
Reported counts in the paper for the baseline group (16 accepted of 50 baseline interactions).
In an observational study of documented interactions across four AI tools (Claude, ChatGPT, Cowork, Codex), incomplete context was associated with 72% of iteration cycles.
Observational study reported in the paper covering interactions across four AI tools; the paper reports the 72% figure.
This combination (rapid but uneven capability advance and lagging knowledge about harms/safeguards) creates a difficult policy condition: governments must decide under uncertainty across multiple plausible technological trajectories through 2030.
Reasoned argument in the article synthesizing foresight scenarios and the literature on uncertainty in AI progress (references to documents like OECD foresight and the International AI Safety Report 2026).
Knowledge about harms, safeguards, and effective interventions remains partial and lagged relative to capability advances.
Analytic claim in the article, supported by cited reports and literature that document gaps in understanding of harms and safeguards.
The expansion of AI in digital health has simultaneously introduced complex governance, privacy, and financial sustainability challenges.
Argument and synthesis across regulatory policy, ethics, and healthcare economics literatures presented in the review (literature review / conceptual synthesis).
Result 2: When managers are short-termist or worker skill has external value, the decision-maker's optimal policy can produce the augmentation trap, leaving the worker worse off than if AI had never been adopted.
Analytical result from the dynamic model comparing planner/objective variations (short-termist manager or externalities) and showing an outcome labeled the 'augmentation trap'.
Result 1: Even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption.
Analytical result from the dynamic model showing optimal adoption choice can lead to a steady-state where worker productivity is lower than pre-adoption (model-based comparative statics).
Experimental evidence shows that sustained use of AI tools can erode the expertise on which productivity gains depend (deskilling).
Statement in paper referencing experimental studies (no specific study, method, or sample size reported in the excerpt).
Aggressive compression increased total session cost by 67% despite reducing input tokens by 17%, because it shifted interpretive burden to the model's reasoning phase.
Result reported from the controlled experiment comparing log-format conditions; four conditions described but specific number of sessions/replications not provided in the abstract.
The impossibility is structural: transparency, audits, and oversight cannot resolve it without reducing autonomy.
Logical consequence derived from the Accountability Incompleteness Theorem and the formal model; stated directly in the paper.
Accountability Incompleteness Theorem: for any collective whose compound autonomy exceeds the Accountability Horizon and whose interaction graph contains a human-AI feedback cycle, no framework can satisfy all four accountability properties simultaneously.
Central theoretical result stated in the paper; supported by a formal impossibility proof based on the model and axioms.
Agentic AI systems violate the above shared accountability assumption not as an engineering limitation but as a mathematical necessity once autonomy exceeds a computable threshold.
Formal theoretical development in the paper culminating in the Accountability Incompleteness Theorem (mathematical proof based on the introduced formal model and axioms).
OpenAI o3 achieves only 17% of optimal collective performance.
Experimental measurement of collective performance for OpenAI o3 in the paper's multi-agent setup (value reported in abstract; no sample size provided there).
We term this the Logic Monopoly -- the agent society's unchecked monopoly over the entire logic chain from planning through execution to evaluation.
Terminology/definition introduced by the authors to describe the conceptual governance problem; definitional claim rather than empirical finding.
When agents from different human principals collaborate at scale, the collective becomes opaque: no single human can observe, audit, or govern the emergent behavior.
Conceptual/analytical claim presented as a security/governance risk in the paper; no empirical study or quantified measurement given in the excerpt.
Participants incentivized for originality incorporate fewer AI suggestions verbatim.
Usage and output-analysis from the pre-registered RCT comparing verbatim incorporation rates of AI suggestions across incentive conditions (no numeric rates provided in excerpt).
Early evidence suggests generative AI increases productivity but does so at the cost of collective diversity, potentially narrowing the set of ideas and perspectives produced.
Statement refers to prior literature/early studies (no specific study, sample size, or method reported in the excerpt).
The study observed errors and limitations in both phases (test generation and refactoring), and manual intervention was necessary at times.
Case study observations reported in the paper describing observed model errors/limitations and instances requiring manual developer intervention.
High-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act's essential requirements.
Authors' legal and normative conclusion based on their regulatory mapping and analysis (argumentative/legal reasoning rather than reported empirical testing).
The paper identifies agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift.
Author-stated findings from the regulatory mapping and analysis; specific challenge areas listed without reported quantitative measurement.
The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive.
Legal/regulatory mapping asserted by the authors listing specific EU regulations and directives that impose obligations on providers.
Multiple distinct contexts tend to collapse into one another or 'rot', degrading over time and reducing the utility of efforts to account for context.
Theoretical and empirical claim supported by interviewee reports and the authors' analytic synthesis; presented as observed pattern across cases (qualitative; sample size not specified).
Generative AI tools fail to account for users' context in workplace settings.
Findings from expert interviews reporting concrete examples where tools did not incorporate or respect relevant contextual information; qualitative analysis (sample size not provided in the summary).
Current approaches to account for the contexts in which generative AI technologies are used fall short of users' expectations and needs.
Qualitative empirical study based on expert interviews and analysis of user/developer perspectives (method described as expert interviews; exact sample size not stated in provided summary).
The literature remains fragmented, with limited integrative frameworks to explain how AI-human dynamics and decision-making typologies shape outcomes.
Conclusion drawn from the systematic review and bibliometric analysis of the 627-article corpus as reported in the abstract.
The remaining 26 barriers are carried over from prior digital transformation waves — 22 in amplified form and 4 unchanged.
Comparative coding/classification within the review corpus indicating whether each barrier is novel or carried over, and whether it is amplified versus unchanged.
Three barriers were identified as agentic-specific: error propagation in multi-agent systems, role ambiguity, and accountability diffusion.
Classification of the 29 coded barriers by 'agentic specificity' within the literature review; these three barriers were labeled agentic-specific by the authors.
Occupations whose AI-exposed steps are more dispersed across the production workflow (higher fragmentation) exhibit a substantially lower share of their steps actually executed by AI, conditional on AI exposure share.
Empirical regression analysis controlling for share of AI-exposed steps; uses dataset linking O*NET tasks, human AI exposure assessments, Anthropic Economic Index execution outcomes, and GPT-generated workflow orderings (details in Sections 5.1 and 7).
Treated firms' demand for external capital investment falls by just over $220,000 relative to the control group.
RCT with 515 firms; reported dollar-change in external investment demand between treated and control firms.
Despite faster growth, treated firms do not scale inputs proportionally: their demand for external capital investment falls by 39.5% relative to the control group.
RCT with 515 firms; firms reported external capital demand/investment requests; comparison of investment demand between treatment and control groups.
There are macroeconomic risks associated with AI-led unemployment.
Paper's macroeconomic analysis drawing on labor economics and technology adoption research; no quantitative estimates or sample sizes provided in the summary.
Managerial incentives drive premature workforce contraction during AI adoption.
Analytical claim grounded in labor economics and organizational behavior review; the summary indicates examination of managerial incentives but does not report primary empirical tests or sample sizes.
Premature workforce contraction in response to AI adoption foreshadows deeper structural challenges as AI systems mature.
Forward-looking claim based on synthesis of literature and theoretical projection; no empirical quantification or sample provided in the summary.
This pattern of premature workforce reductions reflects longstanding corporate short-termism rather than genuine technological displacement.
The paper's interpretation drawing on labor economics and organizational behavior literature; no empirical study or sample size reported in the summary.
Organizations face mounting pressure to demonstrate immediate returns on AI investments, often through workforce reductions that outpace actual automation capabilities.
Argument in paper citing accelerating AI adoption across sectors and observed managerial responses; no primary dataset or sample size reported in the text.
The interaction between strict algorithmic control and worker counter-strategies leads to persistent limit cycles in strategy frequencies rather than convergence to a stable compliant workforce.
Dynamical systems analysis and simulation trajectories from the EGT model showing limit cycles / oscillatory equilibria in strategy proportions; model-based (no empirical sample).
The way we're thinking about generative AI right now is fundamentally individual (this appears in how users interact with models, how models are built, how they're benchmarked, and how commercial and research strategies using AI are defined).
Author's observational/descriptive claim supported by argumentative examples (mentions user interaction patterns, model design and benchmarking practices, and commercial/research strategies); no empirical sample or quantitative analysis reported in the excerpt.
The emission-reduction effect of AI innovation is more significant for firms located in regions with underdeveloped factor markets.
Heterogeneity (regional subsample/interaction) analysis reported in the paper on the 21,428 firm-year sample, indicating larger AI-related emission reductions in regions with less developed factor markets.
The emission-reduction effect of AI innovation is more significant for firms in high-environmental-sensitivity industries.
Heterogeneity (subsample/interaction) analysis in the paper using the 21,428 firm-year observations, showing stronger AI-related emission reductions in industries characterized as high environmental sensitivity.
The emission-reduction effect of AI innovation is more significant for enterprises with a low supply chain concentration.
Heterogeneity (subsample) analysis reported in the paper using the 21,428 firm-year dataset, comparing effects across firms with different supply chain concentration levels.
Executives’ green cognition and government environmental attention together constitute dual internal and external driving forces for corporate carbon emission reduction.
Further analysis reported in the paper (moderation/interaction analysis or additional regressions) on the same 21,428 firm-year sample showing these factors strengthen carbon reduction associated with AI innovation.
AI innovation can significantly reduce corporate carbon emission intensity.
Empirical analysis using panel data of 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022; result reported in the paper's main regressions (method described as micro-level empirical analysis).