Evidence (13827 claims)
Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 195 | 97 | 889 | 1979 |
| Governance & Regulation | 815 | 391 | 188 | 121 | 1539 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 624 | 233 | 123 | 96 | 1084 |
| Research Productivity | 410 | 121 | 56 | 331 | 929 |
| Output Quality | 466 | 177 | 59 | 47 | 749 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 166 | 122 | 24 | 495 |
| Task Allocation | 206 | 64 | 70 | 31 | 376 |
| Skill Acquisition | 165 | 57 | 60 | 17 | 299 |
| Innovation Output | 201 | 27 | 41 | 18 | 288 |
| Employment Level | 105 | 51 | 107 | 13 | 278 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 149 | 46 | 26 | 3 | 224 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 61 | 20 | 12 | 182 |
| Error Rate | 69 | 91 | 10 | 2 | 172 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 92 | 19 | 13 | 19 | 145 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Skill Obsolescence | 5 | 45 | 6 | 1 | 57 |
| Creative Output | 31 | 16 | 7 | 2 | 57 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
ImplicitMemBench operationalizes three cognitively grounded constructs from cognitive science: Procedural Memory (one-shot skill acquisition after interference), Priming (theme-driven bias via paired experimental/control instances), and Classical Conditioning (CS--US associations shaping first decisions).
Paper description of benchmark design explicitly listing the three constructs and brief operational definitions for each.
We introduce ImplicitMemBench, the first systematic benchmark evaluating implicit memory through three cognitively grounded constructs.
Paper claim of introducing a new benchmark named ImplicitMemBench; it states novelty ('first systematic benchmark') and describes design around three constructs (Procedural Memory, Priming, Classical Conditioning).
LLM-driven persuasion nearly triples the rate at which users select sponsored products compared to traditional search placement (61.2% vs. 22.4%).
Randomized comparison between conversational LLM agent conditions and traditional search placement in the preregistered experiments; reported selection rates 61.2% (LLM) vs. 22.4% (search). Total sample N = 2,012.
Above the Accountability Horizon, distributed accountability mechanisms become necessary.
Derived implication from the Accountability Incompleteness Theorem and the paper's discussion of policy responses; theoretical argument rather than empirical evidence.
Experiments on 3,000 synthetic collectives confirm all predictions with zero violations.
Reported simulation experiments: N = 3,000 synthetic Human-Agent Collectives evaluated against the theoretical predictions; reported outcome was zero violations of the predicted impossibility/conditions.
Below the threshold (Accountability Horizon), legitimate frameworks exist, establishing a sharp phase transition between regimes where the four properties can and cannot be satisfied.
Constructive existence results and theoretical arguments in the paper showing frameworks that satisfy the axioms when compound autonomy is below the defined threshold.
We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model.
Paper provides a formal model/definition called Human-Agent Collectives (mathematical formalisation and definitions).
Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility.
Stated as background assumption in the paper's introduction/abstract; supported by citation to prior legal/ethical/regulatory frameworks (normative claim about literature). No empirical test reported in this paper.
Tiny sharing incentives improve models with weak cooperation.
Experimental intervention reported in the paper: adding small sharing incentives and observing improved cooperation among weakly-cooperative models (stated in abstract; no quantitative effect size or sample size provided there).
Explicit protocols double performance for low-competence models.
Experimental intervention reported in the paper: introducing explicit protocols in the multi-agent setup and observing a doubling of performance for low-competence models (stated in abstract; no sample size reported there).
OpenAI o3-mini reaches 50% of optimal collective performance.
Experimental measurement of collective performance for OpenAI o3-mini in the paper's multi-agent setup (value reported in abstract; no sample size provided there).
The core thesis is alignment-through-accountability: if each agent is aligned with its human owner through the accountability chain, then the collective converges on behavior aligned with human intent -- without top-down rules.
Central theoretical thesis of the paper; presented as a hypothesis to be evaluated rather than as an empirically demonstrated result in the excerpt.
We propose the Separation of Power (SoP) model, a constitutional governance architecture deployed on public blockchain that breaks this monopoly through three structural separations: agents legislate operational rules as smart contracts, deterministic software executes within those contracts, and humans adjudicate through a complete ownership chain binding every agent to a responsible principal.
Design proposal / governance architecture presented in the paper; the text asserts that the model 'breaks this monopoly' but provides no experimental results in the excerpt to validate that claim.
Properly designed, agentic copyright offers a path toward scalable, fair, and legally meaningful copyright markets in the age of AI.
Synthesis and prescriptive claim grounded in the paper's conceptual framework; presented as an argument rather than as empirically demonstrated outcome.
AI should be understood not only as a source of disruption, but also as a governance tool capable of restoring market-based ordering in creative industries.
Normative conclusion of the paper based on theoretical reasoning and proposed governance mechanisms; no empirical tests provided.
Embedding normative constraints and monitoring functions into multi-agent architectures can align agent behavior with the underlying values of copyright law.
Conceptual proposal and argumentation in the paper; no experimental or field evidence offered.
The governance framework should emphasize ex ante and ex post coordination mechanisms capable of correcting agentic market failures before they crystallize into systemic harm.
Prescriptive policy/design recommendation grounded in the paper's conceptual analysis; no empirical validation provided.
A supervised multi-agent governance framework that integrates legal rules, technical protocols, and institutional oversight can address agentic market failures.
Framework development and prescriptive argumentation within the paper; proposed design rather than empirically validated solution.
Multi-agent ecosystems promise efficiency gains and reduced transaction costs in creative markets.
Theoretical claim and normative argument in the paper; no empirical measurement or sample provided to quantify efficiency gains.
The paper introduces 'agentic copyright', a model in which AI agents act on behalf of creators and users to negotiate access, attribution, and compensation for copyrighted works.
Conceptual proposal and definitional development within the paper; presented as a new model rather than as empirically validated intervention.
Those incentivized for originality rely on the model more selectively for brainstorming, proofreading, and targeted edits.
Behavioral/usage measures from the RCT indicating task-level patterns of AI use (described qualitatively in excerpt; no quantitative task-level usage breakdown provided).
Participants rewarded for originality relative to peers produce collectively more diverse writing than those rewarded for quality alone.
Randomized assignment to incentive conditions (originality reward vs. quality reward) in the pre-registered RCT on a creative writing task (no sample size or numerical effect provided in excerpt).
Early evidence has shown that generative AI can increase individual-level productivity.
Statement refers to prior literature/early studies (no specific study, sample size, or method reported in the excerpt).
Much of the business and management literature approaches artificial intelligence primarily as a technological capability that enhances efficiency and productivity.
Literature review / characterization of existing business and management literature cited in the paper; no quantitative synthesis or meta-analysis reported.
For non‑tech firms seeking to enhance operational efficiency through digitalization, optimizing internal power structures in response to technological shifts can improve firm performance.
Policy/managerial recommendation based on the study's empirical findings linking digitalization, decentralization, and productivity using China's listed firms data (2009–2020).
Digital technologies operate as an external contingency for non‑tech firms, requiring structural decentralization to align organizational structure with technological shifts.
Theoretical proposition and interpretation of empirical findings in the paper; framed as a contribution to organizational structure theory rather than a separate causal test.
Many non‑technology firms' existing organizational structures fail to accommodate data‑driven digital technologies, creating a need for strategic adaptation to integrate these technologies into business operations.
Argument and literature synthesis presented in the paper motivating the study; descriptive characterization rather than a directly tested empirical claim in the reported analyses.
Shifting power allocation (decentralization to subsidiaries) driven by digitalization significantly enhances firm productivity.
Further (post‑hoc / additional) analyses reported in the paper linking measured shifts in internal power allocation to improvements in firm productivity using the sample of China's listed companies (2009–2020).
The decentralizing effect of digitalization is stronger for firms operating in environments of higher uncertainty.
Moderation analyses in the paper using public data on China's listed companies (2009–2020) showing interaction between digitalization and environmental uncertainty on decentralization outcomes.
The decentralizing effect of digitalization is more pronounced for companies with greater business diversification.
Moderation tests reported in the study using the same dataset of China's listed companies (2009–2020) examining interaction between digitalization and business diversification on subsidiary empowerment.
Firms with higher levels of digitalization tend to decentralize decision‑making authority to their subsidiaries.
Empirical analysis using public data from China's listed companies between 2009 and 2020; paper reports multiple measures of digitalization and tests the relationship between firm digitalization and subsidiary empowerment (decentralization).
The paper documents best practices for iteratively generating tests to capture existing system behavior before model-assisted refactoring.
Methodological contributions in the paper: recommended workflow and practices for iterative test generation to lock down behavior prior to refactoring.
The described workflow constrained refactoring changes and enabled model-assisted refactoring under developer supervision, with proposed code changes validated by passing tests.
Methodological description in the paper: iterative test generation to capture existing behavior, then model-assisted refactoring with developer oversight and test-based validation.
The generated tests achieved up to 78% branch coverage in critical modules.
Measured branch coverage reported in the case study for critical modules after running the generated tests.
Using coding models, we generated nearly 16,000 lines of reliable unit tests in hours rather than weeks.
Single case study reported in the paper: automated unit test generation using coding models; reported aggregate output of generated tests and a qualitative time comparison (hours vs weeks).
The results confirm the expediency of concentrating government support on a limited number of industries with the greatest potential for structural transformation.
Policy recommendation derived from the simulated outcomes of the integrated model showing larger productivity, adoption, and employment gains in the identified priority sectors (model calibrated to 2020–2024 Kazakhstan industry data).
Across all sectors, there is a steady excess of the effect of creating new tasks over the effect of automation (i.e., task-creation effects exceed automation effects), reflecting the specifics of a resource-dependent economy with a shortage of qualified personnel.
Model decomposition of task-creation versus automation effects in the Acemoglu–Restrepo task-oriented component of the integrated model, calibrated with 2020–2024 industry data for Kazakhstan.
Total employment increases by 22.4 p.p. (equivalent to +1.3 million jobs) over the simulation period.
Simulated employment outcomes from the integrated dynamic model calibrated to Kazakhstan industry data (2020–2024).
The share of priority industries in the GDP structure increases by 6.3 p.p. (over the simulation horizon).
Modelled structural transformation outcomes from the integrated simulation using Kazakhstan industry data (2020–2024).
AI adoption in priority sectors by 2035 exceeds the indicators of non-priority industries by 13–32 p.p.
Comparison of simulated AI adoption trajectories between priority and non-priority sectors using the integrated model calibrated on 2020–2024 industry data.
The level of AI adoption in priority sectors reaches 86.8–93.8 p.p. by 2035.
Projection from the paper's Bass-model-based diffusion and integrated dynamic model calibrated to Kazakhstan industry data (2020–2024).
Of the cumulative 35.3 p.p. increase in gross value added for 2025–2035, 16.8 percentage points are attributable to AI.
Decomposition of the simulation results produced by the paper's integrated dynamic model, using industry data for 2020–2024 for calibration.
The cumulative increase in gross value added in the analysed industries will amount to 35.3 p.p. for the period 2025-2035.
Simulation results from the paper's integrated dynamic model (Bass diffusion + expanded production function with endogenous technological progress + task-oriented Acemoglu–Restrepo approach) calibrated to industry data from the Bureau of National Statistics of the Republic of Kazakhstan for 2020–2024.
We illustrate a human-in-the-loop research methodology for LLMs to automatically classify and summarize research descriptions at scale.
Methodological description and application of a human-in-the-loop LLM workflow applied to the 58,746 NIH project descriptions, including labeling, model prompting, and human review steps as described in the paper.
AI research is concentrated in discovery, prediction, and data integration across disease domains.
Topic/semantic classification of AI-labelled project descriptions indicating topical concentration in discovery, prediction, and data integration areas across disease domains within the analyzed portfolio.
AI projects receive a 13.4% funding premium.
Comparison of funding amounts for AI-classified projects versus non-AI projects in the analyzed NIH portfolio (sample drawn from the 58,746 projects).
AI constitutes 15.9% of the NIH portfolio.
Classification of the 58,746 NIH project descriptions to identify projects using AI, yielding a reported share of 15.9%.
We present a comprehensive analysis of 58,746 NIH-funded biomedical research projects from 2025.
Enumeration and analysis of NIH project descriptions (n = 58,746) in 2025 using the paper's described dataset and methods.
The United States maintains superior resources by enforcing strict export controls on semiconductor chips, AI models, as well as outbound investments in these areas.
Stated as a comparative conclusion in the chapter; implies policy analysis of U.S. export-control regimes (e.g., controls on chips, models, outbound investment), but no specific datasets or sample sizes are given in the excerpt.
China's legal environment may offer certain advantage in terms of IP protection.
Asserted in the chapter as part of comparative analysis of IP regimes in the US and China; presented as a conclusion without reported empirical metrics in the excerpt.