Evidence (6491 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
The per-task ceiling does not bind the windowed measure, though both remain bounded: L_task by per-task novelty, L_window by the stock of accumulated planning investment that pays out within the window.
Theoretical derivation/argument in the paper distinguishing bounds on per-task leverage (L_task) and windowed leverage (L_window) and identifying their respective limiting factors; no empirical evidence provided.
We extend this per-task analysis to a windowed leverage measure that accommodates recurring tasks, spawned subtasks, and amortized system-design investment.
Conceptual/theoretical extension in the paper defining a windowed leverage metric and describing how it accounts for recurring tasks, subtasks, and amortized design investments; no empirical tests reported.
The asymptotic behavior of leverage decomposes into two scaling axes (capability and memory) with a non-zero floor on the planning term set by irreducible task novelty bounded by human throughput.
Mathematical/theoretical asymptotic analysis within the paper; conceptual derivation linking capability and memory as scaling axes and asserting a lower bound on planning cost due to task novelty and human throughput.
Information density itself is directional and bounded by separate ceilings on human-to-agent and agent-to-human flow.
Theoretical argument/derivation in the paper establishing directional information-density and distinct upper bounds for each flow direction; no empirical validation reported.
The denominator decomposes into three channels through which a conserved per-task information requirement must flow, each with its own time-cost scalar (specify the task, resolve mid-run interrupts, and review the result).
Analytic decomposition within the paper's theoretical framework; conceptual argument rather than empirical measurement.
We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result.
Theoretical/conceptual proposal and formal definition provided in the paper; no empirical sample or experimental data reported.
Grounding recommendations in validated research offers leaders a framework for navigating AI's labor implications responsibly.
Paper asserts that its synthesis and recommendations provide a practical framework for leaders; no empirical validation of the framework is reported in the abstract.
Evidence-based organizational responses (transparent workforce planning, skills investment, redesigned roles, adaptive governance, and long-term capability-building) can mitigate harm and prepare organizations for workplace transformation.
Paper proposes these organizational responses grounded in the synthesized empirical literature; this is a recommendation rather than an empirically tested intervention in the paper abstract.
Our evolved prefetcher achieves a 1.76x geomean IPC speedup over no prefetching, 17% over its VA/AMPM Lite seed (1.59x) and 21% over SMS (1.55x).
Reported experimental result comparing geomean IPC across benchmark set; comparisons made to no prefetching and two specific prefetcher baselines. Benchmark details not included in abstract.
Our evolved branch predictor achieves a 1.100x geomean IPC speedup over Bimodal, 1.5% over its Hashed Perceptron seed (1.085x).
Reported experimental result comparing geomean IPC across benchmark set; compared to Bimodal and Hashed Perceptron seed as baselines. Benchmark details not given in abstract.
Our best evolved cache replacement design achieves a 1.062x geomean IPC speedup over LRU, 0.6% over Mockingjay (1.056x).
Reported experimental result comparing geomean IPC across benchmark set; exact benchmark count/split not provided in abstract. Comparison reported against LRU and Mockingjay baselines.
Across cache replacement, data prefetching, and branch prediction, Agentic Architect matches or exceeds state-of-the-art designs.
Experimental evaluation across three microarchitectural component domains (cache replacement, prefetching, branch prediction) reported in the paper with comparative performance results versus baselines.
We introduce Agentic Architect, an agentic AI framework for computer architecture design exploration and optimization that combines LLM-driven code evolution with cycle-accurate simulation.
Authors' description of the system and methodology in the paper (introduction and methods). No numeric sample size reported in the abstract; evidence is the implemented framework and accompanying descriptions; authors state it is open-source.
Through targeted prompting inspired by these findings, we modify agents' negotiation behavior and improve win rates from 22.2% to 32.7%.
Intervention experiment reported in the paper where prompts were changed and resulting agent win rates were measured.
In clinical utility evaluation across three abstraction tasks, semantic search reduced time-to-completion by 24 to 89% compared to clinician-performed chart review.
Clinical utility assessment compared chart abstraction efficiency across three tasks and reported percentage reductions in time-to-completion ranging from 24% to 89%.
Qwen3 embeddings with 300-token chunk size achieved 94.6% accuracy on a clinical question-answering benchmark.
Optimization experiment on a physician-authored clinical question-answering benchmark; best-performing configuration reported as qwen3 embeddings with 300-token chunks and 94.6% accuracy.
The system delivers sub-second query latency: median 237 ms single-user, 451 ms at 20-user concurrency.
Full-scale performance characterization reported exact median latencies for single-user and 20-user concurrency.
Technological advancement alone is insufficient—maximizing AI's economic potential requires strategic investments in workforce capability development (e.g., widespread AI fluency programs and targeted cultivation of higher-order judgment skills).
Policy recommendation based on the article's synthesis of task-based models and empirical literature; the excerpt does not report specific interventions, trials, or sample sizes.
The supply of AI-literate workers amplifies productivity gains.
Stated as a mechanism in the task-based model synthesis; described qualitatively in the article without specific empirical method or sample sizes in the excerpt.
Aggregate productivity improvements from AI advancement depend critically on two forms of human capital: specialized AI expertise and complementary non-AI skills.
Claim is presented as a theoretical result drawn from 'task-based economic models' in the article; empirical corroboration is referenced generally but no specific datasets or sample sizes are reported in the excerpt.
Mounting empirical evidence indicates AI primarily functions as augmentation technology—amplifying human capabilities rather than replacing workers.
Article states it draws on 'mounting empirical evidence' and synthesizes recent theoretical and empirical findings; no specific studies, methods, or sample sizes are cited in the excerpt.
The proposed approach reframes AI control from optimizing decisions to governing their admissibility, introducing a protocol-level abstraction that operates independently of model architecture or training methodology.
Conceptual argument and proposal in the paper asserting architecture-agnostic protocol abstraction. No empirical tests across architectures or training methods reported.
Through a scenario-based case study, we demonstrate how identical AI outputs can lead to divergent outcomes when evaluated under a Right-to-Act protocol, preserving reversibility and preventing premature or irreversible actions.
Scenario-based case study (illustrative demonstration). The paper reports example scenarios rather than empirical experiments; no sample size or quantitative evaluation reported.
Unlike compensatory systems, where high-confidence signals can override failed conditions, the proposed framework enforces strict structural constraints: if any required condition is unmet, execution is halted or deferred.
Conceptual distinction and protocol rule specification in the paper (formal description of non-compensatory enforcement). No empirical testing reported.
We introduce the Right-to-Act protocol, a deterministic, non-compensatory pre-execution decision layer that evaluates whether an AI-generated decision is permitted to be realized at all.
Proposed method / conceptual contribution and formal definition provided in the paper (formalization and protocol specification). No empirical validation or sample size reported.
Taken together, these insights provide theoretical clarity and practical guidance for responsible GenAI integration into creative work.
Authors' stated contribution and practical recommendations derived from the conceptual framework; no empirical evaluation of guidance effectiveness provided.
The study reinterprets process-oriented creativity theories through structural parallels with GenAI.
Conceptual reanalysis and theoretical reinterpretation based on literature synthesis (paper's theoretical contribution).
The authors propose a role-based integration model that aligns GenAI capabilities with key creative functions: idea generation, synthesis, strategic framing, and facilitation.
Presentation of a novel conceptual model / framework in the paper (theoretical design); no empirical validation or measured outcomes reported.
The paper repositions GenAI as a cognitive collaborator rather than merely a productivity tool.
Argumentative / conceptual claim supported by the proposed theoretical reframing and role-based model in the paper; no empirical testing reported.
There are structural parallels between GenAI architectures and human cognition—such as heuristic search, divergent thinking, and iterative refinement.
Conceptual mapping and theoretical comparison between GenAI architecture characteristics and cognitive/creativity constructs presented in the paper (literature synthesis / theoretical argument).
The study revisits foundational creativity theories to develop a framework for integrating GenAI into creative workflows.
Paper describes a conceptual review and theoretical synthesis of foundational creativity theories leading to a proposed integration framework; methodological (theoretical / conceptual) contribution rather than empirical validation.
Generative Artificial Intelligence (GenAI) is reshaping organisational creativity by emulating cognitive processes traditionally associated with human innovation.
Paper's theoretical argument and literature-grounded conceptual claims (conceptual analysis / literature review); no empirical sample or quantitative data reported.
That compliance layer can improve oversight by making departures from law easier to detect.
Claim supported by the paper's analytical argumentation (no empirical evidence reported).
For probabilistic AI to be incorporated into public administration it must be embedded in a compliance layer that makes decisions reviewable, repeatable, and legally defensible.
Stated as a normative/architectural claim in the paper; supported by conceptual argument rather than empirical testing.
Governments are increasingly interested in using AI to make administrative decisions cheaper, more scalable, and more consistent.
Stated as background motivation in the paper (no empirical data or sample size reported).
There is an open opportunity to support collaborative construction where users and AI jointly develop an evolving knowledge representation.
Paper's stated research opportunity and motivation based on gaps identified in prior tools and systems (conceptual argument).
In a user study where 12 participants created slide decks, MindTrellis outperformed retrieval-only baselines in knowledge organization and cognitive load, as measured by expert ratings of content coverage and structural quality.
Controlled user study reported in the paper: N = 12 participants performing slide-deck creation tasks; outcomes assessed via expert ratings of content coverage and structural quality (comparison to retrieval-only baseline).
MindTrellis is an interactive visual system where users and AI collaboratively build a dynamic knowledge graph; users can query the graph for document-grounded information and contribute by introducing new concepts, modifying relationships, and reorganizing the hierarchy.
System design and implementation described in the paper (feature description and demonstration).
Generative artificial intelligence (genAI) is rapidly reshaping how knowledge and culture are produced and consumed.
Author's descriptive statement based on observed changes in production/consumption patterns (no empirical sample reported in paper abstract).
Reducing variability in solder-joint quality and cycle time.
Abstract statement that variability in solder-joint quality and cycle time was reduced during the deployment (no quantitative variability metrics provided in the abstract).
It maintained near-human takt time.
Abstract claim comparing the system's cycle/takt time to human performance during the deployment (no numeric takt-time comparison provided in the abstract).
Achieving a 99.4% pass rate on product-level quality-control tests.
Reported QC pass rate from the production run in the abstract (presumably based on the produced motors).
Operating without physical fencing.
Abstract statement that the run occurred "without physical fencing" (implying operation around people without traditional fences).
Produced 108 motors.
Count of products produced during the continuous run reported in the abstract.
The system operated continuously for 5 h 10 min.
Reported continuous operation duration from the production run described in the abstract.
Less than 20 min of real-world data per task.
Reported training data requirement for the deployed tasks in the authors' field experiment (abstract statement).
With less than 20 min of real-world data per task, the system operated continuously for 5 h 10 min, producing 108 motors without physical fencing and achieving a 99.4% pass rate on product-level quality-control tests.
Single field deployment / production run reported in the paper; numbers reported in the abstract (training data time, continuous operation duration, number of motors produced, fencing status, QC pass rate).
We deployed the system on an electric-motor production line to automate deformable cable insertion and soldering under real manufacturing constraints, a step previously performed manually by human workers.
Field deployment on an actual electric-motor production line described by the authors (deployment + task specification).
We present Learning-Augmented Robotic Automation, a hybrid system that integrates learned task controllers and a neural 3D safety monitor into conventional industrial workflows.
Description of the system developed by the authors (system design/development reported in the paper).
Self-correction should be treated not as a default behavior, but as a control decision governed by measurable error dynamics.
Synthesis of theoretical framing (Markov model and diagnostic inequality) and empirical results across multiple models/datasets showing thresholds and promptability of EIR.