Evidence (6491 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

The per-task ceiling does not bind the windowed measure, though both remain bounded: L_task by per-task novelty, L_window by the stock of accumulated planning investment that pays out within the window.

Theoretical derivation/argument in the paper distinguishing bounds on per-task leverage (L_task) and windowed leverage (L_window) and identifying their respective limiting factors; no empirical evidence provided.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... bounds on L_task and L_window (per-task novelty and accumulated planning investm...

We extend this per-task analysis to a windowed leverage measure that accommodates recurring tasks, spawned subtasks, and amortized system-design investment.

Conceptual/theoretical extension in the paper defining a windowed leverage metric and describing how it accounts for recurring tasks, subtasks, and amortized design investments; no empirical tests reported.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... windowed leverage (aggregated leverage over a time window accounting for amortiz...

The asymptotic behavior of leverage decomposes into two scaling axes (capability and memory) with a non-zero floor on the planning term set by irreducible task novelty bounded by human throughput.

Mathematical/theoretical asymptotic analysis within the paper; conceptual derivation linking capability and memory as scaling axes and asserting a lower bound on planning cost due to task novelty and human throughput.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... leverage scaling behavior and lower bound on planning term

Information density itself is directional and bounded by separate ceilings on human-to-agent and agent-to-human flow.

Theoretical argument/derivation in the paper establishing directional information-density and distinct upper bounds for each flow direction; no empirical validation reported.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... directional information flow bounds between human and agent

The denominator decomposes into three channels through which a conserved per-task information requirement must flow, each with its own time-cost scalar (specify the task, resolve mid-run interrupts, and review the result).

Analytic decomposition within the paper's theoretical framework; conceptual argument rather than empirical measurement.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... components of human time cost (specification, interrupt resolution, review)

We propose a per-task leverage ratio for human-agent collaboration: human work displaced by an agent, divided by the human time required to specify the task, resolve mid-run interrupts, and review the result.

Theoretical/conceptual proposal and formal definition provided in the paper; no empirical sample or experimental data reported.

high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... human work displaced per unit human time (per-task leverage)

Grounding recommendations in validated research offers leaders a framework for navigating AI's labor implications responsibly.

Paper asserts that its synthesis and recommendations provide a practical framework for leaders; no empirical validation of the framework is reported in the abstract.

high positive AI Displacement Risk in the Labor Market: Evidence, Exposure... ability of leaders to navigate AI labor implications and mitigate harm

Evidence-based organizational responses (transparent workforce planning, skills investment, redesigned roles, adaptive governance, and long-term capability-building) can mitigate harm and prepare organizations for workplace transformation.

Paper proposes these organizational responses grounded in the synthesized empirical literature; this is a recommendation rather than an empirically tested intervention in the paper abstract.

high positive AI Displacement Risk in the Labor Market: Evidence, Exposure... organizational readiness and mitigation of AI-related harms

Our evolved prefetcher achieves a 1.76x geomean IPC speedup over no prefetching, 17% over its VA/AMPM Lite seed (1.59x) and 21% over SMS (1.55x).

Reported experimental result comparing geomean IPC across benchmark set; comparisons made to no prefetching and two specific prefetcher baselines. Benchmark details not included in abstract.

high positive Agentic Architect: An Agentic AI Framework for Architecture ... task_completion_time

Our evolved branch predictor achieves a 1.100x geomean IPC speedup over Bimodal, 1.5% over its Hashed Perceptron seed (1.085x).

Reported experimental result comparing geomean IPC across benchmark set; compared to Bimodal and Hashed Perceptron seed as baselines. Benchmark details not given in abstract.

high positive Agentic Architect: An Agentic AI Framework for Architecture ... task_completion_time

Our best evolved cache replacement design achieves a 1.062x geomean IPC speedup over LRU, 0.6% over Mockingjay (1.056x).

Reported experimental result comparing geomean IPC across benchmark set; exact benchmark count/split not provided in abstract. Comparison reported against LRU and Mockingjay baselines.

high positive Agentic Architect: An Agentic AI Framework for Architecture ... task_completion_time

Across cache replacement, data prefetching, and branch prediction, Agentic Architect matches or exceeds state-of-the-art designs.

Experimental evaluation across three microarchitectural component domains (cache replacement, prefetching, branch prediction) reported in the paper with comparative performance results versus baselines.

high positive Agentic Architect: An Agentic AI Framework for Architecture ... task_completion_time

We introduce Agentic Architect, an agentic AI framework for computer architecture design exploration and optimization that combines LLM-driven code evolution with cycle-accurate simulation.

Authors' description of the system and methodology in the paper (introduction and methods). No numeric sample size reported in the abstract; evidence is the implemented framework and accompanying descriptions; authors state it is open-source.

high positive Agentic Architect: An Agentic AI Framework for Architecture ... innovation_output

Through targeted prompting inspired by these findings, we modify agents' negotiation behavior and improve win rates from 22.2% to 32.7%.

Intervention experiment reported in the paper where prompts were changed and resulting agent win rates were measured.

high positive Cooperate to Compete: Strategic Coordination in Multi-Agent ... agent win rate

In clinical utility evaluation across three abstraction tasks, semantic search reduced time-to-completion by 24 to 89% compared to clinician-performed chart review.

Clinical utility assessment compared chart abstraction efficiency across three tasks and reported percentage reductions in time-to-completion ranging from 24% to 89%.

high positive Health System Scale Semantic Search Across Unstructured Clin... time-to-completion

Qwen3 embeddings with 300-token chunk size achieved 94.6% accuracy on a clinical question-answering benchmark.

Optimization experiment on a physician-authored clinical question-answering benchmark; best-performing configuration reported as qwen3 embeddings with 300-token chunks and 94.6% accuracy.

high positive Health System Scale Semantic Search Across Unstructured Clin... accuracy_on_clinical_question_answering_benchmark

The system delivers sub-second query latency: median 237 ms single-user, 451 ms at 20-user concurrency.

Full-scale performance characterization reported exact median latencies for single-user and 20-user concurrency.

high positive Health System Scale Semantic Search Across Unstructured Clin... query_latency

Technological advancement alone is insufficient—maximizing AI's economic potential requires strategic investments in workforce capability development (e.g., widespread AI fluency programs and targeted cultivation of higher-order judgment skills).

Policy recommendation based on the article's synthesis of task-based models and empirical literature; the excerpt does not report specific interventions, trials, or sample sizes.

high positive AI as Augmentation: How Human Capital Shapes Technology's Im... effectiveness of workforce capability investments for realizing AI-driven produc...

The supply of AI-literate workers amplifies productivity gains.

Stated as a mechanism in the task-based model synthesis; described qualitatively in the article without specific empirical method or sample sizes in the excerpt.

high positive AI as Augmentation: How Human Capital Shapes Technology's Im... productivity gains from AI adoption

Aggregate productivity improvements from AI advancement depend critically on two forms of human capital: specialized AI expertise and complementary non-AI skills.

Claim is presented as a theoretical result drawn from 'task-based economic models' in the article; empirical corroboration is referenced generally but no specific datasets or sample sizes are reported in the excerpt.

high positive AI as Augmentation: How Human Capital Shapes Technology's Im... aggregate productivity improvements

Mounting empirical evidence indicates AI primarily functions as augmentation technology—amplifying human capabilities rather than replacing workers.

Article states it draws on 'mounting empirical evidence' and synthesizes recent theoretical and empirical findings; no specific studies, methods, or sample sizes are cited in the excerpt.

high positive AI as Augmentation: How Human Capital Shapes Technology's Im... degree of workforce displacement versus augmentation (replacement vs. amplificat...

The proposed approach reframes AI control from optimizing decisions to governing their admissibility, introducing a protocol-level abstraction that operates independently of model architecture or training methodology.

Conceptual argument and proposal in the paper asserting architecture-agnostic protocol abstraction. No empirical tests across architectures or training methods reported.

high positive Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... shift in control paradigm (from decision optimization to admissibility governanc...

Through a scenario-based case study, we demonstrate how identical AI outputs can lead to divergent outcomes when evaluated under a Right-to-Act protocol, preserving reversibility and preventing premature or irreversible actions.

Scenario-based case study (illustrative demonstration). The paper reports example scenarios rather than empirical experiments; no sample size or quantitative evaluation reported.

high positive Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... divergent outcomes from identical AI outputs under the protocol; preservation of...

Unlike compensatory systems, where high-confidence signals can override failed conditions, the proposed framework enforces strict structural constraints: if any required condition is unmet, execution is halted or deferred.

Conceptual distinction and protocol rule specification in the paper (formal description of non-compensatory enforcement). No empirical testing reported.

high positive Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... whether execution proceeds when required conditions are unmet (halt/defer behavi...

We introduce the Right-to-Act protocol, a deterministic, non-compensatory pre-execution decision layer that evaluates whether an AI-generated decision is permitted to be realized at all.

Proposed method / conceptual contribution and formal definition provided in the paper (formalization and protocol specification). No empirical validation or sample size reported.

high positive Right-to-Act: A Pre-Execution Non-Compensatory Decision Prot... eligibility/admissibility of AI-generated decisions prior to execution

Taken together, these insights provide theoretical clarity and practical guidance for responsible GenAI integration into creative work.

Authors' stated contribution and practical recommendations derived from the conceptual framework; no empirical evaluation of guidance effectiveness provided.

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... theoretical clarity and practical guidance for responsible GenAI integration

The study reinterprets process-oriented creativity theories through structural parallels with GenAI.

Conceptual reanalysis and theoretical reinterpretation based on literature synthesis (paper's theoretical contribution).

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... process-oriented creativity theory reinterpretation

The authors propose a role-based integration model that aligns GenAI capabilities with key creative functions: idea generation, synthesis, strategic framing, and facilitation.

Presentation of a novel conceptual model / framework in the paper (theoretical design); no empirical validation or measured outcomes reported.

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... alignment of GenAI capabilities with creative functions (idea generation, synthe...

The paper repositions GenAI as a cognitive collaborator rather than merely a productivity tool.

Argumentative / conceptual claim supported by the proposed theoretical reframing and role-based model in the paper; no empirical testing reported.

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... role of GenAI in organizational workflows (cognitive collaborator vs productivit...

There are structural parallels between GenAI architectures and human cognition—such as heuristic search, divergent thinking, and iterative refinement.

Conceptual mapping and theoretical comparison between GenAI architecture characteristics and cognitive/creativity constructs presented in the paper (literature synthesis / theoretical argument).

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... structural parallels between GenAI architectures and human cognition (heuristic ...

The study revisits foundational creativity theories to develop a framework for integrating GenAI into creative workflows.

Paper describes a conceptual review and theoretical synthesis of foundational creativity theories leading to a proposed integration framework; methodological (theoretical / conceptual) contribution rather than empirical validation.

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... framework for integrating GenAI into creative workflows

Generative Artificial Intelligence (GenAI) is reshaping organisational creativity by emulating cognitive processes traditionally associated with human innovation.

Paper's theoretical argument and literature-grounded conceptual claims (conceptual analysis / literature review); no empirical sample or quantitative data reported.

high positive Beyond the Creativity Paradox: A Theory-informed Framework f... organisational creativity

That compliance layer can improve oversight by making departures from law easier to detect.

Claim supported by the paper's analytical argumentation (no empirical evidence reported).

high positive AI Governance under Political Turnover: The Alignment Surfac... detectability of departures from law (oversight effectiveness)

For probabilistic AI to be incorporated into public administration it must be embedded in a compliance layer that makes decisions reviewable, repeatable, and legally defensible.

Stated as a normative/architectural claim in the paper; supported by conceptual argument rather than empirical testing.

high positive AI Governance under Political Turnover: The Alignment Surfac... requirements for legal/administrative incorporation of probabilistic AI

Governments are increasingly interested in using AI to make administrative decisions cheaper, more scalable, and more consistent.

Stated as background motivation in the paper (no empirical data or sample size reported).

high positive AI Governance under Political Turnover: The Alignment Surfac... government interest in AI adoption for administrative decisions (cost, scale, co...

There is an open opportunity to support collaborative construction where users and AI jointly develop an evolving knowledge representation.

Paper's stated research opportunity and motivation based on gaps identified in prior tools and systems (conceptual argument).

high positive MindTrellis: Co-Creating Knowledge Structures with AI throug... potential benefits of joint user-AI collaborative knowledge representation (prop...

In a user study where 12 participants created slide decks, MindTrellis outperformed retrieval-only baselines in knowledge organization and cognitive load, as measured by expert ratings of content coverage and structural quality.

Controlled user study reported in the paper: N = 12 participants performing slide-deck creation tasks; outcomes assessed via expert ratings of content coverage and structural quality (comparison to retrieval-only baseline).

high positive MindTrellis: Co-Creating Knowledge Structures with AI throug... knowledge organization and cognitive load (operationalized via expert ratings of...

MindTrellis is an interactive visual system where users and AI collaboratively build a dynamic knowledge graph; users can query the graph for document-grounded information and contribute by introducing new concepts, modifying relationships, and reorganizing the hierarchy.

System design and implementation described in the paper (feature description and demonstration).

high positive MindTrellis: Co-Creating Knowledge Structures with AI throug... system capability to support collaborative construction and manipulation of a dy...

Generative artificial intelligence (genAI) is rapidly reshaping how knowledge and culture are produced and consumed.

Author's descriptive statement based on observed changes in production/consumption patterns (no empirical sample reported in paper abstract).

high positive Generative artificial intelligence reduces social welfare th... production and consumption of knowledge and culture

Reducing variability in solder-joint quality and cycle time.

Abstract statement that variability in solder-joint quality and cycle time was reduced during the deployment (no quantitative variability metrics provided in the abstract).

high positive Learning-augmented robotic automation for real-world manufac... variability of solder-joint quality; variability of cycle time

It maintained near-human takt time.

Abstract claim comparing the system's cycle/takt time to human performance during the deployment (no numeric takt-time comparison provided in the abstract).

high positive Learning-augmented robotic automation for real-world manufac... takt time (cycle time) relative to human workers

Achieving a 99.4% pass rate on product-level quality-control tests.

Reported QC pass rate from the production run in the abstract (presumably based on the produced motors).

high positive Learning-augmented robotic automation for real-world manufac... product-level quality-control pass rate

Operating without physical fencing.

Abstract statement that the run occurred "without physical fencing" (implying operation around people without traditional fences).

high positive Learning-augmented robotic automation for real-world manufac... use of physical fences for safety (absent)

Produced 108 motors.

Count of products produced during the continuous run reported in the abstract.

high positive Learning-augmented robotic automation for real-world manufac... number of motors produced during the run

The system operated continuously for 5 h 10 min.

Reported continuous operation duration from the production run described in the abstract.

high positive Learning-augmented robotic automation for real-world manufac... continuous operational time without interruption

Less than 20 min of real-world data per task.

Reported training data requirement for the deployed tasks in the authors' field experiment (abstract statement).

high positive Learning-augmented robotic automation for real-world manufac... amount of real-world training data per task

With less than 20 min of real-world data per task, the system operated continuously for 5 h 10 min, producing 108 motors without physical fencing and achieving a 99.4% pass rate on product-level quality-control tests.

Single field deployment / production run reported in the paper; numbers reported in the abstract (training data time, continuous operation duration, number of motors produced, fencing status, QC pass rate).

high positive Learning-augmented robotic automation for real-world manufac... training data required; continuous operational duration; production quantity; pr...

We deployed the system on an electric-motor production line to automate deformable cable insertion and soldering under real manufacturing constraints, a step previously performed manually by human workers.

Field deployment on an actual electric-motor production line described by the authors (deployment + task specification).

high positive Learning-augmented robotic automation for real-world manufac... automation of previously manual deformable cable insertion and soldering tasks

We present Learning-Augmented Robotic Automation, a hybrid system that integrates learned task controllers and a neural 3D safety monitor into conventional industrial workflows.

Description of the system developed by the authors (system design/development reported in the paper).

high positive Learning-augmented robotic automation for real-world manufac... integration of learned controllers and 3D safety monitoring

Self-correction should be treated not as a default behavior, but as a control decision governed by measurable error dynamics.

Synthesis of theoretical framing (Markov model and diagnostic inequality) and empirical results across multiple models/datasets showing thresholds and promptability of EIR.

high positive When Does LLM Self-Correction Help? A Control-Theoretic Mark... policy/recommendation about when to enable iterative self-correction to improve ...

« Prev 1 2 3 … 67 68 69 … 129 130 Next »