Evidence (6491 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

In the short run, with fixed human capital, wages, and job boundaries, AI raises productivity by reducing the time required to perform steps.

Model distinction between short-run (fixed job design and skills) and long-run horizons; short-run optimization shows AI reduces expected execution times for steps, thereby raising productivity.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation time required to complete production steps (task completion time)

Aggregating heterogeneous firms that deploy a commonly available AI technology yields an aggregate production function that admits a constant elasticity of substitution (CES) representation with three inputs: aggregate manual labor, aggregate AI-assisted labor, and aggregate capital.

Theoretical aggregation argument drawing on Houthakker (1955) and Levhari (1968), deriving a macro-level CES representation from a microfounded algorithmic cost function defined by firms' joint optimization over AI deployment and job design.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation form of the aggregate production function (CES representation and separability o...

Improvements in AI quality generate non-linear effects on labor demand and wages because firms' cost-minimizing AI deployment and job designs change discretely at particular AI quality thresholds (microfoundation for the productivity J-curve).

Theoretical analysis of discrete switches in the cost-minimizing arrangement as AI success probability and execution times change; characterization of threshold effects and discussion linking to the J-curve phenomenon (model results and comparative statics).

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation labor demand and wages response to AI quality improvements (non-linear threshold...

Adjacency to AI-executed steps increases the likelihood that a given step is executed by AI (local complementarities): a step is more likely to be AI-executed in occupations where its neighboring steps are also AI-executed.

Empirical comparisons of conceptually similar steps across occupations paired with workflow adjacency information and realized AI execution outcomes from Anthropic’s Economic Index; statistical tests reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation probability (or likelihood) that a step is AI-executed conditional on neighborin...

AI-executed steps co-occur in contiguous chains rather than being randomly scattered across a production workflow.

Empirical analysis linking O*NET tasks to human assessments of AI exposure (Eloundou et al., 2024), realized AI execution outcomes from Anthropic’s Economic Index (Handa et al., 2025), and GPT-generated workflow orderings for occupations; statistical tests comparing observed contiguity to random/scaled baselines reported in the paper.

high positive Chaining Tasks, Redefining Work: A Theory of AI Automation contiguity of AI-executed steps in occupation workflows

Platforms should implement AIGC-sensitive distribution algorithms and precise governance frameworks to ensure the long-term health of online content platforms.

Policy/recommendation derived from the paper's empirical findings on consumption preferences, producer behaviors, and the moderating role of distribution algorithms.

high positive Scale over Preference: The Impact of AI-Generated Content on... long-term platform health (qualitative recommendation target)

AIGC creators achieve aggregate engagement comparable to HGC creators by producing content at high volume (a 'scale-over-preference' dynamic).

Analysis of creation and engagement patterns in the dataset showing that AIGC creators compensate for lower per-item engagement by higher production volume, yielding comparable aggregate engagement levels to HGC creators.

high positive Scale over Preference: The Impact of AI-Generated Content on... aggregate engagement per creator (total engagement across produced items)

Consumers show a marked preference for Human-Generated Content (HGC) over Artificial Intelligence-Generated Content (AIGC).

Comparative analysis of consumption behavior in the longitudinal dataset; the paper reports consumption metrics that indicate higher consumer preference for HGC versus AIGC (e.g., relative engagement per item).

high positive Scale over Preference: The Impact of AI-Generated Content on... consumer preference (relative engagement per content type)

AI facilitates access to distant knowledge domains.

Theoretical model (Schumpeterian quality-ladder recombinant-innovation framework). The paper models R&D as recombining ideas across a knowledge space and shows analytically that AI increases firms' ability to combine ideas across longer distances.

high positive Bridging Distant Ideas: the Impact of AI on R&D and Recombin... access to distant knowledge domains (distance of recombinations)

A statistical recalibration technique called conformal prediction can correct this overconfidence, expanding the intervals to achieve the intended coverage.

Application of conformal prediction to the LLM interval outputs in the experiment, resulting in expanded intervals that attain the target coverage.

high positive Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... coverage of recalibrated credible intervals (post-conformal prediction)

Larger, more capable models produce more accurate estimates.

Empirical experiment asking eleven LLMs to estimate population statistics (health prevalence rates, personality trait distributions, labor market figures) and comparing accuracy across models of different capability.

high positive Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... accuracy of population-statistic estimates

The paper proposes five architectural requirements for genuine human oversight systems.

Stated methodological/prescriptive contribution of the paper (a proposal rather than an empirical finding); no sample size or empirical validation reported in the provided excerpt.

high positive Beyond Symbolic Control: Societal Consequences of AI-Driven ... design requirements for systems enabling genuine human oversight

The proposed framework outlines a pathway toward large-scale cooperative intelligence and offers a constructive perspective on the coevolution of human and artificial agents in the informational ecosystems of the future.

Claim about the paper's contribution; based on conceptual synthesis and theoretical framing rather than empirical validation.

high positive A Case for Coevolution emergence of large-scale cooperative intelligence

A voluntary ecosystem of free rational agents, human and artificial, who cooperate through transparent and fair exchange of information maximizes their adaptive capacity and long-term well-being.

Normative proposition in the paper derived from theoretical principles (information theory, collective intelligence); presented as a proposed ideal rather than an empirically tested policy.

high positive A Case for Coevolution adaptive capacity and long-term well-being of participating agents

Emerging opportunities exist for stabilizing these ecosystems through new forms of informational verification and monitoring made possible by advanced artificial agents.

Forward-looking claim grounded in conceptual analysis of capabilities of advanced agents; proposed as an opportunity in the paper rather than demonstrated empirically.

high positive A Case for Coevolution stability of informational ecosystems via verification and monitoring tools

Systems that preserve diversity of exploration while minimizing barriers to information exchange exhibit superior capacity for discovery and adaptation in complex environments.

Theoretical claim supported by the paper's appeal to principles from information theory, adaptive systems, and collective intelligence; presented as an argument rather than as empirically validated result.

high positive A Case for Coevolution capacity for discovery and adaptation

Increasing the strictness of algorithmic control paradoxically increases the evolutionary fitness of coordinated resistance (e.g., coordinated log-offs).

Results from the EGT model and simulations showing fitness/payoff changes for coordinated resistance strategies as platform surveillance strictness parameter increases; model-only (no empirical N reported).

high positive THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... evolutionary fitness (payoff) of coordinated resistance strategies

The future of transformative transformer-based AI is fundamentally many, not one.

Concluding synthesis and normative prediction based on the paper's theoretical arguments and literature synthesis; no empirical data or quantified projection provided in the excerpt.

high positive The Future of AI is Many, Not One architectural and organizational form of future transformative AI (multi-agent/d...

Developing diverse AI teams addresses critics' concerns that current models are constrained by past data and lack the creative insight required for innovation.

Argumentative claim drawing on conceptual critique of current models and the proposed remedy of diverse AI teams; supported by referenced disciplinary literatures but no empirical validation provided in the excerpt.

high positive The Future of AI is Many, Not One creative insight and capacity for innovation in AI systems

Having a diverse team broadens the search for solutions, delays premature consensus, and allows for the pursuit of unconventional approaches.

Theoretical/argumentative claim referencing literature in complex systems and organizational behavior as support; no quantitative evidence or sample reported in the excerpt.

high positive The Future of AI is Many, Not One search breadth, timing of consensus formation, and pursuit of unconventional sol...

Deep intellectual breakthroughs should be expected to come from epistemically diverse groups of AI agents working together rather than singular superintelligent agents.

Predictive/theoretical claim motivated by referenced research and formal results in complex systems, organizational behavior, and philosophy of science; no empirical experiment or sample size given in the excerpt.

high positive The Future of AI is Many, Not One occurrence of deep intellectual breakthroughs (scientific/innovative discoveries...

We should abandon the individual approach if we're hoping for AI to support groundbreaking innovation and scientific discovery.

Normative prescription based on theoretical argument and synthesis of literature from complex systems, organizational behavior, and philosophy of science; no empirical trial or quantified evaluation reported in the excerpt.

high positive The Future of AI is Many, Not One ability of AI to support groundbreaking innovation and scientific discovery

With further development, this approach may exceed traditional methods regarding risk accuracy and help drive innovation in the insurance industry.

Forward-looking claim by the authors extrapolating from current prototype results and potential improvements; no empirical evidence provided that it already exceeds traditional methods.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... risk assessment accuracy and industry innovation

ARQuest shows great potential to improve user satisfaction and streamline insurance processes.

Interpretation based on experimental findings (fewer questions, user preference) and the proposed framework; forward-looking claim rather than a fully established empirical result.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... user satisfaction and process streamlining

Adaptive versions were preferred by users for their more fluid and engaging experience.

User preference reported from the experiments (qualitative/user feedback or preference metric); specific measures and sample size not provided in excerpt.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... user preference / perceived fluidity and engagement

Adaptive versions powered by GPT models required fewer questions.

Experimental result reported in paper comparing question counts between adaptive GPT-powered questionnaires and traditional questionnaires; no numeric counts or sample sizes provided in the excerpt.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... number of questions required (survey length / task completion effort)

Techniques such as social media image analysis, geographic data categorization, and Retrieval Augmented Generation (RAG) are used to extract meaningful user insights and guide targeted follow-up questions.

Described methods/techniques used within the ARQuest system implementation in the paper.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... ability to extract user insights and guide follow-up questions

The ARQuest framework introduces a new approach to underwriting by using Large Language Models (LLMs) and alternative data sources to create personalized and adaptive questionnaires.

Methodological contribution described in the paper (framework design); description of components and intended function rather than a quantified outcome.

high positive AI in Insurance: Adaptive Questionnaires for Improved Risk P... personalization and adaptiveness of questionnaires

Only interventions that reshape risk allocation can plausibly shift stable system-level behaviour.

Argument based on the paper's game-theoretic reasoning and stylised example (theoretical claim; no empirical testing reported in the abstract).

high positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... ability of interventions to shift stable system-level behaviour

Artificial intelligence (AI) is widely promoted as a promising technological response to healthcare capacity and productivity pressures.

Author assertion in the paper's introduction/abstract, based on literature/policy discourse (no empirical sample or quantitative analysis reported in the abstract).

high positive Incentives, Equilibria, and the Limits of Healthcare AI: A G... promotion of AI as a solution to healthcare capacity and productivity pressures

We open-source the complete benchmark, including scenario specifications, ground truth templates, tool implementations, and evaluation scripts.

Paper statement committing to open-sourcing the benchmark components and artifacts.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... availability of open-source benchmark artifacts

We evaluated leading agent frameworks (ReAct, Cursor Agent, Claude Code) paired with frontier LLMs (Claude Sonnet 4.0, GPT-4o, Granite-3.0-8B).

Paper reports extensive evaluations using the listed agent frameworks and LLM models paired together to run the benchmark scenarios.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... evaluation coverage across agent frameworks and LLMs

Execution-based evaluators were implemented with task-commensurate metrics: MAE/RMSE for regression, F1-score for classification, and categorical matching for health assessments.

Paper statement describing the evaluation methodology and the specific metrics used for regression, classification and health-assessment tasks.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... metricized evaluation of model outputs (MAE/RMSE, F1, categorical matching)

We construct 65 specialized tools across two MCP servers to enable interactions for the benchmark.

Paper statement reporting the number of specialized tools (65) and that they are deployed across two MCP servers as part of the benchmark implementation.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... number of specialized tools and server deployment

The benchmark encompasses 75 expert-curated scenarios spanning 7 industrial asset classes (turbofan engines, bearings, electric motors, gearboxes, aero-engines) across 5 core task categories: Remaining Useful Life (RUL) Prediction, Fault Classification, Engine Health Analysis, Cost-Benefit Analysis, and Safety/Policy Evaluation.

Explicit statement in paper listing the number of scenarios (75), number of asset classes (7) and enumerating the 5 task categories; benchmark construction described by authors.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... count and coverage of benchmark scenarios, asset classes, and task categories

PHMForge is the first comprehensive benchmark specifically designed to evaluate LLM agents on Prognostics and Health Management (PHM) tasks through realistic interactions with domain-specific MCP servers.

Paper statement introducing PHMForge as a benchmark and describing its construction to evaluate LLM agents via MCP servers; benchmark implementation is presented in the manuscript.

high positive PHMForge: A Scenario-Driven Agentic Benchmark for Industrial... availability of a domain-specific benchmark for LLM agents

Design implication: adaptive AI coaching systems should align support intensity with individual readiness, rather than assuming universal effectiveness.

Authors' design recommendation derived from experimental results showing heterogeneous effects by personality profile.

high positive Not My Truce: Personality Differences in AI-Mediated Workpla... appropriateness of intervention intensity (design recommendation)

The system is in production, serving 21 industry verticals with 650+ agents.

Deployment claim reported in paper (production system metrics: number of verticals and agents).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... production deployment scale (industry verticals served, agent count)

We propose a framework for output-side ontological validation (response validation, reasoning verification, compliance checking).

Proposed framework described in paper (conceptual/procedural proposal; not described as empirically validated in abstract).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... output-side ontological validation capability

We introduce ontology-constrained tool discovery via SQL-pushdown scoring.

Methodological/implementation contribution described in the paper (technical mechanism introduced).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... tool discovery constrained by ontology using SQL-pushdown scoring

Improvements from ontology coupling are greatest where LLM parametric knowledge is weakest—particularly in Vietnam-localized domains.

Observed pattern reported from the controlled experiment across the five industries, with stronger improvements in Vietnam-localized domains (no per-industry sample sizes reported in abstract).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... relative improvement magnitude by domain / localization

Ontology-coupled agents significantly outperform ungrounded agents on Role Consistency (p < .001, W = .614).

Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... Role Consistency

Ontology-coupled agents significantly outperform ungrounded agents on Regulatory Compliance (p = .003, W = .318).

Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... Regulatory Compliance

Ontology-coupled agents significantly outperform ungrounded agents on Metric Accuracy (p < .001, W = .460).

Controlled experiment with 600 runs; statistical test reported (p-value and W statistic provided in abstract).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... Metric Accuracy

We formalize the concept of asymmetric neurosymbolic coupling, wherein symbolic ontological knowledge constrains agent inputs (context assembly, tool discovery, governance thresholds) while proposing mechanisms for extending this coupling to constrain agent outputs (response validation, reasoning verification, compliance checking).

Theoretical/formalization contribution described in the paper (conceptual and methodological development).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... asymmetric neurosymbolic coupling formalization and proposed mechanisms

Our approach introduces a three-layer ontological framework--Role, Domain, and Interaction ontologies--that provides formal semantic grounding for LLM-based enterprise agents.

Design contribution described in the paper (formal model specification).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... existence of a formal three-layer ontology for semantic grounding

We present a neurosymbolic architecture implemented within the Foundation AgenticOS (FAOS) platform that addresses these limitations through ontology-constrained neural reasoning.

System design and implementation claim: description of architecture and its implementation in the FAOS platform (technical/design evidence reported in paper).

high positive Ontology-Constrained Neural Reasoning in Enterprise Agentic ... ability to constrain LLM reasoning (reduce hallucination, domain drift, improve ...

The analysis identifies seventeen emerging occupational categories benefiting from reinstatement effects, concentrated in human-AI collaboration, AI governance, and domain-specific AI operations roles.

Modeling/taxonomy result reported in the paper listing 17 emerging occupational categories characterized as benefiting from reinstatement effects (human-AI collaboration, governance, operations).

high positive Agentic AI and Occupational Displacement: A Multi-Regional T... emergence/creation of occupational categories (employment opportunities)

Our findings indicate an increasing agent activity in open-source projects.

Trend analysis reported in the paper showing growth in agent-originated activity within the assembled dataset of PRs and associated metadata.

high positive Investigating Autonomous Agent Contributions in the Wild: Ac... agent activity / contributions in open-source projects over time

Effective collaboration with AI for software engineering (SE) tasks may benefit from functional design rather than replicating human SEI traits, thereby redefining collaboration as functional alignment.

Authors' conclusion and recommendation derived from qualitative interview evidence (10 practitioners) and the proposed concept of functional equivalents.

high positive Bridging the Socio-Emotional Gap: The Functional Dimension o... effectiveness of human-AI collaboration in SE tasks

« Prev 1 2 3 … 78 79 80 … 129 130 Next »