Evidence (6491 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

The research contributes a novel sociotechnical architecture class that integrates intent interpretation, schema formalization, and supervised agentic decision support, offering a scalable pathway for inclusive AI-driven enterprise transformation.

Concluding/novelty claim presented by the authors describing the paper's contribution; scalability and inclusiveness are asserted conceptually without empirical scaling or adoption evidence in the excerpt.

high positive From Configuration to Cognition: A Self-Configuring Agentic ... scalability and inclusiveness of AI-driven enterprise transformation via the pro...

The architecture incorporates adaptive agentic orchestration and Cognitive Infrastructure Elasticity, enabling dynamic policy adjustment under demand volatility while preserving human-supervisory governance.

Architectural/design claim in the paper describing system capabilities; no experimental or empirical validation provided in the excerpt.

high positive From Configuration to Cognition: A Self-Configuring Agentic ... capacity for dynamic policy adjustment under demand volatility and preservation ...

The framework operationalizes Intent-to-Schema automation, translating natural-language business intent into structured operational models and reducing configuration debt embedded in traditional metadata-driven systems.

Described as a functionality of the proposed framework in the paper (conceptual/technical claim); no quantitative evaluation or measured reduction of 'configuration debt' reported in the excerpt.

high positive From Configuration to Cognition: A Self-Configuring Agentic ... translation of natural-language intent to operational schemas and reduction of c...

This paper introduces a Self-Configuring Agentic CRM (SC-ACRM) architecture designed to eliminate configuration barriers in micro-retail contexts.

Architectural proposal and description presented in the paper (design-level contribution); no field deployment or empirical validation reported in the excerpt.

high positive From Configuration to Cognition: A Self-Configuring Agentic ... elimination/reduction of configuration barriers for micro-retail CRM

Artificial intelligence (AI) has significantly enhanced enterprise-scale customer relationship management (CRM) systems.

Stated as background/claim in the paper's introduction; no empirical data, sample size, or citations provided in the excerpt.

high positive From Configuration to Cognition: A Self-Configuring Agentic ... enhancement of enterprise-scale CRM systems

An AI Workflow Store of hardened and reusable workflows would allow agents to invoke workflows with far greater reliability and security than improvised tool chains.

Vision/proposal in the paper advocating an AI Workflow Store as a solution; presented conceptually without experimental or deployment evidence.

high positive Engineering Robustness into Personal Agents with the AI Work... reliability and security of agent-invoked workflows

Integrating rigorous software engineering processes into the agentic loop will produce production-grade, hardened, and deterministically-constrained agent workflows that substantially outperform brittle on-the-fly synthesis.

Prescriptive claim / proposed hypothesis in the paper advocating integration of SE practices into agent workflows; offered as a reasoned proposal without empirical results.

high positive Engineering Robustness into Personal Agents with the AI Work... workflow reliability/security and overall performance compared to on-the-fly syn...

Unlike existing datasets, our benchmark utilizes a seed-driven architecture to simulate dynamic environment states and unpredictable API failures, ensuring a deterministic yet diverse evaluation.

Methodological description: seed-driven architecture and simulated API failures; claimed as a distinguishing design feature versus prior datasets.

high positive ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... determinism and diversity of environment states / simulated API failure scenario...

ComplexMCP provides over 300 meticulously tested tools derived from 7 stateful sandboxes, ranging from office suites to financial systems.

Benchmark construction details reported in the paper: >300 tools, 7 stateful sandboxes (explicit counts provided).

high positive ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... number of tools / sandboxes included in the benchmark

We introduce ComplexMCP, a benchmark designed to evaluate agents in rigorous conditions built on the Model Context Protocol (MCP).

Design and construction of the benchmark reported by authors; methodological description (benchmark/tooling claim).

high positive ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdepend... availability of a benchmark implementing MCP for complex, stateful tool evaluati...

The study draws policy implications for EU Cohesion programming and Sustainable Development Goals 4, 8, 9, 10, and 17.

Paper explicitly states policy implications and links to specific SDGs in its conclusions.

high positive Artificial Intelligence, Social Capital, and Sustainable Emp... policy_relevance_to_SDGs_and_cohesion_programming

External technology partnerships, targeted education, and economic incentives operate as enablers [of AI adoption], all mediated by social and human capital availability.

Thematic analysis of interview data identifying these factors as enabling AI adoption, with mediation by social/human capital.

high positive Artificial Intelligence, Social Capital, and Sustainable Emp... enablers_of_AI_adoption

TourMart outputs a sentence a compliance report can quote: 'at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions.'

Reported output/example from TourMart tool summarizing the measured steering effect (rounded result based on experimental measurement).

high positive TourMart: A Parametric Audit Instrument for Commission Steer... commission-steered recommendations per 100 paired traveler sessions (tool-genera...

An extended-n supplement (n=270) confirms significance for Llama-3.1-8B (+2.96pp, p=0.008).

Larger-sample experimental replication/extension reported in the paper with n=270 and a p-value (p=0.008).

high positive TourMart: A Parametric Audit Instrument for Commission Steer... commission-steered recommendations (percentage-point difference between prompts)

A Llama-3.1-8B reader shows +3.50pp steering in the same direction at n=143 (initial test).

Empirical experiment using TourMart with Llama-3.1-8B at same/deployed settings; sample size explicitly reported as n=143 for this test.

high positive TourMart: A Parametric Audit Instrument for Commission Steer... commission-steered recommendations (percentage-point difference between prompts)

At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003).

Empirical experiment using TourMart at specified governance settings (lambda=1, kappa=0.05) comparing commission-aware vs. minimum-disclosure prompts; statistical test reported (exact McNemar). Sample size not stated in excerpt.

high positive TourMart: A Parametric Audit Instrument for Commission Steer... commission-steered recommendations (percentage-point difference in acceptance be...

Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations.

Theoretical/structural argument in paper based on commission heterogeneity and revenue incentives; not an experimental measurement in excerpt.

high positive TourMart: A Parametric Audit Instrument for Commission Steer... incentive to favor higher-margin supplier recommendations

BenchCAD positions itself as a benchmark for measuring and improving the industrial readiness of multimodal CAD automation.

Authors' stated goal/purpose in the paper/abstract describing BenchCAD as a benchmark intended to measure and guide improvements towards industrial readiness.

high positive BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... benchmark_intended_impact_on_industrial_readiness

Industrial CAD code generation requires models to produce executable parametric programs from visual or textual inputs and to understand 3D structure, infer engineering parameters, and choose CAD operations that reflect design and manufacture.

Problem definition and motivation provided by the authors in the paper/abstract describing the necessary capabilities for industrial CAD code generation.

high positive BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... required_capabilities_for_task

BenchCAD enables fine-grained analysis across perception, parametric abstraction, and executable program synthesis.

Authors' description of benchmark scope and tasks designed to probe perception (visual understanding), parametric abstraction (inferring engineering parameters), and executable program synthesis (generating runnable CadQuery programs).

high positive BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... analysis_capability_of_benchmark

BenchCAD evaluates models through visual question answering, code question answering, image-to-code generation, and instruction-guided code editing.

Benchmark design described in the paper/abstract listing four evaluation tasks (VQA, code QA, image-to-code, instruction-guided code editing).

high positive BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... evaluation_task_coverage

BenchCAD contains 17,900 execution-verified CadQuery programs across 106 industrial part families.

Dataset construction reported in the paper/abstract: explicit statement of 17,900 execution-verified CadQuery programs spanning 106 industrial part families (e.g., bevel gears, compression springs, twist drills).

high positive BenchCAD: A Comprehensive, Industry-Standard Benchmark for P... presence_and_scope_of_dataset

Alternatives to one-size-fits-all chatbots—such as pluralistic system design, task-specific tools, and institutional safeguards—would better mitigate social and economic harm.

Prescriptive recommendations based on the paper's analysis; not supported by empirical trials or quantified evaluations within the paper.

high positive What if AI systems weren't chatbots? Effectiveness of pluralistic design, task-specific tools, and institutional safe...

Verification Coverage, a six-component reportable standard with a minimum-composition rule, should sit beside capability scores in model cards, leaderboards, and regulatory disclosures.

Author-proposed metric/standard introduced in the paper as a policy/tool recommendation.

high positive The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... inclusion of 'Verification Coverage' standard alongside capability scores in rep...

The gate to deploy should be 'calibrated verification': authorization should be domain-scoped, independently checkable, monitored after release, accountable, contestable, and revocable.

Normative proposal by the authors (prescriptive recommendation presented in the paper).

high positive The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... recommended features of deployment authorization regime

Model capability is uneven across nearby tasks, so authorization must attach to a specific use rather than to a model in general.

Author claim supported by the conceptual point that model capabilities vary across tasks; used as an argument for use-specific authorization.

high positive The Open-Box Fallacy: Why AI Deployment Needs a Calibrated V... appropriateness of use-scoped authorization vs model-wide authorization

Industrial robots influence global value chain length primarily through technological innovation.

Mechanism analysis in the paper linking robot adoption to technological innovation measures and then to GVC length, based on the IFR and 14-subsector panel data; exact innovation indicators and estimation details not provided in the abstract.

high positive Research on the impact of industrial robot application on th... global value chain length (mediated by technological innovation)

Industrial robots influence global value chain length primarily through human capital upgrading.

Mechanism analysis reported in the paper linking robot adoption to changes in human capital (upgrading) and then to changes in GVC length using the same IFR and panel data; specific tests/mediation approaches not detailed in the abstract.

high positive Research on the impact of industrial robot application on th... global value chain length (mediated by human capital upgrading)

Industrial robots promote participation in global production networks within capital-intensive industries (i.e., they increase global value chain length for capital-intensive sectors).

Subsample or heterogeneous-effects analysis across capital-intensive vs. labor-intensive sub-sectors using the panel of 14 Chinese manufacturing sub-sectors; results reported for capital-intensive industries as positive effect on GVC participation/length.

high positive Research on the impact of industrial robot application on th... participation in global production networks / global value chain length (capital...

The application of industrial robots significantly extends the length of global value chains in manufacturing.

Empirical analysis using IFR robot data and panel data on 14 manufacturing sub-sectors; significance reported in paper (panel regression results). Exact model specifications and significance levels not provided in the abstract.

high positive Research on the impact of industrial robot application on th... global value chain length

Regulatory modernisation, secure national data infrastructure and targeted digital training are essential to enable sustainable innovation in valuation practice.

Policy and practitioner recommendations derived from interview data and thematic analysis; synthesis into prescriptive recommendations.

high positive Exploring barriers to valuation technology adoption in prope... enablers of sustainable VTech innovation

The framework is illustrated with applications in income-based social protection programs and humanitarian demining in Colombia, where the tension between screening costs and allocation efficiency is operationally consequential.

Applied examples / case studies presented in the paper (applications to social protection and humanitarian demining contexts).

high positive The Limits of AI-Driven Allocation: Optimal Screening under ... operational consequences of screening cost vs allocation efficiency trade-off

Efficiency gains from screening grow as the aleatoric uncertainty in the population increases.

Empirical characterization and/or model-based analysis presented in the paper (claims based on theoretical comparative statics and illustrative empirical examples).

high positive The Limits of AI-Driven Allocation: Optimal Screening under ... efficiency gains from screening (improvement in allocation performance)

In a two-stage allocation framework where a screening stage observes true outcomes for a subset of units before a final allocation under a fixed coverage budget, the optimal strategy screens units at the margin of algorithmic allocation while directly targeting the highest-risk units.

Analytical result derived from the paper's two-stage allocation model (theoretical/mathematical analysis of optimal screening and allocation policy).

high positive The Limits of AI-Driven Allocation: Optimal Screening under ... allocation efficiency / optimality of screening and targeting strategy

Algorithmic targeting is typically cheaper and faster than traditional screening procedures that directly observe the latent vulnerability status through physical verification.

Comparative claim stated in paper introduction; presented as typical advantage of algorithmic targeting (background rationale).

high positive The Limits of AI-Driven Allocation: Optimal Screening under ... cost and speed of targeting procedures

The rise of machine learning has shifted targeted resource allocation in policy and humanitarian settings toward algorithmic targeting based on predicted risk scores.

Descriptive statement in paper introduction; references to the adoption of algorithmic targeting in policy/humanitarian contexts (motivation/background rather than new empirical data).

high positive The Limits of AI-Driven Allocation: Optimal Screening under ... use of algorithmic targeting (shift in allocation method)

The paper articulates a research agenda for how MASS should be modeled, evaluated and governed.

Stated in the abstract (position paper concludes with an articulated research agenda); evidence is the discussion and proposed agenda sections in the paper.

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... proposed research directions for modeling, evaluation and governance of MASS

The importance of each structural prior is demonstrated through formal propositions.

Methodological claim in the abstract that the paper provides formal propositions demonstrating the role/importance of the four priors; evidence contained in proofs/propositions within the paper.

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal/theoretical demonstration of the role of each structural prior

MASS is represented as a class of dynamical systems of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability.

Descriptive claim from the abstract about the formal structure of MASS; supported by the framework and definitions presented in the paper (formal/modeling content).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal representation of multi-agent dynamics via four structural priors

The paper formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.

Direct methodological claim in abstract indicating the authors present a formal framework (MASS) in the paper; evidence consists of the formalization provided in the paper (propositions, definitions).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... formal modeling of agent interactions and system-level outcomes

Agentic AI systems must be modeled with social theory as a structural prior.

Normative / prescriptive claim from the paper's abstract (position paper arguing for this modeling choice; supported by the authors' theoretical arguments and formal framework in the paper).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... modeling approach for agentic AI systems (use of social-theory structural priors...

Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts.

Historical/contextual claim in the abstract; supported by reference to social-science literature (no sample size; general scholarly consensus).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... existence of a body of social-science research on emergent group dynamics

In multi-agent social settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time.

Conceptual claim in the paper's abstract, supported by the paper's argumentation and references to social-science literature on emergent dynamics (formal development likely in main text).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... emergent system-level behavior resulting from agent interactions

Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans (e.g., social media platforms, multi-agent LLM pipelines, autonomous robotics fleets).

Statement from the paper's abstract and motivating examples; implied supporting citation/literature review in the paper (no empirical sample size reported in abstract).

high positive Social Theory Should Be a Structural Prior for Agentic AI: A... deployment prevalence of agentic AI inside social environments (multi-agent sett...

Evaluation indicates improved architectural consistency and deployability compared to general-purpose AI code generation workflows, suggesting that constraint-aware retrieval is essential for aligning AI-assisted service development with production software engineering practices.

Paper reports an evaluation comparing the proposed retrieval-augmented scaffolding approach to general-purpose AI code generation workflows and concludes improvements in architectural consistency and deployability; the excerpt does not provide evaluation design details, metrics, or sample size.

high positive Architectural Constraints Alignment in AI-assisted, Platform... architectural consistency and deployability

By combining template retrieval with structured interaction, the method embeds production-relevant considerations during service scaffolding.

Paper's description of the mechanism by which the proposed approach operates (template retrieval + structured interaction) to incorporate production concerns; presented as a design claim without detailed empirical quantification in the excerpt.

high positive Architectural Constraints Alignment in AI-assisted, Platform... embedding of production-relevant considerations in scaffolding

We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities.

Methodological contribution described in the paper: a retrieval-augmented scaffolding method combining template retrieval and agentic clarification loops; this is a proposed approach rather than reported empirical proof in the provided text.

high positive Architectural Constraints Alignment in AI-assisted, Platform... exposure and resolution of architectural constraint ambiguities

AI-assisted development tools enable rapid prototyping of services.

Stated assertion in paper's introduction/abstract that AI-assisted tools speed up prototyping; no quantitative evaluation or sample size given in the provided text.

high positive Architectural Constraints Alignment in AI-assisted, Platform... rapid prototyping (development speed/productivity)

The C³ Framework provides implementable design patterns and testable propositions intended to help accounting leaders capture productivity gains from human + AI work while preserving accountability, consistency, and alignment with governance expectations in high-stakes reporting contexts.

Conclusions section stating intended practical utility; presented as intended outcomes of applying the proposed framework, not as empirically demonstrated results in this paper.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... organizational_efficiency

The paper proposes a role taxonomy that clarifies review responsibility, escalation thresholds, and evidence retention for human–AI collaboration in accounting.

Results section proposing a role taxonomy as part of the C³ Framework; presented as a design artifact derived from synthesis of research and guidance.

high positive Collaborative Intelligence in Accounting: A Human + AI Compl... task_allocation

« Prev 1 2 3 … 61 62 63 … 129 130 Next »