Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

We implement an Adversarial Multi-Agent Quality Control (QC) loop in which evaluator agents iteratively critique generated frames and prompt generators to refine outputs until a deterministic consensus is reached.

Method description of a multi-agent adversarial QC loop used in the pipeline; no experimental protocol, number of agents, or sample sizes provided in this sentence.

high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... iterative refinement / consensus-driven quality control

Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines.

Methodological description in paper indicating a retrieval-based module for extracting Brand DNA used to condition generation; no evaluation metrics or sample sizes provided in this statement.

high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... parameterization of generation by brand guidelines

We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production.

Paper describes the proposed system architecture (Genflow) as a methodological contribution; description of modules and pipeline provided but no external validation details in this sentence.

high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... ability to enforce brand consistency

Recent advancements in generative video models demonstrate high visual fidelity.

Asserted in paper as a background observation about recent generative video models; no specific dataset, benchmark, or sample size reported.

high positive Genflow Ad Studio: A Compound AI Architecture for Brand-Alig... visual fidelity

Benchmark comparisons of multiple LLM backends (Granite-Docling, Mistral-Small, DeepSeek-OCR) were performed to provide practical insights for production deployment.

Authors state they performed benchmark comparisons of multiple LLM backends (listed in abstract); specifics of metrics and sample sizes not given in abstract.

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... comparative performance of LLM backends for deployment

A comprehensive sustainability analysis shows that the hybrid AI+HITL approach reduces CO2 emissions by 69%, energy consumption by 69%, and water usage by 63% compared to traditional manual processing.

Authors report a sustainability analysis comparing hybrid AI+HITL approach to traditional manual processing (details not provided in abstract).

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... CO2 emissions, energy consumption, water usage

Prompt Fine Tuning with Feedback Inheritance (PFTFI) is a novel approach introduced in this work.

Authors explicitly introduce PFTFI as part of their approach (stated in abstract).

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... novelty of PFTFI method

The system integrates five specialized agents—Classificator, Splitter, Parser, Extraction, and Validator—together with a Human-in-the-Loop mechanism and a Prompt Fine Tuning with Feedback Inheritance (PFTFI) approach.

Authors' architectural description in the abstract specifying the five agents, HITL mechanism, and PFTFI approach.

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... system architecture components (agent integration)

MADP combines deep learning-based classification and parsing with large language model extraction while maintaining accuracy through selective human validation.

System description in paper asserting integration of DL classification/parsing, LLM extraction, and selective human validation; supported by system evaluations reported elsewhere in abstract.

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... maintenance of accuracy via selective human validation

Ablation evaluation on a stratified 100-document subset demonstrates that the full MADP configuration with Human-in-the-Loop supervision attains 98.5% document-level accuracy.

Ablation evaluation on a stratified subset of 100 documents (5 documents per each of 20 supplier/document-type categories) reported by authors.

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... document-level accuracy

Only 3% of documents required non-AI fallback in the production deployment.

Same production deployment on 955 documents (stated in abstract).

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... proportion requiring non-AI fallback

Production deployment on 955 real-world documents processed through January 2026 achieves a 97.0% full-pipeline automation rate.

Reported production deployment on 955 real-world documents processed through January 2026 (stated in abstract).

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... full-pipeline automation rate

Operational analysis on a production use-case scenario of 100,000 invoices per year indicates a potential reduction of Full-Time Equivalent (FTE) requirements by approximately 70%.

Operational analysis reported by authors on a production use-case scenario involving 100,000 invoices per year (stated in abstract).

high positive MADP: A Multi-Agent Pipeline for Sustainable Document Proces... Full-Time Equivalent (FTE) requirements

Trace-Prior RL adds bounded adaptation under capacity asymmetry.

Experiments contrasting Trace-Prior RL versus behavior cloning and reward-only approaches in settings with capacity asymmetry, showing Trace-Prior RL permits limited/adaptive deviation while preserving trace alignment.

high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... bounded adaptation (ability to adapt under capacity asymmetry while preserving t...

Pure behavior cloning is nearly enough for symmetric imitation.

Empirical results in symmetric imitation settings (presumably in the two-hotel or bidding benchmarks) showing behavior cloning achieves close imitation without additional RL.

high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... imitation fidelity in symmetric settings

Trace-prior or corrected-history policies better preserve price or bid distributions.

Comparative experiments and ablations across the two-hotel benchmark and hidden-budget bidding task showing trace-prior and corrected-history policies retain price/bid distribution characteristics better than reward-only variants.

high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... preservation of price or bid distributions

Revealing hidden state reduces label uncertainty.

Experiments (hidden-state ablations) in the compact hidden-budget bidding task and/or two-hotel benchmark where providing hidden state information to the learner reduced uncertainty in inferred labels.

high positive When Outcome Looks Right But Discipline Fails: Trace-Based E... label uncertainty

A year-long pilot across three clinical sites executed 8,728 cohort-enrolled workflow runs with a 97.08% completion rate under an early prototype without the verified-core subsystem.

Reported evaluation: year-long pilot conducted across three clinical sites, total workflow runs = 8,728, reported completion rate = 97.08%; prototype lacked verified-core subsystem.

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... workflow run completion rate

Swimlanes make trust boundaries explicit, separating verified logic from external systems, human judgment, and AI decisions.

Design description in the paper explaining swimlane use to delineate trust boundaries between system components and humans/external systems/AI.

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... clear trust boundaries / separation of concerns (design feature)

At runtime a durable engine records outcomes in an append-only event log and can enforce contracts at system boundaries, supporting replay, retries, and audit.

System architecture description in the paper describing runtime engine features (append-only log, enforcement, replay/retry, audit support).

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... runtime durability, auditability, and recoverability (design features)

At compile time GraphFlow restricts diagrams to produce reusable automations whose contracts (preconditions, postconditions, and composition obligations) are intended to be proof-checked before admission to a shared library.

Design/specification claim in the paper describing compile-time restrictions and proof-checked admission model (implementation/design detail).

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... contract verification / reusability (design intention)

GraphFlow treats workflow diagrams as the executable specification — a single artifact defining data scope, execution semantics, and monitoring — to address the gap between durable execution and semantic correctness.

System design description in the paper explaining GraphFlow's design philosophy and intended role of diagrams as executable specifications.

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... specification completeness / clarity (design intent)

Existing workflow platforms provide durable execution and observability.

Author statement in background/motivation describing properties of existing workflow platforms.

high positive GraphFlow: An Architecture for Formally Verifiable Visual Wo... platform durability and observability (feature presence)

Context engineering (programmatic state abstraction and clean task decomposition) is generally more cost-effective than deeper per-agent deliberation.

Cost-effectiveness measured as returns per token spent (RPTS) across configurations that vary context representation and deliberation; results from the 3,475-episode controlled study indicate context changes yielded larger returns per token than adding deliberation tools.

high positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... returns per token spent (RPTS)

Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations.

Controlled empirical study in the CybORG CAGE-2 POMDP environment comparing context representations (raw observations vs. deterministic state-tracking layer with compressed history) across five model families, six models, and twelve configurations with token-level cost accounting (3,475 episodes).

high positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... mean return (and returns per token spent, RPTS)

Companies that train workers outperform those that simply cut them.

Claim presented as one of the five lessons, based on historical analogy and emerging workplace evidence (chapter asserts firms that invest in training do better).

high positive 7. AI and the Future of Work firm performance (outperformance of training firms relative to cutting firms)

For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

Conclusion/recommendation drawn from the paper's modeling results and analysis (argument that installed MW is a poor planning metric compared to time-varying deployable capacity).

high positive Designing Datacenter Power Delivery Hierarchies for the AI E... planning objective (deployed capacity over time vs installed MW)

The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure.

Method claim: framework integrates projection models and operational data from Microsoft Azure (production data grounding); stated in the paper's methods summary.

high positive Designing Datacenter Power Delivery Hierarchies for the AI E... realism/grounding of projection models (use of Azure production data)

We develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences.

Methodological claim describing the paper's core contribution: a simulation/evaluation framework combining throughput, power, and cost metrics with arrival/oversubscription/decommissioning sequences; based on the authors' implementation (details and data referenced in the paper).

high positive Designing Datacenter Power Delivery Hierarchies for the AI E... throughput, power utilization, cost metrics over deployment sequences

Designs must remain efficient over long datacenter lifetimes and multiple hardware generations.

Normative/design recommendation motivated by long asset lifetimes and evolving hardware density; stated as a requirement in the paper.

high positive Designing Datacenter Power Delivery Hierarchies for the AI E... design efficiency over time

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027.

Projection models for GPU deployments described in the paper (projection models combined with industry deployment assumptions); specific provenance referenced in the abstract but no sample size reported.

high positive Designing Datacenter Power Delivery Hierarchies for the AI E... rack power density (MW per deployment)

Ülkelerin yapay zekâ kaynaklı yapısal dönüşüme uyum sağlayabilmesi için koordineli ve uzun vadeli politika çerçevelerine ihtiyaç vardır; ticaret politikası, sanayi politikası ve dijital düzenlemeler bütünleşik bir strateji dahilinde ele alınmalıdır.

Çalışmanın sonuç ve politika önerisi bölümü; normatif tavsiye ve koordinasyon gereksinimi üzerine argüman; ampirik kanıt veya uygulama örnekleri verilmiyor.

high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... ulusal ve uluslararası politika koordinasyonunun gerekliliği ve bütünleşik strat...

Gelişmekte olan ülkeler için dijital altyapıya erken yatırım yapmak yeni rekabet gücü pencereleri açabilir.

Kavramsal argüman; politika yönelimi ve stratejik öneri; ampirik test veya nicel kanıt sunulmamıştır.

high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... erken dijital altyapı yatırımı yapan ülkelerin rekabet gücü (yeni fırsatların or...

Otomasyon ve akıllı üretim sistemlerinin yaygınlaşmasıyla ucuz işgücüne dayalı karşılaştırmalı üstünlüklerin aşınması ve üretimin gelişmiş ekonomilere veya müttefik ülkelere geri dönüşünü (reshoring ve friendshoring) ifade eden eğilimlerin ivme kazanması beklenmektedir.

Kavramsal analiz ve beklenen teknoloji→tüketim/üretim mekanizmalarına ilişkin mantıksal çıkarımlar; çalışmada ampirik test veya nicel veri sunulmamıştır.

high positive Yapay Zekâ ve Küresel Değer Zincirleri: Ticaret Politikası v... reshoring ve friendshoring eğilimlerinin artışı (üretimin coğrafi yeniden yerleş...

Higher sectoral digitalization potential strongly increased remote work: DiD estimate 40.74 percentage points (p < 0.001); remote work rose from 17.6% to 82.1% in highly digitalized sectors versus 1.3% to 6.6% in less digitalized sectors.

Difference-in-differences (DiD) analysis using the COVID-19 shock as quasi-natural experiment on quarterly panel data for 27 EU Member States (2018–2024), N = 36,685; reported DiD estimate = 40.74 percentage points, p < 0.001; descriptive pre/post shares reported for both groups.

high positive Digital transformation and labor market indicators in the EU... share of remote work (percent of work done remotely)

Higher sectoral digitalization potential has a statistically significant positive effect on wages (hourly wages).

Difference-in-differences (DiD) analysis using the COVID-19 shock as quasi-natural experiment on the same quarterly panel (27 EU Member States, 2018–2024), N = 36,685; reported DiD coefficient = 0.52 €/hour, p < 0.001; authors state this corresponds to ≈4.6% increase in the wage gap between highly and less digitalized activities.

high positive Digital transformation and labor market indicators in the EU... hourly wages

The study's findings offer actionable insights for managers and policymakers to leverage AI for sustainable organizational growth while safeguarding employee well-being.

Authors' concluding statement based on survey findings and analytical results.

high positive Opportunities and Challenges of Human- AI Collaboration in W... practical relevance of findings for management and policy decisions

Successful human–AI collaboration requires a human-centric approach that balances technological advancement with workforce development, ethical governance, and organizational support.

Study conclusion/recommendation based on survey findings (perceptions of opportunities and challenges) and analytical results (correlation/regression).

high positive Opportunities and Challenges of Human- AI Collaboration in W... effective implementation of human–AI collaboration (organizational success facto...

Human–AI collaboration reduces employees' routine workload.

Respondent perceptions collected via the structured questionnaire and analyzed with descriptive statistics and regression in SPSS.

high positive Opportunities and Challenges of Human- AI Collaboration in W... amount of routine work assigned to employees

AI-based systems support better decision-making by providing data-driven insights, allowing employees to focus on higher-level cognitive and strategic activities.

Survey responses (structured questionnaire) analyzed with SPSS (correlation and regression analyses) reporting perceived support for decision-making.

high positive Opportunities and Challenges of Human- AI Collaboration in W... decision-making quality / decision-support

Human–AI collaboration significantly enhances workplace efficiency and productivity by reducing routine workload and improving accuracy and speed in task execution.

Primary data from employees in AI-enabled organizations collected via a structured questionnaire (5-point Likert); analyzed with SPSS using descriptive statistics and regression analysis.

high positive Opportunities and Challenges of Human- AI Collaboration in W... workplace efficiency and productivity (reduction in routine workload, improved a...

Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond the theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.

Simulation experiments calibrated to a real multifamily rental market; simulations test finite-horizon settings, product heterogeneity, and nonlinear logit demand formulations.

high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... occurrence of supra-competitive prices in simulated market environments

Under symmetric exploration, prices can reach monopoly levels.

Theoretical result derived in the ODE analysis showing convergence to monopoly-level prices in symmetric exploration scenarios.

high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... price level (specifically reaching the monopoly price)

Supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price.

Analytical characterization from the fluid-limit ordinary differential equation (ODE) analysis of the explore-then-exploit pipeline with misspecified monopoly-style estimation.

high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... whether equilibrium prices are supra-competitive (above Nash)

Simple algorithmic pricing systems can systematically produce collusive-like (supra-competitive) prices in multi-firm markets.

Theoretical model of multi-firm pricing with an explore-then-exploit pipeline and misspecified monopoly-style demand estimation; fluid-limit ODE analysis characterizing convergence; supporting simulations calibrated to a real multifamily rental market.

high positive Misspecified Explore-then-Exploit Leads to Supra-Competitive... price level relative to the Nash equilibrium

Continuous, simulation-driven prompt optimization is both tractable and necessary for reliable enterprise conversational AI at scale.

Concluding claim in abstract: 'Our results suggest that continuous, simulation-driven prompt optimization is both tractable and necessary...'.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... feasibility and necessity of continuous simulation-driven prompt optimization

PRISM is designed to run on a scheduled basis (daily), treating LLM behavioral drift as a first-class reliability concern.

Design statement in abstract describing scheduled daily runs to monitor behavioral drift.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... scheduled (daily) monitoring frequency

PRISM diagnoses root causes of failures and surgically repairs the prompt, iterating until all tests pass.

Methodological description in abstract stating diagnosis and iterative repair loop until tests pass.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... automated failure diagnosis and prompt repair (iteration to pass all tests)

PRISM simulates full multi-turn conversations against a platform-faithful LLM environment and evaluates pass/fail using an LLM-as-judge.

Method/architecture claim in abstract describing simulation of multi-turn conversations and LLM-based judging.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... simulation and automated evaluation of conversations

PRISM automatically generates test cases from plain-language agent requirements.

Methodological description in abstract stating PRISM takes plain-language requirements and automatically generates test cases.

high positive PRISM: Prompt Reliability via Iterative Simulation and Monit... test-case generation from requirements

« Prev 1 2 3 … 114 115 116 … 276 277 Next »