Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Artificial intelligence (AI) has a positive but weaker impact on sustainable development relative to digital transformation, reflecting its complementary and maturity-dependent role within the digital ecosystem.

Same System GMM regressions on panel of MENA economies (2010–2023) that include measures of AI and digital transformation; reported positive but smaller coefficient for AI.

high positive Digital Transformation, AI Efficiency, and Sustainable Devel... sustainable development

Digital transformation is the primary driver of sustainable development in MENA economies, exerting a stronger and more consistent effect than AI.

Dynamic panel data analysis of MENA economies (2010–2023) using System GMM; reported comparative effect sizes of digital transformation vs. AI in regression results.

high positive Digital Transformation, AI Efficiency, and Sustainable Devel... sustainable development

In the ICT industry, Tobin's Q significantly increased following AI adoption (heterogeneous positive effect).

Subgroup/heterogeneity analysis within the main sample (KOSDAQ firms 2018–2025), estimating the post-adoption effect of AI on Tobin's Q in firms classified as ICT.

high positive The Dynamic Causal Effects of Corporate AI Adoption on Profi... Tobin's Q (market value) in ICT-industry firms

The authors propose corresponding analytical extensions to the framework to address the three structural breaks in agentic systems.

Paper presents proposed analytical extensions (methodological proposals) tied to each identified structural break.

high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... availability of proposed analytical extensions for governance framework

Cross-architecture comparison reveals a governance coverage gradient: deterministic rule engines achieve full DES-property fillability.

Analytic cross-architecture comparison reported in the paper (comparative analysis across four architectures); deterministic rule engines identified as achieving 'full' fillability of DES-properties.

high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... DES-property fillability (completeness of governance evidence coverage)

The paper synthesizes an operational governance evidence framework composed of: structural accountability collapse diagnostics, decision trace schemas, evidence sufficiency measurement, and label-free monitoring, integrated into a chain.

Methodological contribution: authors construct and present a synthesized framework from those four components (conceptual/analytical synthesis).

high positive Governed Auditable Decisioning Under Uncertainty: Synthesis ... presence and structure of an operational governance evidence framework

The Barcelona Declaration offers a promising forum for boundary governance.

Policy recommendation pointing to an existing initiative (Barcelona Declaration) as a suitable forum; stated without empirical evaluation in the excerpt.

high positive Market Dynamics, Governance and Open Research Metadata in th... suitability of the Barcelona Declaration as a forum for boundary governance

Governance should calibrate the annulus, not abolish it: thin enough to serve research efficiently, wide enough to sustain innovation.

Normative policy recommendation from the authors; based on their conceptual framework rather than on empirical policy evaluation in the excerpt.

high positive Market Dynamics, Governance and Open Research Metadata in th... optimal governance calibration of the annulus balancing research efficiency and ...

Artificial intelligence reshapes the annulus by lowering barriers to basic structuring.

Conceptual claim in the paper; asserted as an effect of AI on metadata production without empirical estimates in the excerpt.

high positive Market Dynamics, Governance and Open Research Metadata in th... barriers to basic structuring of metadata

The proposed framework is intended to serve as a practical reference for engineering teams and decision-makers navigating enterprise LLM adoption.

Author statement of intent in the paper (qualitative claim about intended audience and utility).

high positive Buy Or Build? A Practitioner’s Framework for Large Language ... practical utility for engineering teams and decision-makers

The buy-versus-build decision should be viewed as a phased continuum: initial API adoption can give way to hybrid architectures as organizational maturity and requirements evolve.

Conceptual argument in the paper, illustrated by the Bills Converter experience (single-case narrative recommending phased/hybrid progression).

high positive Buy Or Build? A Practitioner’s Framework for Large Language ... recommended adoption pathway (phased/API→hybrid)

In the end-to-end development of the Bills Converter, the authors chose a closed-source, API-based approach over self-hosted or custom-built alternatives.

Case study: the Bills Converter system (single end-to-end project described in the paper).

high positive Buy Or Build? A Practitioner’s Framework for Large Language ... adoption decision (choice of architecture: API-based closed-source vs self-hoste...

This paper presents a multi-dimensional decision framework that synthesizes technical, financial, and strategic considerations into a coherent evaluation methodology for enterprise LLM adoption.

The paper is explicitly framed as presenting a decision framework; supported by conceptual synthesis and exposition within the manuscript (no reported quantitative validation).

high positive Buy Or Build? A Practitioner’s Framework for Large Language ... quality/usefulness of decision-making framework for enterprise LLM adoption

At the country level, digitalisation and workplace training provision steepen the exposure–adoption gradient.

Country-level heterogeneity analysis using the 2024 EWCS (35 countries) linking national measures of digitalisation and prevalence of workplace training to stronger occupational exposure–adoption relationships.

high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)

Individual skills, non-routine cognitive job content within occupations, and employee say in organisational decisions steepen the exposure–adoption gradient.

Interaction and stratified analyses from the 2024 EWCS showing stronger exposure–adoption associations among workers with higher individual skills, more non-routine cognitive job content (within occupations), and greater employee influence over organisational decisions; sample >36,600 workers.

high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI (interaction with exposure)

Occupational exposure strongly predicts uptake.

Associational/regression analysis using the 2024 EWCS linking occupation-level measures of AI exposure to individual-level self-reported adoption; sample >36,600 workers across 35 countries.

high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI

Adoption averages 12% but ranges from under 3% to 25% across countries.

Descriptive analysis of the 2024 European Working Conditions Survey (EWCS), sample of more than 36,600 workers in 35 countries; country-level tabulations of self-reported generative AI adoption.

high positive Generative AI at Work: From Exposure to Adoption across 35 E... self-reported adoption of generative AI

Our baseline model finds evidence that AI is productivity enhancing.

Results from the paper's stated baseline empirical model using BEA industry-account-based measures; model specification described by authors.

high positive Early Estimates of the Impact of AI Within BEA’s Industry Ec... productivity

States can adjust their foreign policies to this fact by focusing on resilience, technological sovereignty, strategic decoupling, and coordination through alliances.

Policy-prescriptive recommendations based on the paper's theoretical framework and analysis; no empirical testing or sample size reported in the abstract.

high positive ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... effectiveness of foreign policy adjustments (resilience, sovereignty, decoupling...

ClawNet enables multiple users to collaborate securely through their respective agents.

Capability claim about the instantiated system (authors assert that ClawNet enables secure multi-user collaboration; excerpt contains no empirical security evaluation or user study).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... secure multi-user collaboration enabled by agent-mediated interactions

We instantiate this paradigm in ClawNet, an identity-governed agent collaboration framework that enforces identity binding and authorization verification through a central orchestrator.

Implementation claim: authors state they built ClawNet as an instantiation of their paradigm (paper describes framework/architecture; no experimental evaluation included in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... existence of an implemented framework (ClawNet) enforcing identity binding and a...

Action-level accountability logs every operation against its owner's identity and authorization, ensuring full auditability.

Design claim describing an accountability primitive (paper asserts logging and auditability as a property; no audit or verification evidence shown in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... auditability of agent actions (logging tied to owner identity/authorization)

Scoped authorization enforces per-identity access control and escalates boundary violations to the owner.

Design/specification claim describing the scoped authorization governance primitive in the proposed paradigm (no empirical or security evaluation provided in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... access control enforcement and escalation behavior

The paradigm rests on three governance primitives: (1) a layered identity architecture that separates a Manager Agent from multiple context-specific Identity Agents; the Manager Agent holds global knowledge but is architecturally isolated from external communication.

Architectural/design claim describing the proposed layered identity primitive (presentation of design; no empirical validation in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... identity architecture and information flow constraints

We propose a human-symbiotic agent paradigm in which each user owns a permanently bound agent system that collaborates on the owner's behalf, forming a network whose nodes are humans rather than agents.

Design proposal / conceptual architecture presented in the paper (no large-scale deployment or empirical evaluation described in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... structure of agent networks (human-centric vs agent-centric) and delegation mode...

The next frontier for AI agents lies not in stronger individual capability, but in the digitization of human collaborative relationships.

Normative/strategic claim advanced by the authors as the central thesis (conceptual argument, no empirical test reported).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... focus of AI-agent development (individual capability vs collaboration digitizati...

Human productivity rests on the social and organizational relationships through which people coordinate, negotiate, and delegate.

Theoretical/argumentative claim presented as background motivation (conceptual reasoning, citation not provided in excerpt).

high positive ClawNet: Human-Symbiotic Agent Network for Cross-User Autono... human productivity as mediated by social/organizational relationships

Time Series Augmented Generation (TSAG) enables LLM agents to delegate quantitative tasks to verifiable external tools.

Description of TSAG framework in paper stating delegation mechanism to external verifiable tools for quantitative computations.

high positive Time Series Augmented Generation for Financial Applications delegation capability to external tools

We publicly release the evaluation framework and empirical insights to foster standardized research on reliable financial AI.

Paper states that the framework, benchmark, and empirical results are released publicly by the authors.

high positive Time Series Augmented Generation for Financial Applications public release of resources

The results demonstrate that capable agents can achieve near-perfect tool-use accuracy with minimal hallucination, validating the tool-augmented paradigm.

Empirical results from the authors' experiments on the 100-question benchmark across multiple agents; paper states agents achieve 'near-perfect' tool-use accuracy and 'minimal' hallucination.

high positive Time Series Augmented Generation for Financial Applications tool-use accuracy; hallucination rate

We apply this methodology in a large-scale empirical study using our framework, Time Series Augmented Generation (TSAG), where an LLM agent delegates quantitative tasks to verifiable, external tools.

Paper reports applying the TSAG framework in an empirical study in which agents call external tools to perform quantitative computations; described as 'large-scale' and implemented by the authors.

high positive Time Series Augmented Generation for Financial Applications use of external/verifiable tools by LLM agents

We introduce a novel evaluation methodology and benchmark designed to rigorously measure an LLM agent's reasoning for financial time-series analysis.

Paper describes a new methodology and benchmark (Time Series Augmented Generation, TSAG) developed by the authors for evaluating LLM reasoning on financial time-series tasks.

high positive Time Series Augmented Generation for Financial Applications existence of a new evaluation methodology / benchmark

Effective evaluation-driven loop scaling is a central axis for advancing LLM-driven scientific discovery, and SimpleTES provides a simple yet practical framework for realizing these gains.

High-level claim supported by the aggregate experimental results and discussion in the paper.

high positive Evaluation-driven Scaling for Scientific Discovery impact of scaling evaluation-driven discovery loops on LLM-driven scientific dis...

When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover.

Experiments in which models were post-trained on successful SimpleTES trajectories and evaluated on both seen and unseen problems (paper claim of improved efficiency and generalization).

high positive Evaluation-driven Scaling for Scientific Discovery post-training efficiency on seen problems and generalization to unseen problems ...

SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning.

Methodological claim and supporting experiments where SimpleTES generates solution trajectories that are then used as supervision for learning.

high positive Evaluation-driven Scaling for Scientific Discovery availability and usefulness of trajectory-level histories for supervision

We discovered new Erdos minimum overlap constructions that surpass the best-known results.

Reported novel combinatorial constructions (Erdos minimum overlap) in the experiments that improve on prior best-known results.

high positive Evaluation-driven Scaling for Scientific Discovery quality of Erdos minimum overlap constructions (best-known benchmarks)

We designed quantum circuit routing policies that reduce gate overhead by 24.5%.

Experimental results reported for quantum circuit routing tasks showing a 24.5% reduction in gate overhead when using SimpleTES-designed policies.

high positive Evaluation-driven Scaling for Scientific Discovery quantum circuit gate overhead

We sped up the widely used LASSO algorithm by over 2x.

Benchmarking experiment reported in the paper comparing LASSO runtime/performance with and without SimpleTES (paper states >2x speedup).

high positive Evaluation-driven Scaling for Scientific Discovery LASSO algorithm runtime / speed

SimpleTES consistently outperforms both frontier-model baselines and sophisticated optimization pipelines.

Comparative experimental evaluation vs. frontier-model baselines and optimization pipelines across the reported problems (paper claim).

high positive Evaluation-driven Scaling for Scientific Discovery performance relative to baselines (solution quality / discovery success)

Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models.

Empirical experiments reported across 21 problems in six domains using gpt-oss models (paper states 21 problems).

high positive Evaluation-driven Scaling for Scientific Discovery ability to discover state-of-the-art solutions (solution quality / discovery suc...

We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection.

Methodological contribution described in the paper (framework design and algorithmic description).

high positive Evaluation-driven Scaling for Scientific Discovery framework design combining parallel exploration, feedback-driven refinement, and...

Given historical inequities in housing placement, it is crucial to audit LLM use in this context.

Authors' policy/recommendation motivated by historical inequities in housing placement and their empirical audit findings; presented as an argument in the report rather than a quantified experimental result.

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... need for auditing LLMs (policy recommendation)

Leveraging LLMs to augment tabular classification with casenote summaries can safely incorporate additional text information with low implementation burden.

Authors' reported experiments and practical assessment on augmenting tabular classifiers with LLM-derived casenote summaries from a nonprofit outreach dataset; described as having low implementation burden and being safe to use. (No sample size given in abstract.)

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... feasibility/safety of augmenting tabular models with LLM casenote summaries; imp...

A fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities on the housing placement multi-class classification task.

Empirical audit of LLM-based tabular classification on a real housing placement prediction task augmented with street outreach casenotes from a nonprofit partner; authors report multi-class classification experiments comparing fine-tuned models with and without casenote summaries and auditing error disparities across groups. (Sample size not stated in the abstract.)

high positive Auditing LLMs for Algorithmic Fairness in Casenote-Augmented... multi-class classification accuracy; classification error disparities across dem...

There is a positive relationship between disagreement among agents and trading volume in the simulated markets.

Observed correlation in the simulated open-call auction between measured disagreement (e.g., dispersion in beliefs) and trading volume; described as replicating classic experimental findings.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles relationship between disagreement (belief dispersion) and trading volume

These individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices.

Aggregation of simulated agent behavior in the open-call auction producing market-level time series; comparison of market dynamics to classic experimental benchmark (Smith et al., 1988) and reported finding that excess demand predicts future prices.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles predictive power of excess demand for future prices

AI agents form recency-weighted extrapolative beliefs (i.e., overweight recent price history when forecasting future prices).

Analysis of agents' forecasts and trading behavior in the simulated open-call auction populated by autonomous LLM agents; identification of extrapolative forecasting patterns reported as a main finding.

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles recency-weighted extrapolative beliefs in price forecasts

AI agents exhibit a pronounced disposition effect.

Simulated open-call auction populated by autonomous LLM agents in experimental asset-market simulations; behavioral trading data showing agents' selling/holding patterns (paper describes this as a main documented finding).

high positive Dissecting AI Trading: Behavioral Finance and Market Bubbles disposition effect (tendency to sell winners and hold losers)

We propose seven interface primitives operationalizing verification-centered HCI.

Design contribution: specification of seven interface primitives within the paper (conceptual/design proposal); no user-study or empirical validation reported.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... existence and specification of interface primitives for verification-centered HC...

We map synthetic literacy -- oral input generating literate output -- as the defining feature of this transition.

Conceptual mapping and theoretical framing within the paper; supported by examples from technology trends but no empirical evaluation reported.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... emergence of synthetic literacy (oral-to-literate workflows)

« Prev 1 2 3 … 138 139 140 … 276 277 Next »