Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Environment creation is framed as a multi-agent task: a coding agent writes setup scripts, downloads real-world data, and configures the software while producing evidence of correct setup; an independent audit agent verifies evidence against a quality checklist.

Method description of multi-agent pipeline (coding agent + audit agent) in the paper.

high positive Gym-Anything: Turn any Software into an Agent Environment reliability/validity of environment setup via multi-agent workflow

We introduce Gym-Anything, a framework for converting any software into an interactive computer-use environment.

Methodological contribution described in paper (framework implementation claimed).

high positive Gym-Anything: Turn any Software into an Agent Environment availability of a general framework for environment creation

The study introduces 'career reconfiguration' as a framework explaining intra-role task transformation, extending existing career mobility and job transition theories.

Theoretical/conceptual contribution presented in the paper (framework proposition; not an empirical effect).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... theoretical framing of intra-role task transformation (career reconfiguration)

Mediation analysis confirms that training and organizational support significantly mediate the relationship between AI adoption and career shifts.

Mediation analysis reported in the study (method stated; no mediation coefficients or sample size provided in abstract).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... career shifts (mediated effect of training and organizational support on relatio...

Together, these variables explain 61% of the variance in adaptive outcomes (R² = 0.61).

Multiple regression model summary reported in the paper (R-squared value provided; sample size not stated).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... variance explained in adaptive outcomes (career adaptation)

Readiness to change is a significant predictor of career adaptation (beta = 0.298, p = 0.011).

Multiple regression analysis reported in the paper (predictors of career adaptation; sample size not stated).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... career adaptation / adaptive outcomes

Openness to technology is a significant predictor of career adaptation (beta = 0.367, p = 0.003).

Multiple regression analysis reported in the paper (predictors of career adaptation; sample size not stated).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... career adaptation / adaptive outcomes

Organizational support is a significant predictor of career adaptation (beta = 0.389, p = 0.005).

Multiple regression analysis reported in the paper (predictors of career adaptation; sample size not stated).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... career adaptation / adaptive outcomes

Skills training is the strongest predictor of career adaptation (beta = 0.412, p = 0.002).

Multiple regression analysis reported in the paper (predictors of career adaptation; sample size not stated).

high positive Artificial Intelligence Adoption and Career Reconfiguration ... career adaptation / adaptive outcomes

The proposal outlines a phased implementation roadmap from a voluntary pilot to mandatory certification within five years.

Proposal states a phased implementation timeline moving from voluntary pilot projects to mandatory certification within a five-year period; presented as a planned roadmap rather than a demonstrated outcome.

high positive IASCA: The International AI Safety Certification Authority —... policy adoption timeline (voluntary pilot → mandatory certification within five ...

The governance structure for IASCA will be treaty-based and include anti-capture provisions.

Proposal explicitly proposes a treaty-based governance structure and states inclusion of anti-capture provisions; this is a design/policy prescription in the document rather than evidence-based finding.

high positive IASCA: The International AI Safety Certification Authority —... treaty-based governance with anti-capture provisions

IASCA employs a zero-knowledge testing architecture that evaluates model safety through behavioural probing without accessing proprietary weights, training data, or architecture.

Proposal describes a technical design: zero-knowledge testing via behavioural probes that does not require access to model weights, training data, or architecture; presented as a design feature without empirical validation or test results in the excerpt.

high positive IASCA: The International AI Safety Certification Authority —... safety evaluation via behavioural probing without inspecting weights/training da...

The International AI Safety Certification Authority (IASCA) is an independent, internationally governed body for mandatory pre-deployment safety certification of frontier AI models.

Explicit statement in the proposal describing IASCA as an independent, internationally governed authority and its role in mandatory pre-deployment certification; conceptual design, no empirical testing or implementation reported.

high positive IASCA: The International AI Safety Certification Authority —... pre-deployment safety certification of frontier AI models

SWE-bench alignment: Bench is aligned with SWE-bench-Verified and SWE-bench-Pro.

Paper statement that the constructed benchmark is aligned with SWE-bench-Verified and SWE-bench-Pro (methodological/design alignment described).

high positive Does Pass Rate Tell the Whole Story? Evaluating Design Const... benchmark alignment

Bench contains 495 issues and 1,787 validated design constraints across six repositories.

Reported dataset statistics in paper/abstract: explicit counts of issues (495), validated constraints (1,787), and number of repositories (6).

high positive Does Pass Rate Tell the Whole Story? Evaluating Design Const... other

We construct DESIGN-AWARE benchmark (Bench) by mining and validating design constraints from real-world pull requests, linking them to issue instances, and automatically checking patch compliance using an LLM-based verifier.

Method description in paper: dataset created by mining real-world pull requests, validating constraints, linking constraints to issues, and using an LLM-based verifier to check compliance.

high positive Does Pass Rate Tell the Whole Story? Evaluating Design Const... other

Flowr is domain-independent, offering a generalizable blueprint for agentic AI-driven supply chain automation across large-scale enterprise settings.

Claim of generalizability made by the authors in the paper; presented as an assertion rather than demonstrated through multi-industry empirical tests in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... generalizability / applicability across domains

The framework was validated in collaboration with a large-scale supermarket chain.

Claim of field validation stated in the paper; indicates at least one real-world collaboration but provides no further details (e.g., number of stores, duration, metrics) in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... field validation / real-world deployment

Evaluation indicates Flowr enables proactive exception handling at a scale unachievable through manual processes.

Empirical/operational claim based on the paper's evaluation and deployment context; the excerpt asserts this capability but does not provide quantitative performance metrics or comparison details.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... proactive exception handling capability and scale

Evaluation shows Flowr improves demand–supply alignment.

Empirical claim in the paper's evaluation; reported improvement in demand-supply alignment from deployment or testing with a large supermarket chain, but no numerical metrics provided in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... demand–supply alignment

Evaluation demonstrates that Flowr significantly reduces manual coordination overhead.

Empirical claim reported in the paper's evaluation section; the excerpt notes an evaluation and collaboration with a large supermarket chain but provides no sample size figures or quantitative effect sizes.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... manual coordination overhead (effort/time/coordination burden)

Central to the framework is a human-in-the-loop orchestration model in which supply chain managers supervise and intervene across workflow stages via a Model Context Protocol (MCP)-enabled interface, preserving accountability and organizational control.

Design/organizational claim describing human-in-the-loop orchestration and MCP interface; asserted in the paper without empirical measures of accountability or control in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... preservation of accountability and organizational control during automation

To ensure task accuracy and adherence to responsible AI principles, the framework employs a consortium of fine-tuned, domain-specialized large language models coordinated by a central reasoning LLM.

Technical/design claim in the paper describing model architecture and approach; no evaluation metrics or tests of accuracy/responsibility provided in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... task accuracy and adherence to responsible AI principles

Flowr systematically decomposes manual supply chain operations into specialized AI agents, each responsible for a clearly defined cognitive role, enabling automation of processes previously dependent on continuous human coordination.

Architectural claim — asserted mechanism of the framework in the paper; presented as part of the framework design, no quantitative evaluation details in the excerpt.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... task decomposition and automation of previously human-coordinated processes

This paper introduces Flowr, a novel agentic AI framework for automating end-to-end retail supply chain workflows in large-scale supermarket operations.

Design and system-proposal claim in the paper; supported by framework description rather than empirical testing in the provided text.

high positive Flowr -- Scaling Up Retail Supply Chain Operations Through A... ability to automate end-to-end supply chain workflows (task allocation to AI)

The taxonomy, feasibility classification, and mechanism-to-scenario mapping provide a technical foundation for policymakers and identify the R&D investments required before hardware-level governance can support verifiable international agreements.

Authors' synthesis and policy-focused conclusions based on the taxonomy, feasibility ratings, mapping, and threat analyses presented in the paper (conceptual/prescriptive).

high positive Hardware-Level Governance of AI Compute: A Feasibility Taxon... usefulness of the paper's contributions for policy planning and R&D prioritizati...

We present an adversary-tiered threat analysis distinguishing commercial, non-state, and nation-state actors, arguing the appropriate security standard is tamper-evident assurance analogous to IAEA verification rather than absolute tamper-proofing.

Authors' adversary-model classification and normative argument recommending tamper-evident assurance (comparative reasoning with IAEA-style verification). Qualitative policy recommendation; no empirical experiment.

high positive Hardware-Level Governance of AI Compute: A Feasibility Taxon... recommended security standard for hardware-level governance

We map the taxonomy onto four governance scenarios: domestic regulation, bilateral agreements, multilateral treaty verification, and industry self-regulation.

Authors' scenario mapping exercise described in the paper (conceptual mapping of mechanisms to four named governance scenarios).

high positive Hardware-Level Governance of AI Compute: A Feasibility Taxon... mechanism-to-scenario applicability mapping

For each mechanism, we provide a technical description, a feasibility rating, and an identification of adversarial vulnerabilities.

Paper's stated content and structure: per-mechanism entries including technical descriptions, feasibility ratings, and adversarial vulnerability discussion (qualitative documentation).

high positive Hardware-Level Governance of AI Compute: A Feasibility Taxon... completeness of mechanism documentation

This paper proposes a taxonomy of 20 hardware-level governance mechanisms, organised by function (monitoring, verification, enforcement) and assessed for technical feasibility on a four-point scale from currently deployable to speculative.

Authors' methodological contribution: a constructed taxonomy enumerating 20 mechanisms and an assigned four-point feasibility rating (documentation in the paper). No external sample size; based on authors' engineering analysis.

high positive Hardware-Level Governance of AI Compute: A Feasibility Taxon... existence and classification of hardware governance mechanisms

Multimodal GeoAI studies fuse multiple geospatial data modalities to tackle urban mobility tasks including accessibility mapping, demand forecasting, and origin–destination flow prediction.

Categorization of tasks addressed by the included multimodal GeoAI studies (synthesis from the surveyed papers, n=18).

high positive GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... types of urban mobility tasks addressed by multimodal GeoAI (accessibility mappi...

To address these challenges, the paper proposes a structured research roadmap including equity-aware loss functions, adaptive multimodal fusion pipelines, participatory and human-in-the-loop workflows, and urban data trusts.

Authors' proposed agenda and recommendations presented in the discussion/conclusion of the paper (proposal, not empirically evaluated).

high positive GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... recommended methodological and governance directions to improve inclusiveness an...

The paper examines emerging techniques such as knowledge graphs, federated learning, and explainable AI that support equity-relevant insights across diverse urban contexts.

Discussion and synthesis of methodological developments in the surveyed literature (reported within the review).

high positive GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... presence and applicability of emerging techniques (knowledge graphs, federated l...

The review highlights the growing use of deep learning architectures in multimodal GeoAI for urban mobility.

Observed trend reported by the authors based on the systematic review of included studies (n=18).

high positive GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... use of deep learning architectures in multimodal GeoAI studies

The integration of artificial intelligence with geographic information science, combined with multimodal geospatial data fusion, provides powerful tools to diagnose and address mobility disparities by integrating heterogeneous data sources (satellite imagery, GPS trajectories, transit records, volunteered geographic information, social sensing).

Theoretical/methodological claim supported by examples and synthesis from the surveyed literature (the paper reviews multimodal GeoAI studies that fuse such data sources).

high positive GeoAI and Multimodal Geospatial Data Fusion for Inclusive Ur... diagnostic and remedial capacity for mobility disparities via multimodal GeoAI

The risk of evolution selecting for deception could be mitigated if reproduction is based on purely objective criteria, rather than human judgment.

Prescriptive implication derived from the model analysis: argument that replacing human-judged fitness with objective criteria would reduce selection for deception (theoretical reasoning, not empirical test).

high positive A mathematical theory of evolution for self-designing AIs reduction in selection for deception under objective reproduction criteria

Assuming bounded fitness and a fixed probability that any AI reproduces a 'locked' copy of itself, fitness concentrates on the maximum reachable value.

Formal theorem/proof within the mathematical model under the stated assumptions (bounded fitness and fixed probability of locked self-reproduction).

high positive A mathematical theory of evolution for self-designing AIs asymptotic distribution of fitness across lineages (concentration on maximum rea...

As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants.

Conceptual argument and motivation in the paper; development of a mathematical model of self-designing AIs to formalize this idea (theoretical, no empirical data or sample).

high positive A mathematical theory of evolution for self-designing AIs emergence of evolutionary dynamics in self-improving AIs (traits shaped by desce...

Generative AI helps users solve problems more efficiently.

Motivating empirical observation stated in the paper (no sample or empirical analysis reported in the provided text); assumption used to motivate the theoretical model.

high positive When AI Improves Answers but Slows Knowledge Creation: Match... problem-solving efficiency (implicit)

By elucidating the mechanisms and trade-offs inherent in AI-human collaboration, this work lays a robust foundation for future research on adaptive decision systems.

Authors' forward-looking claim in the abstract that their synthesis clarifies mechanisms/trade-offs and thus supports subsequent research; based on their review and framework.

high positive Advancing Decision-Making through AI-Human Collaboration: A ... foundation for future research on adaptive decision systems

By synthesizing these paradigms, this research advances the theoretical understanding of hybrid decision-making systems and provides actionable insights for organizations navigating complex and AI-driven environments.

Authors' stated contribution based on the conceptual synthesis of the literature and the proposed framework (as reported in the abstract).

high positive Advancing Decision-Making through AI-Human Collaboration: A ... theoretical advancement and provision of actionable organizational insights

The framework introduces four distinct paradigms of AI-human collaborative decision-making: adaptive intuitive decision, programmed algorithmic decision, interpretive analytical decision and integrative hybrid decision.

Authors' conceptual taxonomy reported in the abstract, produced from synthesis of the reviewed literature (627 articles).

high positive Advancing Decision-Making through AI-Human Collaboration: A ... classification of AI-human collaborative decision-making into four paradigms

We developed a novel conceptual framework that identifies two critical dimensions, AI-human dynamics and decision typologies, that shape decision outcomes.

Authors' reported conceptual synthesis derived from the systematic review/bibliometric analysis of the 627 articles.

high positive Advancing Decision-Making through AI-Human Collaboration: A ... identification of critical dimensions affecting decision outcomes

Prompts can be treated as decision policies that allocate discretion between researcher and system, governing what is executed and when iteration stops.

Methodological framing advanced by the authors describing prompts as decision policies; conceptual claim based on the paper's analytic framework rather than empirical measurement.

high positive On the Carbon Footprint of Economic Research in the Age of G... conceptualization of prompts' role in workflow control and decision allocation

Operational constraints and decision rule prompts deliver large and stable footprint reductions while preserving decision equivalent topic outputs.

Experimental comparisons of prompt strategies in the benchmarked workflow showing reductions in runtime/CO2e and evaluated topic outputs' decision-equivalence (asserted in abstract; no numeric reductions or sample sizes provided).

high positive On the Carbon Footprint of Economic Research in the Age of G... carbon footprint / runtime reductions and preservation of topic output equivalen...

We benchmark a modern economic survey workflow, an LDA-based literature mapping implemented with GenAI assisted coding and executed in a fixed cloud notebook, measuring runtime and estimated CO2e with CodeCarbon.

Experimental benchmark described in the paper: single implemented workflow (LDA-based literature mapping) executed in a fixed cloud notebook with runtime and CO2e measured using CodeCarbon (methodological claim).

high positive On the Carbon Footprint of Economic Research in the Age of G... runtime and estimated CO2e (carbon footprint) of the benchmarked workflow

Training footprint is the largest cluster in the mapped Green AI literature.

Result from the paper's literature mapping / clustering (statement in abstract; no numeric cluster sizes given).

high positive On the Carbon Footprint of Economic Research in the Age of G... relative prevalence (cluster size) of 'training footprint' theme

We map the recent Green AI literature into seven themes: training footprint is the largest cluster, while inference efficiency and system level optimisation are growing rapidly, alongside measurement protocols, green algorithms, governance, and security and efficiency trade-offs.

Bibliometric / thematic mapping of recent Green AI literature described in the paper (method: literature mapping; exact number of papers or mapping procedure not specified in abstract).

high positive On the Carbon Footprint of Economic Research in the Age of G... distribution of themes within Green AI literature (theme prevalence and growth)

Compared to relationship-based debt, stable equity significantly promotes high-quality development in the high-end equipment manufacturing and new energy industries.

Comparative subgroup regression analysis on the same dataset (743 listed enterprises, 2014–2023) indicating that the coefficient for stable equity is significantly larger than that for relationship-based debt in the high-end equipment manufacturing and new energy industry subsamples.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (comparison of effects by financing type...

The effects of two distinct forms of patient capital—stable equity and relationship-based debt—are more pronounced in promoting high-quality development in the new energy vehicle industry, energy conservation and environmental protection industry, biotechnology industry, new materials industry, and next-generation information technology industry.

Industry heterogeneity / subgroup analyses on the 2014–2023 panel of 743 listed firms showing stronger estimated effects of both stable equity and relationship-based debt on firm high-quality development within these specified industries.

high positive The Impact of Patient Capital on the High-Quality Developmen... high-quality development of enterprises (industry-specific stronger effects of t...

« Prev 1 2 3 … 152 153 154 … 276 277 Next »