Evidence (13661 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	740	192	95	871	1945
Governance & Regulation	796	388	185	119	1512
Organizational Efficiency	765	186	123	82	1166
Technology Adoption Rate	610	227	121	95	1061
Research Productivity	409	121	56	331	928
Output Quality	464	174	58	47	743
Decision Quality	318	173	75	42	615
Firm Productivity	432	55	88	20	601
AI Safety & Ethics	214	273	65	33	589
Market Structure	175	165	120	24	489
Task Allocation	206	64	70	31	376
Skill Acquisition	161	57	57	16	291
Innovation Output	201	27	41	18	288
Fiscal & Macroeconomic	130	69	43	26	275
Employment Level	104	50	105	13	274
Consumer Welfare	116	62	42	11	231
Firm Revenue	149	45	26	3	223
Inequality Measures	43	120	49	6	218
Task Completion Time	164	29	8	12	214
Worker Satisfaction	89	60	20	12	181
Error Rate	69	89	9	2	169
Regulatory Compliance	74	67	14	4	159
Training Effectiveness	91	19	13	19	144
Wages & Compensation	77	33	25	6	141
Team Performance	86	17	27	9	140
Automation Exposure	49	50	22	12	136
Developer Productivity	91	17	14	5	128
Job Displacement	12	80	19	1	112
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	16	7	2	57
Skill Obsolescence	5	43	6	1	55
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

The Process Performance Index (PI) is positively associated with abnormal earnings.

Empirical regression results using the panel dataset (3,515 firms; 20,076 firm-year observations) reporting a positive association between PI and abnormal earnings.

high positive A Data-Driven Evaluation Framework for Quantifying the Impac... abnormal earnings

A Process Performance Index (PI) is constructed to measure AI-enabled operational capability across resource allocation efficiency, coordination effectiveness, and production performance dimensions.

Authors describe construction of PI using multi-dimensional indicators and the AHP–EWM weighting plus FCE aggregation procedure.

high positive A Data-Driven Evaluation Framework for Quantifying the Impac... Process Performance Index (PI) as a measure of AI-enabled operational capability

This study proposes a data-driven evaluation framework that integrates the Feltham–Ohlson enterprise value assessment with a multi-level performance evaluation framework (hybrid AHP–EWM weighting and Fuzzy Comprehensive Evaluation aggregation) to quantify the impact of AI on industrial process performance and enterprise value creation.

Methodological description in the paper: authors describe integrating Feltham–Ohlson valuation with AHP–EWM weighting and FCE aggregation to form a unified evaluation framework.

high positive A Data-Driven Evaluation Framework for Quantifying the Impac... ability to evaluate AI-driven process performance and enterprise value

New tendencies in managerial AI research and practice include explainable AI, human–AI collaboration, knowledge management, enterprise analytics, and algorithmic management.

Descriptive finding from the paper's literature synthesis (topics emphasized in the review); no quantitative prevalence or counts provided in the abstract.

high positive Artificial intelligence, machine learning, and deep learning... emergent research and practice topics / adoption tendencies

Machine Learning and Deep Learning enhance employee productivity, business intelligence, process mining, and data-driven decision-making by enabling prediction, perception, and adaptive learning solutions.

Claim synthesized in the review from multiple studies identified via PRISMA screening; abstract does not list the number or identity of underlying empirical studies.

high positive Artificial intelligence, machine learning, and deep learning... employee productivity, effectiveness of business intelligence and process mining...

AI-based technologies can greatly enhance managerial efficiency by automating repetitive activities, improving resource allocation, enabling intelligent scheduling, and supporting predictive modelling and strategic planning.

Summary conclusion from the paper's literature review (PRISMA methodology referenced); no quantitative meta-analytic effect sizes provided in abstract.

high positive Artificial intelligence, machine learning, and deep learning... managerial efficiency, automation of repetitive tasks, resource allocation, sche...

Machine Learning, Artificial Intelligence, and Deep Learning are tools that can optimize managerial decisions, enable intelligent automation, streamline workflows, and improve organizational performance.

Synthesis claim from the paper's PRISMA-based literature review (no numeric sample size reported in the abstract).

high positive Artificial intelligence, machine learning, and deep learning... managerial decision quality, automation, workflow streamlining, organizational p...

The paper ends with strategic suggestions to foster inclusive growth and orchestrate disruption, contributing evidence-based insights to the future of work in Africa.

Description of the paper's conclusions/recommendations drawn from its systematic review; represents the paper's stated contribution rather than an empirical claim about external data.

high positive The Impact of AI-Driven Automation on Semi and Unskilled Wor... policy recommendations and strategic guidance for inclusive growth and managed d...

The technologies are capable of raising productivity.

Synthesis from the paper's systematic review indicating productivity gains associated with AI/automation in the literature; no quantified meta‑analytic estimate provided in the summary.

high positive The Impact of AI-Driven Automation on Semi and Unskilled Wor... productivity increases associated with AI adoption

Future research should explore hybrid frameworks that combine LLM reasoning with quantitative optimization for cost-sensitive environments.

Recommendation in conclusion based on observed results (LLMs perform reasonably but lag optimized methods and transaction costs matter).

high positive Few-Shot Portfolio Optimization: Can Large Language Models O... recommended research direction (hybrid LLM + optimization frameworks)

A transaction cost analysis revealed that low-turnover LLM strategies retain their competitiveness post-costs, surpassing cap-weighted benchmarks.

Post-transaction-cost analysis reported in results: LLM strategies with low turnover remained competitive after applying transaction cost assumptions and exceeded performance of cap-weighted benchmark.

high positive Few-Shot Portfolio Optimization: Can Large Language Models O... post-cost portfolio performance relative to cap-weighted benchmark

LLM-generated portfolios outperformed naive diversification (Sharpe ratio up to 0.741).

Backtest results comparing LLM-generated portfolios against naive diversification; reported Sharpe ratio value (up to 0.741) for LLM strategies.

high positive Few-Shot Portfolio Optimization: Can Large Language Models O... Sharpe ratio (risk-adjusted return) of portfolios

Findings underscore the importance of robust evaluation frameworks for deploying VLMs in visually rich and safety-critical environments.

Synthesis/recommendation based on experimental results showing that visual inputs (images and colors) can influence VLM decisions and that mitigation effectiveness varies by model.

high positive The Effects of Visual Priming on Cooperative Behavior in Vis... need for/importance of robust evaluation frameworks for VLM safety and reliabili...

Policy frameworks, reskilling initiatives, and institutional adaptations are required to ensure inclusive technological progress.

Prescriptive conclusion presented in abstract based on the review and synthesis; no empirical validation or sample sizes provided in abstract.

high positive AI and the Transformation of Human Employment: Challenges, O... effectiveness of policy and reskilling to ensure inclusion

AI simultaneously generates demand for higher-order problem solving, emotional intelligence, and human-AI collaboration skills.

Explicit finding reported in abstract from the review of interdisciplinary literature; no quantified effect sizes or sample sizes provided in abstract.

high positive AI and the Transformation of Human Employment: Challenges, O... demand for higher-order skills / skill acquisition requirements

The majority of AI’s effect on potential GDP in the period under review was due to increased labor productivity and the optimization of existing processes.

Attribution/decomposition within the scenario analysis of aggregated industry data indicating productivity and process-optimization channels as principal contributors.

high positive THE IMPACT OF AI ON POTENTIAL GDP AND LONG-TERM ECONOMIC GRO... labor productivity and process optimization contributions to GDP

Artificial intelligence has become a significant factor in the growth of Russia’s potential GDP.

Findings reported from the scenario analysis and aggregated industry data reviewed in the paper and syntheses of Russian analytical sources.

high positive THE IMPACT OF AI ON POTENTIAL GDP AND LONG-TERM ECONOMIC GRO... contribution of AI to potential GDP

AI implementation during 2023–2025 was accompanied by a positive contribution to Russia’s potential GDP.

Analysis of aggregated industry data and a scenario approach using Russian-language sources (Ministry of Digital Development, HSE, Digital Economy ANO, analytical reviews).

high positive THE IMPACT OF AI ON POTENTIAL GDP AND LONG-TERM ECONOMIC GRO... potential GDP growth

For memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.

Conclusion drawn by the authors based on comparative experimental results reported in the paper (xmemory vs retrieval/model-strength baselines); excerpt provides aggregate benchmark comparisons but not full experimental details.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... relative importance of system architecture versus retrieval/model strength for m...

On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses.

Empirical evaluation on an application-level task reported in the paper showing 95.2% accuracy for xmemory and claiming it outperforms several classes of alternative systems; excerpt lacks details on the task, dataset size, or baseline numeric results.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... accuracy on an application-level memory task

On the end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines.

Empirical evaluation on the paper's end-to-end memory benchmark reporting F1 scores for xmemory and a range for third-party baselines; the excerpt does not provide dataset size or statistical significance details.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... F1 score on an end-to-end memory benchmark

On the structured extraction benchmark (judge-in-the-loop configuration) the system reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines.

Empirical evaluation on the paper's structured extraction benchmark in the judge-in-the-loop configuration; the excerpt reports the numeric accuracies and states they exceed tested frontier structured-output baselines. The excerpt does not specify dataset size or number of runs.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... object-level accuracy and output accuracy on a structured extraction benchmark

This iterative, schema-aware write-path design shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose.

Conceptual claim about how the proposed architecture affects system behavior; supported by the architectural description in the paper rather than explicit quantitative evidence in the excerpt.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... nature of read queries (constrained queries over verified records vs repeated in...

We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control.

Description of the proposed method/architecture in the paper (methodological contribution); no numeric evaluation attached to the description in the excerpt.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... design/components of memory ingestion pipeline

Reliable external AI memory must be schema-grounded (schemas define what must be remembered, what may be ignored, and which values must never be inferred).

Normative assertion supported by the paper's proposed design and subsequent experimental results (the paper introduces a schema-grounded approach and evaluates it against benchmarks), though the excerpt does not give full methodological details or sample sizes for this claim alone.

high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... reliability/stability of external AI memory

To manage AI legibility, creators perform four recurring forms of invisible authenticity labor: epistemic verification, linguistic naturalization, narrative restructuring, and performative embodiment.

Authors identify and name four recurrent practices from coding and analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing specific downstream repair and performance work.

high positive AI passing and invisible authenticity labor: trust vulnerabi... types of labor performed to conceal/humanize AI outputs

Creators engage in 'AI passing': strategic efforts to conceal and humanize AI-assisted drafts so that outputs plausibly appear human-authored.

Concept introduced based on analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing tactics to hide AI involvement and present content as human-authored.

high positive AI passing and invisible authenticity labor: trust vulnerabi... use of concealment/humanization strategies for AI outputs

Latency relaxation expands feasible geography for placing inference workloads.

Result reported from the paper's modeling and stylized simulation (energy-latency frontier analysis showing marginal cost/carbon benefits from relaxing latency budgets).

high positive AI Inference as Relocatable Electricity Demand: A Latency-Co... geographic feasibility of relocating inference demand as a function of latency b...

The paper provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers.

Empirical/methodological evidence from a stylized simulation described in the paper; uses representative global compute regions and latency-tolerance heterogeneity to categorize workloads.

high positive AI Inference as Relocatable Electricity Demand: A Latency-Co... assignment of workloads into execution layers (local, regional, energy-oriented)...

AI inference is becoming a persistent and geographically distributed source of electricity demand.

Statement/assertion in the paper's introduction framing the motivation; no empirical sample or experiment reported in the provided text.

high positive AI Inference as Relocatable Electricity Demand: A Latency-Co... electricity demand (geographic distribution and persistence)

Effective governance requires coordinated action across technical, organizational, and regulatory domains (e.g., system-level audits, vendor guidelines, continuous monitoring, documentation across dependency chains) to establish meaningful accountability in distributed development environments.

Policy and technical recommendations derived from literature review, regulatory analysis, and the paper's conceptual findings (recommendation, not empirically validated).

high positive How Supply Chain Dependencies Complicate Bias Measurement an... effectiveness of governance measures in producing meaningful accountability for ...

Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.

Conclusion/recommendation drawn from the benchmark design and experimental findings; conceptual claim advocating evaluation grounded in external demand signals and verifiable actions.

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... evaluation grounding (use of fresh external demand signals and verifiable agent ...

The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule.

Benchmark release statistics reported in the paper: explicit counts of tasks and evaluated models (105 tasks; 13 models).

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark scope (number of tasks) and evaluation breadth (number of models)

For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions.

Grading methodology described in the paper: instrumentation and hybrid deterministic/LLM-judging approach documented by authors (procedural description).

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... grading/verifiability pipeline (traces, logs, deterministic checks, structured L...

Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders.

Description of release construction in the methods: uses public workflow-demand data and ClawHub Top-500 skills; tasks are materialized with controlled fixtures and graders (procedural detail from the paper).

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... composition of benchmark releases (source signals and materialization strategy)

We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot.

Methodological contribution described in the paper; design and architecture of the benchmark are presented by the authors (design description, no external sample needed).

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark design (refreshable signal layer vs. time-stamped snapshot)

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces.

Framing/background statement in the paper describing expected capabilities of workflow agents; no empirical sample size reported for this expectation.

high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... ability to complete end-to-end units of work

Substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks.

Synthesis of empirical findings from composite metric identification and disruption simulations on the 282,778-patent-derived networks showing capability-based removals have stronger impacts than structure-only removals.

high positive Technological capability and innovation network resilience: ... role of technological competency in network resilience

A composite technological capability metric can be constructed (from textual and network information) to identify core innovators beyond simple topological measures.

Construction and application of a composite metric combining text-derived technological value and network features on 282,778 patents; used to identify core innovators.

high positive Technological capability and innovation network resilience: ... ability to identify core innovators

Latent Dirichlet Allocation (LDA) on the patent texts delineates fine-grained technological domains within the Chinese AI patent corpus.

Text-mining method applied to a corpus of 282,778 Chinese AI patents using LDA to extract topic/domains.

high positive Technological capability and innovation network resilience: ... granular technological domain delineation

This study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis to identify core innovators.

Methodological description: framework built using Latent Dirichlet Allocation (LDA) on 282,778 Chinese AI patents, construction of a composite technological capability metric, and simulation of targeted disruptions across collaboration and knowledge networks.

high positive Technological capability and innovation network resilience: ... identification of core innovators

Managing evolutionary dynamics in software is as urgent as AGI alignment for safeguarding society’s co-evolution with its machines.

Author's concluding normative claim in the abstract; argument based on scenario analysis rather than comparative empirical evidence.

high positive Digital Darwinism: steering the evolution of artificial life... relative urgency of managing software evolutionary dynamics versus AGI alignment

Governance should shift focus from aligning goals to steering evolution; the paper proposes four guidance instruments: replication-rate thresholds (modeled on epidemiological R0), a public vulnerability registry for self-modifying code, tiered digital biosafety levels, and adaptive regulatory sandboxes.

Normative policy recommendation spelled out in the abstract; based on the paper's scenario analysis and argumentation rather than empirical validation.

high positive Digital Darwinism: steering the evolution of artificial life... proposed governance instruments to manage software evolutionary dynamics

Cloud platforms, open-source software supply chains, and crypto-economic incentives provide, at electronic speed, the three preconditions of evolution: replication, variation, and differential fitness.

Conceptual/mechanistic claim supported by theoretical argumentation and scenario-building in the paper (no empirical test or sample reported).

high positive Digital Darwinism: steering the evolution of artificial life... presence of replication, variation, and differential fitness in software ecosyst...

The proposed framework balances AI-driven productivity with the epistemic sovereignty necessary to manage increasingly opaque software ecosystems.

Normative/architectural claim about the proposed framework; presented conceptually in the paper without reported empirical testing in the excerpt.

high positive Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... balance between productivity gains and maintenance of epistemic sovereignty (hum...

To preserve long-term resilience, engineering leaders must move beyond prompt-based development to implement rigorous human-in-the-loop pedagogical standards.

Prescriptive recommendation based on the paper's conceptual analysis; no randomized trials or empirical validation of this intervention reported in the excerpt.

high positive Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... long-term resilience of engineering organizations when using human-in-the-loop p...

The findings offer practical insights for construction firms to enhance innovation performance through effective AI integration and help engineers better leverage AI tools in design and project management workflows.

Authors' stated practical implications based on their empirical findings (survey results linking AI capability, decision-making quality, and innovation performance).

high positive AI meets engineering ingenuity: how AI capability enhances i... innovation performance

Algorithmic transparency positively moderates the relationship between AI capability and decision-making quality.

Moderation analysis reported on questionnaire data (Credamo, time-lagged) with n=435; authors state a positive moderating effect of algorithmic transparency.

high positive AI meets engineering ingenuity: how AI capability enhances i... decision-making quality

Decision-making quality mediates the relationship between AI capability and innovation performance.

Mediation analysis reported on the same survey dataset (time-lagged Credamo survey) with n=435 using established measurement scales; stated in results.

high positive AI meets engineering ingenuity: how AI capability enhances i... innovation performance

AI capability is positively associated with innovation performance.

Authors report statistical analysis of questionnaire data collected via the Credamo platform (time-lagged design) using established scales; sample size n=435; result stated in findings.

high positive AI meets engineering ingenuity: how AI capability enhances i... innovation performance

« Prev 1 2 3 … 126 127 128 … 273 274 Next »