Evidence (8974 claims)

Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.

The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).

Browse by theme

Nine broad, paper-level topics. Click one to filter the claims below.

Human-AI Collaboration

Claims by outcome category

Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.

Outcome	Positive	Negative	Mixed	Null	Total
Other	882	244	117	1097	2424
Governance & Regulation	1010	469	229	135	1875
Organizational Efficiency	977	235	149	90	1462
Technology Adoption Rate	781	299	143	128	1362
Research Productivity	506	155	74	363	1110
Output Quality	555	219	71	70	915
Decision Quality	395	200	95	54	751
Firm Productivity	523	67	101	27	724
AI Safety & Ethics	262	309	75	36	688
Market Structure	195	201	135	30	566
Task Allocation	248	77	96	38	464
Innovation Output	300	34	55	20	411
Skill Acquisition	207	75	65	21	368
Employment Level	138	67	119	24	350
Fiscal & Macroeconomic	156	80	53	33	329
Task Completion Time	211	38	13	16	280
Firm Revenue	183	52	29	5	270
Consumer Welfare	131	77	48	13	269
Inequality Measures	50	141	54	9	254
Worker Satisfaction	104	85	25	13	227
Error Rate	87	112	11	5	215
Automation Exposure	69	69	37	20	198
Wages & Compensation	102	49	31	11	193
Team Performance	115	30	30	11	187
Regulatory Compliance	88	74	17	7	186
Training Effectiveness	109	22	14	21	168
Developer Productivity	116	21	15	8	161
Job Displacement	12	92	26	1	131
Hiring & Recruitment	57	12	9	5	83
Skill Obsolescence	6	59	10	2	77
Social Protection	43	17	8	2	70
Creative Output	35	21	9	4	70
Labor Share of Income	18	23	17	1	59
Worker Turnover	15	16	—	4	35
Industry	—	—	—	1	1

Productivity Remove filter

Die Befunde unterstreichen die Bedeutung kontextspezifischer Einführung, rollenbezogener Qualifizierung und Governance für eine nachhaltige Akzeptanz generativer KI in Organisationen.

Interpretation/Schlussfolgerung der Autoren basierend auf den survey-Ergebnissen und beobachteten Unterschieden zwischen Rollen sowie zeitlichen Entwicklungen (im Abstract formuliert).

high positive Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Empfohlene Implementierungsmaßnahmen (Kontextanpassung, Schulung, Governance) zu...

Der größte Mehrwert von Copilot liegt bei klar strukturierten, textbasierten Aufgaben.

Befragungsergebnisse zur Nutzenabschätzung für typische Tätigkeiten der Wissensarbeit, wie im Abstract zusammengefasst (präferierte Aufgabenarten: strukturierte, textbasierte Aufgaben).

high positive Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Wahrgenommener Nutzen nach Aufgabentyp (textbasierte, strukturierte Aufgaben)

Microsoft 365 Copilot wird überwiegend als benutzerfreundlich und technisch zuverlässig wahrgenommen.

Selbstberichtete Beurteilungen zu Benutzerfreundlichkeit und technischer Zuverlässigkeit in der wiederholten Querschnittsbefragung (Angabe im Abstract).

high positive Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Perzipierte Benutzerfreundlichkeit und technische Zuverlässigkeit

Wissenschaftliche Mitarbeitende entwickeln im Zeitverlauf positivere Einschätzungen, insbesondere hinsichtlich Produktivität und Arbeitserleichterung durch Copilot.

Längsschnittähnliche Beobachtung über die wiederholten Querschnittserhebungen; zeitliche Veränderung der Selbsteinschätzungen wissenschaftlicher Mitarbeitender im Abstract beschrieben.

high positive Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Perzipierte Produktivität und Arbeitserleichterung (Selbsteinschätzung über Zeit...

Verwaltungsmitarbeitende bewerten die Nützlichkeit und die Zuverlässigkeit von Microsoft 365 Copilot höher als wissenschaftliche Mitarbeitende.

Selbstberichtete Bewertungen in der wiederholten Querschnittsbefragung; Vergleich zwischen Berufsrollen (Verwaltung vs. Wissenschaft) angegeben im Abstract.

high positive Generative KI in der Wissensarbeit: Wahrnehmung, Nutzen und ... Perzipierte Nützlichkeit und Zuverlässigkeit (Selbstbericht)

The framework shifts manual harness engineering into automated harness engineering, and takes one step further — automating the design of the automation itself.

Conceptual claim about the scope/implication of the proposed framework stated in the paper; the excerpt contains no empirical measures, experiments, or sample sizes to verify the claim.

high positive The Last Harness You'll Ever Build replacement of manual design processes with automated meta-design (automation of...

The Meta-Evolution Loop optimizes the evolution protocol Λ across diverse tasks, learning a protocol Λ^(best) that enables rapid harness convergence on any new task — so that adapting an agent to a novel domain requires no human harness engineering at all.

Strong methodological claim and intended outcome stated in the paper (formalization and algorithms promised); no empirical validation, benchmarks, or sample sizes given in the excerpt to substantiate the universality or 'no human' guarantee.

high positive The Last Harness You'll Ever Build speed/ability of harness convergence on new tasks and elimination of human harne...

The Harness Evolution Loop optimizes a worker agent's harness H for a single task: a Worker Agent W_H executes the task, an Evaluator Agent V adversarially diagnoses failures and scores performance, and an Evolution Agent E modifies the harness based on the full history of prior attempts.

Description of the proposed algorithmic component/architecture in the paper (conceptual specification); no empirical results or sample size provided in the excerpt.

high positive The Last Harness You'll Ever Build worker agent harness optimization (improvements in agent task performance via it...

We present a two-level framework that automates this process.

Methodological claim: the paper proposes a two-level framework (Harness Evolution Loop and Meta-Evolution Loop) and states it in the text; no experimental validation or sample size reported in the excerpt.

high positive The Last Harness You'll Ever Build automation of harness engineering (replacing manual design)

Traditional software engineering artifacts can serve as effective control mechanisms in AI-assisted development.

Concluding claim in the abstract synthesizing the preliminary evaluation results; presented as the paper's implication/recommendation (based on the exploratory study noted).

high positive Shift-Up: A Framework for Software Engineering Guardrails in... effectiveness of traditional SE artifacts as control mechanisms

Embedding machine-readable requirements and architectural artifacts reduces implementation drift.

Reported as a preliminary finding from the exploratory evaluation; the abstract claims a reduction in implementation drift when using Shift-Up artifacts versus unstructured approaches (no quantification provided).

high positive Shift-Up: A Framework for Software Engineering Guardrails in... implementation drift

This paper proposes Shift-Up, a framework that reinterprets established software engineering practices (executable requirements / BDD, C4 architectural modeling, and architecture decision records / ADRs) as structural guardrails for GenAI-native development.

Design-science research (DSR) artifact: the Shift-Up framework is presented as the paper's primary design contribution (description/proposal in the paper; no broad empirical validation in the abstract).

high positive Shift-Up: A Framework for Software Engineering Guardrails in... use of traditional SE artifacts as structural guardrails

Generative AI (GenAI) is reshaping software engineering by shifting development from manual coding toward agent-driven implementation.

Stated as a high-level premise in the paper's introduction/abstract; presented as an observed trend motivating the research (no empirical sample or quantified measurement reported in the abstract).

high positive Shift-Up: A Framework for Software Engineering Guardrails in... shift toward agent-driven implementation (automation exposure)

The results demonstrate a 'less is more' pattern: simpler combination (memory + reflection) yields better performance than adding architectural complexity.

Authors' interpretation of the ablation study results showing that adding multiple extra mechanisms degraded performance compared to the memory+reflection configuration.

high positive AEL: Agent Evolving Learning for Open-Ended Environments relative performance of simpler vs. more complex agent configurations

A nine-variant ablation reveals that memory and reflection together produce a 58% cumulative improvement over the stateless baseline.

Ablation study with nine variants on the sequential portfolio benchmark; authors report a 58% cumulative improvement when combining memory and reflection versus the stateless baseline.

high positive AEL: Agent Evolving Learning for Open-Ended Environments cumulative improvement in performance relative to stateless baseline

AEL outperforms five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches on the benchmark.

Comparative empirical evaluation on the same sequential portfolio benchmark, comparing AEL to five published self-improving methods and multiple non-LLM and LLM baselines (reported relative ranking and variance).

high positive AEL: Agent Evolving Learning for Open-Ended Environments relative performance (ranking) and variance across methods

On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), AEL achieves a Sharpe ratio of 2.13 ± 0.47.

Empirical experiment on the sequential portfolio benchmark with 10 tickers, 208 episodes, evaluated across 5 random seeds (reported Sharpe ratio and standard deviation).

high positive AEL: Agent Evolving Learning for Open-Ended Environments Sharpe ratio (portfolio performance metric)

We introduce Agent Evolving Learning (AEL), a two-timescale framework in which a Thompson Sampling bandit at the fast timescale learns which memory retrieval policy to apply each episode, while LLM-driven reflection at the slow timescale diagnoses failure patterns and injects causal insights into the agent's decision prompt.

Methodological description and proposed algorithmic design in the paper (no additional experimental sample size—design/algorithmic claim).

high positive AEL: Agent Evolving Learning for Open-Ended Environments framework architecture / learning framework

The sustainability of the algorithmic state rests on a movement from technocratic secrecy to value-based transparency to ensure AI- and human collaboration is founded on institutional accountability and algorithmic justice.

Authorial conclusion from the systematic review synthesis (2018-2026) advocating a policy/practice shift; presented as normative policy recommendation rather than quantified empirical finding.

high positive Artificial Intelligence, Public Policy and Governance - impl... sustainability of algorithmic/state governance (accountability and algorithmic j...

Empirical evidence shows great gains in efficiency in fiscal forecasting.

Empirical studies included in the PRISMA-guided review (2018-2026) reporting improved fiscal forecasting outcomes; no quantitative effect sizes provided in abstract.

high positive Artificial Intelligence, Public Policy and Governance - impl... accuracy/efficiency of fiscal forecasting

Empirical evidence shows great gains in efficiency at routinised administrative tasks.

Empirical studies reported in the systematic review (2018-2026); the abstract claims empirical evidence of efficiency gains but does not report specific study counts, sample sizes, or effect magnitudes.

high positive Artificial Intelligence, Public Policy and Governance - impl... efficiency in routinised administrative tasks

Digital infrastructure is a primary determinant of both the pace of AI diffusion and its resulting economic returns.

Synthesis of descriptive patterns, difference-in-differences causal estimates, and instrumental-variable results using Turkish administrative and survey data (2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * pace of AI diffusion and economic returns (productivity, exports, labor composit...

Infrastructure-driven AI adoption shifts labor composition toward ICT-related roles.

Instrumental-variable estimates showing changes in occupational composition (increase in ICT-related roles) associated with infrastructure-driven AI adoption; based on administrative employment data and enterprise survey (Turkey, 2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * share of ICT-related roles in employment (labor composition)

Infrastructure-driven AI adoption raises export intensity.

Instrumental-variable estimates linking infrastructure-driven adoption to firm export intensity using administrative and survey data (Turkey, 2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * export intensity

Infrastructure-driven AI adoption raises labor productivity.

Instrumental-variable estimates where infrastructure-driven adoption is instrumented (IV) and linked to firm-level labor productivity measures; data from administrative records and enterprise survey in Turkey (2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * labor productivity

Improved connectivity (due to pipeline-driven fiber deployment) significantly increases AI adoption, particularly for software-intensive technologies and among small and medium-sized enterprises.

Causal inference using difference-in-differences estimates exploiting staggered pipeline expansion as variation in connectivity; sample drawn from administrative records and nationally representative enterprise survey (Turkey, 2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * AI adoption (change due to improved connectivity)

AI adoption is concentrated among large firms and in regions with high-speed broadband and proximity to data centers, particularly for software-intensive and cloud-based applications.

Descriptive analysis using administrative data and a nationally representative enterprise survey from Turkey (2021-2024).

high positive Digital Infrastructure, AI Adoption, and Firm Performance * AI adoption (concentration by firm size and region)

HAF-DS provides a scalable and adaptable solution for modern textile and PPE supply chains.

Author claim in conclusions indicating scalability and adaptability as properties of the proposed framework; supported implicitly by application to multiple datasets.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... scalability and adaptability of the solution

Coupling predictive forecasting with prescriptive optimization enhances both accuracy and efficiency in textile and PPE supply chains.

Summary conclusion drawn from the reported experimental improvements in forecast errors and operational metrics on textile and PPE datasets.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... forecast accuracy and operational efficiency

Service level rose from 95.5% to 97.8%.

Reported experimental operational metric (service level) improvement values under HAF-DS versus baseline (95.5% -> 97.8%).

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... service level (fill rate / on-time fulfillment)

Stockouts decreased by 27.5%.

Reported experimental operational metric indicating a 27.5% reduction in stockouts under HAF-DS compared to baseline.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... stockout frequency

Inventory cost decreased by 5.4%.

Reported experimental operational metric (inventory cost) showing 5.4% reduction under HAF-DS relative to baseline.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... inventory cost

On the combined dataset, HAF-DS reduced Mean Absolute Percentage Error (MAPE) from 9.5% to 8.1%.

Reported experimental result on the combined dataset comparing MAPE of HAF-DS vs baseline (values given: 9.5% -> 8.1%).

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... Mean Absolute Percentage Error (MAPE)

On the combined dataset, HAF-DS reduced Root Mean Squared Error (RMSE) from 19.53 to 17.11 (12.4%).

Reported experimental result on the combined dataset comparing RMSE of HAF-DS vs baseline (values given: 19.53 -> 17.11, with percent reduction 12.4%).

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... Root Mean Squared Error (RMSE)

On the combined dataset, HAF-DS reduced Mean Absolute Error (MAE) from 15.04 to 12.83 (14.7%).

Reported experimental result on the combined dataset comparing MAE of HAF-DS vs baseline (values given: 15.04 -> 12.83, with percent reduction 14.7%).

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... Mean Absolute Error (MAE)

Experiments on textile sales and supply chain datasets show significant performance gains over statistical and deep learning baselines.

Empirical evaluation reported on textile sales and supply chain datasets with comparisons to statistical and deep learning baseline models (datasets described broadly; no sample sizes given).

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... forecasting performance relative to baselines

The framework jointly minimizes forecasting error and operational cost through embedding-based feature representation and recurrent neural architectures.

Paper text describing joint objective (minimize forecasting error and operational cost) and the use of embedding-based features plus recurrent networks to accomplish this.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... combined forecasting error and operational cost

The optimization layer prescribes cost-efficient replenishment and allocation decisions (MILP).

Method description stating the use of a MILP optimization layer to produce replenishment/allocation decisions aimed at cost efficiency.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... cost-efficient replenishment and allocation decisions

The LSTM captures temporal and contextual demand dependencies.

Methodological description asserting LSTM's role in modeling temporal and contextual dependencies within the forecasting module.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... ability to capture temporal and contextual demand dependencies

The paper proposes a Hybrid AI Framework for Demand-Supply Forecasting and Optimization (HAF-DS), which integrates a Long Short-Term Memory (LSTM)-based demand forecasting module with a mixed integer linear programming (MILP) optimization layer.

Paper description of the proposed framework (design/architecture). Reports integration of LSTM forecasting module and MILP optimization layer as the core contribution.

high positive Hybrid Deep Learning Approach for Coupled Demand Forecasting... design/architecture integration of forecasting and optimization

Dynamic combinations of AI and organizational structure can help managers overcome traditional trade-offs between scale and scope, opening pathways for scalable, cross-market expansion.

Managerial implication drawn from the paper's longitudinal case study of ByteDance; qualitative inference from observed organizational practices and AI deployment patterns.

high positive Scaling high and wide: How firms leverage AI and organizatio... managerial ability to overcome scale–scope trade-offs and enable cross-market ex...

AI transforms the scale–scope nexus from being a trade-off into a source of strategic advantage.

Synthesis and theoretical claim derived from longitudinal case study of ByteDance showing simultaneous scaling and diversification enabled by AI and organizational design.

high positive Scaling high and wide: How firms leverage AI and organizatio... ability to simultaneously achieve scale and scope (strategic advantage from comb...

AI reverses the conventional logic of the resource-based view: rather than valuable resources enabling diversification, diversification amplifies the value of resources.

Theoretical argument supported by the ByteDance case study; paper presents this as a theorized inversion based on observed patterns in the single-case study.

high positive Scaling high and wide: How firms leverage AI and organizatio... amplification of resource value as a result of diversification

The value of AI learning transfer across domains is contingent on access to structurally related data that allow learning to transfer across domains.

Claim derived from the ByteDance longitudinal case study showing conditions for successful cross-domain AI transfer (qualitative evidence emphasizing data structure/relatedness).

high positive Scaling high and wide: How firms leverage AI and organizatio... effectiveness of transfer learning across domains (dependence on structurally re...

AI evolves and improves through self-learning and cross-fertilization across domains, becoming increasingly valuable as learning accumulates.

Theoretical claim supported by longitudinal observations from the ByteDance case study (qualitative evidence from repeated AI deployments over time).

high positive Scaling high and wide: How firms leverage AI and organizatio... AI capability improvement/value accumulation over time

ByteDance leveraged AI and adaptive organizational design to scale rapidly and diversify across industries and markets without incurring rising costs or coordination complexity.

Longitudinal single-case (qualitative) study of ByteDance described in the paper; method reported as a longitudinal case study of one firm.

high positive Scaling high and wide: How firms leverage AI and organizatio... ability to scale and diversify across industries and markets (growth and diversi...

Meaningful human oversight of AI agents in knowledge work requires not improved post-hoc review mechanisms, but active participation in decisions as they are made.

Authors' conclusion drawn from the formative (N=8) and summative (N=16) studies and associated observations.

high positive Auditing and Controlling AI Agent Actions in Spreadsheets oversight effectiveness (design implication favoring in-line/active participatio...

Users reported a sense of co-ownership over the resulting output.

Participant self-reports from the formative and/or summative studies (authors report users expressed co-ownership of outputs when participating in execution).

high positive Auditing and Controlling AI Agent Actions in Spreadsheets sense of ownership / co-ownership

Users detected errors that post-hoc review would have failed to surface.

Empirical observation reported from the studies (authors report that active participation allowed users to detect errors that would be missed by post-hoc review).

high positive Auditing and Controlling AI Agent Actions in Spreadsheets error detection (compared to post-hoc review)

Users identified their own intent reflected in the agent's actions.

Reported participant observations/self-reports from the formative (N=8) and/or summative (N=16) studies; claim presented as a finding of the evaluations.

high positive Auditing and Controlling AI Agent Actions in Spreadsheets alignment between user intent and agent actions

« Prev 1 2 3 … 93 94 95 … 179 180 Next »