Evidence (3470 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

The paper uses a comprehensive longitudinal dataset comprising tens of millions of users from a leading Chinese video-sharing platform.

Statement in paper summarizing data source: a longitudinal dataset covering 'tens of millions of users' from a major Chinese video-sharing platform; used for descriptive and comparative analyses of creation and consumption behavior.

high null result Scale over Preference: The Impact of AI-Generated Content on... dataset coverage (number of users observed)

These chats were committed to public repositories as part of routine development, capturing in-the-wild behavior.

Data collection method: analysis of chat transcripts that were committed to public repositories (authors state collected from repos and describe them as routine commits).

high null result Programming by Chat: A Large-Scale Behavioral Analysis of 11... degree to which collected chats represent in-the-wild developer behavior (public...

We analyze 74,998 developer messages from 11,579 chat sessions across 1,300 repositories and 899 developers using Cursor and GitHub Copilot.

Reported dataset counts in the paper (message, session, repository, developer counts) drawn from public commit histories of chats.

high null result Programming by Chat: A Large-Scale Behavioral Analysis of 11... number of developer messages / chat sessions / repositories / developers analyze...

Conventional microeconomic models often treat interactions between algorithmic platforms and workers as static principal-agent problems.

Literature statement in paper (conceptual framing / literature review); no empirical sample reported.

high null result THE RED QUEEN in the DASHBOARD: CO-EVOLUTIONARY DYNAMICS of ... characterization of theoretical models (static principal-agent framing)

The study sample comprises 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022.

Data description provided in the paper's abstract/introduction specifying the sample frame and time period.

high null result Artificial Intelligence Innovation, Internal Structure Optim... sample composition (firm-year observations)

This paper employs a staggered difference-in-differences (DID) model using data from Chinese A-share listed manufacturing companies from 2012 to 2023 and uses the National Artificial Intelligence Innovative Application Pioneer Zone (AIIAPZ) policy as a quasi-natural experiment.

Staggered DID empirical design; sample described as Chinese A-share listed manufacturing firms, 2012–2023; AIIAPZ policy used as treatment assignment (quasi-natural experiment).

high null result Does Artificial Intelligence Improve the Operational Resilie... methodological design / identification strategy (use of staggered DID and policy...

This study uses semi-structured interviews with 10 practitioners to examine perceptions of collaborating with human versus AI teammates.

Methods statement in the paper: semi-structured interviews; sample size explicitly reported as 10 practitioners.

high null result Bridging the Socio-Emotional Gap: The Functional Dimension o... methodological description (data collection approach)

The study is based on a qualitative analysis of recent academic literature, comparative analysis of sector-specific applications of Big Data technologies, and synthesis of empirical findings from international studies using a systemic and structural analysis approach.

Methodological statement within the paper describing data sources and analytic approach; not an empirical claim about outcomes.

high null result Implications of Big Data Technologies for the Resilience of ... methodological approach (literature synthesis, comparative analysis, systemic/st...

Society 5.0 and Industry 5.0 call for human-centric technology integration, but the concept lacks an operational definition that can be measured, optimized, or evaluated at the firm level.

Motivating claim grounded in literature gap analysis presented in the paper (argument that normative frameworks lack formal, operational metrics at firm level).

high null result From Automation to Augmentation: A Framework for Designing H... operationalizability/measurability of 'human-centricity' at firm level

We propose the Workplace Augmentation Design Index (WADI), a 36-item theory-grounded instrument for diagnosing human-centricity at the firm level.

Instrument design/proposal presented in the paper (36 items mapped to the five workplace-design dimensions); no validation sample reported in the abstract.

high null result From Automation to Augmentation: A Framework for Designing H... diagnosis/measurement of firm-level human-centric workplace design

We conducted a PRISMA-guided systematic review of 120 papers (screened from 6,096 records) to map the evidence base for each workplace-design dimension.

Systematic literature review using PRISMA protocol; final sample = 120 papers; initial records screened = 6,096.

high null result From Automation to Augmentation: A Framework for Designing H... coverage/evidence for each workplace-design dimension in the literature

Existing models of human-AI complementarity treat the augmentation function phi(D) as exogenous and thus ignore that two firms with identical technology investments can achieve radically different augmentation outcomes depending on workplace organization.

Argument based on literature review of prior models (the paper contrasts its approach with existing complementarity models). No new empirical sample reported for this specific claim.

high null result From Automation to Augmentation: A Framework for Designing H... augmentation outcomes (human-AI augmentation productivity)

A subset of four datasets included settings in which the AI provided explanations of its decision.

Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... presence_of_AI_explanation

The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.

Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... dataset_scope

The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.

Description of the HCT method in the paper (procedural/design specification).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... procedure_description

We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.

Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.

high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)

We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.

Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.

high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)

Methods combine targeted literature synthesis, comparative conceptual analysis, and framework building (with recent scholarly and institutional sources reviewed).

Explicit methodological statement in the paper describing the review and analytic approach; no primary-data methods used.

high null result Behavioral Factors as Determinants of Successful Scaling of ... methodological approach (literature synthesis and conceptual framework developme...

AI coding assistants are a high-visibility class of corporate AI and are given special attention as an illustrative case in the paper.

Paper specifically calls out AI coding assistants as a focal example in the conceptual analysis and discussion; based on literature review rather than original measurement.

high null result Behavioral Factors as Determinants of Successful Scaling of ... role of coding assistants as illustrative case for scaling and behavioral dynami...

The Article translates these insights into risk-sensitive guideposts for modernizing governance of AI-enabled tools and emerging modalities, from agentic systems to blockchain-deployed smart contracts.

Prescriptive/conceptual policy guidance presented in the Article (normative recommendations; governance framework).

high null result Rewired: Reconceptualizing Legal Services for the AI Age provision of governance guideposts for AI-enabled legal technologies

The Innovation Frontier traces LegalTech’s evolution from 2000s-vintage e-discovery to generative AI.

Historical/chronological analysis in the Article (literature review/history of LegalTech provided by authors).

high null result Rewired: Reconceptualizing Legal Services for the AI Age narrative/historical scope of LegalTech evolution covered in the Article

The Legal Services Value Chain disaggregates the lifecycle of a legal matter into five distinct nodes of activity.

Model description in the Article (conceptual architecture; decomposition of legal work).

high null result Rewired: Reconceptualizing Legal Services for the AI Age number and structure of nodes in the proposed value-chain model

The Article develops two core organizing models: the Legal Services Value Chain and the Innovation Frontier.

Explicit claim in the Article describing conceptual/model contributions (theoretical/model-building).

high null result Rewired: Reconceptualizing Legal Services for the AI Age presence of two organizing conceptual models in the Article

This Article provides a practical framework for navigating the shifting terrain of legal innovation and AI.

Statement of purpose in the Article (conceptual contribution; framework development). No empirical validation reported in the excerpt.

high null result Rewired: Reconceptualizing Legal Services for the AI Age existence of a practical framework for legal-AI governance and strategy

Three interlocking threads characterize AI for science: (1) AI as research instrument, (2) AI for research infrastructure, and (3) the reshaping of scholarly profiles and incentives by machine-readable metrics.

Conceptual framework presented in the paper; organization of topics rather than empirical measurement. The paper indicates these threads are followed through historical and contemporary examples.

high null result A Brief History of AI for Scientific Discovery: Open Researc... conceptual decomposition of AI-for-science developments

The history of artificial intelligence for scientific discovery is not a two year story about chatbots learning to write papers; it is a sixty year story beginning with DENDRAL (1965).

Historical narrative / literature review citing early systems such as DENDRAL (1965) and subsequent developments in scholarly infrastructure (arXiv, Google Scholar, ORCID). No empirical sample or statistical test reported.

high null result A Brief History of AI for Scientific Discovery: Open Researc... historical scope and timeline of AI for scientific discovery

Both the positive (approach) and negative (avoidance) AI job crafting pathways failed to significantly affect life satisfaction, indicating domain specificity of AI-related psychological mechanisms.

Analysis of the same multi-source, multi-wave dataset of 287 employee–leader dyads; tests of effects on life satisfaction showed non-significant results for both pathways.

high null result Approach or avoidance? A dual-pathway model of job crafting ... life satisfaction

For readers less familiar with the Bayesian and decision-theoretic language, key terms are defined in a glossary at the end of the article.

Statement about the article's structure and supporting material (presence of glossary noted in the article).

high null result Retraining as Approximate Bayesian Inference availability of glossary/terminology definitions

The gap between a continuously updated belief state and your frozen deployed model is 'learning debt.'

Terminology/definition introduced by the author in the article (glossary and definitional exposition).

high null result Retraining as Approximate Bayesian Inference definition/labeling of model staleness

Model retraining is usually treated as an ongoing maintenance task.

Author's descriptive claim in the article; presented as an observation about prevailing practice (no empirical sample or data reported).

high null result Retraining as Approximate Bayesian Inference how retraining is operationalized (treated as maintenance)

The study was conducted by the Mohammed bin Rashid School of Government’s Future of Government Center, in collaboration with global AI pioneers.

Authorship and collaboration statement in the report.

high null result Charting AI Governance Future in the Arab Region: A Policy R... institutional authorship and collaboration on the study

The report highlights the key findings of a field study covering ten Arab countries to explore the realities and challenges of AI governance.

Report statement describing the geographic scope of the field study (explicitly: ten Arab countries).

high null result Charting AI Governance Future in the Arab Region: A Policy R... geographic coverage of the field study (number of countries)

The recommendations are based on regional research that included hundreds of leaders active in the AI domains, from the public and private sectors.

Report statement claiming participant base of the underlying research (described as 'hundreds of leaders').

high null result Charting AI Governance Future in the Arab Region: A Policy R... scope and participant coverage of the underlying research

Data sources include field research conducted in 2024 and public reports from the Ministry of Industry and Information Technology and the National Bureau of Statistics.

Paper statement describing data provenance: field surveys in 2024 (n=326) plus public reports from MIIT and National Bureau of Statistics.

high null result Research on the Adoption of Artificial Intelligence and Proc... data provenance / sources

The visualization avoided redistributing value.

Reported result from the within-subjects experiment (N=32) stating that the visualization did not redistribute value between parties (i.e., it improved outcomes/efficiency without changing value split).

high null result From Overload to Convergence: Supporting Multi-Issue Human-A... distribution of value between negotiating parties (value split / surplus allocat...

Human-like presentations did not raise conformity pressure.

Reported experimental result: manipulaton of presentation style (human-like vs not) and measurement of conformity pressure; the abstract states that human-like presentation increased perceived usefulness/agency without increasing conformity pressure. No quantitative details provided in abstract.

high null result More Isn't Always Better: Balancing Decision Accuracy and Co... conformity pressure

Larger panels yielded no gains in accuracy relative to a single AI.

Reported experimental comparison manipulating panel size in the study (three tasks). The abstract states that larger panels did not produce accuracy gains versus a single AI. (No sample size or numerical effect reported in abstract.)

high null result More Isn't Always Better: Balancing Decision Accuracy and Co... accuracy

The paper proposes an original 'Revenue-Sharing as Infrastructure' (RSI) model in which the platform offers its AI infrastructure for free and takes a percentage of the revenues generated by developers' applications, reversing the traditional upstream payment logic.

Theoretical model proposal and conceptual description in the paper; presented as original contribution (no empirical implementation reported).

high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... business model design (revenue-sharing vs pay-upfront)

Recent literature distinguishes three generations of business models: a first generation modeled on cloud computing (pay-per-use), a second characterized by diversification (freemium, subscriptions), and a third, emerging generation exploring multi-layer market architectures with revenue-sharing mechanisms.

Literature review and conceptual synthesis presented in the paper; no empirical study or sample reported.

high null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... classification of business model generations

We evaluate our approach on spapi, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains.

Case study / evaluation dataset description (explicit counts provided in paper).

high null result LLM-Powered Workflow Optimization for Multidisciplinary Soft... evaluation dataset scale and scope (endpoints, properties, CAN signals, domains)

We document a systematic pattern we call the 'Intent-Source Divide' (experiential vs transactional intent is associated with different source mixes).

Labeling of the observed consistent association between query intent (experiential vs transactional) and citation-source mix in the audited dataset of Google Gemini responses.

high null result The End of Rented Discovery: How AI Search Redistributes Pow... association between query intent and source mix

We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo.

Manual audit of Google Gemini grounding citations for 156 hotel queries in Tokyo; counted 1,357 grounding citations.

high null result The End of Rented Discovery: How AI Search Redistributes Pow... number of grounding citations audited

This study uses a mixed-method research design combining quantitative ROI modelling and cost–benefit analysis, qualitative synthesis of secondary enterprise case studies, and architectural analysis of Azure-native GenAI services.

Explicit methodological description in the abstract of the paper.

high null result Measuring Business ROI of Generative AI Adoption on Azure Cl... research design / methods

This Article presents the results of an experiment in which a transcript of a hypothetical client interview involving potential disability discrimination, retaliation, and wrongful termination claims was submitted to each AI system, with prompts requesting identification and assessment of viable legal theories.

Methodological description of the experiment: one hypothetical client interview transcript fed to each of four AI engines with prompts to identify and assess legal theories.

high null result Robot Wingman: Using AI to Assess an Employment Termination experimental procedure (input and prompts)

The paper derives formal conditions under which the inversion (smaller, orchestrated models outperforming frontier models) holds.

Mathematical derivations and stated sufficient/necessary conditions presented in the paper.

high null result Punctuated Equilibria in Artificial Intelligence: The Instit... parameter conditions for comparative performance inversion

We develop the Institutional Fitness Manifold, a mathematical framework that evaluates AI systems along four dimensions: capability, institutional trust, affordability, and sovereign compliance.

Theoretical/model development presented in the paper (formal definition of the manifold and its four dimensions).

high null result Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness evaluated across four dimensions

There have been five eras of AI development since 1943, and within the current Generative AI Era there are four distinct epochs, each initiated by a discontinuous event.

Descriptive/historical classification within the paper (counts of eras and epochs; named initiating events such as the transformer and the 'DeepSeek Moment').

high null result Punctuated Equilibria in Artificial Intelligence: The Instit... count and classification of historical AI eras/epochs

Despite fears of mass unemployment, aggregate labor-market data through 2025 show limited labor-market disruption from generative AI.

Review of aggregate employment and labor-market studies and macro-level data through 2025 cited in the brief; methods include analyses of employment statistics and macro labor indicators (no single sample size reported).

high null result AI, Productivity, and Labor Markets: A Review of the Empiric... aggregate employment / labor-market disruption

We scored rule-breaking and abuse outcomes with an independent rubric-based judge across 28,112 transcript segments from multi-agent governance simulations.

Reported methodology: multi-agent governance simulations with agents in formal governmental roles, outcomes evaluated by an independent rubric-based judge; explicit sample count of 28,112 transcript segments.

high null result I Can't Believe It's Corrupt: Evaluating Corruption in Multi... rule-breaking and abuse outcomes (as assessed by rubric-based judge)

Controlled experiments were run with N = 250 across five content types to validate the mechanisms.

Experimental methods reported in the paper: controlled experiments with specified sample size and content-type breakdown.

high null result Governed Memory: A Production Architecture for Multi-Agent W... experimental sample size and content-type breadth (N=250, 5 content types)

« Prev 1 2 3 … 15 16 17 … 69 70 Next »