Evidence (3470 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

Research agenda: empirical microdata on managerial time use, task-level automation, performance outcomes, and wage impacts are needed to quantify substitution versus complementarity and to evaluate human-in-the-loop designs' effects on firm performance and distributional outcomes.

Explicit methodological recommendation within the paper; identifies gaps due to the paper's conceptual (non-empirical) approach.

high null result Comparative analysis of strategic vs. computational thinking... availability and use of microdata on managerial tasks, automation, firm performa...

There is a need for longitudinal and cross‑country empirical research to measure how hybrid work and AI tools affect promotion rates, network centrality, productivity, privacy harms, trust, and long‑term career trajectories.

Statement of research gaps derived from the paper's methodological approach (conceptual synthesis and secondary case studies) and absence of longitudinal/cross‑cultural primary data.

high null result The Sociology of Remote Work and Organisational Culture: How... research gap existence (need for longitudinal and cross‑country empirical studie...

Practical recommendations for firms and policymakers include investing in training for AI curation/evaluation/coordination, experimenting with decentralised decision rights and governance safeguards, and monitoring competitive dynamics related to model/platform providers.

Policy and practitioner takeaways explicitly presented in the discussion/implications sections, deriving from the conceptual framework and mapped literature.

high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended organisational and policy actions

The paper recommends a research agenda for AI economists: causal microeconometric studies (DiD, IVs, RCTs), structural models with hybrid human–AI agents, measurement work on GenAI use, distributional analysis and policy evaluation.

Explicit recommendations listed in the implications and research agenda sections; logical follow‑on from bibliometric findings about gaps in causal and measurement evidence.

high null result Generative AI and the algorithmic workplace: a bibliometric ... recommended methodological directions for future empirical and theoretical resea...

Bibliometric mapping profiles the intellectual structure and evolution of the field but does not establish causal effects of GenAI on organisational outcomes.

Methodological limitation explicitly stated in the paper; bibliometric approach (co‑word, citation, thematic mapping) is descriptive and historical in scope.

high null result Generative AI and the algorithmic workplace: a bibliometric ... methodological limitation (inability to infer causality from bibliometric mappin...

Co‑word and thematic analyses reveal six coherent conceptual clusters that bridge technical AI topics (e.g., LLMs, GANs) with managerial themes (e.g., autonomy, coordination, decision‑making).

Thematic mapping and co‑word network analysis performed on the 212‑paper corpus; identification of six clusters reported in results.

high null result Generative AI and the algorithmic workplace: a bibliometric ... number and thematic composition of conceptual clusters (six clusters linking tec...

Bibliometric and conceptual tools (VOSviewer, Bibliometrix) were used to identify performance trends, co‑word structures, thematic maps, and conceptual evolution in the GenAI–organisation literature.

Methods section: use of VOSviewer for network visualization and Bibliometrix for bibliometric statistics, co‑word analysis, thematic mapping and Sankey thematic evolution.

high null result Generative AI and the algorithmic workplace: a bibliometric ... types of bibliometric analyses applied (performance trends, co‑word structures, ...

The study analysed a corpus of 212 Scopus‑indexed publications covering 2018–2025 to map emergent literature on Generative AI and organisational change.

Bibliometric dataset constructed from Scopus; sample size = 212 peer‑reviewed articles; time window 2018–2025; analyses performed with Bibliometrix and VOSviewer.

high null result Generative AI and the algorithmic workplace: a bibliometric ... size and timeframe of bibliometric corpus (number of publications, 2018–2025)

Because the study is cross-sectional and self-report, causal claims are limited and generalizability is restricted to Generation Z (limitation noted in the paper).

Authors' limitations: cross-sectional/self-report design and sample restricted to Generation Z; these constraints are reported in the paper.

high null result Trust in AI-Driven Marketing and its Impact on Brand Loyalty... Inference validity / generalizability

Study design: cross-sectional self-report survey of 450 Generation Z consumers analyzed with Structural Equation Modeling (SPSS AMOS).

Methods section reporting sample size (n = 450), target population (Generation Z), cross-sectional survey design, and analysis technique (SEM using SPSS AMOS).

high null result Trust in AI-Driven Marketing and its Impact on Brand Loyalty... Study design / sample

The measurement and structural model show good to excellent fit and reliable constructs (CFI = 0.980, TLI = 0.974, RMSEA = 0.062, SRMR = 0.031).

Reported psychometric/model-fit indices from SEM analysis (SPSS AMOS) on sample of 450 respondents.

high null result Trust in AI-Driven Marketing and its Impact on Brand Loyalty... Model fit / construct validity

Outcomes reported are primarily self-reported psychological measures rather than objective productivity metrics.

Paper reports measurement instruments focused on self-reported self-efficacy, psychological ownership, meaningfulness, and enjoyment/satisfaction; no primary objective productivity metrics reported.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... measurement type (self-reported psychological outcomes)

The experiment was pre-registered, used occupation-specific writing tasks, and employed a between-subjects design with three conditions (No-AI, Passive AI, Active collaboration).

Study design reported in the paper: pre-registration statement, N = 269, between-subjects assignment to three conditions using occupation-specific writing tasks.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... n/a (methodological claim)

Active, collaborative AI use preserves perceived meaningfulness of work at levels comparable to independent work and does not produce the lasting psychological costs seen with passive use.

Pre-registered experiment (N = 269) with post-manipulation and post-return measures; Active-collaboration condition matched No-AI on meaningfulness and showed no persistent declines after returning to manual tasks.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work (including post-return)

Active, collaborative AI use preserves psychological ownership of outputs at levels comparable to independent work.

Pre-registered experiment (N = 269); Active-collaboration condition reported ownership levels similar to No-AI condition on self-report scales.

high null result Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs

Active, collaborative AI use (human drafts first, then uses AI to refine) preserves self-efficacy at levels comparable to independent (no-AI) work.

Pre-registered experiment (N = 269) comparing Active-collaboration and No-AI conditions; no statistically meaningful differences in self-efficacy between them (self-reported measures).

high null result Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)

The authors propose research priorities for economists: quantify productivity gains from closing the actionability gap; estimate firm-level heterogeneity in evaluation capability and its effect on adoption; and model investment trade-offs between building evaluation-to-action pipelines versus accepting reduced LLM performance.

Paper's concluding recommendations for future research directions (explicitly listed by the authors).

high null result Results-Actionability Gap: Understanding How Practitioners E... recommended research agenda topics

The paper produces as primary outcomes a taxonomy of ten evaluation practices, the articulation of the results-actionability gap, and recommended strategies observed among successful teams.

Authors report these as the main outcomes of their thematic analysis and syntheses from the 19 interviews.

high null result Results-Actionability Gap: Understanding How Practitioners E... reported study outputs (taxonomy, articulated gap, recommended strategies)

The study method consisted of semi-structured qualitative interviews with 19 practitioners across multiple industries and roles, analyzed via thematic coding.

Explicit methods section of the paper stating sample size (n=19), participant diversity, interview approach, and coding/analysis procedure.

high null result Results-Actionability Gap: Understanding How Practitioners E... study design and sample size

The analysis used sentence‑transformer models to produce dense vector representations of article text and UMAP to project those embeddings into a low‑dimensional thematic map for cluster identification and gap detection.

Methods section specifying use of sentence‑transformer embeddings and UMAP for dimensionality reduction/visualization of article text.

high null result Natural language processing in bank marketing: a systematic ... analytic techniques applied to article abstracts/text (embedding + dimensionalit...

The study followed a PRISMA protocol for literature selection and included peer‑reviewed journal articles published between 2014 and 2024, with a final sample size of n = 109.

Explicit methodological statement in the paper describing the literature search, inclusion/exclusion criteria, and final sample.

high null result Natural language processing in bank marketing: a systematic ... methodological protocol adherence and sample size

Twenty‑seven papers study marketing in banking without using NLP methods.

PRISMA systematic review; categorization of the 109 selected articles into the three coverage groups (8, 74, 27).

high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on marketing in banking that do not use NLP

Seventy‑four papers study NLP in marketing more broadly (not specifically banking).

Same PRISMA‑based systematic review and manual categorization of the final sample n = 109 into topical buckets (NLP in marketing vs. NLP in bank marketing vs. marketing in banking without NLP).

high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles on NLP in marketing (general)

Only 8 peer‑reviewed papers directly examine NLP in bank marketing (out of a final sample of 109 articles published 2014–2024).

Systematic review following PRISMA protocol; final sample n = 109 peer‑reviewed journal articles published 2014–2024; manual screening and categorization yielding counts by topic.

high null result Natural language processing in bank marketing: a systematic ... count of peer‑reviewed articles focused on NLP in bank marketing

The study's findings are qualitative and case-driven (Xiaomi and Deloitte); generalizability is limited by case selection and the absence of standardized quantitative metrics.

Methods section explicitly states case analysis and literature review as primary methods and notes lack of large-scale quantitative measurement.

high null result Explore the Impact of Generative AI on Finance and Taxation external validity/generalizability of results

The study is qualitative and law-focused and uses Vietnam as a focused case study without collecting primary quantitative field data.

Explicit Data & Methods statement in the paper indicating doctrinal legal analysis, comparative institutional analysis, and normative framework development; no primary quantitative sample.

high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... study design/data type (qualitative, doctrinal, comparative; absence of primary ...

The study recommends empirical metrics for future evaluation of reforms, including processing time per case, reversal rates on appeal, administrative litigation frequency, compliance and procurement costs, investment flows into public-sector AI, and changes in labor composition and wages in administrative agencies.

Methodological recommendation arising from the paper's normative and comparative analysis.

high null result ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... recommended empirical metrics (processing time per case; appeal reversal rates; ...

The paper's argument is principally theoretical and prescriptive and requires empirical validation across domains and at scale.

Author-stated limitation in the Data & Methods section noting that the work is primarily conceptual and that empirical validation is needed.

high null result An Alternative Trajectory for Generative AI existence/absence of empirical validation (current lack of cross-domain, large-s...

Operationalizing DSS requires building domain ontologies/knowledge graphs, designing synthetic curricula, training compact domain models, benchmarking against monolithic LLMs, and measuring total cost-of-ownership (energy, latency, bandwidth, infrastructure).

Paper's recommended experimental and measurement agenda (procedural/methodological prescriptions); this is a proposed research plan rather than an empirical result.

high null result An Alternative Trajectory for Generative AI validation metrics proposed by the paper (benchmark performance, energy/inferenc...

The paper does not claim proprietary deployment metrics beyond qualitative field observations; experimental formalizations are provided for reproducible evaluation instead.

Authors explicitly note they document how to reproduce experiments but do not claim proprietary deployment metrics beyond qualitative field observations.

high null result Bridging Protocol and Production: Design Patterns for Deploy... degree to which empirical claims are qualitative field observations vs. propriet...

The paper recommends tracking specific operational and economic metrics: MTTR for tool failures, per-invocation latency variance, per-interaction operational cost, frequency of identity-related incidents, human remediation hours per 1,000 incidents, and SLA breach rates.

Explicit list of recommended metrics in the implications and metrics-to-track sections of the paper.

high null result Bridging Protocol and Production: Design Patterns for Deploy... the listed operational/economic metrics (MTTR, latency variance, costs, incident...

The paper provides a production-readiness checklist and instructions for reproducible evaluation alongside the proposed mechanisms.

Deliverables enumerated in the paper include a production-readiness checklist and reproducible experimental methodology.

high null result Bridging Protocol and Production: Design Patterns for Deploy... existence of a production-readiness checklist and reproducible evaluation instru...

All three proposed mechanisms (CABP, ATBA, SERF) are formalized as testable hypotheses with reproducible experimental methodology (benchmarks, latency/error models, broker pipeline semantics).

Paper includes formal descriptions and reproducible evaluation instructions and benchmarks; authors state methods to reproduce experiments are provided.

high null result Bridging Protocol and Production: Design Patterns for Deploy... availability and completeness of reproducible experimental methodology for each ...

The paper organizes production failure modes across five dimensions—server contracts, user context, timeouts, errors, and observability—and provides concrete failure vignettes from an enterprise deployment.

Taxonomy and failure vignettes are listed as design artifacts and deliverables in the paper; derived from observational analysis of production logs and incidents.

high null result Bridging Protocol and Production: Design Patterns for Deploy... classification coverage of failure incidents across the five dimensions

Sample sizes reported: human–AI experiment n = 126; human–human benchmark n = 108.

Study's Data & Methods section reporting sample sizes for the human–AI experiment (n = 126) and citing the human–human benchmark (Dvorak & Fehrler 2024, n = 108).

high null result Playing Against the Machine: Cooperation, Communication, and... reported sample sizes

Experimental design: subjects played an indefinitely repeated Prisoner’s Dilemma in supergames with two between-subjects treatments varying chat timing (chat only before first round of each supergame vs chat before every round); the AI partner was GPT-5.2.

Methods description of the lab experiment reported in the paper: indefinitely repeated PD in supergames, two chat-frequency between-subjects treatments, AI implemented as GPT-5.2; human–AI sample n = 126.

high null result Playing Against the Machine: Cooperation, Communication, and... experimental treatment specification (chat-frequency manipulation; AI identity)

Allowing repeated pre-play communication (chat before every round) has no detectable effect on cooperation rates when the partner is an AI.

Between-subjects manipulation within the human–AI experiment comparing chat-before-first-round vs chat-before-every-round treatments (human–AI n = 126 total); statistical comparison of cooperation rates across the two chat-frequency treatments showed no detectable difference.

high null result Playing Against the Machine: Cooperation, Communication, and... effect of chat frequency on cooperation rate (difference in cooperation between ...

Initial cooperation rates against the AI (GPT-5.2) are high and comparable to initial cooperation in human–human pairs.

Laboratory experiment with human subjects playing an indefinitely repeated Prisoner’s Dilemma against an AI chatbot (GPT-5.2); human–AI sample n = 126; human–human benchmark taken from Dvorak & Fehrler (2024) with n = 108; comparison of initial-round / early-round cooperation rates across conditions.

high null result Playing Against the Machine: Cooperation, Communication, and... initial cooperation rate (cooperation in early rounds / first round of supergame...

Suggested empirical research directions for AI economists include: comparing LLM performance and economic outcomes on rule‑encodable vs tacit tasks; quantifying performance decline when forcing LLMs into interpretable rule representations; studying contracting/pricing where buyers cannot verify internal rules; and measuring returns to scale attributable to tacit capabilities.

Explicitly enumerated recommended research agenda items in the paper; these are proposed studies rather than executed work.

high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... proposed empirical research topics and corresponding outcomes to measure

New metrics are needed to value tacit capabilities — e.g., measures of transfer, generalization under distribution shifts, ease of integrating with human workflows, and irreducibility to compressed rule representations.

Methodological recommendation in the paper listing specific metric categories for future empirical work.

high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... proposed metrics for assessing tacit LLM capabilities

Suggested empirical validations (not performed) include benchmarking LLMs versus rule systems on allegedly rule‑encodable tasks, attempting rule extraction and measuring fidelity loss, and compression/distillation studies to quantify irreducible task performance.

Recommendations and proposed experimental directions listed in the paper; these are proposals, not executed studies.

high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... types of empirical tests recommended for validating the thesis

The paper contains mostly qualitative and historically grounded empirical content and reports no primary datasets or large‑scale experimental results in support of the formal thesis.

Explicit declaration in the Data & Methods section that empirical content is qualitative/historical and no new datasets were collected.

high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... extent of empirical/quantitative evidence presented

The paper's core methodological approach is conceptual and theoretical argumentation (formal/logical proof, historical examples, and philosophical framing), not empirical experimentation.

Stated Data & Methods description indicating reliance on formal logic, historical case analysis, and philosophical argument; absence of primary datasets.

high null result Why the Valuable Capabilities of LLMs Are Precisely the Unex... presence/absence of empirical experiments in the paper

Measuring the marginal cost of runtime governance, the tradeoff curve between task completion and compliance risk, and calibrating violation probabilities are open empirical research questions identified by the paper.

Explicit list of open problems and proposed empirical research agenda in the Implications/Measurement sections of the paper.

high null result Runtime Governance for AI Agents: Policies on Paths existence of empirical research gaps (identified/not identified)

No large empirical dataset or large-scale field experiments were used; the work is primarily theoretical/formal with simulations and worked examples rather than empirical validation.

Paper's Methods/Data section explicitly states the work is theoretical/formal and lists reference implementation and simulations instead of large empirical studies.

high null result Runtime Governance for AI Agents: Policies on Paths use of empirical data (presence/absence of large-scale empirical evaluation)

Risk calibration—mapping violation probabilities to enforcement actions and thresholds—is a key unsolved operational problem for runtime governance.

Paper highlights open problems including risk calibration; argued via conceptual analysis and operational concerns (false positives/negatives, costs of blocking actions).

high null result Runtime Governance for AI Agents: Policies on Paths existence of calibrated thresholds and procedures (presence/absence)

BenchPreS defines two complementary metrics—Misapplication Rate (MR) and Appropriate Application Rate (AAR)—to quantify over‑application and correct personalization, respectively.

Methodological contribution described in the paper: explicit definitions of MR as fraction of inappropriate applications and AAR as fraction of appropriate applications, used to score model behavior.

high null result BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Definition and use of MR and AAR metrics

Pilot randomized or quasi-experimental implementations of reduced workweeks (across firms, industries, or regions) are needed to measure effects on employment, productivity, wages, and consumption.

Research-design recommendation motivated by lack of contemporary causal evidence; not an empirical finding but a stated priority for rigorous testing.

high null result A Shorter Workweek as a Policy Response to AI-Driven Labor D... measured causal effects of reduced workweeks on employment, productivity, wages,...

There is limited direct causal identification separating technology-driven layoffs from incentive-driven layoffs in current firm-level data, creating a need for new firm-panel datasets linking AI adoption, executive pay/ownership, layoff decisions, and local demand outcomes.

Stated limitation of the paper and research-priority recommendation; assessment based on literature gaps noted in the synthesis rather than empirical gap quantification.

high null result A Shorter Workweek as a Policy Response to AI-Driven Labor D... availability/coverage of firm-level panel data capable of separating AI effects ...

Observed layoffs should be treated in empirical research as outcomes of firm governance and incentive structures; econometric studies estimating displacement from AI must control for managerial incentives and financial pressures.

Methodological recommendation based on the conceptual argument and literature linking governance/incentives to firm behavior; no new empirical demonstration provided.

high null result A Shorter Workweek as a Policy Response to AI-Driven Labor D... bias in estimated causal effect of AI on layoffs when not controlling for manage...

« Prev 1 2 3 … 16 17 18 … 69 70 Next »