Evidence (4560 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Productivity Remove filter

The evidence presented consists mainly of qualitative arguments drawn from documented advances and discussion of prototypes; no controlled experimental evaluation is presented.

Authors' own description in the Data & Methods section about the nature of evidence supporting their perspective.

high null result ChatMicroscopy: A Perspective Review of Large Language Model... availability and type of empirical evidence for claims (qualitative/prototype vs...

This paper is a conceptual perspective/review rather than an original empirical study.

Explicit statement in the Data & Methods section that the contribution is a perspective synthesizing literature and illustrative examples with no controlled experimental evaluation.

high null result ChatMicroscopy: A Perspective Review of Large Language Model... type of scholarly contribution (conceptual review)

Modern microscopes are increasingly software-driven and data-intensive, while existing ML tools for microscopy are task-specific and fragmented.

Synthesis of recent literature on optical microscopes, detectors, and task-specific ML for image analysis referenced in the perspective (descriptive claim; no new empirical data collected).

high null result ChatMicroscopy: A Perspective Review of Large Language Model... degree of software control and data volume/intensity in modern microscopy system...

Techno‑economic assessments (TEA) and life‑cycle analyses (LCA) are necessary research tools to compare bio‑routes to incumbent chemical synthesis on cost and emissions, and current literature is incomplete in this regard.

Review notes the presence of some TEA/LCA studies but highlights gaps and heterogeneity in methods and results across case studies; many processes lack published TEA/LCA at commercial scales.

high null result Harnessing Microbial Factories: Biotechnology at the Edge of... existence and comprehensiveness of TEA/LCA studies for documented bio-processes;...

Analyses were conducted as intent-to-treat comparisons across arms, with hypothesis tests reported (including p-values) and principal stratification used for mechanism decomposition.

Methods statement: intent-to-treat comparisons, reported p-values for score differences, and use of principal stratification for separating total effect into adoption and effectiveness channels in the randomized trial (n = 164).

high null result Training for Technology: Adoption and Productive Use of Gene... Analysis methods (ITT, hypothesis tests, principal stratification)

The primary outcomes analyzed were LLM adoption (use), exam score (grade points), and answer length.

Study’s stated primary outcomes in methods: adoption indicator, exam score on an issue-spotting exam, and answer length (measured). Sample size n = 164.

high null result Training for Technology: Adoption and Productive Use of Gene... Adoption; exam score; answer length

The study used a randomized controlled design with three arms: no LLM access, optional LLM access, and optional LLM access plus brief training.

Study methods description: randomized assignment of 164 law students to three experimental conditions as listed.

high null result Training for Technology: Adoption and Productive Use of Gene... Study design (randomization and arm definitions)

The intervention consisted of roughly a ten-minute training focused on how to use the LLM effectively.

Study description of the intervention in the randomized experiment (three-arm design with one arm receiving ~10-minute targeted training).

high null result Training for Technology: Adoption and Productive Use of Gene... Intervention duration/content (training implementation)

Findings are estimated for Chinese cities and require replication in other institutional contexts to assess external validity.

Scope statement in the paper — primary empirical sample limited to 274 Chinese cities; authors note generalizability limits and call for replication elsewhere.

high null result Artificial intelligence, greening of occupational structure ... Generalizability/external validity (interpretative claim)

The paper’s AI exposure index — capturing automation and service-sector transformation — is important for robust measurement in empirical work on AI’s macro and environmental effects.

Methodological claim justified by the paper's construction of the index and its use in the main and robustness regressions; robustness checks reported using alternative index specifications.

high null result Artificial intelligence, greening of occupational structure ... Quality/robustness of AI exposure measurement (index performance across specific...

The paper constructs an AI exposure index that captures both industrial automation (robots) and AI-enabled transformation of service-sector jobs/tasks.

Methodological construction described in the paper combining measures of industrial robot adoption (sectoral push) and AI-driven changes in service-sector job/task content.

high null result Artificial intelligence, greening of occupational structure ... AI exposure index (independent variable)

The study uses a panel of 274 Chinese cities from 2007–2021 as the primary empirical sample.

Descriptive dataset information reported in the paper — city-level panel covering 274 cities and the years 2007 through 2021.

high null result Artificial intelligence, greening of occupational structure ... N/A (sample description)

Empirical validation of the book’s proposals would require complementary case studies, model documentation, and outcome measurements.

Author/reviewer recommendation in the blurb about methodological limitations and next steps; not an empirical finding.

high null result Governing The Future need for empirical case studies, documented models, and outcome metrics to valid...

The book is predominantly conceptual and policy-analytic and uses illustrative case vignettes rather than presenting a single empirical study.

Explicit methodological description in the Data & Methods blurb: synthesis of technical ideas, governance requirements, and illustrative vignettes; no empirical sample or experimental protocol described.

high null result Governing The Future presence or absence of empirical methodology in the book

The evidence base is qualitative: the study uses conceptual framework synthesis, comparative analysis of multi-sector implementations, and case examples rather than randomized or large-sample empirical evaluation.

Methods and limitations section of the paper explicitly describing the evidence base and methods (qualitative synthesis, pattern extraction, cross-case lessons).

high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... type and rigor of empirical evidence supporting claims

The paper presents a deployment pattern intended to be adapted by sector and regulatory context rather than a one-size-fits-all blueprint.

Explicit statement in the paper and the described pattern design; based on qualitative pattern extraction and prescriptive guidance.

high null result Governed Hyperautomation for CRM and ERP: A Reference Patter... character of the deployment guidance (adaptable pattern vs. fixed blueprint)

Methodological claim: combining fixed-effects panel estimation, mediation analysis, and panel threshold models is an effective multi-method approach to (a) estimate average effects, (b) unpack causal channels, and (c) detect nonlinear stage-dependent impacts.

The paper's applied methodology: fixed-effects panel regressions, mediation framework, and panel threshold modeling on the 2012–2022 provincial panel.

high null result Digital rural development and agricultural green total facto... Methodological validity / estimation strategy

The paper constructs a multidimensional digitalization index composed of digital infrastructure, digital service capacity, and the digital development environment.

Index construction described in data/methods: composite indicator combining measures of connectivity/broadband (infrastructure), e-commerce/digital finance (service capacity), and policy/institutional/human capital indicators (development environment).

high null result Digital rural development and agricultural green total facto... Digitalization index components (infrastructure, service capacity, development e...

The study is observational (panel) and subject to limitations: residual confounding is possible; two-way fixed-effects estimators can be biased with heterogeneous treatment timing or dynamics; external validity beyond China and non-grain crops is not established.

Authors' stated limitations and caveats in the paper regarding identification and generalizability of results from the CLDS 2014–2018 observational panel.

high null result Whole-Process Agricultural Production Chain Management and L... study validity and generalizability (methodological limitation)

The study uses two-way fixed-effects (household and year) models as the primary identification strategy and employs propensity score matching (PSM) as a robustness check.

Methods section of the paper describing estimation strategy applied to the CLDS 2014–2018 panel of grain-producing households.

high null result Whole-Process Agricultural Production Chain Management and L... methodological approach (no substantive outcome)

Attributing productivity changes specifically to AI requires causal identification beyond VIS accounting (e.g., experiments, instrumental variables, difference-in-differences).

Paper notes that VIS is an accounting framework and that causal attribution to AI requires econometric/experimental methods beyond input–output accounting.

high null result Measuring labor productivity dynamics in U.S. industrial and... need for causal identification methods to link observed productivity changes to ...

The method uses BEA for industry output and industry-by-industry transactions, BLS for employment and hours worked, and IMPLAN for detailed input–output structure and sector mapping; coverage period is 2014–2023.

Explicit data sources and time coverage stated: public BEA, BLS, and IMPLAN annual data 2014–2023 used to construct input–output matrices and labor measures.

high null result Measuring labor productivity dynamics in U.S. industrial and... data provenance and temporal coverage (2014–2023)

Limitations of the review include the small sample of studies, uneven geographic coverage, heterogeneity in methods across studies, and limited long‑run evidence (especially on generative AI), which complicate causal aggregation.

Author-reported limitations based on the meta-assessment of the 17 included studies (variation in methods, contexts, and time horizons).

high null result The role of generative artificial intelligence on labor mark... limitations to causal inference and generalizability

Design of this work: a systematic literature review and meta‑synthesis of empirical findings from peer‑reviewed journals (2020–2025), based on 17 publications.

Stated methods and inclusion criteria of the paper: systematic review of peer‑reviewed literature (sample = 17).

high null result The role of generative artificial intelligence on labor mark... study design / review methodology

Long-term evidence on generative AI’s structural labor‑market effects is scarce; few longitudinal studies exist.

Assessment of study horizons and methods among the 17 papers indicates limited long-run and longitudinal analyses specifically on generative AI impacts.

high null result The role of generative artificial intelligence on labor mark... availability of long-term / longitudinal studies on generative AI effects

Empirical coverage is limited for low‑income countries; evidence from such settings is scarce.

Geographic distribution of the 17 reviewed studies shows concentration in advanced economies with few or no studies focused on low-income countries.

high null result The role of generative artificial intelligence on labor mark... geographic representativeness of empirical evidence

The literature shows a surge in research activity on AI and labor markets in 2023–2025 and a concentration of studies in advanced economies.

Meta-analytic summary of the publication years and geographic focus among the 17 selected publications (temporal and geographic count of included studies).

high null result The role of generative artificial intelligence on labor mark... publication counts by year and geographic coverage

The paper proposes two conceptual models (AI/ML‑Driven Labor Market Transformation Model and Sectoral Impact and Resilience Model) to organize heterogeneous findings and generate testable hypotheses about how AI reshapes labor across sectors and skill levels.

Conceptual synthesis integrating Technological Determinism, Socio‑Technical Systems Theory (STS), and Skill‑Biased Technological Change (SBTC); the models are theoretical outputs of the review used to map mechanisms and heterogeneity rather than empirical findings.

high null result The Impact of AI Machine Learning on Human Labor in the Work... conceptual mapping of mechanisms (task automation vs augmentation, sectoral expo...

There are substantial measurement and identification gaps in the literature: heterogeneity in measuring 'AI adoption', limited long‑run causal evidence, and geographic bias toward advanced economies.

Methodological assessment within the review noting variability across studies in AI measures (patents, investment, task exposure proxies), paucity of long‑run causal designs, and concentration of empirical studies in advanced economies; this is a meta‑evidence limitation statement.

high null result The Impact of AI Machine Learning on Human Labor in the Work... quality and robustness of empirical evidence on AI's labor‑market impacts

The Iceberg Index indicates where capability exists but does not indicate whether or when job losses will occur.

Explicit caution in the paper noting the distinction between technical exposure (capability overlap) and realized labor-market outcomes; methodological limitation described.

high null result The Iceberg Index: Measuring Workforce Exposure in the AI Ec... distinction between capability exposure (Iceberg Index) and realized job loss/ad...

The Iceberg Index captures capability overlap but does not capture firm adoption choices, regulatory constraints, social acceptance, complementarity effects, or worker reallocation dynamics.

Limitations section in the paper explicitly listing these omitted factors; methodological boundaries of the Iceberg Index stated.

high null result The Iceberg Index: Measuring Workforce Exposure in the AI Ec... scope/limitations of the Iceberg Index (what it does not measure)

Model and simulations are implemented with the AgentTorch framework.

Implementation note in the paper indicating AgentTorch was used to build the agent-based models and run simulations.

high null result The Iceberg Index: Measuring Workforce Exposure in the AI Ec... implementation platform (AgentTorch)

The simulation model represents 151 million U.S. workers as autonomous agents, covers 32,000+ distinct skills, links agents to thousands of AI tools, and provides county-level resolution (~3,000 U.S. counties).

Model specification described in the paper: large-population agent-based model (AgentTorch) parameterized with occupation, skills portfolios, wages, and county locations; counts provided in the paper.

high null result The Iceberg Index: Measuring Workforce Exposure in the AI Ec... model scope metrics: number of agents (151M), skills (~32k), counties (~3k), and...

The Iceberg Index is a skills-centered metric that measures the wage value of specific skills AI systems can perform within each occupation; it quantifies technical exposure (capability overlap), not displacement, adoption timelines, or realized outcomes.

Methodological definition: mapping of ~32,000 skills to occupations with wage-value contributions, summing wages of skills that current AI capabilities cover to compute the index.

high null result The Iceberg Index: Measuring Workforce Exposure in the AI Ec... Iceberg Index value (wage-value of automatable skills per occupation/geography)

Key empirical gaps remain: better measurement of K_T (AI/software capital), more granular matched employer‑employee and wealth data, and improved estimates of task-substitution elasticities are required to precisely quantify incidence and policy impacts.

Authors’ stated research agenda and limitations section, including sensitivity analyses showing outcome variation with parameter choices and measurement uncertainty.

high null result The Macroeconomic Transition of Technological Capital in the... quality/precision of measurement of K_T and task-substitution elasticities (rese...

The ManagerWorker two-agent pipeline (expensive text-only manager + cheaper worker with repo access) can substitute expensive execution by using expensive reasoning in the manager and cheaper execution in the worker.

System design description plus empirical results on 200 SWE-bench Lite instances showing parity in success rates between a strong-manager/weak-worker pipeline and a strong single agent while using fewer strong-model tokens.

high positive Can AI Models Direct Each Other? Organizational Structure as... ability to substitute expensive execution with expensive reasoning (operationali...

A minimal review-only manager loop adds only 2 percentage points over the baseline, whereas structured exploration and planning by the manager add 11 percentage points, demonstrating that active direction (not mere reviewing) produces most of the benefit.

Ablation-style comparison of pipeline variants on the 200-instance SWE-bench Lite evaluation: review-only manager loop versus manager with structured exploration and planning; reported improvements in percentage points.

high positive Can AI Models Direct Each Other? Organizational Structure as... improvement in task success rate (percentage-point increase)

A strong manager directing a weak worker achieves a 62% success rate on software-engineering tasks, matching a strong single agent which achieves 60%, while using a fraction of the strong-model token usage.

Empirical evaluation on 200 instances from SWE-bench Lite across five pipeline configurations and model pairings; measured task success rates and token usage for manager-worker pipelines versus single-agent baselines.

high positive Can AI Models Direct Each Other? Organizational Structure as... task success rate (percentage of tasks solved)

Under economy-wide deployment, the share of computer-vision-exposed labor compensation that is cost-effectively automatable rises sharply (relative to the firm-level 11% estimate).

Model counterfactuals or calibration scenarios comparing firm-level deployment vs economy-wide deployment; qualitative statement that share increases substantially.

high positive Economics of Human and AI Collaboration: When is Partial Aut... share of labor compensation automatable under economy-wide deployment

At the firm level, cost-effective automation captures approximately 11% of computer-vision-exposed labor compensation.

Calibration and implementation in computer vision; reported firm-level estimate from the framework.

high positive Economics of Human and AI Collaboration: When is Partial Aut... share of computer-vision-exposed labor compensation captured by cost-effective a...

Scale of deployment is a key determinant: AI-as-a-Service and AI agents spread fixed costs across users, sharply expanding economically viable tasks.

Modeling and calibration arguments showing fixed-cost spreading effects increase set of tasks for which automation is cost-effective; qualitative and quantitative comparisons in implementation.

high positive Economics of Human and AI Collaboration: When is Partial Aut... number/coverage of economically viable tasks (adoption potential) as a function ...

Because higher accuracy is disproportionately costly (convex cost), full automation is often not cost-minimizing; partial automation, where firms retain human workers for residual tasks, frequently emerges as the equilibrium.

Theoretical model combined with calibration (scaling laws + task mappings); equilibrium outcomes reported from the framework implementation.

high positive Economics of Human and AI Collaboration: When is Partial Aut... prevalence of partial automation vs full automation as cost-minimizing choices

We model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation.

The paper develops a theoretical framework / model that treats automation intensity as a continuous decision variable; described as the central modeling approach.

high positive Economics of Human and AI Collaboration: When is Partial Aut... degree of automation (accuracy level chosen by firms)

The findings demonstrate that technological innovation strategies, when effectively implemented, provide measurable competitive advantages for banks and offer evidence-based insights for policymakers and practitioners.

Authors' interpretation/conclusion drawing on the reported statistically significant relationships between innovation (product and technological) and competitiveness.

high positive Technology Innovation Strategy and the Competitiveness of Ke... competitiveness (market share, profitability, customer satisfaction)

Technological innovation is positively and statistically significantly related to bank competitiveness (simple linear regression result reported).

Simple linear regression reported in the paper testing the hypothesis that technological innovation influences competitiveness; data collected from innovation-focused executives across licensed banks (paper states data from 39 licensed banks).

high positive Technology Innovation Strategy and the Competitiveness of Ke... competitiveness (market share, return on equity, customer satisfaction)

Product innovation strategy has a positive and statistically significant effect on competitiveness (F(1,134) = 74.983, p < .001).

Bivariate regression analysis reported in the paper with F(1,134)=74.983, p < .001; based on survey data from innovation-focused executives (regression degrees of freedom indicate n≈136 observations).

high positive Technology Innovation Strategy and the Competitiveness of Ke... competitiveness (measured via market share, return on equity, and customer satis...

In the user study, AI-expanded 5W3H prompts increase user satisfaction from 3.16 to 4.04.

Reported pre/post or baseline vs AI-expanded satisfaction scores in the N=50 user study with numeric scores 3.16 and 4.04.

high positive Structured Intent as a Protocol-Like Communication Layer: Cr... user satisfaction (rating scale)

In the user study, AI-expanded 5W3H prompts reduce interaction rounds by 60 percent.

Reported comparison in the N=50 user study between baseline interaction rounds and rounds after AI-assisted 5W3H expansion; percentage reduction reported as 60%.

high positive Structured Intent as a Protocol-Like Communication Layer: Cr... interaction rounds (number of back-and-forth interactions to reach goal)

A weak-model compensation pattern was observed: the lowest-baseline model (Gemini) shows a much larger D-A gain (+1.006) than the strongest model (Claude, +0.217).

Model-level comparison of D-A gain (difference between structured and unstructured conditions) across three models (Claude, GPT-4o, Gemini) on the evaluated outputs; reported gains for Gemini and Claude.

high positive Structured Intent as a Protocol-Like Communication Layer: Cr... D-A gain (improvement in goal-alignment score from structured prompting)

The strongest structured conditions reduce cross-language sigma from 0.470 to about 0.020.

Reported numeric comparison of sigma (variance) between unstructured baseline and strongest structured prompting conditions across evaluated outputs.

high positive Structured Intent as a Protocol-Like Communication Layer: Cr... cross-language sigma (standard deviation of scores across languages)

« Prev 1 2 3 … 20 21 22 … 91 92 Next »