Evidence (13870 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	196	98	892	1984
Governance & Regulation	817	394	188	121	1544
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	627	233	123	96	1088
Research Productivity	411	123	56	332	933
Output Quality	467	178	59	47	751
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	167	122	24	496
Task Allocation	207	64	71	32	379
Skill Acquisition	165	59	60	17	301
Innovation Output	203	27	43	18	292
Employment Level	105	52	107	13	279
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	150	48	26	3	227
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	63	20	12	184
Error Rate	69	92	10	2	173
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	93	21	13	19	148
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	17	7	3	59
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

The study sample comprises 21,428 firm-year observations from Chinese A-share listed manufacturing companies over 2010–2022.

Data description provided in the paper's abstract/introduction specifying the sample frame and time period.

high null result Artificial Intelligence Innovation, Internal Structure Optim... sample composition (firm-year observations)

We find little evidence of crashing waves (in contrast to recent work by METR).

Analysis of the >3,000 tasks and >17,000 evaluations which reportedly do not show abrupt, concentrated surges in AI capability on small sets of tasks.

high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... presence of abrupt concentrated capability surges ('crashing waves')

The evaluation is based on more than 17,000 evaluations by workers from these jobs.

Reported sample of >17,000 human evaluations of model outputs.

high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... number of human evaluations

We test for these effects in preliminary evidence from an ongoing evaluation of AI capabilities across over 3,000 broad-based tasks derived from the U.S. Department of Labor O*NET categorization that are text-based and thus LLM-addressable.

Empirical study design reporting an ongoing evaluation covering >3,000 text-based tasks mapped from O*NET.

high null result Crashing Waves vs. Rising Tides: Preliminary Findings on AI ... coverage of LLM-addressable tasks (task sample)

Green innovation does not yet significantly reduce carbon inequality.

Empirical results from the provincial panel analysis (2003–2021) showing that measures of green innovation are not associated with a statistically significant reduction in carbon inequality.

high null result Artificial intelligence, green innovation, and regional carb... carbon inequality

This paper employs a staggered difference-in-differences (DID) model using data from Chinese A-share listed manufacturing companies from 2012 to 2023 and uses the National Artificial Intelligence Innovative Application Pioneer Zone (AIIAPZ) policy as a quasi-natural experiment.

Staggered DID empirical design; sample described as Chinese A-share listed manufacturing firms, 2012–2023; AIIAPZ policy used as treatment assignment (quasi-natural experiment).

high null result Does Artificial Intelligence Improve the Operational Resilie... methodological design / identification strategy (use of staggered DID and policy...

The paper characterizes the symmetric Nash equilibrium in a preemption game of competing frontier-AI firms.

Analytic game-theoretic model and equilibrium derivations presented in the paper (formal characterization/propositions).

high null result Optimal Release Timing of AI Systems: A Strategic Analysis w... strategic equilibrium (symmetric Nash) in release-timing preemption game

The study uses panel data from listed manufacturing firms in China and employs a quasi-natural experiment approach.

Statement in the abstract describing data source (panel of listed manufacturing firms in China) and empirical strategy (quasi-natural experiment).

high null result The impact of R&D innovation strategy on the sustainable... data source and identification strategy

Big data analytics and blockchain technologies show no significant correlations with exports to specific destinations (multivariate probit result).

Multivariate probit model of destination-specific export decisions showing non-significant coefficients for big data analytics and blockchain across destinations (sample size not reported in prompt).

high null result How Digitalization Shapes Export Potential: Firm-Level Insig... exporting to specific destination regions (binary/region-specific firm export de...

Adopting blockchain technologies does not have a statistically significant effect on a firm's likelihood of exporting (probit model result).

Probit regression analysis showing non-significant coefficient for blockchain adoption (sample size not reported in prompt).

high null result How Digitalization Shapes Export Potential: Firm-Level Insig... likelihood/probability of exporting (firm-level)

Adopting big data analytics does not have a statistically significant effect on a firm's likelihood of exporting (probit model result).

Probit regression analysis showing non-significant coefficient for big data analytics adoption (sample size not reported in prompt).

high null result How Digitalization Shapes Export Potential: Firm-Level Insig... likelihood/probability of exporting (firm-level)

We introduce the Agentic Task Exposure (ATE) score, a composite measure computed algorithmically from O*NET task data using calibrated adoption parameters (not a regression estimate), incorporating AI capability scores, workflow coverage factors, and logistic adoption velocity.

Methodological description in the paper; algorithmic construction from O*NET task data with specified calibrated adoption parameters and components (AI capability scores, workflow coverage, logistic adoption).

high null result Agentic AI and Occupational Displacement: A Multi-Regional T... NA (methodological construct for measuring exposure/adoption)

Code authoring and review are only a small part of the larger software engineering process; the resulting code must also be maintained and updated over time.

Conceptual/argumentative claim presented in the paper to motivate longitudinal analysis (not presented as an empirical estimate from the dataset).

high null result Investigating Autonomous Agent Contributions in the Wild: Ac... relative share of authoring/review versus maintenance in software development (c...

We offer several longitudinal estimates of survival and churn rates for agent-generated versus human-authored code.

Longitudinal analysis reported in the paper comparing survival and churn for agent-generated and human-authored code over time using the dataset (paper states these estimates were produced).

high null result Investigating Autonomous Agent Contributions in the Wild: Ac... survival rates and churn rates of code contributions

We compare five popular coding agents, including OpenAI Codex, Claude Code, GitHub Copilot, Google Jules, and Devin, examining how their usage differs in various development aspects such as merge frequency, edited file types, and developer interaction signals, including comments and reviews.

Comparative analysis across agents using the constructed dataset of ~110,000 PRs (paper states these five agents were compared on metrics like merge frequency, edited file types, and interaction signals).

high null result Investigating Autonomous Agent Contributions in the Wild: Ac... merge frequency, edited file types, developer interaction signals (comments, rev...

We construct a novel dataset of approximately 110,000 open-source pull requests, including associated commits, comments, reviews, issues, and file changes, collectively representing millions of lines of source code.

Descriptive dataset construction reported in the paper (stated sample size ~110,000 PRs including commits, comments, reviews, issues, file changes; representing millions of lines of code).

high null result Investigating Autonomous Agent Contributions in the Wild: Ac... number of pull requests and total lines of source code in dataset

The paper extends classical (Solow) and endogenous (Romer) growth models to incorporate TAI, producing a dynamic framework for analyzing AI-driven structural change.

Methodological claim: the authors explicitly state they build on Solow (1956) and Romer (1990) to develop an integrated dynamic model that incorporates TAI; evidence is described model extension and formalization within the paper.

high null result Transformative AI and the Evolution of Growth Models: Extend... modeling framework / analytical capacity to study AI-driven structural change

The study uses dynamic fixed-effects and dynamic panel threshold regression techniques on a panel of 23 developed and developing countries from 2002 to 2023.

Methodological statement in the abstract specifying the estimation techniques and the dataset: a panel of 23 countries over 2002–2023.

high null result Can AI technology innovation promote national entrepreneursh... entrepreneurship

This study uses semi-structured interviews with 10 practitioners to examine perceptions of collaborating with human versus AI teammates.

Methods statement in the paper: semi-structured interviews; sample size explicitly reported as 10 practitioners.

high null result Bridging the Socio-Emotional Gap: The Functional Dimension o... methodological description (data collection approach)

The study is based on a qualitative analysis of recent academic literature, comparative analysis of sector-specific applications of Big Data technologies, and synthesis of empirical findings from international studies using a systemic and structural analysis approach.

Methodological statement within the paper describing data sources and analytic approach; not an empirical claim about outcomes.

high null result Implications of Big Data Technologies for the Resilience of ... methodological approach (literature synthesis, comparative analysis, systemic/st...

The research documents a transition in the literature (2013–2025) from early 'risk-of-automation' evaluations toward task-based and firm-level econometric models.

Literature review/synthesis across the 2013–2025 body of research as described in the paper.

high null result Impact Of Artificial Intelligence (AI) On Employment research methods / framework change

Society 5.0 and Industry 5.0 call for human-centric technology integration, but the concept lacks an operational definition that can be measured, optimized, or evaluated at the firm level.

Motivating claim grounded in literature gap analysis presented in the paper (argument that normative frameworks lack formal, operational metrics at firm level).

high null result From Automation to Augmentation: A Framework for Designing H... operationalizability/measurability of 'human-centricity' at firm level

We propose the Workplace Augmentation Design Index (WADI), a 36-item theory-grounded instrument for diagnosing human-centricity at the firm level.

Instrument design/proposal presented in the paper (36 items mapped to the five workplace-design dimensions); no validation sample reported in the abstract.

high null result From Automation to Augmentation: A Framework for Designing H... diagnosis/measurement of firm-level human-centric workplace design

We conducted a PRISMA-guided systematic review of 120 papers (screened from 6,096 records) to map the evidence base for each workplace-design dimension.

Systematic literature review using PRISMA protocol; final sample = 120 papers; initial records screened = 6,096.

high null result From Automation to Augmentation: A Framework for Designing H... coverage/evidence for each workplace-design dimension in the literature

Existing models of human-AI complementarity treat the augmentation function phi(D) as exogenous and thus ignore that two firms with identical technology investments can achieve radically different augmentation outcomes depending on workplace organization.

Argument based on literature review of prior models (the paper contrasts its approach with existing complementarity models). No new empirical sample reported for this specific claim.

high null result From Automation to Augmentation: A Framework for Designing H... augmentation outcomes (human-AI augmentation productivity)

The widening effect of AI adoption on the electricity output growth gap diminishes over time and becomes statistically insignificant after approximately three years.

Temporal (dynamic) empirical analysis / event-study-style estimation tracing the AI adoption effect over multiple years post-adoption; statistical significance reported to fade by year ~3. Sample size / exact time windows not provided in the summary.

high null result The Impact of AI Adoption on Electricity Output Growth Gap: ... corporate electricity output growth gap (time-varying effect)

The review employed a systematic analysis of multidisciplinary studies (qualitative, quantitative, and bibliometric) focused on agentic AI technologies in financial domains, covering literature published up to mid-2024.

Stated methodology of the paper (systematic review description).

high null result A Comparative & Systematic Review of Literature on the I... scope and methods of the review itself

A subset of four datasets included settings in which the AI provided explanations of its decision.

Paper states that four of the datasets involved AI explanations (explicitly stated in abstract).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... presence_of_AI_explanation

The study compared HCT to the AI-as-advisor approach using 10 datasets from various domains, including medical diagnostics and misinformation discernment.

Paper reports an empirical comparison across 10 datasets spanning multiple domains (explicitly stated in abstract).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... dataset_scope

The hybrid confirmation tree (HCT) elicits a human judgment and an AI judgment independently; if they agree that decision is accepted, and if they disagree a second human breaks the tie.

Description of the HCT method in the paper (procedural/design specification).

high null result Beyond AI advice -- independent aggregation boosts human-AI ... procedure_description

The cross-sectional, self-reported survey design prevents strong causal claims about the effect of algorithms or selective exposure on polarization.

Authors explicitly note methodological limitations: cross-sectional survey of N = 450, reliance on self-reported consumption, and lack of platform log or longitudinal/experimental data.

high null result Echo Chambers, Filter Bubbles, and Selective Exposure: Media... causal inference ability (limitation due to design)

The study adopted a positivist philosophy and a descriptive-correlational design.

Methods section statement in the paper describing the research philosophy and study design.

high null result Technology Innovation Strategy and the Competitiveness of Ke... research design / methodology

Data were collected from innovation-focused executives across 39 licensed Kenyan commercial banks.

Paper statement specifying sample source: 'Using data from innovation-focused executives across 39 licensed banks.'

high null result Technology Innovation Strategy and the Competitiveness of Ke... sample composition / data source

Technological innovation was assessed via adoption of new systems, integration of digital channels, and use of Artificial Intelligence and data analytics.

Measurement description provided in the paper listing the components used to operationalize technological innovation.

high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of technological innovation

Competitiveness in the study was measured through market share, return on equity and customer satisfaction.

Measurement description provided in the paper describing dependent variable operationalization (explicit list of three indicators).

high null result Technology Innovation Strategy and the Competitiveness of Ke... measurement/operationalization of competitiveness

Metode penelitian yang digunakan adalah penelitian hukum normatif dengan pendekatan perundang-undangan, konseptual, dan komparatif, didukung oleh analisis literatur dari jurnal nasional terindeks SINTA dan jurnal internasional bereputasi.

Pernyataan metode yang jelas tercantum dalam abstrak/metodologi makalah.

high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... metodologi penelitian (penelitian hukum normatif dan tinjauan literatur)

Penelitian menilai kecukupan perlindungan hukum yang tersedia bagi pekerja terdampak PHK akibat adopsi AI.

Pernyataan tujuan penelitian dan pendekatan analitis (normatif, komparatif) yang didukung oleh tinjauan literatur pada jurnal-jurnal terpilih.

high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... kecukupan perlindungan hukum bagi pekerja terdampak AI

Penelitian ini bertujuan menganalisis bagaimana Undang-Undang Cipta Kerja dan peraturan turunannya mengklasifikasikan dan menjustifikasi Pemutusan Hubungan Kerja (PHK) akibat adopsi AI.

Pernyataan tujuan penelitian yang tercantum di bagian metodologi/pendahuluan; pendekatan peraturan-perundang-undangan dalam penelitian hukum normatif.

high null result Reformasi Hukum Ketenagakerjaan di Era Artificial Intelligen... klasifikasi dan justifikasi PHK dalam kerangka UU Cipta Kerja

The user study had N=50 participants.

Reported user study sample size (N=50) used to evaluate AI-assisted intent expansion in ecologically valid settings.

high null result Structured Intent as a Protocol-Like Communication Layer: Cr... user study sample size

Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient.

Controlled comparison between three structured frameworks (5W3H, CO-STAR, RISEN) across the evaluated outputs, with no meaningful differences reported between them.

high null result Structured Intent as a Protocol-Like Communication Layer: Cr... goal-alignment scores

The study evaluated 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks) using an independent judge (DeepSeek-V3).

Reported experimental design and evaluation: 3 languages, 6 conditions, 3 models, 3 domains, 20 tasks; judged by DeepSeek-V3.

high null result Structured Intent as a Protocol-Like Communication Layer: Cr... number of model outputs evaluated / evaluation procedure

The paper frames the LLM-politician relationship through principal-agent theory and bounded rationality, conceptualizing the legislator as a principal delegating advisory tasks to a boundedly rational agent under structural information asymmetry.

Explicit theoretical framing described in the introduction or theory section of the paper.

high null result Can Commercial LLMs Be Parliamentary Political Companions? C... theoretical framing

Model outputs were evaluated using a dual framework combining LLM-as-Judge semantic scoring and programmatic text similarity metrics.

Paper describes the evaluation methodology: semantic scoring via LLM-as-Judge plus programmatic text similarity measures applied to model-generated rationales vs official memoranda.

high null result Can Commercial LLMs Be Parliamentary Political Companions? C... evaluation method / scoring approach

Six LLMs were evaluated: GPT-5-mini, GPT-5-chat (OpenAI), Claude Haiku 4.5 (Anthropic), and Llama 4 Maverick, Llama 3.3 70B, Llama 3.1 8B (Meta).

Paper explicitly lists the six evaluated models spanning three provider families and multiple capability tiers.

high null result Can Commercial LLMs Be Parliamentary Political Companions? C... models evaluated

The study uses a dataset of 15 Romanian Senate law proposals paired with their official explanatory memoranda (expuneri de motive).

Explicit statement in the paper describing the dataset composition: 15 Romanian Senate law proposals each paired with its official explanatory memorandum.

high null result Can Commercial LLMs Be Parliamentary Political Companions? C... dataset size / data corpus

We implement a rigorously controlled execution-based testbed featuring Git worktree isolation and explicit global memory to evaluate agent coordination frameworks.

Methodological description in the paper indicating the testbed design choices (Git worktree isolation, explicit global memory) used to ensure controlled, reproducible execution of agent-generated code.

high null result An Empirical Study of Multi-Agent Collaboration for Automate... experimental reproducibility and isolation (testbed design)

We benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs) using a rigorously controlled, execution-based testbed.

Description of experimental setup in the paper: an execution-based testbed with Git worktree isolation and explicit global memory; experiments explicitly compare single-agent, subagent, and agent-team architectures under fixed computational time budgets.

high null result An Empirical Study of Multi-Agent Collaboration for Automate... comparative performance of agent architectures (benchmarking setup)

Data construction: The authors treat Wikipedia technology pages as distinct technologies and trace them across patents and job postings from 1976 to 2007, using technical bigrams to identify technologies in texts.

Description of dataset construction building on Kalyani et al. (2025) in Section 2; methodological description of linking Wikipedia pages, patent text, and job postings.

high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE coverage and method of technology identification in data

Proposition 1: With a constant pace of technology creation (m(b)=m), the model admits a unique balanced growth path (BGP) along which real wages and output grow at rate g, the skill premium remains constant and is independent of m.

Analytical result (proposition) proved in the paper's model appendix under model assumptions.

high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE skill premium dependence on pace parameter m along BGP

The modal technology in the top 1% densest locations (e.g., New York, San Francisco) is 34 years old, while the modal technology in the bottom 50% lowest-density locations is 48 years old, indicating sizable diffusion gaps.

Empirical measurement from the text-based technology dataset tracking vintage of technologies across locations; reported modal ages by location density percentile.

high null result THE SKILL PREMIUM IN TIMES OF RAPID TECHNOLOGICAL CHANGE modal technology age by location density

« Prev 1 2 3 … 75 76 77 … 277 278 Next »