Evidence (8625 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Adoption Remove filter

We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow and encodes content solely from intrinsic features.

Model architecture description in paper (design specification; no numeric evaluation included in excerpt).

high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... model architecture behavior (device tower uses message passing; content tower sh...

We formulate cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph.

Methodological framing presented in the paper (problem formulation).

high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... problem formulation (inductive graph-completion on temporal bipartite graph)

In Tubi's production retrieval system, new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval.

Description of production serving constraints in Tubi stated in paper (system design / operational constraint).

high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... serving/operational constraint: immediate standalone content embedding and devic...

In neither unit did internal control mechanisms identify any information-security incident, sensitive-data leakage, or formal compliance challenge from external oversight bodies during the period examined.

Author reports absence of recorded incidents in internal control mechanisms and no external oversight challenges for both units over the study period; based on internal records and SEI-GDF auditable indicators.

high null result The Main Barrier to AI Adoption in the Public Sector is Lack... information-security incidents / sensitive-data leakage / formal compliance chal...

The aggregate Stanford HAI AI Vibrancy Score shows no significant within-country effect on tourism’s direct GDP share after controlling for macroeconomic factors.

Fixed-effects estimation with clustered standard errors on panel data from 33 countries (2017–2023); reported coefficient β = 0.061, p = 0.622, with macroeconomic controls.

high null result Which dimensions of AI development shape tourism’s direct co... tourism’s direct GDP share

The study integrates ICT4D, socio-technical systems theory, and the capability approach as its theoretical framing.

Methodological/theoretical statement in the paper describing the integrative framework used for analysis.

high null result Compressed professionalization in informal economies: a soci... theoretical_integration

While grounded in the DRC, the findings offer broader insights into AI adoption dynamics across informal economies in Sub-Saharan Africa and beyond.

Authors' claim of broader relevance/generalizability based on the DRC case study and theoretical framing.

high null result Compressed professionalization in informal economies: a soci... generalizability of findings to informal economies in Sub-Saharan Africa and bey...

AI adoption in the DRC emerges through hybrid socio-technical interactions between bottom-up youth innovation and weakly coordinated institutional frameworks, rather than following policy-led or infrastructure-first trajectories.

Theoretical integration (ICT4D, socio-technical systems, capability approach) and qualitative interview evidence used to characterize observed adoption pathways.

high null result Compressed professionalization in informal economies: a soci... adoption pathways (hybrid socio-technical, bottom-up)

The article introduces 'compressed professionalization', defined as the accelerated acquisition and immediate market enactment of professional-level digital capabilities outside formal institutional pathways.

Conceptual/theoretical contribution presented and defined in the paper, supported by illustrative field observations from the interviews.

high null result Compressed professionalization in informal economies: a soci... compressed_professionalization (conceptual construct)

The study drew on 125 semi-structured interviews conducted in Kinshasa, Lubumbashi, and Goma.

Primary qualitative fieldwork reported in the paper: 125 semi-structured interviews across three DRC cities (Kinshasa, Lubumbashi, Goma).

high null result Compressed professionalization in informal economies: a soci... number_of_interviews

We scored over 2.1 million twin responses on 500 participants and 183 held-out questions.

Reported evaluation counts in the paper: 2.1M responses, 500 participants, 183 held-out questions.

high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... number of evaluated twin responses / evaluation scale

The construction-method grid covers three open-weight LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes.

Paper's experimental design specification (methods section).

high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... experimental factorization of model types, information depths, embedding methods...

We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a 3 × 5 × 2 × 2 construction-method grid.

Methodological description of the study: experimental construction and evaluation on SOEP data.

high null result Synthetic Personalities: How Well Can LLMs Mimic Individual ... feasibility of constructing and evaluating detailed individual-level twins from ...

There is no evidence of improved win rates for AI-flagged complaints; AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases.

Outcome analysis linking AI-flag status to litigation outcomes (win rates, dismissal rates, termination phase) using case metadata.

high null result The New Pro Se: Generative AI and the Surge in Federal Civil... win rate; dismissal rate; procedural termination phase

The empirical analysis is based on panel data of new energy vehicle firms in the Yangtze River Delta from 2001 to 2023.

Dataset description provided in the paper's abstract/introduction indicating the time span and regional coverage.

high null result Mechanisms and Effects of Artificial Intelligence on New Qua... dataset/time coverage

R&D expenditure does not constitute a significant mediating channel between artificial intelligence and firms' new quality productive forces.

Mediation analysis using the panel data and constructed indicators; reported nonsignificant mediation effect of R&D expenditure (no sample size or statistics reported in excerpt).

high null result Mechanisms and Effects of Artificial Intelligence on New Qua... new quality productive forces (mediating role of R&D expenditure)

The study developed a manufacturing value chain resilience (MVCR) index system based on three dimensions: Readiness, Response, and Recovery, using the CSMAR database.

Methodological description: construction of MVCR index using CSMAR microdata and a three-dimension framework (Readiness, Response, Recovery).

high null result Industrial Robot Application and the Manufacturing Value Cha... manufacturing value chain resilience (MVCR) index

The study constructed indices of industrial robot application at the enterprise-industry-year level by matching industry-level industrial robot data published by the IFR with microdata from Chinese A-share listed companies.

Methodological description in the paper: matching IFR industry-level industrial robot data to microdata from Chinese A-share listed firms to build enterprise-industry-year robot-application indices.

high null result Industrial Robot Application and the Manufacturing Value Cha... index of industrial robot application (enterprise-industry-year)

The study uses listed companies in China's manufacturing industry from 2010 to 2023 as the research sample.

Authors explicitly state the empirical sample: listed manufacturing firms in China covering 2010–2023.

high null result Big data technology application and carbon emission efficien... research sample/time period (data description)

The positive relationship between BDTA and CEE remains robust after a series of robustness tests and endogeneity tests.

Authors state they conducted robustness checks and endogeneity tests (unspecified in the summary) and report that the main regression results remain robust.

high null result Big data technology application and carbon emission efficien... carbon emission efficiency (CEE) (robustness of main effect)

Brain privacy has both personal and social attributes; its protection therefore implicates individual interests and technological development.

Normative/legal argumentation and conceptual analysis presented in the paper (no empirical data reported).

high null result Empowerment or behavioral regulation? governing brain–comput... scope of brain-privacy (personal vs. social) and implicated interests

Greater frontier-level compute does not consistently translate to better performance.

Empirical observation in the paper's findings: increasing compute capacity at the Pareto frontier did not uniformly improve task performance across evaluated tasks.

high null result When Cloud Agents Meet Device Agents: Lessons from Hybrid Mu... task performance as a function of available compute at the frontier

The audit samples 2,000 runs over a design space of 10 personas x 8 prompts x 3 model configurations x N=10 reps, with the two OpenAI cells at full 8-prompt coverage and the Anthropic sonnet-4.6 / low cell at 4-prompt coverage.

Stated audit design and sample counts in paper (method section describing factorial design and coverage of model/prompt cells).

high null result Persona Conditioning of Brand Recommendations in Retrieval-A... audit sample size and experimental design coverage

The paper evaluates the proposed architecture using the outcome metric 'time-to-insight'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... time-to-insight (time required to generate actionable insight from data)

The paper evaluates the proposed architecture using the outcome metric 'time-to-find'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... time-to-find (time required to locate relevant data/products)

The paper evaluates the proposed architecture using the outcome metric 'data product adoption'.

Methodological statement in the paper listing evaluation metrics.

high null result Beyond the Data Mesh Illusion: Designing Modern AI-augmented... data product adoption

We ran 24 matches pairing 23 expert humans with 16 AI agents, capturing 387 delegation and 1440 adoption decisions.

Author-reported experimental setup and counts from the study (24 matches; 23 human experts; 16 AI agents; counts of delegation and adoption decisions).

high null result AI, Take the Wheel: What Drives Delegation and Trust in Huma... delegation and adoption decisions

Because all observations come from a single practitioner, the inferential statistics are exploratory and hypothesis-generating rather than confirmatory; portability across the full portfolio awaits multi-practitioner replication.

Explicit limitation stated in the paper about the single-practitioner design and its implications for inference.

high null result Augment Engineering: A Methodology for Multi-Tool AI Orchest... generalizability/replicability of the findings

The framework is illustrated with an accounts-payable simulation and a companion spreadsheet.

Empirical illustration: the paper includes (or accompanies) an accounts-payable simulation and a spreadsheet to demonstrate the model and estimation approach.

high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... practical illustration of framework through accounts-payable simulation and spre...

The note starts from a compact dashboard expression, expands it into a fuller structural model, defines all variables and parameters, and shows how each cost category can be estimated from operational data.

Methodological description in the paper: construction of dashboard, expansion to structural model, full variable/parameter definitions, and stated procedures for estimating cost categories from operational data; accompanied by worked examples.

high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... methodological capacity to estimate agentic costs from operational data

Agentic Technical Debt is a stock of accumulated design and governance liability.

Definition provided in the paper as part of the conceptual framework that labels Agentic Technical Debt as a stock (accumulated) liability tied to design and governance.

high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... conceptual characterization of Agentic Technical Debt (stock of design and gover...

This note develops a formal and managerially usable model that distinguishes Agentic Technical Debt from Stochastic Tax.

Author states development of a formal, managerially usable model and explicit distinction between the two constructs; supported by model construction in the paper (structural model and dashboard).

high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... ability to distinguish Agentic Technical Debt from Stochastic Tax via a formal m...

Agentic AI systems combine probabilistic reasoning with delegated action through tools, context, memory, orchestration, and external workflow integration.

Conceptual/definitional statement in the paper; presented as the working characterization of 'Agentic AI systems' within the model specification.

high null result Modeling Agentic Technical Debt and Stochastic Tax: A Standa... structural composition of agentic AI systems (probabilistic reasoning + delegate...

The paper proposes a policy framework consisting of six groups of solutions for Vietnam to both promote AI development and control risks in the digital age.

Declared in abstract: the paper presents a six-group policy framework for Vietnam; the framework itself is the paper's output (proposal), not empirically tested in the paper.

high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence of a six-group policy framework aimed at promoting AI development and ...

This study employs document synthesis and comparative analysis of international policies.

Methodological statement in the paper abstract describing the research approach; no sample size specified beyond document sources.

high null result Regulatory Policy for the Agent Economy in the Digital Age: ... research method used (document synthesis and comparative policy analysis)

The rise of artificial intelligence (AI) is shaping a new Agent Economy (AE), in which autonomous AI agents represent humans in performing a wide range of complex tasks.

Statement in paper abstract/intro (conceptual definition); no empirical data or sample reported.

high null result Regulatory Policy for the Agent Economy in the Digital Age: ... existence/definition of Agent Economy (autonomous AI agents representing humans ...

The study contributes a taxonomy of AI workforce impact, a Workforce Resilience Readiness Score (WRRS), an AI Workforce Trust Index (AWTI), an Ethical Automation Boundary concept, and a pilot empirical validation design.

Declared methodological and conceptual contributions in the paper (these are presented as deliverables of the study; no validated results reported in the excerpt).

high null result From Automation Panic to Workforce Resilience: A Governance ... new measurement/conceptual tools (taxonomy, WRRS, AWTI, Ethical Automation Bound...

The International Labour Organization's 2025 update highlights the need to assess the exposure of generative AI at the task level using task data, expert input, and AI model predictions.

Reference to ILO 2025 update recommendation described in the paper (policy/technical guidance rather than primary empirical data in the excerpt).

high null result From Automation Panic to Workforce Resilience: A Governance ... recommended assessment methods for AI exposure (task-level approach)

A path analysis was used to trace structural relationships between HR quality, effectiveness perceptions, and AI readiness.

Paper reports a path analysis linking composite HR quality indices, perceived HR effectiveness, and AI readiness measures; uses same survey sample.

high null result Determinants of Artificial Intelligence Adoption in Public S... AI readiness and perceived HR effectiveness

A binary logistic regression modelling active AI adoption was estimated with McFadden R² = 0.032.

Reported logistic regression model fit (McFadden R² = 0.032) for AI adoption outcome using the survey data.

high null result Determinants of Artificial Intelligence Adoption in Public S... active AI adoption (binary)

An OLS regression was estimated explaining perceived HR effectiveness with R² = 0.446.

Reported OLS model fit statistics in the paper (R-squared = 0.446); model explains perceived HR effectiveness using survey data.

high null result Determinants of Artificial Intelligence Adoption in Public S... perceived HR effectiveness

Constructed and validated a composite index of external HR quality factors with Cronbach's α = 0.959.

Measurement validation reported in the paper; Cronbach's alpha reported for external HR factors.

high null result Determinants of Artificial Intelligence Adoption in Public S... external HR quality index reliability

Constructed and validated a composite index of internal HR quality factors with Cronbach's α = 0.924.

Measurement validation reported in the paper; Cronbach's alpha reported for internal HR factors.

high null result Determinants of Artificial Intelligence Adoption in Public S... internal HR quality index reliability

A large-scale empirical survey of 12,562 public servants was conducted in June 2025 in Kazakhstan.

Statement in paper specifying survey sample and date; sample of public servants N = 12,562, June 2025.

high null result Determinants of Artificial Intelligence Adoption in Public S... AI adoption determinants (survey data collection)

Identification limits prevent a strict causal claim; the paper outlines an agenda for cleaner tests.

Authors' explicit caveat in the abstract noting limits to identification and stating they outline future cleaner tests.

high null result Coding Beyond Your Training: Claude Code and the Technologic... causal identification credibility / limitations

The analysis exploits the staggered rollout of Claude Code across GitHub between May 2025 and January 2026, using a panel of 5,838 developers observed monthly over 28 months, with treatment defined by a developer's first Claude-co-authored commit and not-yet-treated developers as controls, and estimates obtained via the doubly robust Callaway and Sant'Anna (2021) estimator.

Methods and data description as stated in the abstract: staggered rollout timing, sample size (5,838), observation window (28 months), treatment definition (first Claude-co-authored commit), estimator (Callaway & Sant'Anna 2021).

high null result Coding Beyond Your Training: Claude Code and the Technologic... study design / identification strategy

Results are robust to two stricter activity filters.

Robustness checks reported in the paper applying two stricter activity filters to the sample; claim refers to consistency of estimated effects under these alternate sample definitions.

high null result Coding Beyond Your Training: Claude Code and the Technologic... sensitivity/robustness of estimated treatment effects to stricter activity filte...

The analysis is structured across past, present, and future phases using an integrative socio-technical political economy framework and validated secondary sources (OECD, ILO, UNDP, WTO, WEF) alongside official Indian statistics and sector evidence.

Methodological claim stated in abstract describing the approach and data sources used in the paper (OECD, ILO, UNDP, WTO, WEF, MoSPI/NSO, PLFS, HCES, Reuters, Nasscom).

high null result ARTIFICIAL INTELLIGENCE, INEQUALITIES OF KNOWLEDGE AND RESOU... methodological approach and data sources

We analyzed over 1.5M assets and 128K agents in EvoMap.

Descriptive dataset statement in the paper reporting the scope of the empirical analysis (assets and agents counts).

high null result Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent... dataset_size

We conducted a global large-scale randomized field experiment, delivering customized LLM-generated feedback for over 31,000 arXiv preprints across 150 fields and more than 45,000 researchers from 133 geographic regions.

Statement in paper describing experimental design and scale: randomized field experiment; sample described as >31,000 preprints, >45,000 researchers, 150 fields, 133 regions.

high null result Human-AI Collaboration in Science at Scale: A Global Large-s... n/a (description of experimental sample and coverage)

« Prev 1 2 3 … 33 34 35 … 172 173 Next »