Evidence (8625 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	761	200	101	904	2020
Governance & Regulation	829	400	191	122	1566
Organizational Efficiency	784	193	125	84	1197
Technology Adoption Rate	637	236	124	97	1103
Research Productivity	431	131	58	340	972
Output Quality	481	183	59	47	770
Decision Quality	332	177	82	49	647
Firm Productivity	439	57	88	20	610
AI Safety & Ethics	218	279	66	33	602
Market Structure	181	170	123	24	503
Task Allocation	214	64	72	33	388
Skill Acquisition	174	62	62	17	315
Innovation Output	204	27	45	18	295
Employment Level	105	54	108	13	282
Fiscal & Macroeconomic	132	69	43	26	277
Consumer Welfare	117	63	42	11	233
Firm Revenue	154	48	26	3	231
Task Completion Time	173	31	8	12	225
Inequality Measures	44	123	50	6	223
Worker Satisfaction	89	65	22	12	188
Error Rate	71	92	10	2	175
Regulatory Compliance	77	69	14	5	165
Automation Exposure	58	56	26	13	156
Training Effectiveness	96	21	14	19	152
Wages & Compensation	77	37	25	6	145
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	81	21	1	115
Hiring & Recruitment	52	7	8	3	70
Creative Output	32	20	8	3	64
Skill Obsolescence	5	47	6	1	59
Social Protection	28	16	8	2	54
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Adoption Remove filter

SAFI measures LLM performance on text-based representations of skills, not full occupational execution.

Methodological caveat stated by the authors clarifying the scope and limits of SAFI.

high neutral The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... scope of SAFI measure (text-based representations vs full job execution)

We propose an AI Impact Matrix that positions skills into four quadrants: High Displacement Risk, Upskilling Required, AI-Augmented, and Lower Displacement Risk.

Conceptual/interpretive framework introduced by the authors; described in text as proposed by the paper.

high neutral The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... interpretive classification of skills into four impact quadrants

Using a strictly algorithmic baseline (mathematical bottleneck aggregation), we calculate Relative Occupational Automation Indices (OAI) for the U.S. labor market based on the DWA-level scores.

Method and calculation claim: algorithmic baseline aggregation applied across the 923 occupations / 2,087 DWAs to produce OAIs mapped to the U.S. labor market. Specific aggregation formula referenced but not numerically detailed in the excerpt.

high neutral Bounded by Risk, Not Capability: Quantifying AI Occupational... Relative Occupational Automation Index (OAI)

We deconstructed 923 occupations into 2,087 Detailed Work Activities (DWAs).

Explicit data processing claim in the paper: mapping of 923 occupations to 2,087 DWAs for analysis.

high neutral Bounded by Risk, Not Capability: Quantifying AI Occupational... coverage of occupations and DWAs used for analysis

The economic model for IASCA follows the FDA's PDUFA precedent, with progressive certification fees representing 0.1-1% of model training costs.

Proposal specifies that IASCA's funding would mirror the FDA PDUFA model and states a fee range of 0.1–1% of model training costs; this is an asserted financing mechanism, not empirically validated in the excerpt.

high neutral IASCA: The International AI Safety Certification Authority —... progressive certification fees equal to 0.1-1% of model training costs

IASCA is modelled after existing international and national regulatory bodies such as the IAEA, FAA, and FDA.

Proposal explicitly states IASCA is modelled after the IAEA, FAA, and FDA; this is an analogy/organizational design claim rather than an empirical finding.

high neutral IASCA: The International AI Safety Certification Authority —... institutional design modeled on IAEA/FAA/FDA

A life insurance system integrated into an industry partner mobile app was tested in two experiments.

Paper reports two experiments running the ARQuest-enabled life insurance system inside a partner mobile app; experimental setup is stated though sample sizes are not provided in the excerpt.

high neutral AI in Insurance: Adaptive Questionnaires for Improved Risk P... experimental evaluation of system in partner app

BCR is a minimalist, single-stage training paradigm that trains the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy.

Methodological description presented in the paper describing the training procedure and objective (single-stage, per-instance accuracy reward, N-problem batching in shared context).

high neutral Batched Contextual Reinforcement: A Task-Scaling Law for Eff... training paradigm characteristics (simplicity, stage count, reward structure)

The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision.

Calibration and empirical implementation using O*NET, a domain expert survey (n=3,778), and GPT-4o task decompositions; applied to computer vision tasks.

high neutral Economics of Human and AI Collaboration: When is Partial Aut... validity of calibration / empirical grounding of the framework

We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level.

New metric proposed in the paper (entropy-based task complexity) and mapping procedure from accuracy to substitution ratio; implemented in the framework.

high neutral Economics of Human and AI Collaboration: When is Partial Aut... labor substitution ratio (human labor displaced per unit accuracy)

Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles.

Description of experimental/evaluation setup in the paper: macroscopic evaluation via Fundamental Diagram across varied scenario parameters. No numeric sample size provided in the claim text.

high neutral Macroscopic Characteristics of Mixed Traffic Flow with Deep ... traffic performance (via Fundamental Diagram) under varied heterogeneity and RL ...

CriQ is a sister app to Dream11, India's largest fantasy sports platform with over 250 million users.

Descriptive statement in the paper providing context about the application domain and user base.

high neutral Schema on the Inside: A Two-Phase Fine-Tuning Method for Hig... user base size

In the near term, the most plausible equilibrium is bounded autonomy, in which AI agents operate as supervised co-pilots, monitoring systems, and constrained execution modules embedded within human decision processes.

Theoretical argument and forward-looking assessment by the authors based on the proposed framework and plausibility considerations; not presented as the result of a causal empirical study in the excerpt.

high neutral AI Agents in Financial Markets: Architecture, Applications, ... expected equilibrium mode of AI agent autonomy in finance (bounded autonomy / su...

Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.

Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.

high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... comprehensiveness of economic evaluations (inclusion of externalities vs. narrow...

Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.

Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.

high neutral Why Avoid Generative Legal AI Systems? Hallucination, Overre... underlying model architecture type (token-prediction vs. formal-reasoning)

The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.

Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.

high neutral Runtime Governance for AI Agents: Policies on Paths degree of control over execution path (distributional shaping vs. path-specific ...

Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.

Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).

high neutral Incentive-Tuning: Understanding and Designing Incentives for... themes in incentive design practices and reported impacts on empirical study out...

A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.

Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).

high neutral Incentive-Tuning: Understanding and Designing Incentives for... participant recruitment source (e.g., crowdsourcing) and its influence on study ...

Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.

Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).

high neutral Incentive-Tuning: Understanding and Designing Incentives for... human behaviour and decision outcomes when assisted by AI (empirical study outco...

The study provides empirical evidence specific to a small open EU economy (Slovakia) on the relationship between AI adoption and labour productivity.

Use of harmonised Eurostat enterprise and productivity data for Slovakia and EU27 over 2021–2024, analysed with descriptive statistics, gap analysis, dynamics of change, correlation, and an illustrative regression model.

high neutral Artificial Intelligence Adoption and Labour Productivity in ... Empirical characterization of AI adoption and labour productivity relationship f...

Returns to AI are heterogeneous across firms; estimating treatment effects requires attention to selection, complementarities, and dynamic adoption pipelines.

Methodological argument referencing treatment-effect literature and observed firm heterogeneity; supported by conceptual examples rather than a single empirical treatment-effect estimate.

high neutral Modern Management in the Age of Artificial Intelligence: Str... heterogeneity in returns to AI adoption (firm-level productivity or performance ...

The study uses a qualitative, mixed-methods design combining a systematic literature review, secondary evidence from an industry MRO digital survey, five semi-structured expert interviews, and two technical case studies (neural networks for aircraft retirement and an AI-based digital twin for a Power Electronics Cooling System).

Methods description provided in the paper (explicit counts: 5 interviews, 2 case studies); method = author-reported study design.

high null result Aviation 4.0: the impacts of digital transformation on the a... study design and methods employed

After screening, 35 studies were included in the thematic synthesis and supplemented by official regulatory and industry documents.

Review screening result reported in the paper: number of included studies = 35; supplementation by regulatory and industry documents stated.

high null result Artificial Intelligence-Driven Optimization in Pharmacy Inve... number of included studies and supplementary documents

A structured search protocol was designed for Scopus, Web of Science, PubMed, IEEE Xplore, and Google Scholar covering January 2016 to May 2026, English-language records only.

Methods statement in the review describing the databases, date range, and language restriction used for the systematic search.

high null result Artificial Intelligence-Driven Optimization in Pharmacy Inve... search protocol (databases, date range, language)

The implementation literature on AI for pharmacy inventory and pharmaceutical supply chains remains dispersed across pharmacy operations, operations research, health informatics, and supply chain analytics.

The review's thematic synthesis of the searched literature (review methods described below) identified studies across these disciplinary areas.

high null result Artificial Intelligence-Driven Optimization in Pharmacy Inve... disciplinary distribution of implementation literature

Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap

Claim of artifact availability hosted on GitHub (URL provided) as part of the paper's resources.

high null result Collaborative Human-Agent Protocol (CHAP) availability of specification and accompanying artifacts

Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability.

Factual claim referencing existing standards (MCP and A2A) and their scopes; no citations or supporting documentation included in the provided excerpt.

high null result Collaborative Human-Agent Protocol (CHAP) scope of existing protocol standards

Production deployments are no longer one human supervising one model; they are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries.

Stated as a general characterization of modern production deployments; no quantitative data or case counts provided in the excerpt.

high null result Collaborative Human-Agent Protocol (CHAP) structure of production deployments (multi-human, multi-agent)

The six middle macros form a low-contrast band between the poles; equivalence testing (TOST at d = 0.2) admits only 1 out of 15 macro-pair comparisons as equivalent.

Authors' analysis of pairwise macro comparisons using Two One-Sided Tests (TOST) for equivalence at Cohen's d = 0.2.

high null result Stable Geometry, Reversing Poles: The Bipolar Structure of A... pairwise equivalence among middle macros (TOST results)

We decomposed 1,961 O*NET Detailed Work Activities (DWAs) into 15,817 micro-actions using a multi-agent LLM pipeline with 31-expert human-in-the-loop (HITL) calibration.

Empirical method reported by the authors: automated multi-agent LLM pipeline plus 31-expert HITL calibration producing the stated counts (1,961 DWAs -> 15,817 micro-actions).

high null result Stable Geometry, Reversing Poles: The Bipolar Structure of A... task decomposition (DWAs to micro-actions)

Empirical research since Frey and Osborne (2017) has converged on a continuous-gradient representation in which each occupation is assigned a real-valued exposure score on [0,1] obtained by linear aggregation across capability dimensions.

Literature synthesis / statement in the paper referencing Frey and Osborne (2017) and subsequent empirical work using continuous exposure scores.

high null result Stable Geometry, Reversing Poles: The Bipolar Structure of A... use of continuous-gradient occupational exposure scores (OAI-style representatio...

The findings provide empirical insights for managing employee wellbeing and refining human resource strategies during organizational digital transformation.

Authors' stated implications in the discussion, based on the reported empirical associations and moderation results from the survey of 411 employees.

high null result The impact of artificial intelligence application on employe... managerial implications for employee wellbeing and HR strategies

The study draws on the Conservation of Resources Theory and the Cognitive Appraisal Theory of Stress to explain how AI application influences employees' job insecurity via resource gain and resource threat mechanisms.

Theoretical framing stated in the introduction and discussion explaining the mechanisms (resource gain vs. resource threat) underlying the observed U-shaped association.

high null result The impact of artificial intelligence application on employe... theoretical explanation of mechanisms behind job insecurity

Data were collected via mixed online and offline questionnaires: 453 questionnaires were distributed (242 online, 211 offline); 449 were returned (242 online, 207 offline); following validity screening, 411 valid questionnaires were retained (219 online, 192 offline), yielding an effective response rate of 90.73%.

Reported survey administration and response counts provided in the methods section of the paper.

high null result The impact of artificial intelligence application on employe... survey response / valid sample size / response rate

The paper proposes a five-pillar diagnostic framework combining fundamental valuation, residual-exuberance tests, SADF/GSADF explosive-root procedures, LPPL/HLPPL price-pattern diagnostics, sentiment and issuance measures, and capex-payback analysis.

Methodological proposal presented in the paper (framework description); this is a stated contribution rather than an empirical result.

high null result Boom, Bubble, or Buildout? A Multi-Method Evaluation of Whet... diagnostic framework components for bubble assessment

From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice.

Empirical construction from CF submission histories (pattern: increased first-try accepts, fewer retries). Method: analysis of historical submission logs; sample size not stated in abstract.

high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... submission patterns (first-attempt acceptances, attempts, retries)

The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all.

Descriptive factual claim about contest rules and formats (institutional description in paper); based on contest rules and organizational formats referenced by authors.

high null result When the Scaffold Stays On: AI, Practice Style, and Screenin... institutional design (proctoring and entry requirements)

We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels.

Paper's stated evaluation methodology: operator feedback + production question set, graded by humans and automated panels.

high null result Archi: Agentic Operations at the CMS Experiment evaluation methodology (feedback and graded question set)

There is a need to examine the impacts of LLM on workers in jobs where the technology is prominent.

Recommendation in the paper's conclusion based on the observed concentration of LLM exposure in lower-precarity occupations.

high null result Large language model exposure and precarious occupations: Un... research/policy need (recommendation)

These occupations (those with higher LLM exposure and lower precariousness) have previously been sheltered from technological change.

Statement in the paper's conclusion asserting that occupations with higher LLM exposure are ones historically sheltered from technological change (no specific empirical evidence provided in abstract).

high null result Large language model exposure and precarious occupations: Un... historical exposure to technological change (asserted)

The study used Canada's Labour Force Survey, developed a multidimensional index summarizing occupational exposure to precarity (contractual instability, earnings inadequacy, schedule unpredictability, working-time mismatch), and estimated associations using four multivariate linear regression models with cluster-robust standard errors plus a fifth model for the multidimensional index.

Methods description in abstract specifying data source (Canada's Labour Force Survey), index construction, and multivariate linear regression models with cluster-robust standard errors.

high null result Large language model exposure and precarious occupations: Un... multidimensional precarity index / methodological approach

This study benchmarks Algeria’s readiness to adopt AI against Morocco, Egypt, and Turkey using data from the World Bank (2022), the Oxford Insights Government AI Readiness Index, and sector-specific studies.

Methodological statement in the paper specifying data sources used for the comparative assessment (World Bank 2022, Oxford Insights index, sector studies).

high null result Artificial Intelligence and Economic Productivity: A Compara... AI readiness / readiness indicators

The article aims to provide systematic literature support for subsequent research and adaptive policy formulation.

Statement of the paper's stated objective; methodological and policy-intent claim from the authors.

high null result Influence of Artificial Intelligence in the Labor Market policy formulation support

This article is based on a systematic literature review and summarizes the four core theoretical mechanisms of substitution, complementarity, new task creation, and skill mismatch.

Methodological claim from the paper: the authors conducted a systematic literature review and identified these four theoretical mechanisms.

high null result Influence of Artificial Intelligence in the Labor Market theoretical mechanisms

Traditional software and agentic systems are distinct: in traditional software code is the carrier of decision logic, whereas in agentic systems code is ephemeral tooling used by an LLM-driven reasoning loop.

Formalization and conceptual definitions developed in the paper (first-principles formal distinction; no empirical sample size reported).

high null result The End of Software Engineering: How AI Agents Are Fundament... architectural role of code (carrier of logic vs ephemeral tool)

For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve.

Historical/descriptive claim presented in the paper's framing and literature review; citation of longstanding software engineering practices (qualitative, no empirical sample size reported).

high null result The End of Software Engineering: How AI Agents Are Fundament... software development practice (human-driven decomposition and static code mainte...

We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2).

Implementation description in paper: architecture design and pipeline stages described by the authors.

high null result Leveraging LLMs for Unstructured Claims Data Analysis system architecture (document-level vs claim-level processing)

The study introduces a methodological framework for evaluating LLM citation behaviors, integrating information retrieval theory, semantic search optimization, and structured content engineering.

Explicit claim about the paper's contribution: introduction of a methodological framework combining IR theory, semantic search, and structured content engineering. This is a factual statement about the paper's content (no sample size reported in excerpt).

high null result SEARCH ENGINE OPTIMIZATION: HOW LLM-GENERATED SUMMARIES ARE ... methodological framework for evaluating LLM citation behaviors

Traditional SEO strategies have historically focused on keyword density, backlink authority, and ranking positions within search engine results pages (SERPs).

Descriptive claim about historical SEO practices presented as background/context in the paper; based on domain knowledge and literature references (no new empirical data reported in the excerpt).

high null result SEARCH ENGINE OPTIMIZATION: HOW LLM-GENERATED SUMMARIES ARE ... features of historical SEO strategies (keyword density, backlink authority, SERP...

We extend the representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features.

Methodological extension described in paper (approach for device cold-start handled via cohort-based demographic embeddings).

high null result Bridging the Semantic-Collaborative Gap: An Asymmetric Graph... device cold-start embedding construction (cohort-based demographics)

« Prev 1 2 3 … 32 33 34 … 172 173 Next »