Evidence (8625 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
SAFI measures LLM performance on text-based representations of skills, not full occupational execution.
Methodological caveat stated by the authors clarifying the scope and limits of SAFI.
We propose an AI Impact Matrix that positions skills into four quadrants: High Displacement Risk, Upskilling Required, AI-Augmented, and Lower Displacement Risk.
Conceptual/interpretive framework introduced by the authors; described in text as proposed by the paper.
Using a strictly algorithmic baseline (mathematical bottleneck aggregation), we calculate Relative Occupational Automation Indices (OAI) for the U.S. labor market based on the DWA-level scores.
Method and calculation claim: algorithmic baseline aggregation applied across the 923 occupations / 2,087 DWAs to produce OAIs mapped to the U.S. labor market. Specific aggregation formula referenced but not numerically detailed in the excerpt.
We deconstructed 923 occupations into 2,087 Detailed Work Activities (DWAs).
Explicit data processing claim in the paper: mapping of 923 occupations to 2,087 DWAs for analysis.
The economic model for IASCA follows the FDA's PDUFA precedent, with progressive certification fees representing 0.1-1% of model training costs.
Proposal specifies that IASCA's funding would mirror the FDA PDUFA model and states a fee range of 0.1–1% of model training costs; this is an asserted financing mechanism, not empirically validated in the excerpt.
IASCA is modelled after existing international and national regulatory bodies such as the IAEA, FAA, and FDA.
Proposal explicitly states IASCA is modelled after the IAEA, FAA, and FDA; this is an analogy/organizational design claim rather than an empirical finding.
A life insurance system integrated into an industry partner mobile app was tested in two experiments.
Paper reports two experiments running the ARQuest-enabled life insurance system inside a partner mobile app; experimental setup is stated though sample sizes are not provided in the excerpt.
BCR is a minimalist, single-stage training paradigm that trains the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy.
Methodological description presented in the paper describing the training procedure and objective (single-stage, per-instance accuracy reward, N-problem batching in shared context).
The framework is calibrated with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, and implemented in computer vision.
Calibration and empirical implementation using O*NET, a domain expert survey (n=3,778), and GPT-4o task decompositions; applied to computer vision tasks.
We introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level.
New metric proposed in the paper (entropy-based task complexity) and mapping procedure from accuracy to substitution ratio; implemented in the framework.
Traffic performance is evaluated using the Fundamental Diagram (FD) under varying driver heterogeneity, heterogeneous time-gap penetration levels, and different shares of RL-controlled vehicles.
Description of experimental/evaluation setup in the paper: macroscopic evaluation via Fundamental Diagram across varied scenario parameters. No numeric sample size provided in the claim text.
CriQ is a sister app to Dream11, India's largest fantasy sports platform with over 250 million users.
Descriptive statement in the paper providing context about the application domain and user base.
In the near term, the most plausible equilibrium is bounded autonomy, in which AI agents operate as supervised co-pilots, monitoring systems, and constrained execution modules embedded within human decision processes.
Theoretical argument and forward-looking assessment by the authors based on the proposed framework and plausibility considerations; not presented as the result of a causal empirical study in the excerpt.
Economic evaluations of GLAI should account for end-to-end risk externalities (error propagation, institutional trust, rights impacts), not only short-term productivity gains.
Methodological recommendation grounded in conceptual synthesis of technical, behavioral, and legal risks; normative argument rather than empirical result.
Generative Legal AI (GLAI) systems are built on token-prediction (LLM) architectures rather than formal legal-reasoning architectures.
Conceptual and technical analysis in the paper distinguishing GLAI from other legal-tech; literature synthesis on common LLM architectures. No original empirical dataset or sample size—qualitative/technical review.
The paper's formalism shows that prompt/system messages shape distributions over possible execution paths (indirect control) but do not evaluate actual partial paths at runtime.
Formal mapping in the paper that treats prompts as shaping prior over paths; conceptual argument and illustrative examples.
Through a thematic review of existing research, the authors identified recurring themes about incentive schemes: their components, how researchers manipulate them, and their impact on research outcomes.
Authors' stated method and findings: thematic review (the scope/number of reviewed papers not specified in excerpt).
A critical aspect of conducting human–AI decision-making studies is the role of participants, often recruited through crowdsourcing platforms.
Claim based on the authors' thematic literature review noting participant sourcing practices (specific studies and counts not given in excerpt).
Researchers conduct empirical studies investigating how humans use AI assistance for decision-making and how this collaboration impacts results.
Statement summarizing the research landscape; supported implicitly by the authors' thematic review of existing empirical studies (number of studies not specified in excerpt).
The study provides empirical evidence specific to a small open EU economy (Slovakia) on the relationship between AI adoption and labour productivity.
Use of harmonised Eurostat enterprise and productivity data for Slovakia and EU27 over 2021–2024, analysed with descriptive statistics, gap analysis, dynamics of change, correlation, and an illustrative regression model.
Returns to AI are heterogeneous across firms; estimating treatment effects requires attention to selection, complementarities, and dynamic adoption pipelines.
Methodological argument referencing treatment-effect literature and observed firm heterogeneity; supported by conceptual examples rather than a single empirical treatment-effect estimate.
The study uses a qualitative, mixed-methods design combining a systematic literature review, secondary evidence from an industry MRO digital survey, five semi-structured expert interviews, and two technical case studies (neural networks for aircraft retirement and an AI-based digital twin for a Power Electronics Cooling System).
Methods description provided in the paper (explicit counts: 5 interviews, 2 case studies); method = author-reported study design.
After screening, 35 studies were included in the thematic synthesis and supplemented by official regulatory and industry documents.
Review screening result reported in the paper: number of included studies = 35; supplementation by regulatory and industry documents stated.
A structured search protocol was designed for Scopus, Web of Science, PubMed, IEEE Xplore, and Google Scholar covering January 2016 to May 2026, English-language records only.
Methods statement in the review describing the databases, date range, and language restriction used for the systematic search.
The implementation literature on AI for pharmacy inventory and pharmaceutical supply chains remains dispersed across pharmacy operations, operations research, health informatics, and supply chain analytics.
The review's thematic synthesis of the searched literature (review methods described below) identified studies across these disciplinary areas.
Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap
Claim of artifact availability hosted on GitHub (URL provided) as part of the paper's resources.
Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability.
Factual claim referencing existing standards (MCP and A2A) and their scopes; no citations or supporting documentation included in the provided excerpt.
Production deployments are no longer one human supervising one model; they are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries.
Stated as a general characterization of modern production deployments; no quantitative data or case counts provided in the excerpt.
The six middle macros form a low-contrast band between the poles; equivalence testing (TOST at d = 0.2) admits only 1 out of 15 macro-pair comparisons as equivalent.
Authors' analysis of pairwise macro comparisons using Two One-Sided Tests (TOST) for equivalence at Cohen's d = 0.2.
We decomposed 1,961 O*NET Detailed Work Activities (DWAs) into 15,817 micro-actions using a multi-agent LLM pipeline with 31-expert human-in-the-loop (HITL) calibration.
Empirical method reported by the authors: automated multi-agent LLM pipeline plus 31-expert HITL calibration producing the stated counts (1,961 DWAs -> 15,817 micro-actions).
Empirical research since Frey and Osborne (2017) has converged on a continuous-gradient representation in which each occupation is assigned a real-valued exposure score on [0,1] obtained by linear aggregation across capability dimensions.
Literature synthesis / statement in the paper referencing Frey and Osborne (2017) and subsequent empirical work using continuous exposure scores.
The findings provide empirical insights for managing employee wellbeing and refining human resource strategies during organizational digital transformation.
Authors' stated implications in the discussion, based on the reported empirical associations and moderation results from the survey of 411 employees.
The study draws on the Conservation of Resources Theory and the Cognitive Appraisal Theory of Stress to explain how AI application influences employees' job insecurity via resource gain and resource threat mechanisms.
Theoretical framing stated in the introduction and discussion explaining the mechanisms (resource gain vs. resource threat) underlying the observed U-shaped association.
Data were collected via mixed online and offline questionnaires: 453 questionnaires were distributed (242 online, 211 offline); 449 were returned (242 online, 207 offline); following validity screening, 411 valid questionnaires were retained (219 online, 192 offline), yielding an effective response rate of 90.73%.
Reported survey administration and response counts provided in the methods section of the paper.
The paper proposes a five-pillar diagnostic framework combining fundamental valuation, residual-exuberance tests, SADF/GSADF explosive-root procedures, LPPL/HLPPL price-pattern diagnostics, sentiment and issuance measures, and capex-payback analysis.
Methodological proposal presented in the paper (framework description); this is a stated contribution rather than an empirical result.
From Codeforces histories we build an AI-prompt signature characterised by more first-attempt acceptances and fewer attempts and retries, consistent with AI-assisted practice.
Empirical construction from CF submission histories (pattern: increased first-try accepts, fewer retries). Method: analysis of historical submission logs; sample size not stated in abstract.
The International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all.
Descriptive factual claim about contest rules and formats (institutional description in paper); based on contest rules and organizational formats referenced by authors.
We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels.
Paper's stated evaluation methodology: operator feedback + production question set, graded by humans and automated panels.
There is a need to examine the impacts of LLM on workers in jobs where the technology is prominent.
Recommendation in the paper's conclusion based on the observed concentration of LLM exposure in lower-precarity occupations.
These occupations (those with higher LLM exposure and lower precariousness) have previously been sheltered from technological change.
Statement in the paper's conclusion asserting that occupations with higher LLM exposure are ones historically sheltered from technological change (no specific empirical evidence provided in abstract).
The study used Canada's Labour Force Survey, developed a multidimensional index summarizing occupational exposure to precarity (contractual instability, earnings inadequacy, schedule unpredictability, working-time mismatch), and estimated associations using four multivariate linear regression models with cluster-robust standard errors plus a fifth model for the multidimensional index.
Methods description in abstract specifying data source (Canada's Labour Force Survey), index construction, and multivariate linear regression models with cluster-robust standard errors.
This study benchmarks Algeria’s readiness to adopt AI against Morocco, Egypt, and Turkey using data from the World Bank (2022), the Oxford Insights Government AI Readiness Index, and sector-specific studies.
Methodological statement in the paper specifying data sources used for the comparative assessment (World Bank 2022, Oxford Insights index, sector studies).
The article aims to provide systematic literature support for subsequent research and adaptive policy formulation.
Statement of the paper's stated objective; methodological and policy-intent claim from the authors.
This article is based on a systematic literature review and summarizes the four core theoretical mechanisms of substitution, complementarity, new task creation, and skill mismatch.
Methodological claim from the paper: the authors conducted a systematic literature review and identified these four theoretical mechanisms.
Traditional software and agentic systems are distinct: in traditional software code is the carrier of decision logic, whereas in agentic systems code is ephemeral tooling used by an LLM-driven reasoning loop.
Formalization and conceptual definitions developed in the paper (first-principles formal distinction; no empirical sample size reported).
For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve.
Historical/descriptive claim presented in the paper's framing and literature review; citation of longstanding software engineering practices (qualitative, no empirical sample size reported).
We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2).
Implementation description in paper: architecture design and pipeline stages described by the authors.
The study introduces a methodological framework for evaluating LLM citation behaviors, integrating information retrieval theory, semantic search optimization, and structured content engineering.
Explicit claim about the paper's contribution: introduction of a methodological framework combining IR theory, semantic search, and structured content engineering. This is a factual statement about the paper's content (no sample size reported in excerpt).
Traditional SEO strategies have historically focused on keyword density, backlink authority, and ranking positions within search engine results pages (SERPs).
Descriptive claim about historical SEO practices presented as background/context in the paper; based on domain knowledge and literature references (no new empirical data reported in the excerpt).
We extend the representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features.
Methodological extension described in paper (approach for device cold-start handled via cohort-based demographic embeddings).