Evidence (13870 claims)
Adoption
8467 claims
Productivity
7558 claims
Governance
6805 claims
Human-AI Collaboration
6363 claims
Org Design
4132 claims
Innovation
4065 claims
Labor Markets
3526 claims
Skills & Training
2945 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 749 | 196 | 98 | 892 | 1984 |
| Governance & Regulation | 817 | 394 | 188 | 121 | 1544 |
| Organizational Efficiency | 771 | 189 | 124 | 83 | 1177 |
| Technology Adoption Rate | 627 | 233 | 123 | 96 | 1088 |
| Research Productivity | 411 | 123 | 56 | 332 | 933 |
| Output Quality | 467 | 178 | 59 | 47 | 751 |
| Decision Quality | 320 | 174 | 75 | 42 | 618 |
| Firm Productivity | 435 | 55 | 88 | 20 | 604 |
| AI Safety & Ethics | 214 | 276 | 65 | 33 | 593 |
| Market Structure | 178 | 167 | 122 | 24 | 496 |
| Task Allocation | 207 | 64 | 71 | 32 | 379 |
| Skill Acquisition | 165 | 59 | 60 | 17 | 301 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 52 | 107 | 13 | 279 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 116 | 63 | 42 | 11 | 232 |
| Firm Revenue | 150 | 48 | 26 | 3 | 227 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Task Completion Time | 169 | 29 | 8 | 12 | 219 |
| Worker Satisfaction | 89 | 63 | 20 | 12 | 184 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 76 | 68 | 14 | 5 | 163 |
| Training Effectiveness | 93 | 21 | 13 | 19 | 148 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Automation Exposure | 51 | 54 | 22 | 12 | 142 |
| Team Performance | 86 | 17 | 27 | 9 | 140 |
| Developer Productivity | 94 | 17 | 14 | 6 | 132 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 51 | 7 | 8 | 3 | 69 |
| Creative Output | 31 | 17 | 7 | 3 | 59 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 17 | 17 | — | 51 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Our findings highlight the importance of additional research and progress on economic measurement related to AI.
Authors' concluding statement/recommendation based on their results and measurement challenges discussed in the paper.
We use tools to indirectly estimate the impact of AI via the lens of BEA’s industry accounts.
Methodological description in the paper: authors apply indirect estimation methods using BEA industry accounts to infer AI's economic impact.
Currently, there is not a line item in the U.S. national accounts that can be used to identify and measure the economic impact of artificial intelligence (AI).
Statement by authors about the state of U.S. national accounts (BEA) and absence of a specific national-accounts line item for AI.
We compare multiple state-of-the-art agents (e.g., GPT-4o, Llama 3, Qwen2) on metrics assessing tool selection accuracy, faithfulness, and hallucination.
Paper lists evaluated models (GPT-4o, Llama 3, Qwen2) and reports evaluation on metrics including tool selection accuracy, faithfulness, and hallucination across the benchmark.
Our benchmark consists of 100 financial questions.
Paper explicitly states the benchmark contains 100 financial questions.
The outreach casenotes used in the study are fairly short and heavily redacted.
Descriptive statement about the dataset of street outreach casenotes provided by the nonprofit partner used in the audit (direct observation by authors).
LLM zero-shot classification does not introduce additional textual biases beyond the algorithmic biases already present in tabular classification.
Authors' assessment/audit comparing zero-shot LLM classification using casenote text against tabular-only classification, concluding no additional textual bias introduced. (Details and sample size not provided in abstract.)
Under three scenarios (optimistic: 2028-2035; base: 2035-2045; pessimistic: 2045-2060), we specify disconfirmation criteria that would weaken the thesis if observed.
Scenario analysis and specification of disconfirmation criteria by the authors; methodological claim about forecasting structure rather than empirical result.
Converging evidence from history, philosophy, neuroscience, technology, organizational studies, and cultural analysis supports this thesis.
Authors' multidisciplinary literature review and synthesis across the named fields (method: qualitative review); no single empirical dataset or sample size given.
We introduce 'instrumental dissolution' -- loss of institutional-default status while persisting in specialist niches.
Conceptual/theoretical contribution defined by the authors and illustrated via cross-disciplinary examples; no empirical validation sample reported.
Typing's dominance was instrumental, not cognitively necessary.
Argumentative/historical analysis presented in the paper; synthesis of historical and philosophical literature (no empirical sample or experiment reported).
We conducted an in-the-wild evaluation with over 2,200 individuals from heterogeneous organisations and roles in 116 countries, via log analysis, surveys, and 20 interviews.
Reported evaluation methods and sample in the paper's abstract: log analysis, surveys, and 20 interviews with over 2,200 participants across 116 countries.
Participants were retested individually on the programming tasks after a retention interval of one week.
Statement in abstract describing follow-up retest procedure (one-week retention interval, individual retest).
Participants were incentivized by bonus compensation to balance performance with understanding.
Paper description of participant incentives in methods/abstract; compensation scheme used during experiment.
We conducted a controlled pair programming study with 22 participants who wrote Python code under time pressure in teams of two and individually with GitHub Copilot for 20 minutes each.
Statement of study design in the paper's methods/abstract; controlled pair programming experiment with 22 participants, 20-minute tasks in both conditions (human teammate and Copilot).
We measure processes of polarization and integration in global AI research over three decades using large-scale scientific publication data.
Methodological claim describing the study: the analysis spans three decades and uses large-scale publication data and network comparisons to randomized baselines.
A stylized calibration to four providers using April 2026 data treats parameter values as inputs to a comparative risk mapping, not structural estimates.
Paper reports a calibration exercise using data from four providers (April 2026) and emphasizes it is a comparative mapping rather than structural estimation.
Discrimination (QoS gap) vanishes at a joint boundary rather than at a simple threshold in alpha alone.
Analytical result from the model characterizing the boundary conditions for non-discrimination.
The framework is evaluated against forecast-driven base-stock and greedy fulfillment heuristics, and against a perfect-information oracle; pairwise differences are examined using Wilcoxon signed-rank tests.
Experimental evaluation setup described in the paper: comparisons to two heuristic baselines and an oracle, and use of Wilcoxon signed-rank tests for pairwise comparisons.
Demand shocks are modeled using two specifications: a mixed profile (half the products follow a uniform demand process and the rest follow a Merton-type jump-diffusion process) and a fully shock-driven profile.
Modeling choices described in the methods: two demand-shock specification setups for simulation experiments.
Policies are learned using Proximal Policy Optimization (PPO) in an actor–critic architecture, with bounded stochastic policies to handle constrained action spaces.
Method description in the paper specifying the use of PPO, actor–critic structure, and bounded stochastic policy parameterization.
The study develops a centralized Hierarchical Reinforcement Learning (HRL) control framework that makes decision timing explicit: replenishment and allocation are optimized weekly, while fulfillment and lateral inventory rebalancing are controlled daily.
Methodological description in the paper: design of an HRL framework with two-level timing (weekly vs daily) for different control decisions.
In both popular and academic press, concerns are often expressed that AI threatens not only people’s livelihoods but also the meaning they derive from their work.
Observational/literature-commentary claim made in the paper's abstract; references to discourse in popular and academic press (no empirical study or sample reported).
Buildings account for approximately 40% of global energy consumption.
Statement in paper (background/contextual fact); likely based on cited external data though no sample size reported in excerpt.
The analysis uses causal discovery methods and integrates scenario-based outcomes, communication analysis, and questionnaire measures.
Paper abstract states that causal discovery analysis was used and that it integrates scenario outcomes, communication analysis, and questionnaire measures.
The study examines user Extraversion and Agreeableness alongside AI design characteristics including Adaptability, Expertise, and chain-of-thought Transparency.
Variables listed in the abstract as the human personality traits and AI design characteristics analyzed.
The study compares two interaction scenario categories: (1) hiring negotiations between human job candidates and AI hiring agents; and (2) human-AI transactions in which AI agents may conceal information to maximize internal goals.
Explicit description of the two scenario categories in the paper abstract; method: experimental / simulation scenarios.
The study includes a parallel human subjects experiment involving 290 human participants.
Statement in paper abstract reporting a human-subjects experiment with 290 participants.
The study uses a purely simulated dataset comprising 2,000 simulations.
Statement in paper abstract describing a simulated dataset of 2,000 simulations; method: simulation experiments.
Algorithmic accuracy alone does not determine value; legitimacy and uptake hinge on people's and process readiness.
Thematic conclusion drawn from interviews, Likert surveys, and document analysis across cases indicating non-technical factors strongly influence uptake despite algorithmic performance metrics. (Sample size not reported.)
The study utilized 3.87 million consumer comments from 127,846 product listings to build and validate models.
Data description reported in paper: 3.87 million consumer comments and 127,846 product listings used.
The study's measurement model is supported by Composite Reliability (CR), Average Variance Extracted (AVE), and several model-fit indicators.
Paper explicitly states CR, AVE, and model-fit indices were used and supported the construct measurements and SEM.
Principal Component Analysis (PCA) identified the main constructs related to adoption of FinTech and perceived algorithmic trust.
Paper reports using PCA to identify constructs underlying adoption and perceived algorithmic trust prior to CFA/SEM.
Structured questionnaires were administered to 400 respondents in both city and rural areas of developing countries.
Method section statement specifying a quantitative research design and that structured questionnaires were sent to 400 respondents.
Molecular representations discussed include string-based methods, topological models, five key categories of Graph Neural Networks (GNNs), 3D-aware Geometric Deep Learning (GDL), emerging Quantum Machine Learning (QML), and Hybrid Quantum-Classical Neural Networks (HQNNs).
Taxonomy and descriptive enumeration of representation classes provided by the review (no empirical comparison or performance claims quantified in the provided text).
The study combines theoretical analysis with quantitative empirical research using survey data from Bosnia and Herzegovina analyzed by regression.
Paper summary states the methodological approach: theoretical analysis plus a quantitative empirical study based on survey data from Bosnia and Herzegovina, analyzed with regression methods. No further methodological details or sample size provided in the summary.
The long-term dynamic effects of AI on resilience remain unverified and require longer-term data.
Authors explicitly state the need for longer time-series data to validate long-term dynamics.
Enterprise-level indicators used in the study do not directly capture supply chain network structure and node dependencies.
Explicit limitation noted by the authors about measurement and scope.
The study's sample is limited to listed manufacturing companies, so conclusions should be applied cautiously to small and medium-sized enterprises (SMEs).
Explicit limitation stated by the authors in the paper.
Mediation and moderation models are leveraged to explore how AI enhances resilience via resource allocation optimization, productivity, and technological innovation, and how conditional factors (e.g., agility) affect these links.
Authors state they used mediation and moderation models on firm-level data to test mechanisms and conditional effects.
The study uses data on A-share listed manufacturing companies from 2011 to 2023 and applies a multi-period difference-in-differences (DID) model to assess AI's impact on SCR.
Methods description provided in the paper summary: sample timeframe and econometric approach explicitly stated.
A randomly sampled coalition of equal size remains largely ineffective at increasing platform spending / wages.
Theoretical comparison in the model between targeted coalitions and randomly sampled coalitions of the same size; analytical results showing limited impact for random coalitions.
We contribute junior–senior accounts on their usage of agentic AI through a three-phase mixed-methods study: ACTA combined with a Delphi process with 5 seniors, an AI-assisted debugging task with 10 juniors, and blind reviews of junior prompt histories by 5 more seniors.
Authors' methodological description of the study design and participant counts as reported in the paper.
The article examines the socioeconomic implications of AI-driven automation through the lens of political economy and labor sociology.
Methodological statement in the paper indicating theoretical framing and disciplinary approaches; no empirical sample reported in the abstract.
In the patent citation network, neither technological diversity nor technological proximity shows a significant impact on main path formation.
Layer-specific ERGM results for the patent citation network reporting non-significant coefficients for variables measuring technological diversity and technological proximity.
We introduce a new benchmark QuantSightBench to assess prediction-interval forecasting capability and evaluate frontier models under multiple settings, assessing both empirical coverage and interval sharpness.
Methodological contribution reported in the paper: creation of QuantSightBench and its use to evaluate models on empirical coverage and sharpness (paper describes benchmark and evaluation procedure; specific task/sample counts not given in excerpt).
Technology-driven recruitment encompasses Applicant Tracking Systems (ATS), AI-powered screening, video-based interviews, gamified assessments, and data analytics.
Conceptual description in the paper's introduction/background defining the scope of 'technology-driven recruitment'.
The study employed a mixed-methods research design combining a quantitative survey of 150 HR professionals and recruiters across manufacturing, IT, banking, and education sectors with qualitative case study analysis of four organizations in Chhatrapati Sambhajinagar.
Explicit methodological statement in the paper: quantitative survey (N=150) across specified sectors + qualitative case studies of 4 organizations in Chhatrapati Sambhajinagar.
The review is a focused qualitative evidence synthesis and the proposed governance model is an evidence-informed conceptual framework that warrants future empirical validation.
Authors' explicit framing of the review approach and caveat calling for empirical validation of the proposed model.
Given the focused Title/Abstract/Keywords query and the small, heterogeneous corpus, the findings are interpreted as a scoped evidence map rather than an exhaustive census of all AI-and-work research.
Authors' explicit limitation statement referencing the search strategy (title/abstract/keywords focus), small number of included studies (n=19), and heterogeneity of studies.