Evidence (8625 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
Differences in perceived stylistic/aesthetic qualities do not translate into higher monetary valuation (i.e., stylistic preference differences do not increase willingness to pay).
BDM bidding behavior of N = 117 participants combined with rating data showing stylistic differences but no corresponding increases in bids.
There is no statistically significant relationship between perceived aesthetic quality and willingness to pay for LLM outputs.
Online experiment with N = 117 participants who evaluated model outputs, rated aesthetic quality, and submitted monetary bids using a Becker-DeGroot-Marschak (BDM) mechanism; statistical tests reported as not significant.
The analysis identifies three major thematic areas: integration of AI in global supply chains; challenges and opportunities associated with AI adoption; and the impact of AI on decision-making and operational efficiency.
Structured synthesis of themes across 31 scholarly sources included in the qualitative literature review.
The study uses panel data of A-share listed energy-intensive firms from 2009 to 2021; measures corporate digital technology integration by counting frequency of digital-technology-related words in annual reports (text analysis); and evaluates low-carbon transformation using the LTFP method.
Methods and data description provided in the paper's abstract/summary: panel of A-share listed firms in energy-intensive industries (2009–2021); text analysis of annual reports for digital technology integration; LTFP method for low-carbon transformation measurement.
This paper focuses on five research questions about the historical pathways, leverage points, trajectory differences, alternative projects, and socio-technical programmes related to current dominant generative AI tools and possible AGI-adjacent development.
Explicit listing of the five research questions in the paper's introduction/aims; statement of scope and focus.
The study tested Olava Extract against five frontier models.
Method statement in the paper/abstract specifying comparison with five frontier models.
Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence.
Statement in paper's introduction / motivation; cites policy trend (UK Online Safety Act and EU Digital Services Act) as motivating context (policy texts referenced in paper).
Prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions.
Conceptual critique presented by the authors; no quantitative validation presented for this claim within the excerpt.
Perceived responsiveness (a functional cue) did not function as a general mediator of anchor type on trust.
Moderated mediation analyses in the randomized experiment (N = 439) found no overall mediation via perceived responsiveness across the full sample.
Data analysis combined quantitative analytics with qualitative sentiment analysis, while environmental impact data was collected through IoT sensors measuring energy consumption, waste generation, and carbon footprint metrics.
Methods description specifying mixed quantitative and qualitative analyses and IoT sensor measures.
The authors applied machine-learning models, natural language processing, sentiment scoring, predictive dashboards, and clustering techniques to map customer preferences, purchasing patterns, and green program participation.
Methods description listing analytical techniques used (ML, NLP, sentiment scoring, dashboards, clustering).
Data collection encompassed retail kiosks, shopping apps, home sensors, and wearables over twelve months.
Methods description in the chapter explicitly listing data sources and a twelve-month collection period.
The study employed stratified random sampling across urban shopping centers, suburban retail outlets, and online-to-offline hybrid stores in Nigeria to represent diverse consumer demographics and shopping behaviors.
Methods section description in the chapter stating use of stratified random sampling across specified retail contexts; no numeric sample counts given in the provided text.
Data analysis utilized regression modeling for performance correlations, time-series analysis for predictive maintenance patterns, and thematic analysis for qualitative interviews.
Paper methods: explicit listing of analytic techniques used (regression, time-series, thematic analysis).
Secondary data encompasses sustainability reports, carbon footprint assessments, and operational performance metrics.
Paper methods: explicit listing of secondary data sources (sustainability reports, carbon footprint assessments, operational metrics).
Blockchain transaction records spanning eighteen months across Nigeria were used as primary data.
Paper methods: explicit statement about 18 months of blockchain transaction records across Nigeria.
The study uses IoT sensor data from forty-five facilities.
Paper methods: explicit statement that IoT sensor data were collected from 45 facilities.
Primary data collection includes structured interviews with supply chain managers.
Paper methods section: primary data described as including structured interviews with supply chain managers (number of interviewees not specified).
The study uses mixed methods involving case studies from twelve multinational companies across the manufacturing, logistics, and retail sectors.
Paper statement of methods: explicit mention of mixed methods and case studies from 12 multinational companies across the three sectors.
The study constructs a tripartite evolutionary game framework composed of government regulators, leading computing power incumbents, and downstream AI innovators to analyze strategic interactions and derive evolutionarily stable strategies.
Methodological claim documented in the paper describing the model structure and analytic approach (method: formal model specification and ESS derivation).
The analysis uses over 23 million WIOA participation records from 2017–2023.
Statement in the paper about the data coverage: administrative records of WIOA participants totaling >23 million records across 2017–2023.
The paper introduces the 'Retrainability Index' to measure program outcomes using post-intervention wage recovery and shifts in Routine Task Intensity (RTI).
Methodological contribution described in the paper: formulation of a composite index (Retrainability Index) combining wage recovery and occupation RTI change to evaluate WIOA outcomes.
The study was a randomized trial of 356 clinicians generating 7,476 trust ratings.
Methods/results reported in paper specifying randomized design, N=356 clinicians, total of 7,476 trust ratings collected.
Technologically advanced firms operating in hypercompetitive markets gain little from AI adoption, reflecting diminishing returns from capability saturation.
Cluster-specific results from the multidimensional heterogeneity analysis indicating small or negligible TFP effects for clusters identified as technologically advanced and highly competitive.
The study employs a System GMM estimator to address potential endogeneity and uses Fixed Effects (FE) and Random Effects (RE) models for robustness checks.
Methodological statement in the paper describing the econometric approach; verifiable from the methods section (no sample size or instrumentation details provided in the supplied text).
Prompt-driven generation (even with detailed prompting) fails to address the central problem of architectural complexity management in AI-based software engineering.
Results showing prompting did not prevent code bloat/coupling; conceptual argument reframing the problem toward architecture management rather than prompt engineering.
Neither functional correctness nor detailed prompting mitigates this architectural decay in AI-generated code.
Experimental comparisons reported in the paper where functionally correct outputs and variants produced with more detailed prompting were evaluated for structural quality and showed persistent architectural degradation.
Existing literature has extensively examined general AI adoption but limited empirical evidence exists on how more autonomous, agent-like systems contribute to economic outcomes.
Literature review / positioning statement in the introduction of the paper.
The study uses panel data from the World Bank (World Development Indicators and Enterprise Surveys) and OECD AI indicators for the period 2015 to 2024.
Explicit statement of data sources and time period in the paper's methods section.
An AI Adoption Index was constructed using indicators of AI investment, business adoption, and innovation output as a proxy for diffusion of advanced AI capabilities (including agentic features).
Methodological description in the paper: index synthesis from OECD AI indicators and other measures of investment/adoption/innovation; exact index components and weighting described in methods (sample size not applicable).
AI learns from both explicit knowledge (papers, documentation, structured databases) and implicit knowledge (reasoning patterns, debugging processes, intermediate steps).
Stated as a conceptual premise in the position paper; no empirical methods, sample, or quantitative data reported.
Perceived usability and satisfaction among participants showed little difference across model sizes.
Reported participant-reported measures (usability and satisfaction) compared across model sizes 3B, 8B, and 70B for N=112 participants; paper states little difference across sizes (no numeric statistics provided in the excerpt).
We examine the performance of humans (N=112) assisted by RAG-assistants compared to LLM-only or LLM+RAG baselines.
Experimental comparison reported in the paper with N=112 human participants across conditions (human+RAG vs LLM-only vs LLM+RAG baseline conditions).
This work evaluates a chatbot-style assistant based on Retrieval-Augmented Generation (RAG) in a realistic multi-turn information-seeking scenario inspired by workplace settings where compliance with local legislation and secure handling of sensitive data are often key.
Reported experimental setup: a chatbot-style RAG assistant evaluated in a realistic multi-turn information-seeking scenario inspired by workplace settings (method description in the paper).
The medium of exchange of the traditional economy is mainly the fiat currency of each country or region, and when cross-border transactions occur, they need to be settled according to the exchange rate.
Author's descriptive statement based on general observation of monetary systems; no empirical sample or study data provided in the excerpt.
Public inference benchmarks compare AI systems at the model and provider level, but the unit at which deployment decisions are actually made is the endpoint: the (provider, model, stock-keeping-unit) tuple at which a specific quantization, decoding strategy, region, and serving stack is exposed.
Author assertion / methodological observation about how public benchmarks report results versus how deployments are decided; no empirical test reported in the excerpt.
Determining how much value individual data contributions bring to the network remains an open problem.
Literature gap claim in paper (review of existing approaches and statement of open problem; no empirical sample).
The review uses a collection of qualitative and quantitative approaches (i.e., it synthesizes both qualitative and quantitative studies).
Explicit methodological description in the abstract indicating mixed-methods literature synthesis.
A collection of qualitative and quantitative approaches reveals predictors of technological integration, including organisational preparedness, economic factors, policies, and human capital.
Statement about the review's synthesized findings from multiple qualitative and quantitative studies identifying these predictors; method = mixed-methods literature synthesis.
The primary technologies covered in this review are Electronic Health Records (EHR), telemedicine, artificial intelligence (AI), and the Internet of Things (IoT).
Explicit topical scope statement in the paper (description of review subjects); based on the paper's own selection of topics for review.
There is little empirical exploration of how professionals making high-stakes decisions perceive their agency and level of control when working with genAI systems.
Statement about a gap in the existing literature made by the authors (literature review / framing); no sample size (gap claim).
We introduce a public benchmark dataset of 11,500 user queries to support our study and future research of generative search.
Authors constructed and released a public benchmark dataset containing 11,500 real-user queries (dataset release described in the paper).
AI adoption has no detectable effects on overall employment.
Difference-in-differences estimates using administrative employment totals linked to survey-reported adoption show no statistically significant change in total employment.
As of 2024, AI adoption remains limited: about 10 per cent of firms report current use.
Newly collected firm-level survey data linked to administrative balance sheet and employer–employee records; prevalence reported in 2024 survey.
The empirical analysis uses panel data from 3,515 Chinese A-share listed firms, totaling 20,076 firm-year observations covering 2014–2022.
Statement of data and sample in the paper (sample frame and time period explicitly given).
The literature review employs the PRISMA model to screen, identify, and synthesize available literature on AI, Machine Learning and Deep Learning in promoting managerial productivity and task efficiency.
Methodological statement in the paper's abstract (explicitly states use of PRISMA for screening and synthesis).
Portfolios were constructed from financial news headlines for S&P 500 equities and benchmarked against mean–variance optimization (MVO), the Black–Litterman model, AI-driven optimizers, and naive diversification strategies.
Methods description: portfolio construction used financial news headlines mapped to S&P 500 equities; benchmarks explicitly listed (MVO, Black–Litterman, AI-driven optimizers, naive diversification).
We evaluated seven medium-sized open-source LLMs—Gemma-7B, Mistral-7B, Jansen Adapt-Finance-Llama2-7B, DeepSeek-R1-8B, QuantFactory Llama-3-8B-Instruct-Finance, Qwen-7B, and Llama2-7B.
Direct statement in methods: explicit list of seven evaluated models. Empirical evaluation reported on these models.
The paper introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition.
Methodological contribution: introduction/definition of specific operational metrics as stated in the paper.
The paper formulates a geo-distributed inference placement model with feasibility masks and migration frictions.
Methodological/modeling contribution described in the paper; specifies modeling components (feasibility masks, migration frictions).