The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8625 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Adoption Remove filter
Differences in perceived stylistic/aesthetic qualities do not translate into higher monetary valuation (i.e., stylistic preference differences do not increase willingness to pay).
BDM bidding behavior of N = 117 participants combined with rating data showing stylistic differences but no corresponding increases in bids.
There is no statistically significant relationship between perceived aesthetic quality and willingness to pay for LLM outputs.
Online experiment with N = 117 participants who evaluated model outputs, rated aesthetic quality, and submitted monetary bids using a Becker-DeGroot-Marschak (BDM) mechanism; statistical tests reported as not significant.
The analysis identifies three major thematic areas: integration of AI in global supply chains; challenges and opportunities associated with AI adoption; and the impact of AI on decision-making and operational efficiency.
Structured synthesis of themes across 31 scholarly sources included in the qualitative literature review.
high null result Evaluating the Role of Artificial Intelligence in Optimizing... thematic areas identified in the literature
The study uses panel data of A-share listed energy-intensive firms from 2009 to 2021; measures corporate digital technology integration by counting frequency of digital-technology-related words in annual reports (text analysis); and evaluates low-carbon transformation using the LTFP method.
Methods and data description provided in the paper's abstract/summary: panel of A-share listed firms in energy-intensive industries (2009–2021); text analysis of annual reports for digital technology integration; LTFP method for low-carbon transformation measurement.
high null result The Impact of Digital Technology Integration on Low-Carbon T... study design and measurement details
This paper focuses on five research questions about the historical pathways, leverage points, trajectory differences, alternative projects, and socio-technical programmes related to current dominant generative AI tools and possible AGI-adjacent development.
Explicit listing of the five research questions in the paper's introduction/aims; statement of scope and focus.
high null result Pathways to AGI research_focus
The study tested Olava Extract against five frontier models.
Method statement in the paper/abstract specifying comparison with five frontier models.
high null result A Few Good Clauses: Comparing LLMs vs Domain-Trained Small L... number of comparator models
Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence.
Statement in paper's introduction / motivation; cites policy trend (UK Online Safety Act and EU Digital Services Act) as motivating context (policy texts referenced in paper).
Prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions.
Conceptual critique presented by the authors; no quantitative validation presented for this claim within the excerpt.
high null result Beyond Task Success: Measuring Workflow Fidelity in LLM-Base... coverage/expressiveness of evaluation metrics (TSR, HF1)
Perceived responsiveness (a functional cue) did not function as a general mediator of anchor type on trust.
Moderated mediation analyses in the randomized experiment (N = 439) found no overall mediation via perceived responsiveness across the full sample.
high null result Conditional trust pathways in live-streaming commerce: how c... trust (mediator: perceived responsiveness)
Data analysis combined quantitative analytics with qualitative sentiment analysis, while environmental impact data was collected through IoT sensors measuring energy consumption, waste generation, and carbon footprint metrics.
Methods description specifying mixed quantitative and qualitative analyses and IoT sensor measures.
high null result AI and Iot-Based Customer Behaviour Analysis for Business En... integrated analytics approach and environmental metrics collection
The authors applied machine-learning models, natural language processing, sentiment scoring, predictive dashboards, and clustering techniques to map customer preferences, purchasing patterns, and green program participation.
Methods description listing analytical techniques used (ML, NLP, sentiment scoring, dashboards, clustering).
high null result AI and Iot-Based Customer Behaviour Analysis for Business En... mapping of customer preferences, purchasing patterns, and program participation
Data collection encompassed retail kiosks, shopping apps, home sensors, and wearables over twelve months.
Methods description in the chapter explicitly listing data sources and a twelve-month collection period.
high null result AI and Iot-Based Customer Behaviour Analysis for Business En... data collection scope and duration
The study employed stratified random sampling across urban shopping centers, suburban retail outlets, and online-to-offline hybrid stores in Nigeria to represent diverse consumer demographics and shopping behaviors.
Methods section description in the chapter stating use of stratified random sampling across specified retail contexts; no numeric sample counts given in the provided text.
high null result AI and Iot-Based Customer Behaviour Analysis for Business En... sampling representativeness / coverage of consumer demographics and shopping beh...
Data analysis utilized regression modeling for performance correlations, time-series analysis for predictive maintenance patterns, and thematic analysis for qualitative interviews.
Paper methods: explicit listing of analytic techniques used (regression, time-series, thematic analysis).
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... analytical methods applied
Secondary data encompasses sustainability reports, carbon footprint assessments, and operational performance metrics.
Paper methods: explicit listing of secondary data sources (sustainability reports, carbon footprint assessments, operational metrics).
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... types of secondary data used
Blockchain transaction records spanning eighteen months across Nigeria were used as primary data.
Paper methods: explicit statement about 18 months of blockchain transaction records across Nigeria.
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... blockchain transaction record timespan
The study uses IoT sensor data from forty-five facilities.
Paper methods: explicit statement that IoT sensor data were collected from 45 facilities.
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... IoT sensor data coverage (facility count)
Primary data collection includes structured interviews with supply chain managers.
Paper methods section: primary data described as including structured interviews with supply chain managers (number of interviewees not specified).
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... qualitative interview data from supply chain managers
The study uses mixed methods involving case studies from twelve multinational companies across the manufacturing, logistics, and retail sectors.
Paper statement of methods: explicit mention of mixed methods and case studies from 12 multinational companies across the three sectors.
high null result Green Supply Chain Optimization: AI and IoT for Ethical Reso... study sample composition (case study count and sectors)
The study constructs a tripartite evolutionary game framework composed of government regulators, leading computing power incumbents, and downstream AI innovators to analyze strategic interactions and derive evolutionarily stable strategies.
Methodological claim documented in the paper describing the model structure and analytic approach (method: formal model specification and ESS derivation).
high null result Evolutionary Dynamics of Openness, Dependence, and Regulatio... model structure (composition and methodological approach)
The analysis uses over 23 million WIOA participation records from 2017–2023.
Statement in the paper about the data coverage: administrative records of WIOA participants totaling >23 million records across 2017–2023.
high null result Did US Worker Retraining Reduce Participant Automation Expos... dataset size / coverage (WIOA participation records 2017–2023)
The paper introduces the 'Retrainability Index' to measure program outcomes using post-intervention wage recovery and shifts in Routine Task Intensity (RTI).
Methodological contribution described in the paper: formulation of a composite index (Retrainability Index) combining wage recovery and occupation RTI change to evaluate WIOA outcomes.
high null result Did US Worker Retraining Reduce Participant Automation Expos... Retrainability Index (composite of wage recovery and RTI shifts)
The study was a randomized trial of 356 clinicians generating 7,476 trust ratings.
Methods/results reported in paper specifying randomized design, N=356 clinicians, total of 7,476 trust ratings collected.
high null result Atomic Fact-Checking Increases Clinician Trust in Large Lang... number of trust ratings collected (trial metadata)
Technologically advanced firms operating in hypercompetitive markets gain little from AI adoption, reflecting diminishing returns from capability saturation.
Cluster-specific results from the multidimensional heterogeneity analysis indicating small or negligible TFP effects for clusters identified as technologically advanced and highly competitive.
high null result The Heterogeneous Effects of Artificial Intelligence on Ente... Total Factor Productivity (TFP) / productivity gains
The study employs a System GMM estimator to address potential endogeneity and uses Fixed Effects (FE) and Random Effects (RE) models for robustness checks.
Methodological statement in the paper describing the econometric approach; verifiable from the methods section (no sample size or instrumentation details provided in the supplied text).
high null result Research on the Transformation Acceleration of Financial Ins... Use of System GMM, FE, and RE estimators (methodological claim)
Prompt-driven generation (even with detailed prompting) fails to address the central problem of architectural complexity management in AI-based software engineering.
Results showing prompting did not prevent code bloat/coupling; conceptual argument reframing the problem toward architecture management rather than prompt engineering.
high null result AI-Generated Smells: An Analysis of Code and Architecture in... effectiveness of prompting on reducing architectural complexity
Neither functional correctness nor detailed prompting mitigates this architectural decay in AI-generated code.
Experimental comparisons reported in the paper where functionally correct outputs and variants produced with more detailed prompting were evaluated for structural quality and showed persistent architectural degradation.
high null result AI-Generated Smells: An Analysis of Code and Architecture in... degree of architectural decay (despite functional correctness and prompt enginee...
Existing literature has extensively examined general AI adoption but limited empirical evidence exists on how more autonomous, agent-like systems contribute to economic outcomes.
Literature review / positioning statement in the introduction of the paper.
high null result The Economic Value of Agentic AI: A Comparative Analysis of ... state of empirical literature on agent-like AI systems
The study uses panel data from the World Bank (World Development Indicators and Enterprise Surveys) and OECD AI indicators for the period 2015 to 2024.
Explicit statement of data sources and time period in the paper's methods section.
high null result The Economic Value of Agentic AI: A Comparative Analysis of ... n/a (data coverage claim)
An AI Adoption Index was constructed using indicators of AI investment, business adoption, and innovation output as a proxy for diffusion of advanced AI capabilities (including agentic features).
Methodological description in the paper: index synthesis from OECD AI indicators and other measures of investment/adoption/innovation; exact index components and weighting described in methods (sample size not applicable).
high null result The Economic Value of Agentic AI: A Comparative Analysis of ... AI adoption/diffusion (index construction)
AI learns from both explicit knowledge (papers, documentation, structured databases) and implicit knowledge (reasoning patterns, debugging processes, intermediate steps).
Stated as a conceptual premise in the position paper; no empirical methods, sample, or quantitative data reported.
high null result Reliable AI Needs to Externalize Implicit Knowledge: A Human... use of explicit vs implicit knowledge by AI
Perceived usability and satisfaction among participants showed little difference across model sizes.
Reported participant-reported measures (usability and satisfaction) compared across model sizes 3B, 8B, and 70B for N=112 participants; paper states little difference across sizes (no numeric statistics provided in the excerpt).
high null result Seeking Information with RAG-Assistants: Does Model Size Mat... usability and satisfaction
We examine the performance of humans (N=112) assisted by RAG-assistants compared to LLM-only or LLM+RAG baselines.
Experimental comparison reported in the paper with N=112 human participants across conditions (human+RAG vs LLM-only vs LLM+RAG baseline conditions).
This work evaluates a chatbot-style assistant based on Retrieval-Augmented Generation (RAG) in a realistic multi-turn information-seeking scenario inspired by workplace settings where compliance with local legislation and secure handling of sensitive data are often key.
Reported experimental setup: a chatbot-style RAG assistant evaluated in a realistic multi-turn information-seeking scenario inspired by workplace settings (method description in the paper).
The medium of exchange of the traditional economy is mainly the fiat currency of each country or region, and when cross-border transactions occur, they need to be settled according to the exchange rate.
Author's descriptive statement based on general observation of monetary systems; no empirical sample or study data provided in the excerpt.
high null result RSDM: The Consensus Honest Money in the AI Era type_of_medium_of_exchange_used_for_transactions
Public inference benchmarks compare AI systems at the model and provider level, but the unit at which deployment decisions are actually made is the endpoint: the (provider, model, stock-keeping-unit) tuple at which a specific quantization, decoding strategy, region, and serving stack is exposed.
Author assertion / methodological observation about how public benchmarks report results versus how deployments are decided; no empirical test reported in the excerpt.
high null result Token Arena: A Continuous Benchmark Unifying Energy and Cogn... granularity of benchmarking vs. deployment decision unit (endpoint = provider, m...
Determining how much value individual data contributions bring to the network remains an open problem.
Literature gap claim in paper (review of existing approaches and statement of open problem; no empirical sample).
high null result Calibrating Attribution Proxies for Reward Allocation in Par... existence of methods to value individual data contributions
The review uses a collection of qualitative and quantitative approaches (i.e., it synthesizes both qualitative and quantitative studies).
Explicit methodological description in the abstract indicating mixed-methods literature synthesis.
high null result A Comprehensive Review of Technology Adoption and Its Impact... review methodology (use of qualitative and quantitative approaches)
A collection of qualitative and quantitative approaches reveals predictors of technological integration, including organisational preparedness, economic factors, policies, and human capital.
Statement about the review's synthesized findings from multiple qualitative and quantitative studies identifying these predictors; method = mixed-methods literature synthesis.
high null result A Comprehensive Review of Technology Adoption and Its Impact... predictors of technological integration
The primary technologies covered in this review are Electronic Health Records (EHR), telemedicine, artificial intelligence (AI), and the Internet of Things (IoT).
Explicit topical scope statement in the paper (description of review subjects); based on the paper's own selection of topics for review.
high null result A Comprehensive Review of Technology Adoption and Its Impact... topics covered (EHR, telemedicine, AI, IoT)
There is little empirical exploration of how professionals making high-stakes decisions perceive their agency and level of control when working with genAI systems.
Statement about a gap in the existing literature made by the authors (literature review / framing); no sample size (gap claim).
high null result Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... availability of empirical research on professionals' perceptions of agency/contr...
We introduce a public benchmark dataset of 11,500 user queries to support our study and future research of generative search.
Authors constructed and released a public benchmark dataset containing 11,500 real-user queries (dataset release described in the paper).
high null result How Generative AI Disrupts Search: An Empirical Study of Goo... dataset size (number of queries)
AI adoption has no detectable effects on overall employment.
Difference-in-differences estimates using administrative employment totals linked to survey-reported adoption show no statistically significant change in total employment.
As of 2024, AI adoption remains limited: about 10 per cent of firms report current use.
Newly collected firm-level survey data linked to administrative balance sheet and employer–employee records; prevalence reported in 2024 survey.
high null result The economic impact of artificial intelligence: evidence fro... current AI adoption rate
The empirical analysis uses panel data from 3,515 Chinese A-share listed firms, totaling 20,076 firm-year observations covering 2014–2022.
Statement of data and sample in the paper (sample frame and time period explicitly given).
high null result A Data-Driven Evaluation Framework for Quantifying the Impac... sample coverage / dataset size
The literature review employs the PRISMA model to screen, identify, and synthesize available literature on AI, Machine Learning and Deep Learning in promoting managerial productivity and task efficiency.
Methodological statement in the paper's abstract (explicitly states use of PRISMA for screening and synthesis).
high null result Artificial intelligence, machine learning, and deep learning... literature search and synthesis method (PRISMA use)
Portfolios were constructed from financial news headlines for S&P 500 equities and benchmarked against mean–variance optimization (MVO), the Black–Litterman model, AI-driven optimizers, and naive diversification strategies.
Methods description: portfolio construction used financial news headlines mapped to S&P 500 equities; benchmarks explicitly listed (MVO, Black–Litterman, AI-driven optimizers, naive diversification).
high null result Few-Shot Portfolio Optimization: Can Large Language Models O... portfolio construction source and benchmarking set
We evaluated seven medium-sized open-source LLMs—Gemma-7B, Mistral-7B, Jansen Adapt-Finance-Llama2-7B, DeepSeek-R1-8B, QuantFactory Llama-3-8B-Instruct-Finance, Qwen-7B, and Llama2-7B.
Direct statement in methods: explicit list of seven evaluated models. Empirical evaluation reported on these models.
high null result Few-Shot Portfolio Optimization: Can Large Language Models O... models evaluated (model set)
The paper introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition.
Methodological contribution: introduction/definition of specific operational metrics as stated in the paper.
high null result AI Inference as Relocatable Electricity Demand: A Latency-Co... definitions/metrics for relocatability, energy and carbon return per latency rel...
The paper formulates a geo-distributed inference placement model with feasibility masks and migration frictions.
Methodological/modeling contribution described in the paper; specifies modeling components (feasibility masks, migration frictions).
high null result AI Inference as Relocatable Electricity Demand: A Latency-Co... modeling of placement feasibility including migration friction effects