The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
Privacy-by-design architectures, secure data interoperability, and compliance automation contribute to trust, institutional legitimacy, and long-term adoption of digital health solutions.
Synthesis of literature on privacy engineering, interoperability standards, and compliance technologies presented in the review (literature review; inferred causal linkages discussed).
high positive Conceptual framework for AI governance, data privacy complia... trust / institutional legitimacy / long-term adoption rate of digital health sol...
The framework gives particular attention to algorithmic transparency, risk management, regulatory alignment, and lifecycle oversight of AI-enabled health systems operating under evolving privacy regulations (e.g., data protection laws and cross-border data governance standards).
Descriptive emphasis within the proposed framework, based on cited literatures in regulatory alignment and algorithmic governance (literature synthesis / conceptual emphasis).
high positive Conceptual framework for AI governance, data privacy complia... regulatory alignment and lifecycle oversight
This review develops a comprehensive conceptual framework that integrates AI governance principles, data privacy compliance mechanisms, and financially sustainable operational models within digital health ecosystems.
The paper's primary contribution is a proposed conceptual framework derived from synthesizing interdisciplinary literatures (conceptual framework produced by authors based on literature review).
high positive Conceptual framework for AI governance, data privacy complia... existence of an integrated governance/privacy/finance framework
The rapid expansion of digital health technologies driven by artificial intelligence has transformed healthcare delivery, clinical decision-making, and health data management.
Narrative synthesis in the review paper drawing on interdisciplinary literature in health informatics, clinical AI studies, and health data management (literature review / conceptual synthesis).
high positive Conceptual framework for AI governance, data privacy complia... clinical decision-making quality / healthcare delivery and data management
We present a simulation study analyzing the social benefits of applying ARS to agentic transactions.
Simulation study reported in the paper (study exists; abstract does not report simulation parameters, sample size, or quantitative results).
high positive Quantifying Trust: Financial Risk Management for Trustworthy... social benefits of applying ARS as assessed by simulation
This shifts trust from an implicit expectation about model behavior to an explicit, measurable, and enforceable product guarantee.
Conceptual claim about the expected effect of adopting ARS (argument presented by authors; no empirical substantiation in the abstract).
high positive Quantifying Trust: Financial Risk Management for Trustworthy... nature of trust (implicit expectation vs explicit/enforceable guarantee) in agen...
Under ARS, users receive predefined and contractually enforceable compensation in cases of execution failure, misalignment, or unintended outcomes.
Functional guarantee described as part of ARS design (contractual/payment mechanism described; no empirical testing detailed in the abstract).
high positive Quantifying Trust: Financial Risk Management for Trustworthy... predefined, contractually enforceable compensation for users upon execution fail...
ARS integrates risk assessment, underwriting, and compensation into a single transaction framework that protects users when interacting with agents.
Design description of ARS in the paper (architectural/design claim; no empirical validation reported in the abstract).
high positive Quantifying Trust: Financial Risk Management for Trustworthy... user protection in agent interactions via integrated risk assessment, underwriti...
We propose a complementary framework based on risk management: the Agentic Risk Standard (ARS), a payment settlement standard for AI-mediated transactions.
Framework proposal described in the paper (design/proposal; implementation referenced).
high positive Quantifying Trust: Financial Risk Management for Trustworthy... existence of the ARS framework (payment settlement standard integrating risk man...
Security evaluation across 135 test cases demonstrates 87.5% accuracy on static code safety analysis with zero false positives.
Security evaluation reported in paper across 135 test cases with reported accuracy and false positive rate.
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... static code safety analysis accuracy and false positive rate
Security evaluation across 135 test cases demonstrates 96.7% accuracy on prompt injection detection.
Security evaluation reported in paper across 135 test cases with reported accuracy metric.
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... prompt injection detection accuracy
On document intelligence (DocILE), Code Factory achieves the highest line item recognition accuracy (LIR: 80.4%).
Empirical evaluation reported on DocILE dataset of 5,680 invoices; LIR metric reported at 80.4% and described as the highest among compared variants.
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... line item recognition accuracy (LIR)
Compiled AI reduces token consumption by 57x at 1,000 transactions.
Empirical token-consumption comparison reported in paper (scaling example at 1,000 transactions).
Compiled AI breaks even with runtime inference at approximately 17 transactions.
Cost/efficiency comparison reported in evaluation (function-calling context); break-even point stated in paper.
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... cost trade-off / break-even transaction count
On function-calling, compiled AI achieves 96% task completion with zero execution tokens.
Empirical evaluation on the BFCL function-calling tasks (reported n=400).
We introduce a system architecture for constrained LLM-based code generation, a four-stage generation-and-validation pipeline that converts probabilistic model output into production-ready code artifacts, and an evaluation framework measuring operational metrics including token amortization, determinism, reliability, security, and cost.
Paper states these three contributions as part of the authors' work (descriptive claim about methods and artifacts presented).
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... availability of system architecture, pipeline, and evaluation framework (methodo...
By constraining generation to narrow business-logic functions embedded in validated templates, compiled AI trades runtime flexibility for predictability, auditability, cost efficiency, and reduced security exposure.
Conceptual/systems claim made in paper describing design trade-offs of the compiled AI paradigm (no single empirical test cited in the excerpt).
high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... predictability, auditability, cost efficiency, security exposure (design trade-o...
Experimental evidence confirms that AI tools raise worker productivity.
Statement in paper referencing experimental studies (no specific study, method, or sample size reported in the excerpt).
A lightweight interception layer captures and blocks only the final submission request, ensuring safe evaluation without real-world side effects.
Paper describes an interception layer in the evaluation infrastructure that prevents actual final submissions on production sites.
high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? evaluation_safety (prevention of real-world side effects)
Unlike existing benchmarks that evaluate agents in offline sandboxes with static pages, ClawBench operates on production websites, preserving the full complexity, dynamic nature, and challenges of real-world web interaction.
Methodological description in the paper: evaluation occurs on live (production) websites rather than offline static sandboxes; supported by reported coverage of 144 live platforms.
high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? evaluation_realism / fidelity to real-world interactions
The tasks in ClawBench require demanding capabilities beyond existing benchmarks, such as extracting relevant information from user-provided documents, navigating multi-step workflows across diverse platforms, and completing write-heavy operations like filling many detailed forms correctly.
Paper description of task types and the capabilities they require; based on the design and composition of the 153 tasks.
high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? task_complexity / capability_requirements
ClawBench spans 144 live platforms across 15 categories.
Paper explicitly reports coverage across 144 production websites and 15 task categories (dataset description).
high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? benchmark_scope (platforms and categories)
ClawBench is an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work.
Paper states the benchmark comprises 153 tasks (dataset description).
high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? benchmark_scope (number of tasks)
When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.
Normative conclusion in the abstract based on the paper's proposed framework and discussion; presented as an overall benefit but not supported by empirical outcomes or quantified gains in the excerpt.
high positive Large Language Models: An Applied Econometric Framework expansion of empirical economics research capabilities
For estimation problems—automating the measurement of economic concepts for downstream analysis—valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates.
Methodological claim in the abstract advocating use of a small validation sample together with LLM outputs to achieve consistent/precise estimates; no empirical demonstration or sample-size specification provided in the excerpt.
high positive Large Language Models: An Applied Econometric Framework consistency and precision of downstream estimates derived from LLM-measured vari...
The paper provides an econometric framework for realizing the potential of LLMs in two empirical uses: prediction problems and estimation problems.
Claim of contribution in the abstract describing a methodological framework (the excerpt reports the existence of the framework but does not detail empirical validation or sample sizes).
high positive Large Language Models: An Applied Econometric Framework methodological framework for empirical use of LLMs
Researchers can now revisit old questions and tackle novel ones with rich data using LLMs.
Asserted in the paper's abstract as a consequence of LLM-enabled large-scale text analysis; no empirical demonstration or quantified case described in the excerpt.
high positive Large Language Models: An Applied Econometric Framework ability to (re)address research questions using textual data
Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost.
Stated as an assertion in the paper's abstract/summary; based on the authors' framing of LLM capabilities (no empirical sample, experiment, or quantified result provided in the excerpt).
high positive Large Language Models: An Applied Econometric Framework ability to analyze text at scale and cost
There is an urgent need for targeted workforce planning, investment in human capital, and collaboration between industry, government, and educational institutions to manage AI-driven labour market transformations.
Policy conclusion drawn from the paper's theoretical framing (SBTC, Human Capital Theory) and the empirical patterns identified in secondary data and official reports (2020–2024).
high positive Artificial Intelligence and labour market polarisation in In... policy interventions for workforce planning and reskilling
Comparative insights from the United Kingdom show that more systematic AI adoption and structured training programs mitigate workforce displacement.
Cross-country comparison using secondary data and official reports (2020–2024) highlighting the UK's more systematic AI adoption and structured training, which the paper presents as reducing displacement risk.
high positive Artificial Intelligence and labour market polarisation in In... mitigation of workforce displacement via structured training/AI adoption strateg...
AI adoption is increasing demand for new competencies.
Secondary sources and official reports (2020–2024) cited in the paper document emerging skill requirements and employer demand for new competencies.
high positive Artificial Intelligence and labour market polarisation in In... demand for new skills/competencies
AI adoption is driving growth in high-wage occupations.
Analysis of secondary data and official reports (2020–2024) reporting expansion of high-wage occupational categories in India.
high positive Artificial Intelligence and labour market polarisation in In... occupational growth in high-wage jobs
AI adoption disproportionately benefits high-skilled workers.
The paper cites theoretical frameworks (Skill Biased Technological Change and Human Capital Theory) and analyses of secondary data and official reports from 2020–2024 showing relative gains for high-skill occupations.
high positive Artificial Intelligence and labour market polarisation in In... wages and employment of high-skilled workers
All data, code, and model responses are open-sourced.
Statement in the paper asserting that data, code, and model outputs are publicly released.
high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... availability of study materials (data, code, responses)
78.7% of observed AI interactions are augmentation, not automation.
Empirical classification of AI interactions (from cross-referenced Anthropic Economic Index interactions/tasks) reported as a percentage in the paper.
high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... share of AI interactions classified as augmentation vs automation
The study cross-references the SAFI benchmark with real-world AI adoption data from the Anthropic Economic Index covering 756 occupations and 17,998 tasks.
Data linkage described in the paper: use of Anthropic Economic Index as real-world AI adoption dataset (numbers reported in text).
high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... occupations and tasks coverage in cross-reference dataset
The benchmark covers 263 text-based tasks spanning all 35 skills in the U.S. Department of Labor's O*NET taxonomy.
Reported dataset construction in the paper: 263 tasks mapped to 35 O*NET skills.
high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... coverage of O*NET skills by benchmark tasks
We present the Skill Automation Feasibility Index (SAFI), benchmarking four frontier LLMs -- LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash -- across 263 text-based tasks spanning all 35 skills in the U.S. Department of Labor's O*NET taxonomy (1,052 total model calls, 0% failure rate).
Empirical benchmark executed by the authors: 263 text-based tasks mapped to 35 O*NET skills, 4 LLMs, 1,052 total model calls reported, and reported 0% failure rate.
high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... benchmark coverage and execution success (model calls and failure rate)
The paper argues for a fundamental decoupling of semantic intent from human-readable representation.
Conceptual/design claim made by the authors as a recommended shift in representation strategy for agentic consumers; presented as argumentation rather than empirically tested in abstract.
high positive Beyond Human-Readable: Rethinking Software Engineering Conve... alignment between semantic intent encoding and human-readable formats
We extend the semantic density principle to propose rehabilitation of classical anti-patterns and introduce the program skeleton concept for agentic code navigation.
Design/position claims and proposed constructs presented in the paper (program skeleton concept and re-evaluation of anti-patterns) without empirical validation reported in abstract.
high positive Beyond Human-Readable: Rethinking Software Engineering Conve... suitability of classical anti-patterns and program skeletons for agentic navigat...
Aggressive compression reduced input tokens by 17%.
Reported numeric result from the controlled experiment comparing compressed logs to other conditions; sample size not specified in abstract.
We propose a key design principle: semantic density optimization, eliminating tokens that carry zero information while preserving tokens that carry high semantic value.
Proposal/design principle presented in the paper; theoretical justification provided and (per paper) subsequently validated by experiment.
high positive Beyond Human-Readable: Rethinking Software Engineering Conve... information/content efficiency of token representations for agentic consumers
These empirical findings provide reference for global governments to optimise artificial intelligence policies for low-carbon urban development.
Paper conclusion interpreting results as policy-relevant and generalisable lessons for governments; based on observed positive association between NAIDPZ and urban GEE.
high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)
The impact of the NAIDPZ policy on urban GEE is positively moderated by government attention and public environmental attention.
Reported moderation analysis showing interaction effects between the treatment indicator and measures of government attention and public environmental attention within the DiD framework.
high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)
The composite NAIDPZ policy effect increases GEE mainly through promoting green technological innovation and optimising industrial structure.
Mechanism analysis reported in the paper (channel/mediation tests) showing that indicators of green technological innovation and industrial structure optimisation account for much of the policy effect on GEE.
high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)
The policy effect on GEE is stronger in inland cities, central-region cities, and non-resource-based cities.
Reported heterogeneity/subgroup analysis within the staggered DiD framework comparing effects across geographic regions (inland vs. others, central vs. others) and city types (non-resource-based vs. resource-based) in the 267-city sample.
high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)
The NAIDPZ policy significantly improves urban green economic efficiency (GEE).
Estimated treatment effect from staggered DiD on the 267-city panel (2007–2023) with reported statistical significance and multiple robustness checks mentioned.
high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)
ImplicitMemBench reframes evaluation from 'what agents recall' to 'what they automatically enact'.
Paper framing statement positioning the benchmark's conceptual contribution as shifting evaluation focus to implicit, automatic behavior rather than explicit recall.
high positive ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... evaluation framing / measurement focus
Top performers were DeepSeek-R1 (65.3%), Qwen3-32B (64.1%), and GPT-5 (63.0%).
Paper lists top model names with reported overall percentage scores from the benchmark evaluation.
high positive ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... overall accuracy on the implicit memory benchmark
The benchmark's 300-item suite employs a unified Learning/Priming-Interfere-Test protocol with first-attempt scoring.
Paper states the suite size (300 items) and describes a unified Learning/Priming-Interfere-Test protocol and that scoring is done on first attempts.