The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (7448 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Results are robust across the authors' reported robustness checks.
Author statement that multiple robustness checks were performed and the main findings persist (the summary does not enumerate the checks or report their outcomes).
low null result Is digital trade affecting city house prices? An artificial ... city-level house prices
Observable firm-level and economy-wide moments—changes in spans of control, manager share of payroll, incidence of new tasks, employment growth, and shifts in the wage distribution—can be used to test the model's predictions.
Model-implied empirical identification strategy and suggested measurable moments in the paper's discussion/implications section (theoretical prediction, not an empirical test).
low null result AI as Coordination-Compressing Capital: Task Reallocation, O... empirical testable moments (spans of control, manager payroll share, new-task in...
This study is the first systematic presentation of factual data describing employment outcomes of Russian university AI graduates.
Authors' stated novelty claim in the paper (asserted uniqueness of systematic institutional-level employment outcome data for Russian AI graduates).
low null result Employment og Graduates of Educational Programs in the Field... Novelty / uniqueness of compiled institutional-level dataset on employment outco...
Signal legitimacy was validated through negative control experiments.
Experimentation claim: the paper asserts that negative control experiments were run to validate that signals are not due to memorized ticker associations. The excerpt does not specify the design, number, or results of these negative controls.
low positive Can Blindfolded LLMs Still Trade? An Anonymization-First Fra... legitimacy of predictive signals (i.e., whether performance persists under negat...
The PIER architecture (physics-informed state construction, demonstration-augmented offline data, decoupled post‑hoc safety shield) transfers to wildfire evacuation, aircraft trajectory optimization, and autonomous navigation in unmapped terrain.
Claim of transferability stated in the paper; the excerpt does not include experimental details or quantitative results for these domains.
low positive Physics-informed offline reinforcement learning eliminates c... transferability of the PIER architecture to other domains (qualitative claim)
Pidgin should not be treated as 'broken English' but as necessary linguistic infrastructure for repaired, sustainable development; failures often reflect language-sovereignty crises requiring political solutions.
Normative claim supported by mixed-methods findings on comprehension, adoption, and legitimacy, and Critical Discourse Analysis of institutional language hierarchies.
low positive From Linguistic Hybridity to Development Sovereignty: Pidgin... normative assessment of language status and policy implication (not a quantitati...
The paper advances a new conceptual framework called 'Developmental Sociolinguistics' and formalizes Three Laws of Linguistic Justice (Epistemic Access, Discursive Parity, Sovereignty), operationalized via a proposed 'Pidgin Protocol' for decolonized development practice.
Conceptual/theoretical contribution based on synthesis of field results and literature; proposal of framework and laws as normative prescriptions rather than empirically tested policy interventions.
low positive From Linguistic Hybridity to Development Sovereignty: Pidgin... theoretical/conceptual contribution (framework and protocol)
Expect rising demand and wage premia for managers with hybrid capabilities (systems thinking + computational literacy), with a risk of widening returns to managerial skill heterogeneity.
Theoretical implication from predicted complementarities and task reallocation; prescriptive economic inference without empirical labor-market evidence in the paper.
low positive Comparative analysis of strategic vs. computational thinking... labor demand, wage premia, and distributional widening across managerial skill t...
Managers’ time will be reallocated toward hybrid tasks (interpretation, oversight, ethical deliberation), increasing returns to combined strategic and computational skills.
Predictive inference from the role reconfiguration analysis and task-complementarity argument; forward-looking theoretical forecast (no empirical time-use data).
low positive Comparative analysis of strategic vs. computational thinking... managerial time allocation (share devoted to hybrid tasks) and returns/wage prem...
Standards for provenance, labeling of AI-generated content, and interoperable evidence formats would lower verification costs and create beneficial network effects.
Policy recommendation derived from identified verification frictions and the study's analysis of data/model governance needs.
low positive Fact-Checking Platforms in the Middle East: A Comparative St... verification cost and interoperability/network effects
There is growing market demand for AI-assisted fact-checking tools, creating opportunities for software, monitoring services, and labeled datasets.
Analytic implication drawn from findings about increasing AI use and needs for automation/labeling; based on interviews and market inference in the study.
low positive Fact-Checking Platforms in the Middle East: A Comparative St... market demand for AI tools and labeled datasets
Regulators should consider guidelines on AI monitoring, algorithmic fairness in performance evaluations, and protections to prevent hybrid‑induced career penalties.
Policy recommendation based on conceptual assessment of risks identified in literature synthesis; not an empirical claim—no policy evaluation data provided.
low positive The Sociology of Remote Work and Organisational Culture: How... existence/applicability of regulatory guidelines; protections against career pen...
Hybrid agency implies complementarity between GenAI and managerial/knowledge‑worker skills (curation, evaluation, coordination), potentially increasing returns to those skills while automating routine cognitive tasks—consistent with skill‑biased technological change.
Synthesis of recurring themes linking GenAI capabilities with managerial skill topics in the thematic clusters; positioned as an implication for labour demand and skill composition rather than an empirically tested effect.
low positive Generative AI and the algorithmic workplace: a bibliometric ... expected changes in returns to managerial/knowledge‑worker skills and automation...
Policy prescriptions for developing countries to mitigate these vulnerabilities include: diversify supply sources, invest in local human capital and mid-stream capabilities, create legal/regulatory flexibility to navigate competing standards, and pursue regional cooperation to build bargaining leverage.
Policy analysis and recommendations grounded in the mechanisms identified via process tracing and comparative cases; intended as prescriptive synthesis rather than empirically demonstrated interventions in the paper. (Based on inferred best-practice interventions; no empirical evaluation/sample size provided.)
low positive China-US Trade War and the Challenges for Developing Countri... effectiveness of policy measures (e.g., diversification index, human-capital ind...
There is demand for tooling that bridges evaluation outputs to actionable fixes (e.g., failure-mode libraries, standardized remediation templates, evaluation-to-priority mapping), signaling economic opportunities for third-party tools and consulting services.
Authors' inference based on the documented results-actionability gap and participants' descriptions of pain points; presented as a market implication rather than direct market measurement.
low positive Results-Actionability Gap: Understanding How Practitioners E... inferred market demand for evaluation-to-action tooling/services
Firms that invest in instrumentation, cross-functional processes, and remediation levers capture more value from LLMs; organizations with better evaluation-to-action pipelines will obtain higher productivity gains and market edge.
Authors' inference from observed heterogeneity among teams in the interviews and comparison of practices in teams that reported more success converting evaluations into changes.
low positive Results-Actionability Gap: Understanding How Practitioners E... relative productivity/value capture tied to evaluation-to-action capability (inf...
Public investments in standards, verification infrastructure, and public-interest datasets can correct market failures and support trustworthy AI.
Policy recommendation informed by governance and public-good theory and examples from the literature; the claim is prescriptive and not validated by new empirical evidence within the paper.
low positive The Evolution and Societal Impact of Artificial Intelligence... trustworthiness of AI systems and correction of market failures via public inves...
Policy instruments (law and markets) should be designed to remain institutionally and procedurally responsive to ethical claims that resist full codification (e.g., through participatory governance, oversight mechanisms, equitable redress, care-centered procurement standards).
Normative policy prescriptions derived from the Levinasian diagnosis and case illustrations; proposed measures are normative and not empirically evaluated within the paper.
low positive Examining ethical challenges in human–robot interaction usin... responsiveness of policy and market instruments to non-codifiable ethical claims...
Integrating Object-Oriented Ontology (OOO) and the material turn enables attention to nonhuman actors and assemblages without collapsing them into human-centered instrumentalism.
Theoretical synthesis of OOO/material-turn literature and argument that this synthesis offers analytic resources for socio-technical assemblages; illustrated conceptually in domains.
low positive Examining ethical challenges in human–robot interaction usin... conceptual adequacy of analytic lens for nonhuman actors and assemblages (qualit...
Structured errors (SERF) enable automated recovery, reducing human-in-the-loop remediation and the marginal cost of scaling agent fleets.
Reasoned implication from the design of SERF; proposed as an expected operational benefit rather than demonstrated quantitative result in the summary.
low positive Bridging Protocol and Production: Design Patterns for Deploy... human remediation hours per incident; MTTR; automated recovery success rate
Adaptive budgeting (ATBA) can reduce wasted latency and cost by optimizing timeouts and retries across tool chains, improving throughput and reducing per-interaction resource spend.
Algorithmic claim supported by theoretical framing and proposed reproducible benchmarks; no concrete field-level cost/throughput numbers provided in the summary.
low positive Bridging Protocol and Production: Design Patterns for Deploy... per-interaction latency/cost, throughput, retry rates under ATBA vs. baseline
Improved identity propagation (via CABP) reduces risk and compliance costs by lowering misattributed actions and improving audit trails, thereby reducing expected liability and incident-resolution overhead.
Analytical / economic argument in the implications section; no reported quantitative field results in the summary to directly measure cost reduction.
low positive Bridging Protocol and Production: Design Patterns for Deploy... incidence of misattributed actions; audit trail completeness; incident-resolutio...
Humans who configure and teach agents gain understanding and skills themselves — learning-by-teaching generates human capital accumulation endogenous to agent deployment (bidirectional scaffolding).
Qualitative, naturalistic observations and comparative documentation of users configuring/teaching agents during the one-month study; no randomized assignment or pre/post quantitative skill testing reported.
low positive When Openclaw Agents Learn from Each Other: Insights from Em... human skill accumulation / understanding from configuring/teaching agents
By lowering single-GPU resource requirements and improving throughput, SlideFormer can democratize domain adaptation and fine-tuning of large models on commodity single-GPU hardware (reducing the need for multi-GPU clusters).
Argumentative implication based on reported throughput, memory, and capacity improvements (e.g., enabling 123B+ models on a single RTX 4090 and reducing memory usage). This is an extrapolation from experimental results rather than a directly measured socio-economic outcome.
low positive An Efficient Heterogeneous Co-Design for Fine-Tuning on a Si... accessibility / feasibility of single-GPU fine-tuning (qualitative economic impl...
Models trained primarily on negative constraints will generalize constraint adherence more robustly under distribution shift than models trained primarily on preference rankings.
Presented as a central, experimentally falsifiable prediction derived from the paper's theoretical account; the paper does not present large-scale empirical confirmation and recommends controlled experiments to test this.
low positive Via Negativa for AI Alignment: Why Negative Constraints Are ... robustness of constraint adherence under distribution shift (e.g., adherence rat...
Negative examples function as counterfactual eliminators that rule out regions of behavior space, allowing a model to settle on robust acceptable behavior, whereas positive preference signals require continual calibration in a high-dimensional, context-sensitive space.
Informal/structural theoretical argument and analogy to falsification presented in the paper; no direct empirical test reported there demonstrating this exact mechanism.
low positive Via Negativa for AI Alignment: Why Negative Constraints Are ... conceptual measure of behavioral space reduction and subsequent robustness (oper...
Regulators may prefer systems that support contestability and audit trails and could mandate argumentation-style explainability in certain sectors.
Speculative policy prediction; no regulatory statements or empirical policy adoption evidence cited.
low positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... regulatory adoption rate of contestability/audit-trail requirements
Better contestability may reduce litigation and regulatory frictions if decisions are transparently defensible.
Speculative legal-economic claim; no case studies or empirical legal analysis provided.
low positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... frequency/cost of litigation and regulatory disputes post-adoption of contestabl...
New service layers may emerge (argumentation-as-a-service, audit firms, explanation certification, human-in-the-loop orchestration platforms).
Speculative market/industry evolution claim based on analogous tech-service cretions; no empirical evidence.
low positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... emergence and market size of new service verticals around argumentative AI
New metrics are needed to value resilience (robustness to out-of-distribution events, graceful degradation) in procurement and contracting; performance-based contracts and regulated minimums for oversight mode selection can help align incentives.
Prescriptive recommendation based on gaps identified in procurement and contracting practice; conceptual proposal without empirical testing.
low positive Resilience Meets Autonomy: Governing Embodied AI in Critical... existence and use of resilience metrics in procurement/contracts and resulting a...
Demand will grow for tools and services that enable oversight (auditability, explainability, safe fallbacks), creating markets for verification, certification, safety middleware, and human-in-the-loop platforms.
Market-structure and demand-side reasoning based on the proposed governance needs; forecast-style projection without empirical market-data analysis.
low positive Resilience Meets Autonomy: Governing Embodied AI in Critical... market growth for oversight-enabling products and services (demand, number of ve...
Allocation decisions should be explicit, auditable, and adaptive — with provisions for overriding, fallbacks, and graceful degradation during unanticipated conditions.
Normative recommendation based on safety and accountability principles combined with crisis-management practices; argued via conceptual analysis and illustrative design features.
low positive Resilience Meets Autonomy: Governing Embodied AI in Critical... auditability, adaptability, and existence of override/fallback mechanisms in dep...
Collaborative VR features can change team workflows (remote, synchronous inspection sessions), potentially lowering coordination costs across geographically distributed teams.
Paper lists collaborative multi-user sessions as a planned capability and posits organizational effects; no user studies or measurements of coordination cost savings presented.
low positive iDaVIE v1.0: A virtual reality tool for interactive analysis... coordination costs / team workflow efficiency in distributed teams
Public funding for shared VR-capable data-exploration infrastructure could yield high leverage by improving returns on large observational investments.
Policy recommendation deriving from the platform and ROI arguments in the paper; no cost-benefit analysis or quantified ROI provided.
low positive iDaVIE v1.0: A virtual reality tool for interactive analysis... policy leverage (ROI) from funding shared VR infrastructure
Using iDaVIE increases the usable fraction of large observational datasets by improving QC and annotation throughput, thereby raising returns to telescope investments and downstream AI efforts.
This is an inferred implication in the paper (returns-to-scale/platform effects) based on improved QC/annotation throughput; no empirical measurement of usable-fraction increases provided.
low positive iDaVIE v1.0: A virtual reality tool for interactive analysis... usable fraction of observational datasets and downstream value for AI/modeling
Higher-quality labels produced via immersive inspection can reduce label noise and lower required training-data sizes for a target ML performance level.
Paper presents this as an implication/expected outcome based on improved annotation quality from immersive inspection; no empirical ML training experiments or quantitative reductions reported.
low positive iDaVIE v1.0: A virtual reality tool for interactive analysis... label noise level and required training-data size for target model performance
iDaVIE demonstrably reduces cognitive load for multidimensional-data tasks compared with 2D-slice inspection.
Paper asserts reduced cognitive load and faster, more intuitive exploration as an aim and reported outcome; no formal user-study metrics, sample size, or statistical analysis provided.
low positive iDaVIE v1.0: A virtual reality tool for interactive analysis... cognitive load (mental effort) for multidimensional-data inspection
The inverse-specification reward offers a domain-agnostic, holistic metric for fidelity to user intent and is recommended for measurement of model value/service quality.
Method introduces inverse-specification reward and asserts domain-agnostic applicability; recommendation based on its conceptual ability to recover briefs as fidelity measure (not necessarily validated across many domains).
low positive Learning to Present: Inverse Specification Rewards for Agent... Utility of inverse-specification recovery accuracy as a fidelity metric (concept...
High-quality automated slide generation has potential to reduce time spent on business presentation creation and produce productivity gains with partial substitution of routine creative/knowledge-worker tasks.
Empirical demonstration of near-SOTA automated slide generation capability on 48 briefs; domain-level economic implication extrapolated from performance improvements.
low positive Learning to Present: Inverse Specification Rewards for Agent... Potential time savings/productivity gains (not directly measured in the study)
Economic agents and risk models that integrate LLM outputs should weight inferences more heavily in structured domains (capacity estimates, trade flows, sanctions impact) and downweight or cross-validate politically ambiguous predictions.
Implication drawn from domain heterogeneity in model performance observed in the study (better structured-domain performance, weaker political forecasting).
low positive When AI Navigates the Fog of War recommended weighting/usage strategy for LLM-derived inputs in economic risk mod...
Deploying BATQuant with reliable 4-bit weight/activation quantization for MXFP-capable accelerators reduces memory footprint and memory-bandwidth pressure, enabling higher throughput and lower per-token inference costs.
Argumentative / economic analysis in the paper linking reduced precision and parameter storage to lower memory/bandwidth requirements and inferred throughput/cost improvements; not presented as a direct empirical measurement of cost per token in production environments in the summary.
low positive BATQuant: Outlier-resilient MXFP4 Quantization via Learnable... Inferred system-level outcomes: memory footprint, memory-bandwidth usage, throug...
Investment in multimodal continual learning, scalable and reliable knowledge-editing methods, and retrieval architectures that guarantee cross-modal consistency is economically justified.
Research/prioritization recommendations based on empirical benchmark findings showing current gaps; argumentation for R&D focus areas.
low positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... recommended R&D investment priorities (qualitative)
The findings argue for policies requiring disclosure of training-data timeframes and robust monitoring for time-sensitive factual accuracy in deployed systems.
Policy recommendations in the paper drawing on benchmark results and identified failure modes; prescriptive argumentation rather than empirical policy evaluation.
low positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... policy recommendation advocating disclosure and monitoring (qualitative)
Models and platforms that offer transparent update mechanisms (frequent data updates, reliable RAG pipelines, clear training snapshot metadata) will have competitive advantages in the market.
Economic and market analysis in implications section recommending transparency and update mechanisms as differentiators; speculative/business-analytical evidence rather than experimental.
low positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... market differentiation potential (qualitative)
The methodological template (train an ML surrogate of a costly simulator and embed it in an optimizer) generalizes beyond Doherty power amplifiers to other analog/microwave components and broader engineering domains.
Paper proposes generality of approach in implications section; no experimental demonstrations beyond the Doherty PA case are provided in the summary.
low positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... applicability/generalizability of the surrogate+optimizer methodology to other d...
Design choices and open-weight availability are intended to align with EU AI Act expectations for regional sovereignty and compliance.
Stated intent in the paper: the authors explicitly frame design and release strategy as aiming to align with EU AI Act regulatory expectations. The summary notes this intention but provides no technical compliance proof or audits.
low positive EngGPT2: Sovereign, Efficient and Open Intelligence claimed regulatory alignment (qualitative, declared intent rather than audited c...
EngGPT2 requires substantially less inference compute than comparable dense models—reported as roughly 20%–50% of the inference compute used by dense 8B–16B models.
Paper reports relative inference compute reductions (1/5–1/2). The summary states these percentages but no supporting FLOP counts, latency measurements, hardware, batching conditions, or benchmark-query workloads are provided.
low positive EngGPT2: Sovereign, Efficient and Open Intelligence relative inference compute (percentage of compute or latency compared to dense b...
Embedding culturally aligned moderation and multi-layer safety orchestration can reduce regulatory frictions and increase adoption in conservative or tightly regulated markets.
Paper claims regulatory and safety economics implications from their safety/moderation architecture; this is an asserted implication rather than an empirically validated outcome in the summary.
low positive Fanar 2.0: Arabic Generative AI Stack regulatory friction and adoption (policy/economic impact, asserted)
The methods used (data quality focus, continual pre-training, model merging, modular product stacks) are potentially transferable to other underrepresented/low-resource languages, lowering barriers to regional AI competitiveness.
Paper posits this policy/transferability implication as an argument in the 'Implications for AI Economics' section; no cross-language experimental evidence provided in the summary.
low positive Fanar 2.0: Arabic Generative AI Stack transferability potential to other languages (qualitative)
Fanar 2.0 demonstrates that targeted data curation, continual pre-training, and model-merging can be a viable alternative to the raw-scale pre-training arms race for language-specific competitiveness.
Paper argues this implication based on achieving benchmark gains on Arabic and English using curated data (120B tokens), continual pre-training, model-merging, and a 256 H100 GPU training budget rather than massively larger-scale pre-training.
low positive Fanar 2.0: Arabic Generative AI Stack viability of alternative development strategy vs scale (conceptual/performance c...