The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (13827 claims)

Adoption
8454 claims
Productivity
7544 claims
Governance
6789 claims
Human-AI Collaboration
6327 claims
Org Design
4126 claims
Innovation
4058 claims
Labor Markets
3520 claims
Skills & Training
2924 claims
Inequality
2057 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 749 195 97 889 1979
Governance & Regulation 815 391 188 121 1539
Organizational Efficiency 771 189 124 83 1177
Technology Adoption Rate 624 233 123 96 1084
Research Productivity 410 121 56 331 929
Output Quality 466 177 59 47 749
Decision Quality 320 174 75 42 618
Firm Productivity 435 55 88 20 604
AI Safety & Ethics 214 276 65 33 593
Market Structure 178 166 122 24 495
Task Allocation 206 64 70 31 376
Skill Acquisition 165 57 60 17 299
Innovation Output 201 27 41 18 288
Employment Level 105 51 107 13 278
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 116 63 42 11 232
Firm Revenue 149 46 26 3 224
Inequality Measures 44 122 49 6 221
Task Completion Time 169 29 8 12 219
Worker Satisfaction 89 61 20 12 182
Error Rate 69 91 10 2 172
Regulatory Compliance 76 68 14 5 163
Training Effectiveness 92 19 13 19 145
Wages & Compensation 77 36 25 6 144
Automation Exposure 51 54 22 12 142
Team Performance 86 17 27 9 140
Developer Productivity 94 17 14 6 132
Job Displacement 12 80 20 1 113
Hiring & Recruitment 51 7 8 3 69
Skill Obsolescence 5 45 6 1 57
Creative Output 31 16 7 2 57
Social Protection 27 16 8 2 53
Labor Share of Income 17 17 17 51
Worker Turnover 11 12 3 26
Industry 1 1
The paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, creating Nash-MADDPG, where Nash bargaining determines efficient bilateral pricing.
Methodological claim describing the proposed algorithm and role of Nash bargaining (as stated in abstract).
high positive Incentive-Aligned Vehicle-to-Vehicle Energy Trading via Nash... bilateral pricing efficiency (algorithmic pricing)
Nash-MADDPG achieves superior fairness, showing a 40.1% improvement in Jain's index.
Reported fairness metric (Jain's index) improvement in the paper's evaluation over a 30-day horizon (abstract statement).
Nash-MADDPG yields a 62.9% improvement in trading volume over Double Auction.
Reported comparison versus Double Auction in the paper's 30-day continuous-operation evaluation (abstract statement).
Nash-MADDPG improves social welfare by 61.6% over Double Auction in evaluation over 30-day continuous operation.
Simulation evaluation reported in the paper: 30-day continuous operation comparison against Double Auction baseline (as stated in abstract).
The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.
Authors' stated contributions/anticipated utility of their framework (conceptual claim about the expected usefulness of their mapping).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... utility for practitioners, researchers, and evaluators regarding human-AI combin...
Closing the synergy gap requires explicit engagement with a wider design space.
Prescriptive conclusion from the authors advocating broader design engagement (conceptual recommendation based on their framework).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... likelihood of closing the synergy gap given broader design engagement
Meta-analyses show that AI assistance tends to improve human performance compared to working alone.
Reference to existing meta-analyses in the literature reported by the authors (meta-analytic evidence aggregated across studies; no specific meta-analysis names, sample sizes, or quantitative pooled effects provided in the excerpt).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... human performance with AI assistance versus human performance alone
AI is now embedded in healthcare, finance, policy, and many other domains.
Statement in the paper's introduction/abstract summarizing the current deployment of AI across domains (literature observation, no specific empirical study or sample size cited).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... embedding/adoption of AI in multiple domains
The paper proposes a multi-layered governance framework combining core regulatory requirements with supporting ecosystem measures to ensure accountability, security, and transparency in the age of autonomous financial agency.
Policy proposal presented in the paper (concluding recommendation summarized in the abstract).
high positive AI Agents in Payments: Applications, Risks and Regulations proposed governance framework for accountability, security, transparency
These results position PRISM-Coach as a practical blueprint for privacy-by-design adaptive learning systems in everyday wellness.
Authors' interpretation in paper based on the implemented system and evaluation results (telemetry + survey + matched comparison).
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... suitability of PRISM-Coach as a blueprint for privacy-by-design adaptive learnin...
92% report increased privacy confidence after transparency disclosures.
In-app needs assessment survey reported in paper; percentage stated (92%). Sample size for survey not given in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... self-reported privacy confidence after disclosures
Survey results show that 82% report positive perceived benefit.
In-app needs assessment survey reported in paper; percentage stated (82%). Sample size for survey not given in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... self-reported perceived benefit
In the matched comparison, AI-enabled workflow yields higher average weight loss: 5.2 kg versus 3.1 kg.
Matched 19-week comparison window reported in paper; average weight loss numbers provided (5.2 kg vs 3.1 kg); sample size not stated in abstract.
In a matched 19-week comparison window, the AI-enabled workflow achieves adherence of 0.74 versus 0.48 under static grouping.
Matched 19-week comparison window reported in paper; comparison of AI-enabled workflow vs static grouping; sample size for comparison not stated in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... adherence (AI-enabled workflow vs static grouping)
At the population level, daily check-in adherence increases from 0.35 to 0.68.
Three years of telemetry from ~2,800 users reported in paper (population-level metrics).
PRISM-Coach was instantiated in a commercially deployed lifestyle coaching platform and evaluated using three years of telemetry from approximately 2,800 users and an in-app needs assessment survey.
Reported deployment and evaluation details in paper; telemetry period = 3 years; approximate user count = 2,800; survey described.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... deployment and evaluation dataset (telemetry + survey)
A human-in-the-loop coaching assistant generates de-identified summaries and draft messages without sending raw PII or PHI to external AI services.
System design and implementation described; claimed as part of instantiated PRISM-Coach deployment.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... prevention of PII/PHI leakage to external AI services
The system uses a privacy-constrained contextual bandit to assign users to eligible peer groups under coach-capacity and stability constraints.
Algorithmic method described in paper; implemented in the deployed system.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... peer-group assignment (privacy-constrained contextual bandit performance)
The system uses vault-based controlled identity restoration.
Method/architecture description in paper; implemented as part of instantiated platform.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... controlled identity restoration mechanism
PRISM-Coach separates each user into four bounded views: Identity, Operational, Learning, and Coaching, each with distinct access controls and risk profiles.
System architecture described in paper; implemented design (instantiated) reported.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... separation of user data into four bounded views
Agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.
Conclusion drawn from synthesis of evidence across multiple domains and argumentation in the paper.
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... importance/value of engineering practices (requirements, traceability, verificat...
Agentic Agile-V and the task-level SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify) convert conversational intent into structured engineering artifacts and acceptance evidence.
The paper proposes this process framework (the claim is the proposed function of the framework; no empirical evaluation given in the abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... ability to convert conversational intent into structured artifacts and acceptanc...
Controlled studies report productivity gains in some enterprise tasks.
Controlled experimental studies referenced by the paper (specific trials/stats not provided in abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... productivity on enterprise software tasks
These capabilities make software and hardware development faster in some settings.
Aggregated evidence cited in the paper including controlled studies and adoption studies (details not specified in abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... speed of software and hardware development
Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests.
Descriptive synthesis of existing agentic systems and demonstrations referenced in the paper (literature/examples); no single study or sample size given in the abstract.
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... agent capabilities (repository inspection, planning, editing, tool use, testing,...
ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation.
Paper statement that ScienceClaw x Infinite was used to supply auditable artifacts and provenance for the benchmark.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... availability of auditable artifact and provenance layer
When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability.
Reported result for the historical paradigm-shift detection task indicating limited predictive gains but improved interpretability and provenance when using coordinated agents.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... interpretation and traceability of detection results for paradigm-shift detectio...
Cross-channel composites improve over single-channel baselines: exoplanet vetting reaches AUROC 0.955.
Reported performance metric (AUROC=0.955) for the exoplanet vetting task comparing cross-channel composite to single-channel baselines.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... classifier performance (AUROC) for vetting transiting-exoplanet candidates
When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944.
Reported performance metric (AUROC=0.944) for the climate-vector emergence task comparing cross-channel composite to single-channel baselines.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... classifier performance (AUROC) for detecting vector-borne disease emergence
Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations.
Methods claim describing evaluation protocol components reported in the paper.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... evaluation protocol completeness
We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates.
Stated design of the study: description of benchmark tasks in the paper's methods/abstract.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... benchmark scope (four tasks)
We identify three perceived barriers and address each empirically across travel booking (14 nodes), Zoom support (14 nodes, product-specific knowledge), and insurance claims (55 nodes, 6 decision hubs).
Author statement describing the experimental evaluation conducted in this paper: three domains evaluated with specified node counts (travel booking: 14 nodes; Zoom support: 14 nodes with product-specific knowledge; insurance claims: 55 nodes with 6 decision hubs).
high positive Compiling Agentic Workflows into LLM Weights: Near-Frontier ... empirical evaluation across three workflow domains (breadth/complexity of tests ...
Compiling the procedure into the weights of a small fine-tuned model -- creating a subterranean agent -- should resolve all of these concerns.
Author's proposed solution/argument (theoretical claim that fine-tuning a small model can avoid context-window usage, per-conversation frontier usage, and exposure to third parties).
high positive Compiling Agentic Workflows into LLM Weights: Near-Frontier ... mitigation of the named concerns (context usage, frontier-model requirement, exp...
Agent orchestration frameworks have proliferated, collectively exceeding 290,000 GitHub stars across LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, Semantic Kernel, Strands, and LlamaIndex.
Paper statement reporting an aggregate GitHub star count across seven named frameworks (LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, Semantic Kernel, Strands, LlamaIndex).
high positive Compiling Agentic Workflows into LLM Weights: Near-Frontier ... GitHub star count (popularity/adoption proxy) across listed agent orchestration ...
Policy implication: governments in emerging economies should support AI-based learning ecosystems, strengthen university-industry collaboration and expand digital literacy programs to accelerate digital competitiveness.
Authors' policy recommendations based on study findings and contextual discussion about Pakistan's IT sector and emerging economies.
high positive Enhancing innovation in Pakistan’s IT sector policy actions to improve digital competitiveness
Organisational intelligence (OI) is a major driver of sustained innovation and helps firms translate learning into commercial outcomes.
Survey measures for OI and IP (N=348) and results from mediation/association analyses indicating OI positively relates to innovation performance and mediates effects of AIDLC/KO.
high positive Enhancing innovation in Pakistan’s IT sector organisational intelligence impact on innovation performance
Knowledge orchestration functions as a critical bridge between AI-driven learning culture and innovation; success depends less on what information is stored and more on how quickly and intelligently it can be used.
Mediation analysis from the cross-sectional survey (N=348) showing KO mediates the relationship between AIDLC and innovation performance; conceptual interpretation in discussion contrasting KO with traditional knowledge management.
high positive Enhancing innovation in Pakistan’s IT sector mediating effect of knowledge orchestration on innovation performance
AI-supported learning environments were linked to greater creativity, experimentation and technological improvement.
Survey responses (N=348) using established measurement scales; authors report associations between AIDLC measures and subcomponents of innovation (creativity, experimentation, technological improvement).
high positive Enhancing innovation in Pakistan’s IT sector creativity / experimentation / technological improvement
Firms with a learning culture strongly driven by AI reported higher innovation performance, both directly and indirectly through two mediating factors (knowledge orchestration and organisational intelligence).
Cross-sectional quantitative survey (N=348) using established scales for AI-driven learning culture (AIDLC), knowledge orchestration (KO), organisational intelligence (OI) and innovation performance (IP); statistical analysis testing direct and serial mediation relationships.
high positive Enhancing innovation in Pakistan’s IT sector innovation performance (IP)
Research on automation should be reoriented away from a primary focus on job loss toward understanding the organizational and technological transformations produced by digital work.
Normative and methodological recommendation derived from the paper's critical review of literature and the mappings of production/work networks; argued on conceptual and interpretive grounds rather than new empirical estimation.
high positive H ψηφιακή εργασία πίσω από την Τεχνητή Νοημοσύνη: research agenda and focus (topics prioritized by scholars and policymakers)
The global HR technology market is expected to expand from USD 43.7 billion in 2025 to over USD 81 billion by 2032.
Forecast figure stated in paper (likely sourced from a market research / industry report, not specified in the excerpt).
high positive The Algorithmic Mirror: Can Artificial Intelligence Truly Mi... HR technology market size / market growth
Artificial Intelligence (AI) is increasingly marketed as a neutral arbiter capable of eliminating unconscious bias from human resource processes.
Statement in paper (assertion about industry marketing and positioning); no empirical data or citation provided in the excerpt.
high positive The Algorithmic Mirror: Can Artificial Intelligence Truly Mi... perceived neutrality of AI in HR / bias elimination claims
We recommend that LLM forecasting evaluations use continuous (and unbounded) measures of accuracy alongside bounded binary threshold metrics.
Recommendation based on the paper's empirical findings that binary threshold metrics miss upper-tail costs while continuous/tail-inclusive metrics reveal inverse-scaling effects; rationale provided by experimental comparisons (empirical support described in paper).
high positive Is Capability a Liability? More Capable Language Models Make... evaluation methodology for LLM forecasting (metric selection)
Community and Indigenous approaches offer alternative models of authority over AI infrastructure rooted in stewardship rather than extraction, although these approaches are constrained.
Normative argument and engagement with community/Indigenous scholarship and examples; presented as an alternative model in the paper (qualitative).
high positive Digital colonialism, techno-sovereignty, and infrastructural... viability and character of stewardship-based authority models for AI governance
Scholarly and empirical research should prioritize multilevel analysis, algorithmic governance, and ethical considerations to study the AI-infused strategic landscape.
Paper's concluding research agenda based on gaps identified in the conceptual analysis; prescriptive recommendation rather than empirical finding.
high positive Infusing Artificial Intelligence into Strategy Theory: Synth... recommended research priorities and topics
Although evaluated in the ads stack, this is a general framework that can be applied broadly to any large-scale recommendation and retrieval systems facing similar scaling and predictability challenges.
Author statement about generalizability and applicability beyond ads; no cross-domain experiments reported in the excerpt to substantiate broad applicability.
high positive LLM Retrieval for Stable and Predictable Ad Recommendations generalizability/applicability to other large-scale recommendation and retrieval...
We tested this LLM ads retrieval framework in a large-scale industrial ads recommendation system, demonstrating significant improvements across offline and online A/B experiments, showcasing gains in both predictability and traditional performance metrics.
Reported large-scale industrial deployment and both offline and online A/B experiments; authors state 'significant improvements' but no numeric effect sizes, p-values, or sample sizes are provided in the excerpt.
high positive LLM Retrieval for Stable and Predictable Ad Recommendations predictability; traditional performance metrics (e.g., recall, NDCG, click/conve...
The approach extracts hierarchical semantic attributes from ad creatives to obtain LLM representations, which serve as the foundation for graph-based expansion to retrieve semantic variants of an ad.
Method description in paper: hierarchical semantic attribute extraction, LLM representations, graph-based expansion; presented as the core technical approach (no detailed quantitative validation in excerpt).
high positive LLM Retrieval for Stable and Predictable Ad Recommendations semantic coverage/representation of ad candidates (retrieval of semantic variant...
We present an online validated semantic candidate generation framework powered by fine-tuned Large Language Models (LLMs) that showed significant improvement along these metrics by fundamentally improving the semantic-awareness of the system.
Claim backed by reported online validation and use of fine-tuned LLMs; paper states results come from online validation in a large-scale industrial ads recommendation system and offline/online A/B experiments (no numeric details provided in excerpt).
high positive LLM Retrieval for Stable and Predictable Ad Recommendations prediction stability and predictability; semantic-awareness of candidate generat...
We introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system.
Paper presents a methodological contribution (new evaluation framework) described in the text; no numerical validation details provided in the excerpt.
high positive LLM Retrieval for Stable and Predictable Ad Recommendations prediction stability and predictability (new evaluation metrics/framework)