The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
We introduce unbounded cognitive fusion (UCF) as a new theoretical framework explaining coordination through cognitive synthesis rather than price signals or authority structures.
Theoretical proposal and framing within the paper; conceptual development rather than empirical validation.
high positive Beyond markets and hierarchies: How GenAI enables unbounded ... organizational coordination explained via cognitive synthesis
Generative artificial intelligence (GenAI) fundamentally alters [traditional organizational coordination] assumptions by augmenting human cognitive capabilities across organizational boundaries.
Position paper argumentation and conceptual reasoning presented in the abstract; no empirical data or sample reported.
high positive Beyond markets and hierarchies: How GenAI enables unbounded ... human cognitive capability augmentation
New tendencies in managerial AI research and practice include explainable AI, human–AI collaboration, knowledge management, enterprise analytics, and algorithmic management.
Descriptive finding from the paper's literature synthesis (topics emphasized in the review); no quantitative prevalence or counts provided in the abstract.
high positive Artificial intelligence, machine learning, and deep learning... emergent research and practice topics / adoption tendencies
Machine Learning and Deep Learning enhance employee productivity, business intelligence, process mining, and data-driven decision-making by enabling prediction, perception, and adaptive learning solutions.
Claim synthesized in the review from multiple studies identified via PRISMA screening; abstract does not list the number or identity of underlying empirical studies.
high positive Artificial intelligence, machine learning, and deep learning... employee productivity, effectiveness of business intelligence and process mining...
AI-based technologies can greatly enhance managerial efficiency by automating repetitive activities, improving resource allocation, enabling intelligent scheduling, and supporting predictive modelling and strategic planning.
Summary conclusion from the paper's literature review (PRISMA methodology referenced); no quantitative meta-analytic effect sizes provided in abstract.
high positive Artificial intelligence, machine learning, and deep learning... managerial efficiency, automation of repetitive tasks, resource allocation, sche...
Machine Learning, Artificial Intelligence, and Deep Learning are tools that can optimize managerial decisions, enable intelligent automation, streamline workflows, and improve organizational performance.
Synthesis claim from the paper's PRISMA-based literature review (no numeric sample size reported in the abstract).
high positive Artificial intelligence, machine learning, and deep learning... managerial decision quality, automation, workflow streamlining, organizational p...
Findings underscore the importance of robust evaluation frameworks for deploying VLMs in visually rich and safety-critical environments.
Synthesis/recommendation based on experimental results showing that visual inputs (images and colors) can influence VLM decisions and that mitigation effectiveness varies by model.
high positive The Effects of Visual Priming on Cooperative Behavior in Vis... need for/importance of robust evaluation frameworks for VLM safety and reliabili...
Policy frameworks, reskilling initiatives, and institutional adaptations are required to ensure inclusive technological progress.
Prescriptive conclusion presented in abstract based on the review and synthesis; no empirical validation or sample sizes provided in abstract.
high positive AI and the Transformation of Human Employment: Challenges, O... effectiveness of policy and reskilling to ensure inclusion
AI simultaneously generates demand for higher-order problem solving, emotional intelligence, and human-AI collaboration skills.
Explicit finding reported in abstract from the review of interdisciplinary literature; no quantified effect sizes or sample sizes provided in abstract.
high positive AI and the Transformation of Human Employment: Challenges, O... demand for higher-order skills / skill acquisition requirements
For memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.
Conclusion drawn by the authors based on comparative experimental results reported in the paper (xmemory vs retrieval/model-strength baselines); excerpt provides aggregate benchmark comparisons but not full experimental details.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... relative importance of system architecture versus retrieval/model strength for m...
On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses.
Empirical evaluation on an application-level task reported in the paper showing 95.2% accuracy for xmemory and claiming it outperforms several classes of alternative systems; excerpt lacks details on the task, dataset size, or baseline numeric results.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... accuracy on an application-level memory task
On the end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines.
Empirical evaluation on the paper's end-to-end memory benchmark reporting F1 scores for xmemory and a range for third-party baselines; the excerpt does not provide dataset size or statistical significance details.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... F1 score on an end-to-end memory benchmark
On the structured extraction benchmark (judge-in-the-loop configuration) the system reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines.
Empirical evaluation on the paper's structured extraction benchmark in the judge-in-the-loop configuration; the excerpt reports the numeric accuracies and states they exceed tested frontier structured-output baselines. The excerpt does not specify dataset size or number of runs.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... object-level accuracy and output accuracy on a structured extraction benchmark
This iterative, schema-aware write-path design shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose.
Conceptual claim about how the proposed architecture affects system behavior; supported by the architectural description in the paper rather than explicit quantitative evidence in the excerpt.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... nature of read queries (constrained queries over verified records vs repeated in...
We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control.
Description of the proposed method/architecture in the paper (methodological contribution); no numeric evaluation attached to the description in the excerpt.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... design/components of memory ingestion pipeline
Reliable external AI memory must be schema-grounded (schemas define what must be remembered, what may be ignored, and which values must never be inferred).
Normative assertion supported by the paper's proposed design and subsequent experimental results (the paper introduces a schema-grounded approach and evaluates it against benchmarks), though the excerpt does not give full methodological details or sample sizes for this claim alone.
high positive From Unstructured Recall to Schema-Grounded Memory: Reliable... reliability/stability of external AI memory
To manage AI legibility, creators perform four recurring forms of invisible authenticity labor: epistemic verification, linguistic naturalization, narrative restructuring, and performative embodiment.
Authors identify and name four recurrent practices from coding and analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing specific downstream repair and performance work.
high positive AI passing and invisible authenticity labor: trust vulnerabi... types of labor performed to conceal/humanize AI outputs
Creators engage in 'AI passing': strategic efforts to conceal and humanize AI-assisted drafts so that outputs plausibly appear human-authored.
Concept introduced based on analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing tactics to hide AI involvement and present content as human-authored.
high positive AI passing and invisible authenticity labor: trust vulnerabi... use of concealment/humanization strategies for AI outputs
Effective governance requires coordinated action across technical, organizational, and regulatory domains (e.g., system-level audits, vendor guidelines, continuous monitoring, documentation across dependency chains) to establish meaningful accountability in distributed development environments.
Policy and technical recommendations derived from literature review, regulatory analysis, and the paper's conceptual findings (recommendation, not empirically validated).
high positive How Supply Chain Dependencies Complicate Bias Measurement an... effectiveness of governance measures in producing meaningful accountability for ...
Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.
Conclusion/recommendation drawn from the benchmark design and experimental findings; conceptual claim advocating evaluation grounded in external demand signals and verifiable actions.
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... evaluation grounding (use of fresh external demand signals and verifiable agent ...
The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule.
Benchmark release statistics reported in the paper: explicit counts of tasks and evaluated models (105 tasks; 13 models).
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark scope (number of tasks) and evaluation breadth (number of models)
For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions.
Grading methodology described in the paper: instrumentation and hybrid deterministic/LLM-judging approach documented by authors (procedural description).
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... grading/verifiability pipeline (traces, logs, deterministic checks, structured L...
Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders.
Description of release construction in the methods: uses public workflow-demand data and ClawHub Top-500 skills; tasks are materialized with controlled fixtures and graders (procedural detail from the paper).
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... composition of benchmark releases (source signals and materialization strategy)
We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot.
Methodological contribution described in the paper; design and architecture of the benchmark are presented by the authors (design description, no external sample needed).
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark design (refreshable signal layer vs. time-stamped snapshot)
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces.
Framing/background statement in the paper describing expected capabilities of workflow agents; no empirical sample size reported for this expectation.
high positive Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... ability to complete end-to-end units of work
The proposed framework balances AI-driven productivity with the epistemic sovereignty necessary to manage increasingly opaque software ecosystems.
Normative/architectural claim about the proposed framework; presented conceptually in the paper without reported empirical testing in the excerpt.
high positive Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... balance between productivity gains and maintenance of epistemic sovereignty (hum...
To preserve long-term resilience, engineering leaders must move beyond prompt-based development to implement rigorous human-in-the-loop pedagogical standards.
Prescriptive recommendation based on the paper's conceptual analysis; no randomized trials or empirical validation of this intervention reported in the excerpt.
high positive Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... long-term resilience of engineering organizations when using human-in-the-loop p...
The findings offer practical insights for construction firms to enhance innovation performance through effective AI integration and help engineers better leverage AI tools in design and project management workflows.
Authors' stated practical implications based on their empirical findings (survey results linking AI capability, decision-making quality, and innovation performance).
Algorithmic transparency positively moderates the relationship between AI capability and decision-making quality.
Moderation analysis reported on questionnaire data (Credamo, time-lagged) with n=435; authors state a positive moderating effect of algorithmic transparency.
Decision-making quality mediates the relationship between AI capability and innovation performance.
Mediation analysis reported on the same survey dataset (time-lagged Credamo survey) with n=435 using established measurement scales; stated in results.
AI capability is positively associated with innovation performance.
Authors report statistical analysis of questionnaire data collected via the Credamo platform (time-lagged design) using established scales; sample size n=435; result stated in findings.
Commonly reported gains include the automation of trivial and repetitive tasks.
Multiple studies in the review report that LLM-assistants automate mundane programming tasks.
high positive The Impact of LLM-Assistants on Software Developer Productiv... automation of low-complexity tasks / developer time freed
Commonly reported gains include minimized code search due to LLM assistance.
Synthesis of study findings noting reductions in developer time spent searching for code or answers.
high positive The Impact of LLM-Assistants on Software Developer Productiv... time/effort spent searching for code or information
Commonly reported gains from LLM-assistants include accelerated development (faster task completion).
Multiple included studies report faster development workflows and reduced time-to-complete tasks, as synthesized in the review.
high positive The Impact of LLM-Assistants on Software Developer Productiv... task completion time / development speed
The majority of reviewed studies report considerable benefits from LLM-assistants.
Synthesis of findings across the 39 included peer-reviewed studies as reported in the review.
high positive The Impact of LLM-Assistants on Software Developer Productiv... overall reported impact on developer productivity
Capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.
Recommendation based on the authors' empirical deployment and analysis of failure modes and mitigation effectiveness across the end-to-end pipeline.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... evaluation scope for capital-managing agents
Targeted harness changes increased capital deployment from 42.9% to 78.0% in an affected test population.
A/B or pre/post testing in an affected test population measuring percentage of capital deployed before and after harness changes.
Targeted harness changes reduced fee-led observations from 32.5% to below 10% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fee-led observations before and after harness changes.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... incidence of fee-led observations
Targeted harness changes reduced fabricated sell rules from 57% to 3% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fabricated sell-rule observations before and after harness changes (percentage rates reported).
high positive Operating-Layer Controls for Onchain Language-Model Agents U... incidence of fabricated sell rules
Policy-valid submitted transactions settled with 99.9% success.
Settlement logs comparing policy-valid submitted transactions to successful onchain settlements.
high positive Operating-Layer Controls for Onchain Language-Model Agents U... settlement success rate for policy-valid submissions
We release the full code base and a richly annotated dataset to support reproducible research on adaptive VCAs.
Paper statement announcing release of code and dataset.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... availability of codebase and annotated dataset
The recommender achieved high relevance (MRR@1=0.75).
Reported offline/online recommender evaluation in the paper using Mean Reciprocal Rank at 1 (MRR@1) metric; presumably computed over recommendations in the study (711 conversations).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... recommendation relevance (MRR@1)
Step-by-step guidance improved pleasantness and reduced user burden.
User-reported measures collected in the controlled study (likely subjective ratings across participants/conversations).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... pleasantness (user satisfaction) and user burden
Device-level evidence increased correct resolutions from about 50% to over 90% relative to an LLM-only baseline.
Controlled study comparing SecMate with device-level diagnostic evidence to an LLM-only baseline; reported results across 144 participants / 711 conversations.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... correct resolutions (successful troubleshooting)
Service specificity is achieved through a proactive, context-aware recommender.
System description and recommender component evaluation in the paper.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... use of a proactive, context-aware recommender for service specificity
User specificity relies on implicit proficiency inference and profile-aware troubleshooting.
System design and algorithmic description in the paper explaining user-proficiency inference and profile-aware components.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... ability to infer user proficiency and use profiles for troubleshooting
Device specificity is provided by a lightweight local diagnostic utility.
System design and implementation details reported in the paper describing the diagnostic utility component.
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... presence and role of a local diagnostic utility for device specificity
We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals.
System description and architecture presented in the paper (design and implementation of SecMate).
high positive SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting ... system capability to integrate device, user, and service specificity
The framework produces a list of testable empirical questions that we leave as open problems.
Statement in the paper that it derives testable empirical questions from the theoretical framework; no empirical tests are executed in the paper itself.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... set of testable empirical research questions derived from the framework
The framework operationalizes aspects of earlier qualitative work on supervisory control (Sheridan, 1992), common ground (Clark & Brennan, 1991), and mixed-initiative interaction (Horvitz, 1999) within a single normative ratio.
Conceptual synthesis and mapping of prior qualitative literature into the new per-task leverage formalism presented in the paper; this is a theoretical linkage rather than empirical validation.
high positive Leverage Laws: A Per-Task Framework for Human-Agent Collabor... conceptual operationalization of supervisory control/common ground/mixed-initiat...