The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6574 claims)

Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 761 200 101 904 2020
Governance & Regulation 829 400 191 122 1566
Organizational Efficiency 784 193 125 84 1197
Technology Adoption Rate 637 236 124 97 1103
Research Productivity 431 131 58 340 972
Output Quality 481 183 59 47 770
Decision Quality 332 177 82 49 647
Firm Productivity 439 57 88 20 610
AI Safety & Ethics 218 279 66 33 602
Market Structure 181 170 123 24 503
Task Allocation 214 64 72 33 388
Skill Acquisition 174 62 62 17 315
Innovation Output 204 27 45 18 295
Employment Level 105 54 108 13 282
Fiscal & Macroeconomic 132 69 43 26 277
Consumer Welfare 117 63 42 11 233
Firm Revenue 154 48 26 3 231
Task Completion Time 173 31 8 12 225
Inequality Measures 44 123 50 6 223
Worker Satisfaction 89 65 22 12 188
Error Rate 71 92 10 2 175
Regulatory Compliance 77 69 14 5 165
Automation Exposure 58 56 26 13 156
Training Effectiveness 96 21 14 19 152
Wages & Compensation 77 37 25 6 145
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 81 21 1 115
Hiring & Recruitment 52 7 8 3 70
Creative Output 32 20 8 3 64
Skill Obsolescence 5 47 6 1 59
Social Protection 28 16 8 2 54
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
Startups integrate GenAI not as a peripheral tool but as a structural collaborator.
Interpretive finding from interviews and authors' theorization based on the dataset (17 interviews).
high positive From Prompt To Process: Qualitative Insights On How Genai Us... degree of integration of GenAI within organizational structure
Generative AI (GenAI) is influencing how startups form, operate, and create value.
Statement in paper's introduction/abstract; supported by the study's framing and qualitative interview data (17 expert interviews).
high positive From Prompt To Process: Qualitative Insights On How Genai Us... how startups form, operate, and create value (organizational formation and value...
Large-scale validation in a production code completion environment shows Echo increased the acceptance rate from 25.7% to 35.7%.
Reported result from 'large-scale validation' in a production code completion environment as stated in the paper's abstract; no sample size, statistical tests, or additional experimental details provided in the excerpt.
high positive Echo: Learning from Experience Data via User-Driven Refineme... acceptance rate of code completions
User-driven refinement sequences distill agents' flawed proposals into high-quality training signals.
Conceptual/empirical claim in the paper that user refinements produce verified solutions which serve as high-quality signals; supported by the paper's later validation claim but no separate sample size or statistical detail provided in the excerpt.
high positive Echo: Learning from Experience Data via User-Driven Refineme... quality of training signals produced via user refinement
Echo is a generalized framework that operationalizes the transition from raw experience to learnable knowledge by echoing environmental feedback into the training loop for model optimization.
Methodological contribution described by the authors (framework description); no implementation details or quantitative validation given in the excerpt besides later mention of validation.
high positive Echo: Learning from Experience Data via User-Driven Refineme... process of converting raw experience data into training signals
Widespread deployment of AI agents provides low-cost access to massive streams of real-world experience data.
Stated observation in the paper; no quantitative deployment statistics or sample sizes provided in the excerpt.
high positive Echo: Learning from Experience Data via User-Driven Refineme... availability and cost of experience data from deployed agents
Continuous learning from 'experience data' (interactions between agents and their environments) promises to transcend the scalability and knowledge limitations of static human data.
Conceptual claim in the paper proposing continuous learning from experience data as a solution; no empirical details provided in the excerpt.
high positive Echo: Learning from Experience Data via User-Driven Refineme... ability to overcome limitations of static human data
There is a session-level carryover effect: a participant's prior AI use leads to further AI adoption and entrenches their miscalibration about time savings.
Observed analyses across sessions in the three pre-registered user studies (combined N = 2691) showing that prior within-session AI use predicts subsequent AI adoption and stronger miscalibration.
high positive The efficiency-gain illusion: People underestimate the rate ... effect_of_prior_AI_use_on_subsequent_AI_adoption_and_miscalibration
People display 'efficiency-gain illusions': they overestimate how much time and effort savings AI use provides.
Same three pre-registered user studies (combined N = 2691) that measured participants' perceived time/effort savings from AI versus actual measured time/effort.
high positive The efficiency-gain illusion: People underestimate the rate ... perceived_time_and_effort_savings_vs_actual_time_and_effort_savings
People frequently choose to use AI even when doing so is inefficient (i.e., provides no meaningful time or effort savings).
Three pre-registered user studies reported in the paper (combined N = 2691) measuring participants' choices to use AI on cognitively simple tasks and comparing those choices to measured time/effort savings.
high positive The efficiency-gain illusion: People underestimate the rate ... frequency_of_AI_use_when_AI_is_inefficient
The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.
Authors' stated contributions/anticipated utility of their framework (conceptual claim about the expected usefulness of their mapping).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... utility for practitioners, researchers, and evaluators regarding human-AI combin...
Closing the synergy gap requires explicit engagement with a wider design space.
Prescriptive conclusion from the authors advocating broader design engagement (conceptual recommendation based on their framework).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... likelihood of closing the synergy gap given broader design engagement
Meta-analyses show that AI assistance tends to improve human performance compared to working alone.
Reference to existing meta-analyses in the literature reported by the authors (meta-analytic evidence aggregated across studies; no specific meta-analysis names, sample sizes, or quantitative pooled effects provided in the excerpt).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... human performance with AI assistance versus human performance alone
AI is now embedded in healthcare, finance, policy, and many other domains.
Statement in the paper's introduction/abstract summarizing the current deployment of AI across domains (literature observation, no specific empirical study or sample size cited).
high positive Addressing the Synergy Gap: The Six Elements of the Design S... embedding/adoption of AI in multiple domains
These results position PRISM-Coach as a practical blueprint for privacy-by-design adaptive learning systems in everyday wellness.
Authors' interpretation in paper based on the implemented system and evaluation results (telemetry + survey + matched comparison).
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... suitability of PRISM-Coach as a blueprint for privacy-by-design adaptive learnin...
92% report increased privacy confidence after transparency disclosures.
In-app needs assessment survey reported in paper; percentage stated (92%). Sample size for survey not given in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... self-reported privacy confidence after disclosures
Survey results show that 82% report positive perceived benefit.
In-app needs assessment survey reported in paper; percentage stated (82%). Sample size for survey not given in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... self-reported perceived benefit
In the matched comparison, AI-enabled workflow yields higher average weight loss: 5.2 kg versus 3.1 kg.
Matched 19-week comparison window reported in paper; average weight loss numbers provided (5.2 kg vs 3.1 kg); sample size not stated in abstract.
In a matched 19-week comparison window, the AI-enabled workflow achieves adherence of 0.74 versus 0.48 under static grouping.
Matched 19-week comparison window reported in paper; comparison of AI-enabled workflow vs static grouping; sample size for comparison not stated in abstract.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... adherence (AI-enabled workflow vs static grouping)
At the population level, daily check-in adherence increases from 0.35 to 0.68.
Three years of telemetry from ~2,800 users reported in paper (population-level metrics).
PRISM-Coach was instantiated in a commercially deployed lifestyle coaching platform and evaluated using three years of telemetry from approximately 2,800 users and an in-app needs assessment survey.
Reported deployment and evaluation details in paper; telemetry period = 3 years; approximate user count = 2,800; survey described.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... deployment and evaluation dataset (telemetry + survey)
A human-in-the-loop coaching assistant generates de-identified summaries and draft messages without sending raw PII or PHI to external AI services.
System design and implementation described; claimed as part of instantiated PRISM-Coach deployment.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... prevention of PII/PHI leakage to external AI services
The system uses a privacy-constrained contextual bandit to assign users to eligible peer groups under coach-capacity and stability constraints.
Algorithmic method described in paper; implemented in the deployed system.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... peer-group assignment (privacy-constrained contextual bandit performance)
The system uses vault-based controlled identity restoration.
Method/architecture description in paper; implemented as part of instantiated platform.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... controlled identity restoration mechanism
PRISM-Coach separates each user into four bounded views: Identity, Operational, Learning, and Coaching, each with distinct access controls and risk profiles.
System architecture described in paper; implemented design (instantiated) reported.
high positive Privacy-by-Design Adaptive Group Assignment for Digital Life... separation of user data into four bounded views
Agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.
Conclusion drawn from synthesis of evidence across multiple domains and argumentation in the paper.
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... importance/value of engineering practices (requirements, traceability, verificat...
Agentic Agile-V and the task-level SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify) convert conversational intent into structured engineering artifacts and acceptance evidence.
The paper proposes this process framework (the claim is the proposed function of the framework; no empirical evaluation given in the abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... ability to convert conversational intent into structured artifacts and acceptanc...
Controlled studies report productivity gains in some enterprise tasks.
Controlled experimental studies referenced by the paper (specific trials/stats not provided in abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... productivity on enterprise software tasks
These capabilities make software and hardware development faster in some settings.
Aggregated evidence cited in the paper including controlled studies and adoption studies (details not specified in abstract).
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... speed of software and hardware development
Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests.
Descriptive synthesis of existing agentic systems and demonstrations referenced in the paper (literature/examples); no single study or sample size given in the abstract.
high positive Agentic Agile-V: From Vibe Coding to Verified Engineering in... agent capabilities (repository inspection, planning, editing, tool use, testing,...
ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation.
Paper statement that ScienceClaw x Infinite was used to supply auditable artifacts and provenance for the benchmark.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... availability of auditable artifact and provenance layer
When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability.
Reported result for the historical paradigm-shift detection task indicating limited predictive gains but improved interpretability and provenance when using coordinated agents.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... interpretation and traceability of detection results for paradigm-shift detectio...
Cross-channel composites improve over single-channel baselines: exoplanet vetting reaches AUROC 0.955.
Reported performance metric (AUROC=0.955) for the exoplanet vetting task comparing cross-channel composite to single-channel baselines.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... classifier performance (AUROC) for vetting transiting-exoplanet candidates
When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944.
Reported performance metric (AUROC=0.944) for the climate-vector emergence task comparing cross-channel composite to single-channel baselines.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... classifier performance (AUROC) for detecting vector-borne disease emergence
Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations.
Methods claim describing evaluation protocol components reported in the paper.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... evaluation protocol completeness
We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates.
Stated design of the study: description of benchmark tasks in the paper's methods/abstract.
high positive Cross-domain benchmarks reveal when coordinated AI agents im... benchmark scope (four tasks)
Research on automation should be reoriented away from a primary focus on job loss toward understanding the organizational and technological transformations produced by digital work.
Normative and methodological recommendation derived from the paper's critical review of literature and the mappings of production/work networks; argued on conceptual and interpretive grounds rather than new empirical estimation.
high positive H ψηφιακή εργασία πίσω από την Τεχνητή Νοημοσύνη: research agenda and focus (topics prioritized by scholars and policymakers)
The global HR technology market is expected to expand from USD 43.7 billion in 2025 to over USD 81 billion by 2032.
Forecast figure stated in paper (likely sourced from a market research / industry report, not specified in the excerpt).
high positive The Algorithmic Mirror: Can Artificial Intelligence Truly Mi... HR technology market size / market growth
Artificial Intelligence (AI) is increasingly marketed as a neutral arbiter capable of eliminating unconscious bias from human resource processes.
Statement in paper (assertion about industry marketing and positioning); no empirical data or citation provided in the excerpt.
high positive The Algorithmic Mirror: Can Artificial Intelligence Truly Mi... perceived neutrality of AI in HR / bias elimination claims
Scholarly and empirical research should prioritize multilevel analysis, algorithmic governance, and ethical considerations to study the AI-infused strategic landscape.
Paper's concluding research agenda based on gaps identified in the conceptual analysis; prescriptive recommendation rather than empirical finding.
high positive Infusing Artificial Intelligence into Strategy Theory: Synth... recommended research priorities and topics
The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review.
Empirical result reported from the paper's benchmark and qualitative review of agent outputs (specific metrics, number of agents/tasks, and quantitative scores not provided in the excerpt).
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... output professionalism/quality
We develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards.
Methodological contribution stated in paper; described taxonomy elements (Accuracy, Formula, Format) as part of the evaluation design.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... evaluation criteria/taxonomy
We provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis.
Claim of contribution in the paper; refers to the authors' own evaluation study (details like number of tasks/agents not provided in the excerpt).
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... existence of evaluation on end-to-end spreadsheet tasks
Frontier AI labs have developed agents that can construct entire spreadsheets from scratch.
Asserted in paper as background/context; no specific models, numbers, or experimental details provided in the excerpt.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... agent capability to construct spreadsheets
LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions.
Framing statement in paper; no empirical data or sample size reported to support the trend claim within the excerpt.
high positive WorkstreamBench: Evaluating LLM Agents on End-to-End Spreads... expectations of agent capabilities (trend)
Adoption under higher communicative standards and institutional norms can mitigate suboptimal collective equilibria by imposing social commitments on individual users.
Theoretical argument and model-based analysis proposing communicative and institutional interventions as mitigating mechanisms (conceptual and formal reasoning).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... reduction of suboptimal collective equilibria / improvement in collective outcom...
Individually stable strategies can be scaled to collective equilibria using three extrapolation principles: (a) non-communicative aggregation, (b) local social signaling, and (c) institutional norms setting.
Theoretical extrapolation/principled modeling presented in the paper (conceptual and formal extension from individual to collective level).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... mechanisms for aggregation from individual strategies to collective equilibria
Canonical decision-theoretic strategies that account for adaptive user trajectories can be mapped so that agents transition between strategies based on interaction feedback to reach stable equilibria.
Analytical results from the decision-theoretic modeling in the paper showing adaptive trajectories and stable equilibria (theoretical model derivation).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... stability of agent strategies / attainment of equilibria
The paper develops a decision- and game-theoretic approach to the human-AI delegation-verification dilemma.
Methodological contribution: construction of decision- and game-theoretic models described in the paper (modeling/theoretical development).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... availability of a formal modeling framework for the delegation-verification dile...
Emerging models of human-AI interaction predominantly advance the complementarity thesis variously dubbed human-AI collaboration and human-AI hybrid intelligence.
Literature characterization / conceptual review reported in the paper (no empirical sample or quantitative analysis cited).
high positive The Human-AI Delegation Dilemma: Individual Strategies, Coll... prevalent theoretical framing in human-AI interaction literature (complementarit...