The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (4333 claims)

Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 402 112 67 480 1076
Governance & Regulation 402 192 122 62 790
Research Productivity 249 98 34 311 697
Organizational Efficiency 395 95 70 40 603
Technology Adoption Rate 321 126 73 39 564
Firm Productivity 306 39 70 12 432
Output Quality 256 66 25 28 375
AI Safety & Ethics 116 177 44 24 363
Market Structure 107 128 85 14 339
Decision Quality 177 76 38 20 315
Fiscal & Macroeconomic 89 58 33 22 209
Employment Level 77 34 80 9 202
Skill Acquisition 92 33 40 9 174
Innovation Output 120 12 23 12 168
Firm Revenue 98 34 22 154
Consumer Welfare 73 31 37 7 148
Task Allocation 84 16 33 7 140
Inequality Measures 25 77 32 5 139
Regulatory Compliance 54 63 13 3 133
Error Rate 44 51 6 101
Task Completion Time 88 5 4 3 100
Training Effectiveness 58 12 12 16 99
Worker Satisfaction 47 32 11 7 97
Wages & Compensation 53 15 20 5 93
Team Performance 47 12 15 7 82
Automation Exposure 24 22 9 6 62
Job Displacement 6 38 13 57
Hiring & Recruitment 41 4 6 3 54
Developer Productivity 34 4 3 1 42
Social Protection 22 10 6 2 40
Creative Output 16 7 5 1 29
Labor Share of Income 12 5 9 26
Skill Obsolescence 3 20 2 25
Worker Turnover 10 12 3 25
Clear
Governance Remove filter
Computational argumentation offers formal, verifiable reasoning representations (argumentation frameworks, attack/support relations).
Established literature on formal argumentation (e.g., Dung-style AFs) and the paper's conceptual description; no new empirical data reported.
high positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and machine-checkability of formal inferential chains (inspectability/...
The development artifacts are fully transparent and reproducible: the repository includes an archive of 229 human prompts and a git history with 213 commits.
Paper reports counts of prompts (229) and git commits (213) and states these archives are public; these are concrete repository metrics (n=1 development repository).
high positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... number of human prompts archived (229); number of git commits (213); public avai...
The Lean kernel provided full machine verification of all formalized statements in the development.
Paper reports 'Full verification by the Lean kernel' for the Lean 4 development; supported by availability of the Lean 4 repository and verified theorem artifacts (n=1 project).
high positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... machine-checked verification status of formalized statements (verified/unverifie...
A specialized prover (Aristotle) automatically closed 111 lemmas during the development.
Quantitative verification metric reported in the paper: 111 lemmas automatically closed by Aristotle; claim tied to the Lean development and prover logs (single project count).
high positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... number of lemmas automatically discharged by Aristotle (111)
The AI-assisted pipeline combined an AI reasoning model (Gemini DeepThink) to generate the proof, an agentic coding tool (Claude Code) to translate the proof to Lean, a specialized automated prover (Aristotle) that closed 111 lemmas, and the Lean kernel to fully verify the result.
Project workflow description and verification metrics in the paper; reported counts and named components (Gemini DeepThink, Claude Code, Aristotle, Lean kernel); repository and logs purportedly document toolchain usage (n=1 project; 111 lemmas closed by Aristotle reported).
high positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... composition of toolchain and number of lemmas automatically discharged (111)
A complete formalization in Lean 4 of the equilibrium characterization for the Vlasov–Maxwell–Landau (VML) system was produced through an AI-assisted pipeline.
Single-project artifact: a Lean 4 development containing formal statements, proof scripts and verified theorems reported by the paper (n=1 project); authors report full machine verification by the Lean kernel and provide the repository as public evidence.
high positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... completeness of formalization / machine-checked verification of the VML equilibr...
The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function.
Worked, illustrative examples included in the paper mapping regulatory constraints to the Policy(agent_id, partial_path, proposed_action, org_state) formalism.
high positive Runtime Governance for AI Agents: Policies on Paths representability of regulation-inspired policies in the formalism (yes/no; examp...
Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture).
Reference implementation design described in the paper (runtime policy evaluator hooks, logging, enforcement actions); architectural reasoning and pseudo-workflows provided; no production deployment data.
high positive Runtime Governance for AI Agents: Policies on Paths feasibility of integrating runtime policy evaluator into agent loops (architectu...
Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action.
Formal definition and mapping in the paper; worked examples showing how regulatory-style constraints map into this function; no large-scale empirical validation.
high positive Runtime Governance for AI Agents: Policies on Paths expressiveness of policy formalism (ability to represent targeted constraints)
Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path.
Theoretical argument and formal proposal of runtime policy evaluator that takes (agent_id, partial_path, proposed_action, org_state) and returns a violation probability; reference architecture described; illustrative examples.
high positive Runtime Governance for AI Agents: Policies on Paths governance effectiveness for path-dependent policies (qualitative/coverage)
Multiple off-the-shelf vision-language models (closed-source and open-source) representative of current state-of-the-art architectures were benchmarked.
Paper reports experiments across a mix of closed-source and open-source VLMs; exact model names provided in the released materials.
high positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... models evaluated (variety and representativeness)
Evaluation targets include correctness, consistency, and update efficacy, operationalized via quantitative metrics (accuracy, consistency rates, update success rate).
Methods section describing evaluation metrics and how correctness, consistency, and update efficacy are measured across experiments.
high positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... metrics used: accuracy, consistency rate, update success rate
A curated set of time-sensitive factual items (e.g., officeholders, company statuses, recent awards/results) was used to construct the benchmark.
Benchmark composition description listing categories of time-sensitive facts and methodology for curation of items used in experiments.
high positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... composition of benchmark item set
The authors release the V-DyKnow benchmark, code, and evaluation data for community use.
Statement in paper and accompanying release materials indicating benchmark, code, and evaluation data are publicly available.
high positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... availability of benchmark, code, and data
V-DyKnow is a benchmark specifically designed to evaluate time-sensitive factual knowledge in vision-language models across both text and image modalities.
Release and description of the benchmark in the paper: curated set of time-sensitive factual items, paired multimodal stimuli (text + images), input perturbations, and evaluation scripts. Methodological description of benchmark composition and tasks.
high positive V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... benchmark existence / capability to evaluate time-sensitive multimodal factual k...
Ethical handling: the study involved sensitive material (self-harm, trauma) and authors applied validation and careful handling consistent with research ethics.
Ethics section and methods describing sensitivity of material and precautions taken in data handling and validation.
high positive Characterizing Delusional Spirals through Human-LLM Chat Log... ethical procedures applied to sensitive data
Selected coded items (for example, suicidal messages) were validated by the authors to increase reliability of certain critical annotations.
Methods section describing validation procedures applied to selected items such as suicidal ideation.
high positive Characterizing Delusional Spirals through Human-LLM Chat Log... validation status of coded items (e.g., number of validated suicidal messages)
The authors developed and applied a manual codebook of 28 behavioral/phenomenological codes (e.g., delusional thinking, suicidal ideation, chatbot sentience claims, romantic interest) across the full corpus.
Method section describing construction of a 28-code inventory and manual coding applied to entire dataset.
high positive Characterizing Delusional Spirals through Human-LLM Chat Log... existence and application of a 28-code annotation scheme
Explicit enforcement of signal constraints in DeePC provides a safety/operational advantage over many pure learning approaches that do not explicitly enforce hard constraints.
Algorithmic formulation includes constraints in the optimization; paper contrasts this with unconstrained learning-based controllers and demonstrates constrained, feasible actuation in simulation.
high positive Data-driven generalized perimeter control: Zürich case study explicit constraint satisfaction and operational safety of signal timings
DeePC can compute traffic-light actuation sequences that respect hard operational and safety constraints (e.g., phasing, minimum/maximum green times).
Formulation of DeePC as a constrained optimization problem in the paper with explicit constraint terms for signal phasing and safety; implemented in simulation experiments where constraints are enforced in the controller optimization.
high positive Data-driven generalized perimeter control: Zürich case study constraint satisfaction / feasibility of computed actuation sequences
Reframing urban traffic dynamics with behavioral systems theory allows system evolution to be learned and predicted directly from measured input–output data (no explicit model identification).
Theoretical exposition in the paper showing that traffic trajectories can be represented as linear combinations of past measured trajectories via Hankel/data matrices; used as the basis for predictive control (DeePC).
high positive Data-driven generalized perimeter control: Zürich case study predictive capability from measured I/O trajectories (ability to forecast future...
Applying DeePC yields measurable improvements in system-level outcomes (reduced total travel time and CO2 emissions) in a very large, high-fidelity microscopic simulation of Zürich.
Simulation experiments in a city-scale, high-fidelity microscopic closed-loop simulator of Zürich comparing DeePC-controlled signals against baseline controllers (e.g., fixed-time or standard adaptive schemes); reported reductions in aggregated metrics (total travel time and CO2 emissions).
high positive Data-driven generalized perimeter control: Zürich case study total travel time; CO2 emissions
A model-free traffic control approach (DeePC) can steer urban traffic via dynamic traffic-light control without building explicit traffic models.
Algorithmic/theoretical development (behavioral systems theory + DeePC) and controller-in-loop experiments in a high-fidelity microscopic closed-loop simulator of Zürich demonstrating closed-loop control using only input–output trajectory data (Hankel matrices) rather than parametric model identification.
high positive Data-driven generalized perimeter control: Zürich case study ability to generate feasible control (traffic-light) actuation sequences and clo...
Calibration data must be representative of deployment data to preserve conformal statistical guarantees in practice.
Theoretical requirement of exchangeability for conformal guarantees combined with empirical results where mismatched calibration caused guarantee violations or degraded factuality.
high positive Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... preservation of factuality guarantees and post-deployment factuality
The paper introduces informativeness-aware metrics to measure task utility under conformal filtering, going beyond pure factuality rates.
Methodological contribution described: new metrics that penalize vacuous outputs and quantify retained task utility after filtering.
high positive Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... informativeness/usefulness metrics (as defined in the paper)
Decomposing generated outputs into atomic claims and calibrating a verifier score threshold on held-out data yields a statistically valid guarantee (under exchangeability) that claims passing the threshold meet a target factuality level.
Method description and theoretical use of conformal calibration applied to per-claim scores, with held-out calibration set used to set the threshold; conforms to standard conformal prediction methodology presented in the paper.
high positive Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... coverage/factuality level of claims passing threshold
Conformal factuality provides distribution-free statistical guarantees for claim-level correctness in retrieval-augmented LLM outputs.
The paper applies conformal calibration to atomic claims: decompose outputs into atomic claims, score each claim with a verifier, and calibrate a score threshold on held-out (exchangeable) data to guarantee a target claim-level factuality rate. This is a theoretical property of conformal methods described and implemented in the paper.
high positive Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... claim-level factuality guarantee (probability bound on correctness of claims pas...
Under pathological label heterogeneity (mutually exclusive local labels) FederatedFactory restores CIFAR-10 classification accuracy from a collapsed baseline of 11.36% to 90.57%.
Empirical experiment reported on CIFAR-10 configured as a pathological heterogeneity stress test; paper reports baseline collapsed accuracy (11.36%) and FederatedFactory result (90.57%). (Specific sample sizes / client counts not provided in the summary.)
high positive FederatedFactory: Generative One-Shot Learning for Extremely... CIFAR-10 classification accuracy (%)
A single communication round of generative-module exchange suffices for clients to synthesize class-balanced datasets locally and align their training data.
Paper reports a single exchange of generative modules across clients (one communication round) and uses that to synthesize a globally class-balanced training set at each client; experiments (CIFAR-10, MedMNIST, ISIC2019) are run under this one-round regime.
high positive FederatedFactory: Generative One-Shot Learning for Extremely... number of communication rounds required; class balance of synthesized datasets
Convergence of the three complementary methods (lexical, paraphrase, behavioral) strengthens confidence that contamination is real and systematically inflates scores.
Triangulation across Experiment 1 (lexical detection on public corpora), Experiment 2 (paraphrase robustness on 100-question subset), and Experiment 3 (TS‑Guessing on all items); consistent patterns observed across methods.
high positive Are Large Language Models Truly Smarter Than Humans? robustness/confidence in contamination detection (methodological convergence)
BenchPreS can be used as an evaluative tool for mechanism designers and regulators to measure and compare models' context‑sensitivity to guide incentives, penalties, or certification regimes.
Methodological claim about the benchmark's applicability: BenchPreS produces MR and AAR metrics that can be used for comparisons; paper suggests use in policy/design contexts.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Usability of BenchPreS metrics (MR, AAR) for model comparison and regulatory eva...
BenchPreS provides a benchmark and evaluation protocol that systematically varies stored user preference, interaction partner (self vs third party), and normative requirement to assess appropriate suppression or application of preferences.
Dataset construction and evaluation procedure described: scenario generation varying preference, partner, and normative appropriateness; MR and AAR computed across the scenario set.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Benchmark coverage and experimental protocol (design dimensions: preference, par...
The paper advances a replicable interdisciplinary synthesis method and provides a simulated dataset and transparent protocols enabling other researchers to adapt the approach.
Methods section detailing systematic literature search protocols (ACM/IEEE/Springer, 2020–2024), inclusion criteria, simulation parameterization for the cross-sectoral dataset (seven industries, 2020–2024), and stated reproducibility materials.
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Availability and description of reproducible methods and a simulated dataset (re...
AI adoption is strongly associated with workforce skill transformation (reported correlation r = 0.71).
Correlational analysis reported in the paper using the simulated cross-sectoral dataset that mirrors employment trends across seven industries (Manufacturing, Healthcare, Finance, Education, Transportation, Retail, IT Services) over 2020–2024. This corresponds to sector-year observations (7 sectors × 5 years = 35 observations) and is triangulated with findings from a systematic literature synthesis (ACM, IEEE, Springer publications 2020–2024).
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Skill shift index (measure of changes in required skills and task composition)
Research priorities include rigorous real-world trials assessing patient outcomes, cost-effectiveness, and labor impacts; comparative studies of integration strategies; measurement of long-run workforce effects; and development of standard metrics and monitoring frameworks.
Explicit recommendations from the narrative review based on identified gaps: scarcity of RCTs, economic analyses, and long-term workforce studies.
high positive Human-AI interaction and collaboration in radiology: from co... number and quality of real-world trials, existence of standardized monitoring fr...
The paper advances augmentation debates by articulating the leader’s practical role when decision lead‑agency shifts between humans and AI and by detailing systemic HR changes needed to sustain performance, legitimacy and well‑being.
Stated contribution of the conceptual synthesis comparing existing augmentation and leadership literatures and providing an HR‑focused framework; descriptive of the paper's intellectual contribution.
high positive Symbiarchic leadership: leading integrated human and AI cybe... clarity of leader role; specification of HR system changes
Core practice 4 — Embed governance: make accountability, bias testing, privacy safeguards, audit trails, escalation thresholds and human oversight explicit and routine.
Prescriptive governance practice grounded in literature on algorithmic accountability and risk management and in practitioner examples; presented without original empirical validation.
high positive Symbiarchic leadership: leading integrated human and AI cybe... bias incidence; privacy breaches; auditability and compliance metrics
Core practice 3 — Manage the human–AI relationship: build adoption, psychological safety and calibrated trust; address automation anxiety and misuse.
Framework recommendation synthesizing organizational‑psychology and technology adoption literature plus practitioner observations; not tested empirically in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... adoption rates; psychological safety; calibrated trust; misuse incidents
Core practice 2 — Treat AI outputs as hypotheses: require human sensemaking and validation rather than blind adoption of model outputs.
Prescriptive practice derived from reviewed research and practitioner cases emphasizing human oversight; presented as framework guidance rather than empirically validated intervention.
high positive Symbiarchic leadership: leading integrated human and AI cybe... decision quality; error rates; incidence of blind automation
Core practice 1 — Allocate work by comparative advantage: assign tasks to humans or AI based on relative strengths (e.g., speed, pattern detection, contextual judgement).
Conceptual component of the framework drawn from synthesis of empirical findings in prior human–AI and task allocation literature and practitioner examples; no new empirical testing in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... task assignment efficiency; productivity from task allocation
Molecule operates a marketplace for decentralized clinical and preclinical assets, focusing on tokenizing drug assets and enabling investors to finance development.
Case-study description based on Molecule's public materials and marketplace listings; demonstrates platform design and transactions rather than long-term outcomes.
high positive Decentralized Autonomous Organizations in the Pharmaceutical... number of assets tokenized, capital deployed via the marketplace
VitaDAO is a community-driven organization funding and acquiring IP for longevity-related research, emphasizing open science and community governance.
Detailed case-study description drawing on VitaDAO's public documentation, governance records, and whitepaper materials.
high positive Decentralized Autonomous Organizations in the Pharmaceutical... IP acquisitions by VitaDAO, funding rounds executed, degree of open-science publ...
The work offers a blueprint for converting the ideological potential of AI into implementable, regulator-compatible utilities in pharmaceutical science by synthesizing quantitative measures and practical measures.
Claim about the paper's contribution (blueprint). It is an author claim about the synthesis and guidance provided; the excerpt does not include empirical validation that following the blueprint yields successful implementation.
high positive THE AI REVOLUTION IN PHARMACEUTICALS: INNOVATIONS, CHALLENGE... provision of a blueprint/guidance for implementable, regulator-compatible AI uti...
The paper proposes a systematized framework of integration that emphasizes creating high-impact pilot projects, in-the-wild testing, and ongoing monitoring of models in accordance with FDA, EMA, and EU AI Act guidance.
Described as the paper's proposed framework and recommendations for regulatory-aligned implementation. The excerpt indicates the proposal but does not present validation or empirical testing of the framework.
high positive THE AI REVOLUTION IN PHARMACEUTICALS: INNOVATIONS, CHALLENGE... existence of a proposed integration framework and recommended implementation ste...
O artigo discute implicações gerenciais e de políticas públicas para reduzir fricção, acelerar adoção responsável e orientar investimentos em produtividade e inclusão.
Seção de discussão mencionada no resumo abordando encargos gerenciais e políticas públicas; não há avaliação empírica de políticas no resumo.
high positive A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... recomendações e orientações para ação gerencial e políticas públicas visando red...
O artigo entrega instrumentos replicáveis — a escala SCF-30, um checklist de governança mínima de IA e uma matriz 30-60-90 dias — para uso prático.
Afirmação explícita no resumo de que instrumentos replicáveis são disponibilizados; presunção de inclusão dos instrumentos no corpo do artigo.
high positive A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... disponibilidade de instrumentos operacionais (escala, checklist, matriz 30-60-90...
High-quality chatbots (96–100% accurate) improved caseworker accuracy by 27 percentage points.
Experimental result reported in paper: treatment with chatbots at 96–100% aggregate accuracy produced a 27 percentage-point increase in caseworker accuracy compared to control; based on the randomized experiment on the 770-question benchmark.
high positive LLMs in social services: How does chatbot accuracy affect hu... change in caseworker accuracy (percentage-point increase) when assisted by 96–10...
Caseworker performance significantly improves as chatbot quality improves.
Aggregated results from the randomized experiment show monotonic improvement in caseworker accuracy as the chatbot suggestion accuracy increases; paper states the improvement is statistically significant (specific p-values/statistical tests not provided in the excerpt).
high positive LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy as a function of chatbot suggestion quality
AI-integrated fuel blending systems achieve very high precision, demonstrated by a coefficient of determination (R2) of 0.99 during validation.
Model validation results reported in the paper (fuel blending system validation, R2 = 0.99), indicating very high explanatory/ predictive fit compared to traditional models.
high positive AI-Based Technological Transformation as a Driver for Develo... fuel blending accuracy/precision (measured by R2 on validation dataset) and impl...
DARE posits that responsible AI deployment requires the simultaneous and integrated development of Digital readiness, Administrative governance, Resilience & ethics, and Economic equity.
Descriptive claim about the framework's components as reported in the abstract (conceptual proposition).
high positive The DARE framework: a global model for responsible artificia... responsible AI deployment (dependent on development across four DARE dimensions)