The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2340 claims)

Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 378 106 59 455 1007
Governance & Regulation 379 176 116 58 739
Research Productivity 240 96 34 294 668
Organizational Efficiency 370 82 63 35 553
Technology Adoption Rate 296 118 66 29 513
Firm Productivity 277 34 68 10 394
AI Safety & Ethics 117 177 44 24 364
Output Quality 244 61 23 26 354
Market Structure 107 123 85 14 334
Decision Quality 168 74 37 19 301
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 89 32 39 9 169
Firm Revenue 96 34 22 152
Innovation Output 106 12 21 11 151
Consumer Welfare 70 30 37 7 144
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 75 11 29 6 121
Training Effectiveness 55 12 12 16 96
Error Rate 42 48 6 96
Worker Satisfaction 45 32 11 6 94
Task Completion Time 78 5 4 2 89
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 17 9 5 50
Job Displacement 5 31 12 48
Social Protection 21 10 6 2 39
Developer Productivity 29 3 3 1 36
Worker Turnover 10 12 3 25
Skill Obsolescence 3 19 2 24
Creative Output 15 5 3 1 24
Labor Share of Income 10 4 9 23
Clear
Org Design Remove filter
Realizing those AI-driven gains in Vietnam requires legal and institutional redesigns.
Close reading of Vietnam's constitutional provisions, administrative statutes, procedural rules and judicial doctrine (doctrinal legal analysis) combined with comparative lessons from other jurisdictions; no quantitative data.
high positive ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... feasibility of AI deployment (legal/institutional compatibility enabling efficie...
CABP (Context-Aware Broker Protocol) extends JSON-RPC with identity-scoped request routing via a six-stage broker pipeline to ensure correct identity and policy propagation.
Design and protocol specification included in the paper; formal description and broker-pipeline semantics documented as a deliverable.
high positive Bridging Protocol and Production: Design Patterns for Deploy... correctness of identity and policy propagation across broker pipeline (as define...
The mechanism generalizes to another field: models trained on economics publication records reach ~70% accuracy on a similar benchmark.
Analogue of the management experiment performed in economics: models fine-tuned on economics journal publication records were evaluated on an economics benchmark and achieved approximately 70% accuracy. (Exact dataset sizes, benchmarks, and train/test splits not specified in the provided text.)
high positive Machines acquire scientific taste from institutional traces Accuracy on an economics research-pitch benchmark
Fine-tuned models trained on publication records each outperform every frontier model and the expert panel; the best single model achieves 59% accuracy on the benchmark.
Language models fine-tuned on historical journal accept/reject records were evaluated on the held-out four-tier benchmark; reported performance shows each fine-tuned model exceeds the frontier-model average and the human-panel baseline, with the best model at 59% accuracy. (Exact training set size and benchmark sample count not specified here.)
high positive Machines acquire scientific taste from institutional traces Accuracy on the four-tier management research-pitch benchmark
Panels of journal editors and editorial board members reach 42% accuracy by majority vote on the same four-tier benchmark.
Human baseline obtained by soliciting judgments from journal editors and editorial board members on the held-out benchmark and computing majority-vote accuracy (reported as 42%). (Number of human raters and benchmark size not given in supplied text.)
high positive Machines acquire scientific taste from institutional traces Majority-vote accuracy on the four-tier management research-pitch benchmark
Fine-tuning language models on historical journal publication decisions recovers an evaluative "scientific taste" that frontier (zero-shot) models and expert editor panels cannot reliably reproduce.
Fine-tuned models were trained on years of journal publication decisions (institutional accept/reject records) and evaluated on a held-out four-tier benchmark of management research pitches; performance compared to zero-shot evaluations of frontier models and to panels of journal editors (majority-vote). (Sample sizes for training records and held-out benchmark not specified in the provided text.)
high positive Machines acquire scientific taste from institutional traces Ability to predict publication-worthiness as measured by tier prediction accurac...
The A-ToM mechanism operates by estimating a partner's likely ToM order from interaction history and using that estimate to predict the partner's next action which then informs the agent's policy choices.
Method description and implementation details provided in the paper: estimator over ToM orders based on past interactions + conditional action prediction feeding into decision-making; validated in the reported experiments.
high positive Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... accuracy/usefulness of inferred ToM order for partner-action prediction and subs...
Empirical evaluation was performed across four coordination environments: a repeated matrix game, two grid navigation tasks, and an Overcooked task.
Methods section describes these four benchmark environments used for all reported comparisons between fixed-order agents and A-ToM agents; evaluation metrics were joint payoffs and task-specific success measures.
high positive Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate) as used in experiments
In the human–human benchmark, repeated pre-play communication substantially increases cooperation.
Reference benchmark data from Dvorak & Fehrler (2024), human–human sample n = 108, showing higher cooperation under repeated communication relative to less frequent communication; comparison reported in the paper.
high positive Playing Against the Machine: Cooperation, Communication, and... change in cooperation rate associated with repeated communication in human–human...
Using the proportional veto core provides formal protection for minority blocs by giving them proportional blocking power, thus encoding a proportional fairness guarantee compared to simple majoritarian rules.
Definition and properties of the proportional veto core presented in the paper; conceptual discussion comparing veto/proportionality guarantees to majoritarian outcomes.
high positive Finding Common Ground in a Sea of Alternatives existence of proportional blocking power / protection for minority groups as for...
The paper characterizes the information cost of aggregating preferences when AI can generate essentially unlimited candidate alternatives by providing tight sample-complexity bounds and lower bounds.
The combination of sampling-model formalization, sample-complexity upper bounds, and matching lower bounds constitutes a formal characterization of the information (sample) requirements.
high positive Finding Common Ground in a Sea of Alternatives sample/query complexity as the measure of information cost
The authors prove an upper bound on the number of samples/queries required by their algorithm as a function of accuracy, confidence, and problem parameters.
Theoretical analysis in the paper deriving explicit sample-complexity upper bounds (stated as functions of accuracy/confidence and relevant parameters).
high positive Finding Common Ground in a Sea of Alternatives sample/query complexity required for the algorithm to achieve specified accuracy...
Under only query (sampling) access to the unknown joint distribution of voters and alternatives, there is an efficient sampling-based algorithm that, with high probability, returns an alternative in the approximate proportional veto core.
Constructive algorithm and correctness proof in the paper showing the algorithm returns an approximate core alternative with high probability under the sampling access model.
high positive Finding Common Ground in a Sea of Alternatives probability that the algorithm's output lies in the approximate proportional vet...
The paper formalizes the proportional veto core for settings with an infinite alternative space and voters whose preferences are drawn from an unknown distribution.
Formal model and definitions presented in the paper: extension of the proportional veto core to an infinite alternative space and definitions for sampling-appropriate approximate proportional veto core.
high positive Finding Common Ground in a Sea of Alternatives formal definition / existence of an appropriate approximate proportional veto-co...
The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function.
Worked, illustrative examples included in the paper mapping regulatory constraints to the Policy(agent_id, partial_path, proposed_action, org_state) formalism.
high positive Runtime Governance for AI Agents: Policies on Paths representability of regulation-inspired policies in the formalism (yes/no; examp...
Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture).
Reference implementation design described in the paper (runtime policy evaluator hooks, logging, enforcement actions); architectural reasoning and pseudo-workflows provided; no production deployment data.
high positive Runtime Governance for AI Agents: Policies on Paths feasibility of integrating runtime policy evaluator into agent loops (architectu...
Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action.
Formal definition and mapping in the paper; worked examples showing how regulatory-style constraints map into this function; no large-scale empirical validation.
high positive Runtime Governance for AI Agents: Policies on Paths expressiveness of policy formalism (ability to represent targeted constraints)
Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path.
Theoretical argument and formal proposal of runtime policy evaluator that takes (agent_id, partial_path, proposed_action, org_state) and returns a violation probability; reference architecture described; illustrative examples.
high positive Runtime Governance for AI Agents: Policies on Paths governance effectiveness for path-dependent policies (qualitative/coverage)
Explicit enforcement of signal constraints in DeePC provides a safety/operational advantage over many pure learning approaches that do not explicitly enforce hard constraints.
Algorithmic formulation includes constraints in the optimization; paper contrasts this with unconstrained learning-based controllers and demonstrates constrained, feasible actuation in simulation.
high positive Data-driven generalized perimeter control: Zürich case study explicit constraint satisfaction and operational safety of signal timings
DeePC can compute traffic-light actuation sequences that respect hard operational and safety constraints (e.g., phasing, minimum/maximum green times).
Formulation of DeePC as a constrained optimization problem in the paper with explicit constraint terms for signal phasing and safety; implemented in simulation experiments where constraints are enforced in the controller optimization.
high positive Data-driven generalized perimeter control: Zürich case study constraint satisfaction / feasibility of computed actuation sequences
Reframing urban traffic dynamics with behavioral systems theory allows system evolution to be learned and predicted directly from measured input–output data (no explicit model identification).
Theoretical exposition in the paper showing that traffic trajectories can be represented as linear combinations of past measured trajectories via Hankel/data matrices; used as the basis for predictive control (DeePC).
high positive Data-driven generalized perimeter control: Zürich case study predictive capability from measured I/O trajectories (ability to forecast future...
Applying DeePC yields measurable improvements in system-level outcomes (reduced total travel time and CO2 emissions) in a very large, high-fidelity microscopic simulation of Zürich.
Simulation experiments in a city-scale, high-fidelity microscopic closed-loop simulator of Zürich comparing DeePC-controlled signals against baseline controllers (e.g., fixed-time or standard adaptive schemes); reported reductions in aggregated metrics (total travel time and CO2 emissions).
high positive Data-driven generalized perimeter control: Zürich case study total travel time; CO2 emissions
A model-free traffic control approach (DeePC) can steer urban traffic via dynamic traffic-light control without building explicit traffic models.
Algorithmic/theoretical development (behavioral systems theory + DeePC) and controller-in-loop experiments in a high-fidelity microscopic closed-loop simulator of Zürich demonstrating closed-loop control using only input–output trajectory data (Hankel matrices) rather than parametric model identification.
high positive Data-driven generalized perimeter control: Zürich case study ability to generate feasible control (traffic-light) actuation sequences and clo...
BenchPreS can be used as an evaluative tool for mechanism designers and regulators to measure and compare models' context‑sensitivity to guide incentives, penalties, or certification regimes.
Methodological claim about the benchmark's applicability: BenchPreS produces MR and AAR metrics that can be used for comparisons; paper suggests use in policy/design contexts.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Usability of BenchPreS metrics (MR, AAR) for model comparison and regulatory eva...
BenchPreS provides a benchmark and evaluation protocol that systematically varies stored user preference, interaction partner (self vs third party), and normative requirement to assess appropriate suppression or application of preferences.
Dataset construction and evaluation procedure described: scenario generation varying preference, partner, and normative appropriateness; MR and AAR computed across the scenario set.
high positive BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Benchmark coverage and experimental protocol (design dimensions: preference, par...
Historical transitions in standard work hours (e.g., six-day to five-day week) show that phased implementation, collective bargaining, and complementary policies can make work-time reductions feasible and economically beneficial.
Historical analyses and case studies of past industrialized-country workweek transitions cited in the synthesis; evidence drawn from historical institutional records and prior economic histories rather than a unified econometric analysis.
high positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... feasibility and economic outcomes of phased work-time reductions (employment, pr...
Economists and researchers should measure organizational mediators (governance, mentoring practices, learning processes) alongside AI adoption and use empirical designs such as difference-in-differences with phased rollouts, randomized mentoring/training interventions, matched employer–employee panels, and IV exploiting exogenous shocks to innovation backing to identify causal effects.
Methodological recommendations and proposed empirical designs contained in the paper; no implementation or empirical results reported.
high positive Revolutionizing Human Resource Development: A Theoretical Fr... feasibility and validity of empirical identification strategies for causal effec...
The integrated framework links multi-level outcomes: micro (individual skills, task performance), meso (team coordination, workflows), and macro (organizational strategy, innovation, productivity) effects to adaptive structuration processes and affordance actualization.
Framework specification and theoretical mapping across levels in the conceptual paper; no empirical validation or sample.
high positive Revolutionizing Human Resource Development: A Theoretical Fr... individual skills and performance; team coordination and workflow quality; organ...
The paper develops a conceptual framework that integrates Adaptive Structuration Theory (AST) and Affordance Actualization Theory (AAT) to explain how effective human–AI collaboration can be structured within organizations.
Conceptual/theoretical synthesis and literature integration combining AST and AAT streams; no original empirical data or sample reported (theoretical development).
high positive Revolutionizing Human Resource Development: A Theoretical Fr... explanatory power / conceptual framework for human–AI collaboration
The paper advances augmentation debates by articulating the leader’s practical role when decision lead‑agency shifts between humans and AI and by detailing systemic HR changes needed to sustain performance, legitimacy and well‑being.
Stated contribution of the conceptual synthesis comparing existing augmentation and leadership literatures and providing an HR‑focused framework; descriptive of the paper's intellectual contribution.
high positive Symbiarchic leadership: leading integrated human and AI cybe... clarity of leader role; specification of HR system changes
Core practice 4 — Embed governance: make accountability, bias testing, privacy safeguards, audit trails, escalation thresholds and human oversight explicit and routine.
Prescriptive governance practice grounded in literature on algorithmic accountability and risk management and in practitioner examples; presented without original empirical validation.
high positive Symbiarchic leadership: leading integrated human and AI cybe... bias incidence; privacy breaches; auditability and compliance metrics
Core practice 3 — Manage the human–AI relationship: build adoption, psychological safety and calibrated trust; address automation anxiety and misuse.
Framework recommendation synthesizing organizational‑psychology and technology adoption literature plus practitioner observations; not tested empirically in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... adoption rates; psychological safety; calibrated trust; misuse incidents
Core practice 2 — Treat AI outputs as hypotheses: require human sensemaking and validation rather than blind adoption of model outputs.
Prescriptive practice derived from reviewed research and practitioner cases emphasizing human oversight; presented as framework guidance rather than empirically validated intervention.
high positive Symbiarchic leadership: leading integrated human and AI cybe... decision quality; error rates; incidence of blind automation
Core practice 1 — Allocate work by comparative advantage: assign tasks to humans or AI based on relative strengths (e.g., speed, pattern detection, contextual judgement).
Conceptual component of the framework drawn from synthesis of empirical findings in prior human–AI and task allocation literature and practitioner examples; no new empirical testing in the paper.
high positive Symbiarchic leadership: leading integrated human and AI cybe... task assignment efficiency; productivity from task allocation
AI methods have improved molecular property prediction, protein structure modelling, ADME/Tox prediction, NLP-based extraction from literature, virtual screening, and generative chemistry, accelerating early-stage tasks.
Compilation of benchmarking results, method-comparison studies, and applied case studies cited in the paper across these specific application areas.
high positive Has AI Reshaped Drug Discovery, or Is There Still a Long Way... accuracy/quality of property and structure predictions, throughput/speed of virt...
AI has materially improved efficiency, decision-making, and early-stage productivity in drug discovery, especially in hit discovery, property prediction, and protein modelling.
Synthesis of published benchmarking studies and industry case studies reported in the paper (e.g., improvements in virtual screening throughput, property-prediction benchmarks, and protein-structure prediction results such as those from folding competitions and tool evaluations).
high positive Has AI Reshaped Drug Discovery, or Is There Still a Long Way... efficiency and productivity in early-stage drug discovery (hit discovery rate, t...
Using distributed systems as a principled foundation is a useful approach for creating and evaluating LLM teams.
Primary methodological proposal of the paper; supported by conceptual argument and (per the paper) mappings between distributed-systems concepts and LLM team design (specific experimental validation not detailed in the excerpt).
high positive Language Model Teams as Distributed Systems suitability of distributed-systems framework for designing/evaluating LLM teams
Large language models (LLMs) are growing increasingly capable.
Statement in the paper's introduction/abstract summarizing the field; based on observed progress in LLM development cited by the authors (no experimental sample size provided in the excerpt).
high positive Language Model Teams as Distributed Systems capability of LLMs (general competence/capacity)
O artigo discute implicações gerenciais e de políticas públicas para reduzir fricção, acelerar adoção responsável e orientar investimentos em produtividade e inclusão.
Seção de discussão mencionada no resumo abordando encargos gerenciais e políticas públicas; não há avaliação empírica de políticas no resumo.
high positive A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... recomendações e orientações para ação gerencial e políticas públicas visando red...
O artigo entrega instrumentos replicáveis — a escala SCF-30, um checklist de governança mínima de IA e uma matriz 30-60-90 dias — para uso prático.
Afirmação explícita no resumo de que instrumentos replicáveis são disponibilizados; presunção de inclusão dos instrumentos no corpo do artigo.
high positive A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... disponibilidade de instrumentos operacionais (escala, checklist, matriz 30-60-90...
The authors curated a set of guidelines called the Incentive-Tuning Framework to aid researchers in designing effective incentive schemes for human–AI decision-making studies.
Authors' contribution described in the paper: development of a framework (framework content and evaluation details not provided in excerpt).
high positive Incentive-Tuning: Understanding and Designing Incentives for... guidance for incentive design (qualitative artifact intended to influence study ...
The intelligent scheduling model incorporates legal, contractual, skill-based, and preference-aware constraints to generate equitable and efficient rosters.
Methodological description of constraints encoded in the optimization model for scheduling; experimental validation of resulting rosters reported (conflict reduction and fairness metrics), but specific constraint formulations and datasets are not detailed in the excerpt.
high positive Enhancing hospital workforce planning, scheduling, and perfo... compliance with constraints and roster equity/efficiency
The performance evaluation framework combines structured metrics (task completion, attendance, punctuality) with unstructured feedback (patient surveys, peer reviews) analyzed using natural language processing.
Methodological description in the paper of the performance evaluation module and use of NLP for unstructured feedback analysis; implementation details and dataset sizes not specified in the excerpt.
high positive Enhancing hospital workforce planning, scheduling, and perfo... staff performance measurement (task completion, attendance, punctuality) and sen...
The proposed AI-driven HRM framework integrates forecasting, optimization, and performance evaluation to enhance workforce planning, staff scheduling, and continuous assessment.
Methodological contribution described in the paper: framework design with three core modules (demand forecasting, intelligent scheduling, performance evaluation); validated via experiments on synthetic and real hospital datasets (dataset sizes not specified in the text).
high positive Enhancing hospital workforce planning, scheduling, and perfo... overall workforce planning, scheduling efficiency, and assessment capability (ar...
Persistent environmental state induces history sensitivity (dependence of long-run behavior on past trajectories and initial conditions) unless the overall system is globally contracting.
Formal theorem and proof showing that persistence of environmental variables creates non-autonomous/memory-dependent closed-loop behavior, and that only the special case of global contraction removes this history dependence (mathematical analysis of sensitivity to initial conditions).
high positive How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... history sensitivity of trajectories (dependence on initial conditions/past) vs. ...
Under dissipativity assumptions the induced closed-loop system admits a bounded forward-invariant region, guaranteeing viability of the dynamics without requiring global optimality.
A proven structural result (theorem) in the paper: mathematical proof using dissipativity hypotheses on components of the feedback architecture showing existence of a bounded forward-invariant set for the closed-loop dynamics. (The claim is theoretical; no empirical sample size.)
high positive How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... existence of a bounded forward-invariant region (set invariance/boundedness of t...
Regional peer effects of DT improve firms' resource allocation (RA), which in turn bolsters enterprise resilience (ER).
Mediation/ mechanism analysis on the 2013–2022 Chinese A-share manufacturing panel showing that RA mediates the relationship between regional peer DT and ER.
high positive Peer Effects of Digital Transformation and Enterprise Resili... enterprise resilience (ER) (mediator: resource allocation, RA)
Industrial peer effects of DT enhance firms' innovation capability (IC), which in turn strengthens enterprise resilience (ER).
Mediation/ mechanism analysis on the same 2013–2022 Chinese A-share manufacturing panel showing that IC mediates the relationship between industrial peer DT and ER.
high positive Peer Effects of Digital Transformation and Enterprise Resili... enterprise resilience (ER) (mediator: innovation capability, IC)
Digital transformation (DT) exhibits significant industrial and regional peer effects.
Empirical analysis using panel data of Chinese manufacturing enterprises listed on the Shanghai and Shenzhen A-share markets from 2013 to 2022; peer-effect regressions conducted within interlocking directorate networks (IDNs).
high positive Peer Effects of Digital Transformation and Enterprise Resili... enterprise resilience (ER)
AI significantly enhances supplier stability in sports enterprises (SE).
Empirical estimation using a dual machine learning (DML) model on panel data of 45 Chinese listed sports enterprises (2012–2023); authors report a statistically significant positive effect of AI on supplier stability.
high positive Can Artificial Intelligence Enhance the Stability of Supply ... supplier stability (component of supply chain stability)