Evidence (4189 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Coherent decision-making requires Bayesian principles at the orchestration level of the agentic system, not necessarily the LLM agent parameters
Central prescriptive claim of the position paper; supported by conceptual argumentation and illustrative examples rather than empirical tests.
Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions
Argumentative/theoretical claim in the position paper; illustrated with conceptual examples and design patterns rather than empirical evaluation.
Many high-value deployments rely on decisions under uncertainty (for example, which tool to call, which expert to consult, or how many resources to invest)
Stated as a motivating observation in the paper; no quantitative data or sample provided.
LLMs excel at predictive tasks and complex reasoning tasks
Asserted in the paper's opening motivation; no empirical evaluation or sample reported in the paper itself.
We release the benchmark, harness, sweep configurations, and full run corpus.
Statement of artifact release in the paper; verifiable by checking the project's repository or supplementary materials.
These findings suggest a practical design principle for agentic systems: use smaller open-weight models for the broad base of routine actions, and reserve large frontier models for the narrower class of tasks that truly demand deeper planning and control.
Synthesis/recommendation drawn from the empirical results on AgentFloor showing where small/mid models suffice and where frontier models have advantage; prescriptive claim rather than a direct empirical measurement.
The gap appears most clearly on long-horizon planning tasks that require sustained coordination and reliable constraint tracking over many steps, where frontier models still hold an advantage, though neither side reaches strong reliability.
Performance breakdown by capability tier on AgentFloor showing frontier (GPT-5) advantage on long-horizon planning/constraint-tracking tasks; both model groups have low absolute reliability on these tasks according to reported results.
We evaluate 16 open-weight models, from 0.27B to 32B parameters, alongside GPT-5 across 16,542 scored runs.
Empirical evaluation reported in the paper: 16 open-weight models spanning specified parameter sizes, inclusion of GPT-5, and a total of 16,542 scored runs (reported counts).
We introduce AgentFloor, a deterministic 30-task benchmark organized as a six-tier capability ladder, spanning instruction following, tool use, multi-step coordination, and long-horizon planning under persistent constraints.
Paper describes the design of the benchmark: deterministic, 30 tasks, organized into six tiers covering specified capabilities. This is a descriptive claim about the artifact introduced in the work.
The practical aim is to help strategic leaders and system designers recognize the configuration at work, notice when it shifts, and judge whether it fits the decision before them.
Stated aim/objective of the paper (normative guidance; conceptual).
The framework introduces 'co-adaptability'—the capacity of a configuration to improve as human and non-human participants adjust together—and situates it within 'heterogeneous teaming' where participants may vary by number, substrate, model architecture, capability, speed, memory, and form of participation.
Conceptual/theoretical introduction of new constructs (co-adaptability and heterogeneous teaming) in the paper; definitional rather than empirical.
The five positions serve as landmarks that help leaders recognize configurations as they layer, drift, or change in a single decision.
Normative/conceptual claim supported by the framework; no empirical validation or sample provided in the excerpt.
The spectrum focuses attention on where leadership work occurs: who frames the problem, who redirects the work, and who can answer for what follows.
Conceptual argument in the paper describing the axes/criteria of the spectrum (theoretical/thematic analysis; no empirical data reported).
This paper offers a leadership-facing spectrum to see human–AI decision relationships with five positions: Pure Human, Centaur (human-dominant, with AI in the loop), Co-equal, Minotaur (AI-dominant, with humans in the loop), and Pure AI.
Conceptual presentation in the paper: a theorized five-position spectrum (no empirical sample or experiment reported).
Organizations should cultivate a culture of critical engagement with AI outputs, and e-leadership development must focus on building competencies in mediating, filtering and legitimizing AI contributions within digital workflows.
Recommendations based on thematic analysis of interview data across 34 project managers; presented as implications rather than empirically tested interventions.
To achieve balanced augmentation, leaders must proactively frame AI's role, embedding validation checkpoints and human authorship clauses to maintain accountability.
Prescriptive recommendation derived from thematic findings and cross-case patterns in the 34 interviews; no experimental or longitudinal testing reported.
Proactive engagement combined with creation-oriented use generated the highest effectiveness.
Qualitative coding and cross-case comparisons in the thematic analysis of 34 interviews identified combinations of proactive e-leadership and creation-oriented AI use associated with reported high team effectiveness.
The trajectory of the curvilinear relationship is governed by e-leadership practices.
Interview data analyzed thematically showing recurring references to leadership practices as moderators of AI-use effectiveness across the 34 interviews.
The proposed framework emerged from operational work to improve clinician capability in a live value-based care deployment.
Stated as originating from operational experience in a live deployment; no details on deployment scale, sample size, or outcomes provided in the excerpt.
Training environments that combine longitudinal outcome measurement with aligned financial incentives are a necessary condition for learning a reward model aligned with patient trajectory rather than with encounter economics.
Normative/theoretical argument presented in the paper; no empirical tests or sample sizes reported in the excerpt.
Chronic disease management under outcome-based payment contracts produces override data with uniquely favorable properties for learning: longitudinal density, concentrated decision space, outcome labels, and natural capability variation.
Argument/claim in the paper that outcome-based contracts and chronic disease management produce favorable data characteristics; asserted as part of the framework motivation. No quantitative empirical evidence or sample sizes provided in the excerpt.
We propose a dual learning architecture that jointly trains a reward model and a capability model via alternating optimization, which prevents a failure mode we term 'suppression bias'—the systematic suppression of correct-but-difficult recommendations when clinician capability falls below the execution threshold.
Proposed algorithmic contribution and theoretical claim; suppression bias defined and a mitigation approach described. No empirical evaluation or sample sizes given in the excerpt.
We formulate preferences conditioned on patient state s, organizational context c, and clinician capability κ, where κ decomposes into execution capability (κ-exec) and alignment capability (κ-align).
Presented as a formal model formulation in the paper; theoretical description without empirical sample sizes in the excerpt.
We introduce a five-category override taxonomy that maps override types to distinct model update targets.
Stated as a formal contribution of the framework; taxonomy proposed in the paper. No empirical validation or sample size reported in the excerpt.
Clinician overrides of clinical AI recommendations can be reframed as implicit preference data analogous to reinforcement learning from human feedback (RLHF), but richer because the annotator is a domain expert, the alternatives carry real consequences, and downstream outcomes are observable.
Conceptual argument presented in the paper drawing an analogy to RLHF; no empirical metrics or sample size reported in the excerpt.
This work offers a principled foundation for autonomous AI agents that govern themselves the way humans do: not because rules are imposed upon them, but because deliberation is embedded in how they think.
Concluding claim summarizing the proposed framework's conceptual contribution (theoretical/architectural claim; not an empirical measurement).
Implemented on a production-grade retail supply chain workflow, the framework produces zero false escalations to human oversight.
Empirical implementation on a production-grade retail supply chain workflow reported in the paper (claim stated without sample size or measurement protocol in the abstract).
Implemented on a production-grade retail supply chain workflow, the framework achieves 95% compliance accuracy.
Empirical implementation on a production-grade retail supply chain workflow reported in the paper (no sample size or evaluation details provided in the abstract).
We formalize a Pre-Action Governance Reasoning Loop (PAGRL) in which agents consult a four-layer governance rule set: global, workflow-specific, agent-specific, and situational before every consequential action.
Methodological contribution described in the paper (formalization of a governance loop and four-layer rule hierarchy; no numerical sample given in the abstract).
We propose a neurocognitive governance framework that formally maps this human self-governance process to LLM-driven agent reasoning, establishing a structural parallel between the human brain and the large language model as the cognitive core of an agent.
Theoretical framework and formal mapping presented in the paper (design/proposal rather than empirical validation).
Before acting, humans engage deliberate cognitive processes grounded in executive function, inhibitory control, and internalized organizational rules to evaluate whether an intended action is permissible, requires modification, or demands escalation.
The paper's framing draws on cognitive/neurocognitive literature about human self-governance (presented as background/theoretical justification; no new empirical human-subject data reported in the abstract).
AI-mediated expert networks are an emerging phenomenon that existing coordination theories fail to account for.
Mentioned as an example in the abstract to motivate theoretical gap; no empirical data or sample provided.
GitHub Copilot exhibits 'recursive value creation' as an example of an emerging organizational phenomenon enabled by GenAI.
Illustrative example named in the abstract; no empirical measurement or sample reported within the abstract.
UCF provides a theoretical foundation for understanding organizational coordination when GenAI transforms cognitive constraints from scarce to abundant resources.
Position paper asserts UCF as foundational theory for coordination under transformed cognitive constraints; conceptual argument only.
Three emergent organizational forms illustrate UCF principles: cognitive meshworks (coordinated through competence synthesis), algorithmic ecosystems (achieving emergent optimization), and hybrid intelligence collectives (operating through cognitive complementarity).
Conceptual typology and illustrative examples in the position paper; no reported empirical measurement or sample.
We introduce unbounded cognitive fusion (UCF) as a new theoretical framework explaining coordination through cognitive synthesis rather than price signals or authority structures.
Theoretical proposal and framing within the paper; conceptual development rather than empirical validation.
Generative artificial intelligence (GenAI) fundamentally alters [traditional organizational coordination] assumptions by augmenting human cognitive capabilities across organizational boundaries.
Position paper argumentation and conceptual reasoning presented in the abstract; no empirical data or sample reported.
New tendencies in managerial AI research and practice include explainable AI, human–AI collaboration, knowledge management, enterprise analytics, and algorithmic management.
Descriptive finding from the paper's literature synthesis (topics emphasized in the review); no quantitative prevalence or counts provided in the abstract.
Machine Learning and Deep Learning enhance employee productivity, business intelligence, process mining, and data-driven decision-making by enabling prediction, perception, and adaptive learning solutions.
Claim synthesized in the review from multiple studies identified via PRISMA screening; abstract does not list the number or identity of underlying empirical studies.
AI-based technologies can greatly enhance managerial efficiency by automating repetitive activities, improving resource allocation, enabling intelligent scheduling, and supporting predictive modelling and strategic planning.
Summary conclusion from the paper's literature review (PRISMA methodology referenced); no quantitative meta-analytic effect sizes provided in abstract.
Machine Learning, Artificial Intelligence, and Deep Learning are tools that can optimize managerial decisions, enable intelligent automation, streamline workflows, and improve organizational performance.
Synthesis claim from the paper's PRISMA-based literature review (no numeric sample size reported in the abstract).
Substantive technological competencies play an important role in shaping network resilience and complement structure-based perspectives in understanding innovation networks.
Synthesis of empirical findings from composite metric identification and disruption simulations on the 282,778-patent-derived networks showing capability-based removals have stronger impacts than structure-only removals.
A composite technological capability metric can be constructed (from textual and network information) to identify core innovators beyond simple topological measures.
Construction and application of a composite metric combining text-derived technological value and network features on 282,778 patents; used to identify core innovators.
Latent Dirichlet Allocation (LDA) on the patent texts delineates fine-grained technological domains within the Chinese AI patent corpus.
Text-mining method applied to a corpus of 282,778 Chinese AI patents using LDA to extract topic/domains.
This study develops a multidimensional, knowledge-driven evaluation framework that integrates text mining with complex network analysis to identify core innovators.
Methodological description: framework built using Latent Dirichlet Allocation (LDA) on 282,778 Chinese AI patents, construction of a composite technological capability metric, and simulation of targeted disruptions across collaboration and knowledge networks.
Managing evolutionary dynamics in software is as urgent as AGI alignment for safeguarding society’s co-evolution with its machines.
Author's concluding normative claim in the abstract; argument based on scenario analysis rather than comparative empirical evidence.
Governance should shift focus from aligning goals to steering evolution; the paper proposes four guidance instruments: replication-rate thresholds (modeled on epidemiological R0), a public vulnerability registry for self-modifying code, tiered digital biosafety levels, and adaptive regulatory sandboxes.
Normative policy recommendation spelled out in the abstract; based on the paper's scenario analysis and argumentation rather than empirical validation.
Cloud platforms, open-source software supply chains, and crypto-economic incentives provide, at electronic speed, the three preconditions of evolution: replication, variation, and differential fitness.
Conceptual/mechanistic claim supported by theoretical argumentation and scenario-building in the paper (no empirical test or sample reported).
The proposed framework balances AI-driven productivity with the epistemic sovereignty necessary to manage increasingly opaque software ecosystems.
Normative/architectural claim about the proposed framework; presented conceptually in the paper without reported empirical testing in the excerpt.
To preserve long-term resilience, engineering leaders must move beyond prompt-based development to implement rigorous human-in-the-loop pedagogical standards.
Prescriptive recommendation based on the paper's conceptual analysis; no randomized trials or empirical validation of this intervention reported in the excerpt.