Evidence (4175 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
After roughly a decade of adoption in large biopharma, AI has not yet changed late-stage (Phase II/III) clinical success rates.
Qualitative assessment of industrywide experience and reported outcomes; statement based on narrative review rather than systematic, long-run quantitative analysis or causal estimates.
Workers prefer systems that are straightforward, tolerant, and practical.
Survey responses from workers collected in the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.
Developers report emphasizing politeness, strictness, and imagination in system design.
Survey responses from developers collected as part of the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.
Prior work has mapped which workplace tasks are exposed to AI, but less is known about whether workers perceive these tasks as meaningful or as busywork.
Statement referencing prior literature (background motivation) in the paper; no new data provided for this claim within the excerpt.
The information wedge vanishes precisely when signals are exogenous to controls, thereby delineating when strategic belief manipulation matters.
Analytical condition in the paper: shows V^i_t = 0 if and only if the signal-generating process does not depend on agents' controls; uses this equivalence to identify boundary between endogenous and exogenous-signal regimes.
There is a gap in the existing literature regarding empirical evidence about the relationship between AI/Big Data use and market uncertainty during economic downturns.
Paper motivates the study by citing this gap based on its literature review (the summary does not list the reviewed works or systematic review method).
The studied construction supply chain network exhibits moderate density, reported as 0.591.
Network-level metric (density = 0.591) reported in the results; derived from the constructed network based on coded interview interactions (network size and sampling details not provided in abstract).
Purposive and snowball sampling produced semi-structured interview data that span all major construction supply chain roles.
Sampling approach stated in the paper: purposive and snowball sampling for interviews; claim that interviews 'span all major supply chain roles' (number of interviews and role breakdown not reported in the abstract).
These efficiency and cost gains are achieved while maintaining accuracy parity with the matched hierarchical baseline.
Paper states accuracy parity was maintained in the empirical evaluation comparing the proposed framework to the matched hierarchical baseline on the 2,847-query testbed.
Logistics efficiency does not mediate (fails to fulfill) the anticipated role in transmitting AI's effects to supply chain stability.
Mechanism/mediation tests in the DML analysis on the 45 Chinese listed SEs (2012–2023) indicate no significant mediation via logistics efficiency.
Diverse decision-making AI from different developers will commonly compete for finite shared resources in everyday devices (examples: charging slots, relay bandwidth, traffic priority).
Motivating background statement in the paper (observational/argumentative; examples drawn from real-world deployment contexts rather than reported experiment data).
There is an arithmetic crossover point between these regimes: it occurs where opposing tribes that form spontaneously first fit inside the available capacity.
Mathematical analysis in the paper deriving a capacity-based threshold (crossover) marked by whether spontaneously formed opposing tribes can be accommodated by available capacity.
When resources are abundant, the same ingredients (model diversity, individual RL, tribe formation) drive system overload to near zero.
Empirical and mathematical results in the paper showing that abundance of resources reduces overload to near zero under the same agent-population conditions.
The study presents an advanced systematic ranking of I4.0 adoption barriers in the Thai automotive industry.
Paper outputs a ranked list of barriers produced by the integrated Fuzzy BWM-PROMETHEE II-DEMATEL framework; full ranked list and quantitative ranks not included in the supplied summary.
The study explores the influence of AI on HRM practice specifically within top IT companies.
Scope statement in the paper: empirical study involved HR professionals from various (described as top) IT firms. The summary does not supply the list of companies or sampling criteria.
The paper contributes to both theory and policy by reconceptualizing procurement value and offering an actionable roadmap for embedding ESG principles in public healthcare procurement.
Scholarly contribution claimed via literature synthesis and framework/roadmap creation; contribution is normative and conceptual rather than empirically validated.
In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone.
Empirical finding reported from the preregistered sentiment-analysis experiment showing no complementarity effect (joint human-AI performance ≤ best individual performance). (Statistical tests and sample size not included in the excerpt.)
Self-generated (model-authored) Skills provide no average benefit.
Comparison of three evaluation conditions (no Skills, curated Skills, self-authored Skills) across SkillsBench. Averaged pass-rate deltas show that model-authored Skills do not increase average pass rate relative to baseline; analysis used 7,308 trajectories over 86 tasks and 7 agent–model configurations.
Occupation-level analyses (e.g., BLS OEWS cross-occupation wage regressions) risk misleading conclusions about AI’s distributional effects because they aggregate over the task- and firm-level heterogeneity that drives the mechanism.
Theoretical argument and empirical illustration in the paper showing how aggregation masks within-task compression and firm-level rent capture; example regressions on OEWS used to demonstrate the limitation.
Testing the model requires within-occupation, within-task panel data on task-level performance and wages linked to firm-level AI adoption, ownership of complementary assets, and measures of rent-sharing; such data are not available at scale.
Author statement about data requirements and current data limitations; empirical illustration and discussion note absence of large-scale linked microdata meeting these criteria.
Occupation-level regressions using BLS OEWS (2019–2023) are insufficient for testing the model’s task-level predictions because aggregation across tasks and firms hides the mechanism.
Empirical illustration in the paper using occupation-level regressions on BLS OEWS 2019–2023 showing that such aggregates do not reveal within-occupation, within-task dispersion or firm-level rent concentration effects; paper argues this is a data-adequacy limitation.
A sensitivity decomposition shows five of the moments (the non‑ΔGini moments) identify internal mechanism rates (how AI changes task production, education responses, screening intensity) but do not determine the aggregate sign of inequality change.
Local identification / sensitivity decomposition performed on the calibrated model; decomposition results reported in the paper attribute mechanism-rate identification to five moments and show they leave the sign of ΔGini indeterminate.
There is no accepted integrative digital model that maps measured or perceived value to algorithmic pricing.
Absence of such a model in the SLR sample of 30 articles and thematic coding that identified this gap explicitly.
Realising DT value requires upfront investment in sensors, integration, standards, and skills; economic viability depends on contract structures and how gains are allocated between investors, owners, contractors, and operators.
Synthesis of cost/benefit discussions and case descriptions in the reviewed literature; policy and procurement examples referenced.
Under truthful bidding, the decentralised price-based market matches a centralised value-optimal benchmark (i.e., decentralised allocation equals centralised value-optimal allocation).
Paper presents both a theoretical argument (mechanism properties under quasilinear utilities and discrete slices) and empirical validation in simulation by comparing decentralised outcomes to a centralised value-optimal baseline across configurations in the ablation study.
On-Premise RAG matches commercial (cloud) RAG on standard quantitative retrieval and generation metrics.
Empirical comparative analysis using standard retrieval/generation benchmarks comparing three systems (zero-shot baseline, GPT RAG cloud, Open-source On-Prem RAG) under representative SME workloads; specific metric names and sample sizes not reported in the summary.
State-level advances in worker-protective AI measures exist but are uneven and many proposed state bills aimed at strengthening workers’ rights related to AI have stalled.
Review of state legislative proposals and enacted laws as compiled in the commentary (state-level policy scan); no systematic quantitative legislative count or sample reported.
Research priorities include causal studies on productivity gains from AI, firm‑level adoption dynamics, sectoral labor reallocation, long‑run general equilibrium effects, and heterogeneous impacts across regions and demographic groups.
Set of empirical research recommendations drawn from gaps identified in the literature review and limitations section; not an empirical claim but a prioritized research agenda based on secondary evidence.
Growth‑accounting frameworks and measurement approaches must be updated to capture AI/robotics as intangible and embodied capital, including quality improvements and spillovers.
Methodological argument grounded in literature on measurement challenges and examples of intangible capital; no new measurement exercise or empirical re‑estimation is provided in the paper.
Some functional domains show varying maturity: for example, procurement has more applied work compared with other functions.
Reviewer observation from the systematic search and screening across 2020–2025 literature noting uneven distribution of empirical/ applied studies across functions.
A centralized policy engine for access control, data handling rules, and change management is a necessary control point in the reference pattern.
Prescriptive recommendation in the paper supported by best-practice synthesis and case anecdotes; no direct empirical comparison of centralized vs federated policy engines provided.
Recommendation for research and modeling: economic models of AI markets should incorporate institutional regime types (centralized vs decentralized), enforcement uncertainty, and legitimacy effects as parameters affecting data access costs, R&D productivity, and market concentration.
Normative recommendation based on the comparative typology and inferred mechanisms from the document analysis; not empirically validated within the study.
Theoretical contribution: the paper extends modular coordination theory by treating openness–security trade‑offs as layered, adaptive institutional processes embedded in political regimes and 'legitimacy economies.'
Argumentative/theoretical development in the paper grounded in document analysis and literature on coordination and legitimacy.
Archi enables fully private management of sensitive data by using locally-hosted, open-weight models.
Paper statement tying local hosting of open-weight models to the ability to manage sensitive data privately; no technical privacy audit or measurements reported in the quoted text.
Locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.
Paper's comparative claim about model performance based on the same evaluation (human and automated grading of production question set); asserts that locally-hosted open-weight models are competitive and support private data management.
The system proves effective at operational tasks, resolving real-world queries posed by CMS operators.
Results reported from the evaluation using operator feedback and the production question set graded by human and automated panels (no numerical success rates provided in the text quoted).
Embodied AI shapes collaboration in complex ways, and social cues critically guide teamwork dynamics.
Synthesis and interpretation of experimental findings (performance variability, completion rates, time, errors, conversational analyses) presented in the paper; this is a theoretical/concluding claim derived from reported results rather than a single empirical estimate.
The framework closes scheduling inefficiencies of up to 28%.
Paper claims the constructs close documented gaps including scheduling inefficiencies of up to 28%; the abstract does not specify the empirical study, dataset, or sample size supporting this percentage.
Prior work (SimpleTOD, FireAct, SynTOD, WorkflowLLM, Agent Lumos) has shown the technique [compiling procedures into model weights / subterranean agents] works.
Citation/listing of six prior systems (SimpleTOD, FireAct, SynTOD, WorkflowLLM, Agent Lumos) asserted to demonstrate the approach; empirical/experimental results in those prior works are invoked as support.
Recent work has shown this [orchestration] architecture is dominated for procedural tasks by simply providing the procedure in a frontier model's system prompt [Dennis et al., 2026a].
Citation to Dennis et al., 2026a; claim refers to experimental results in that prior work comparing orchestration vs. providing procedures in frontier model system prompts on procedural tasks.
Firms that successfully combine AI with learning and knowledge coordination can reduce inefficiencies, accelerate innovation cycles and improve overall performance.
Authors' conclusion and managerial implication derived from observed associations in the survey (AIDLC → KO → OI → IP).
AI can reduce knowledge gaps and help employees adapt to change; well-designed AI systems complement human creativity, improve judgment and reduce repetitive tasks rather than simply replacing workers.
Authors' discussion and normative claim drawing on study findings and literature; not presented as a directly tested causal result in the survey.
Only RL-based predictions yield product-repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains.
Comparison of recommended repositioning decisions derived from RL versus those derived from observed (actual) trajectories and from heuristic models; reported that RL recommendations match actual-derived recommendations and produce similar estimated profit gains. No numerical profit figures or sample sizes are provided in the excerpt.
RL-based trajectories provide more accurate estimates of impulse purchase rates and shelf traffic densities than TSP and PNN.
Model-based comparisons against real-world trajectory data showing that outputs from RL more closely match observed impulse purchase rates and shelf traffic densities; specific quantitative comparisons and sample sizes not provided in the excerpt.
Hierarchical decomposition without deliberation achieves the best absolute performance for most models.
Observed performance rankings across the evaluated configurations and models (six models across five model families) in the CybORG CAGE-2 evaluation (3,475 episodes), comparing monolithic ReAct vs. delegation to specialized sub-agents with and without deliberation tools.
Effective AI implementation, coupled with employee training and transparent communication, can reduce resistance and anxiety among employees.
Interpretation and conclusion drawn from the observed negative relationship between perceived opportunities and challenges and the pattern of survey responses; presented as a recommended approach in the study.
The shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures.
Category-level breakdowns within the Product Hunt dataset showing larger increases in solo-founder launches in categories with a historical bias toward team-based ventures.
Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine.
Contextual claim motivating the work; presented as an empirical generalization about production agent pipelines, but not quantified in the abstract.
Small and mid-sized open-weight models are already sufficient for much of the short-horizon, structured tool use work that dominates real agent pipelines.
Aggregate benchmark results across AgentFloor tiers showing high performance of smaller and mid-sized open-weight models on short-horizon structured tasks; supported by the 16,542 scored runs and model comparisons reported in the paper.
Embedding governance into agent reasoning produces more consistent, explainable, and auditable compliance than external enforcement.
Comparative claim asserted in the paper, apparently supported by the reported production deployment results (95% compliance, zero false escalations); explicit experimental comparison details are not provided in the abstract.