Evidence (4175 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

After roughly a decade of adoption in large biopharma, AI has not yet changed late-stage (Phase II/III) clinical success rates.

Qualitative assessment of industrywide experience and reported outcomes; statement based on narrative review rather than systematic, long-run quantitative analysis or causal estimates.

medium null result Learning from the successes and failures of early artificial... Phase II/III clinical success rates (late-stage trial success probability)

Workers prefer systems that are straightforward, tolerant, and practical.

Survey responses from workers collected in the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... traits workers indicate preferring in AI systems (straightforwardness, tolerance...

Developers report emphasizing politeness, strictness, and imagination in system design.

Survey responses from developers collected as part of the study on the representative sample of tasks (171) and possibly summarized/scaled via LMs.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... traits developers report prioritizing when designing AI systems (politeness, str...

Prior work has mapped which workplace tasks are exposed to AI, but less is known about whether workers perceive these tasks as meaningful or as busywork.

Statement referencing prior literature (background motivation) in the paper; no new data provided for this claim within the excerpt.

medium null result Are We Automating the Joy Out of Work? Designing AI to Augme... extent of existing research coverage on AI exposure vs. worker perceptions of me...

The information wedge vanishes precisely when signals are exogenous to controls, thereby delineating when strategic belief manipulation matters.

Analytical condition in the paper: shows V^i_t = 0 if and only if the signal-generating process does not depend on agents' controls; uses this equivalence to identify boundary between endogenous and exogenous-signal regimes.

medium null result Forecasting and Manipulating the Forecasts of Others value of the information wedge (V^i_t) and its relation to exogeneity of signals

There is a gap in the existing literature regarding empirical evidence about the relationship between AI/Big Data use and market uncertainty during economic downturns.

Paper motivates the study by citing this gap based on its literature review (the summary does not list the reviewed works or systematic review method).

medium null result An Empirical Study on the Impact of the Integration of AI an... Existence of an empirical evidence gap in the literature

The studied construction supply chain network exhibits moderate density, reported as 0.591.

Network-level metric (density = 0.591) reported in the results; derived from the constructed network based on coded interview interactions (network size and sampling details not provided in abstract).

medium null result Social-Network Analytics of Construction Supply Chain network density (0.591)

Purposive and snowball sampling produced semi-structured interview data that span all major construction supply chain roles.

Sampling approach stated in the paper: purposive and snowball sampling for interviews; claim that interviews 'span all major supply chain roles' (number of interviews and role breakdown not reported in the abstract).

medium null result Social-Network Analytics of Construction Supply Chain representation of supply chain roles in interview sample

These efficiency and cost gains are achieved while maintaining accuracy parity with the matched hierarchical baseline.

Paper states accuracy parity was maintained in the empirical evaluation comparing the proposed framework to the matched hierarchical baseline on the 2,847-query testbed.

medium null result One Supervisor, Many Modalities: Adaptive Tool Orchestration... answer accuracy (no significant difference reported vs baseline)

Logistics efficiency does not mediate (fails to fulfill) the anticipated role in transmitting AI's effects to supply chain stability.

Mechanism/mediation tests in the DML analysis on the 45 Chinese listed SEs (2012–2023) indicate no significant mediation via logistics efficiency.

medium null result Can Artificial Intelligence Enhance the Stability of Supply ... logistics efficiency as a mediator of AI's effect on supply chain stability

Diverse decision-making AI from different developers will commonly compete for finite shared resources in everyday devices (examples: charging slots, relay bandwidth, traffic priority).

Motivating background statement in the paper (observational/argumentative; examples drawn from real-world deployment contexts rather than reported experiment data).

medium null result Increasing intelligence in AI agents can worsen collective o... incidence of competition for finite shared resources among heterogeneous deploye...

There is an arithmetic crossover point between these regimes: it occurs where opposing tribes that form spontaneously first fit inside the available capacity.

Mathematical analysis in the paper deriving a capacity-based threshold (crossover) marked by whether spontaneously formed opposing tribes can be accommodated by available capacity.

medium null result Increasing intelligence in AI agents can worsen collective o... crossover threshold / capacity-to-population ratio at which system behaviour reg...

When resources are abundant, the same ingredients (model diversity, individual RL, tribe formation) drive system overload to near zero.

Empirical and mathematical results in the paper showing that abundance of resources reduces overload to near zero under the same agent-population conditions.

medium null result Increasing intelligence in AI agents can worsen collective o... system overload (incidence/severity approaching zero under resource abundance)

The study presents an advanced systematic ranking of I4.0 adoption barriers in the Thai automotive industry.

Paper outputs a ranked list of barriers produced by the integrated Fuzzy BWM-PROMETHEE II-DEMATEL framework; full ranked list and quantitative ranks not included in the supplied summary.

medium null result Evaluating Critical Barriers to Industry 4.0 Adoption in the... systematic ranking/prioritization of I4.0 adoption barriers

The study explores the influence of AI on HRM practice specifically within top IT companies.

Scope statement in the paper: empirical study involved HR professionals from various (described as top) IT firms. The summary does not supply the list of companies or sampling criteria.

medium null result AI-Driven Decision Making and Digital Recruitment: Transform... influence of AI on HRM practices within selected IT companies

The paper contributes to both theory and policy by reconceptualizing procurement value and offering an actionable roadmap for embedding ESG principles in public healthcare procurement.

Scholarly contribution claimed via literature synthesis and framework/roadmap creation; contribution is normative and conceptual rather than empirically validated.

medium null result Greening the Medicaid Supply Chain: An ESG-Integrated Framew... academic and policy contributions (theoretical reconceptualization and practical...

In the sentiment-analysis task, those individual differences do not produce human–AI complementarity: the joint performance of humans and AI did not exceed that of either alone.

Empirical finding reported from the preregistered sentiment-analysis experiment showing no complementarity effect (joint human-AI performance ≤ best individual performance). (Statistical tests and sample size not included in the excerpt.)

medium null result Who Needs What Explanation? How User Traits Affect Explanati... human–AI joint performance compared to human-alone and AI-alone performance (e.g...

Self-generated (model-authored) Skills provide no average benefit.

Comparison of three evaluation conditions (no Skills, curated Skills, self-authored Skills) across SkillsBench. Averaged pass-rate deltas show that model-authored Skills do not increase average pass rate relative to baseline; analysis used 7,308 trajectories over 86 tasks and 7 agent–model configurations.

medium null result SkillsBench: Benchmarking How Well Agent Skills Work Across ... task pass rate (average delta for self-authored Skills vs. baseline)

Occupation-level analyses (e.g., BLS OEWS cross-occupation wage regressions) risk misleading conclusions about AI’s distributional effects because they aggregate over the task- and firm-level heterogeneity that drives the mechanism.

Theoretical argument and empirical illustration in the paper showing how aggregation masks within-task compression and firm-level rent capture; example regressions on OEWS used to demonstrate the limitation.

medium null result When AI Levels the Playing Field: Skill Homogenization, Asse... accuracy of occupation-level analyses in capturing task-level mechanism (qualita...

Testing the model requires within-occupation, within-task panel data on task-level performance and wages linked to firm-level AI adoption, ownership of complementary assets, and measures of rent-sharing; such data are not available at scale.

Author statement about data requirements and current data limitations; empirical illustration and discussion note absence of large-scale linked microdata meeting these criteria.

medium null result When AI Levels the Playing Field: Skill Homogenization, Asse... availability of suitable microdata for empirical testing (data coverage / scale)

Occupation-level regressions using BLS OEWS (2019–2023) are insufficient for testing the model’s task-level predictions because aggregation across tasks and firms hides the mechanism.

Empirical illustration in the paper using occupation-level regressions on BLS OEWS 2019–2023 showing that such aggregates do not reveal within-occupation, within-task dispersion or firm-level rent concentration effects; paper argues this is a data-adequacy limitation.

medium null result When AI Levels the Playing Field: Skill Homogenization, Asse... ability of occupation-level regressions to detect task-level mechanism (qualitat...

A sensitivity decomposition shows five of the moments (the non‑ΔGini moments) identify internal mechanism rates (how AI changes task production, education responses, screening intensity) but do not determine the aggregate sign of inequality change.

Local identification / sensitivity decomposition performed on the calibrated model; decomposition results reported in the paper attribute mechanism-rate identification to five moments and show they leave the sign of ΔGini indeterminate.

medium null result When AI Levels the Playing Field: Skill Homogenization, Asse... identification of mechanism parameters versus determination of aggregate ΔGini s...

There is no accepted integrative digital model that maps measured or perceived value to algorithmic pricing.

Absence of such a model in the SLR sample of 30 articles and thematic coding that identified this gap explicitly.

medium null result Pricing Strategy in Digital Marketing: A Systematic Review o... Existence of integrative digital VBP model (mapping perceived value to algorithm...

Realising DT value requires upfront investment in sensors, integration, standards, and skills; economic viability depends on contract structures and how gains are allocated between investors, owners, contractors, and operators.

Synthesis of cost/benefit discussions and case descriptions in the reviewed literature; policy and procurement examples referenced.

medium null result Digital Twins Across the Asset Lifecycle: Technical, Organis... investment requirements and determinants of economic viability

Under truthful bidding, the decentralised price-based market matches a centralised value-optimal benchmark (i.e., decentralised allocation equals centralised value-optimal allocation).

Paper presents both a theoretical argument (mechanism properties under quasilinear utilities and discrete slices) and empirical validation in simulation by comparing decentralised outcomes to a centralised value-optimal baseline across configurations in the ablation study.

medium null result Real-Time AI Service Economy: A Framework for Agentic Comput... allocation value (total value/throughput) relative to a centralised value-optima...

On-Premise RAG matches commercial (cloud) RAG on standard quantitative retrieval and generation metrics.

Empirical comparative analysis using standard retrieval/generation benchmarks comparing three systems (zero-shot baseline, GPT RAG cloud, Open-source On-Prem RAG) under representative SME workloads; specific metric names and sample sizes not reported in the summary.

medium null result An Empirical Study on the Feasibility Analysis of On-Premise... standard retrieval and generation metrics (quantitative performance of retrieval...

State-level advances in worker-protective AI measures exist but are uneven and many proposed state bills aimed at strengthening workers’ rights related to AI have stalled.

Review of state legislative proposals and enacted laws as compiled in the commentary (state-level policy scan); no systematic quantitative legislative count or sample reported.

medium null result AI governance under the second Trump administration: implica... status of state-level legislation regarding AI and worker protections (enacted v...

Research priorities include causal studies on productivity gains from AI, firm‑level adoption dynamics, sectoral labor reallocation, long‑run general equilibrium effects, and heterogeneous impacts across regions and demographic groups.

Set of empirical research recommendations drawn from gaps identified in the literature review and limitations section; not an empirical claim but a prioritized research agenda based on secondary evidence.

medium null result AI and Robotics Redefine Output and Growth: The New Producti... knowledge gaps to be addressed (research outcomes)

Growth‑accounting frameworks and measurement approaches must be updated to capture AI/robotics as intangible and embodied capital, including quality improvements and spillovers.

Methodological argument grounded in literature on measurement challenges and examples of intangible capital; no new measurement exercise or empirical re‑estimation is provided in the paper.

medium null result AI and Robotics Redefine Output and Growth: The New Producti... measurement accuracy of productivity accounts, capture of intangible capital and...

Some functional domains show varying maturity: for example, procurement has more applied work compared with other functions.

Reviewer observation from the systematic search and screening across 2020–2025 literature noting uneven distribution of empirical/ applied studies across functions.

medium null result Integrating Artificial Intelligence and Enterprise Resource ... relative maturity (volume of applied studies or case evidence per functional dom...

A centralized policy engine for access control, data handling rules, and change management is a necessary control point in the reference pattern.

Prescriptive recommendation in the paper supported by best-practice synthesis and case anecdotes; no direct empirical comparison of centralized vs federated policy engines provided.

medium null result Governed Hyperautomation for CRM and ERP: A Reference Patter... effectiveness of access control and change management (e.g., policy violations, ...

Recommendation for research and modeling: economic models of AI markets should incorporate institutional regime types (centralized vs decentralized), enforcement uncertainty, and legitimacy effects as parameters affecting data access costs, R&D productivity, and market concentration.

Normative recommendation based on the comparative typology and inferred mechanisms from the document analysis; not empirically validated within the study.

medium null result Balancing openness and security in scientific data governanc... modeling parameters (regime type, enforcement uncertainty, legitimacy effects) a...

Theoretical contribution: the paper extends modular coordination theory by treating openness–security trade‑offs as layered, adaptive institutional processes embedded in political regimes and 'legitimacy economies.'

Argumentative/theoretical development in the paper grounded in document analysis and literature on coordination and legitimacy.

medium null result Balancing openness and security in scientific data governanc... theoretical framing / extension of modular coordination theory

Archi enables fully private management of sensitive data by using locally-hosted, open-weight models.

Paper statement tying local hosting of open-weight models to the ability to manage sensitive data privately; no technical privacy audit or measurements reported in the quoted text.

medium positive Archi: Agentic Operations at the CMS Experiment privacy / data management capability

Locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.

Paper's comparative claim about model performance based on the same evaluation (human and automated grading of production question set); asserts that locally-hosted open-weight models are competitive and support private data management.

medium positive Archi: Agentic Operations at the CMS Experiment model performance (competitiveness) and capability for private data management

The system proves effective at operational tasks, resolving real-world queries posed by CMS operators.

Results reported from the evaluation using operator feedback and the production question set graded by human and automated panels (no numerical success rates provided in the text quoted).

medium positive Archi: Agentic Operations at the CMS Experiment resolution of real-world queries / task completion

Embodied AI shapes collaboration in complex ways, and social cues critically guide teamwork dynamics.

Synthesis and interpretation of experimental findings (performance variability, completion rates, time, errors, conversational analyses) presented in the paper; this is a theoretical/concluding claim derived from reported results rather than a single empirical estimate.

medium positive Teaming Up with Artificial Agents in Non-routine Analytical ... influence of social cues/embodiment on teamwork dynamics

The framework closes scheduling inefficiencies of up to 28%.

Paper claims the constructs close documented gaps including scheduling inefficiencies of up to 28%; the abstract does not specify the empirical study, dataset, or sample size supporting this percentage.

medium positive Workforce Unit Abstraction for Governing Hybrid Human and Ar... scheduling inefficiency (presumably measured as percent inefficiency in scheduli...

Prior work (SimpleTOD, FireAct, SynTOD, WorkflowLLM, Agent Lumos) has shown the technique [compiling procedures into model weights / subterranean agents] works.

Citation/listing of six prior systems (SimpleTOD, FireAct, SynTOD, WorkflowLLM, Agent Lumos) asserted to demonstrate the approach; empirical/experimental results in those prior works are invoked as support.

medium positive Compiling Agentic Workflows into LLM Weights: Near-Frontier ... feasibility/effectiveness of compiling procedures into model weights (approach s...

Recent work has shown this [orchestration] architecture is dominated for procedural tasks by simply providing the procedure in a frontier model's system prompt [Dennis et al., 2026a].

Citation to Dennis et al., 2026a; claim refers to experimental results in that prior work comparing orchestration vs. providing procedures in frontier model system prompts on procedural tasks.

medium positive Compiling Agentic Workflows into LLM Weights: Near-Frontier ... performance on procedural tasks (dominance of system-prompted frontier model app...

Firms that successfully combine AI with learning and knowledge coordination can reduce inefficiencies, accelerate innovation cycles and improve overall performance.

Authors' conclusion and managerial implication derived from observed associations in the survey (AIDLC → KO → OI → IP).

medium positive Enhancing innovation in Pakistan’s IT sector efficiency, innovation cycle speed, overall performance

AI can reduce knowledge gaps and help employees adapt to change; well-designed AI systems complement human creativity, improve judgment and reduce repetitive tasks rather than simply replacing workers.

Authors' discussion and normative claim drawing on study findings and literature; not presented as a directly tested causal result in the survey.

medium positive Enhancing innovation in Pakistan’s IT sector job displacement / complementarity (adaptation, creativity, reduction of repetit...

Only RL-based predictions yield product-repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains.

Comparison of recommended repositioning decisions derived from RL versus those derived from observed (actual) trajectories and from heuristic models; reported that RL recommendations match actual-derived recommendations and produce similar estimated profit gains. No numerical profit figures or sample sizes are provided in the excerpt.

medium positive Modelling Customer Trajectories with Reinforcement Learning ... alignment of repositioning decisions and estimated profit gains from repositioni...

RL-based trajectories provide more accurate estimates of impulse purchase rates and shelf traffic densities than TSP and PNN.

Model-based comparisons against real-world trajectory data showing that outputs from RL more closely match observed impulse purchase rates and shelf traffic densities; specific quantitative comparisons and sample sizes not provided in the excerpt.

medium positive Modelling Customer Trajectories with Reinforcement Learning ... accuracy of estimated impulse purchase rates and shelf traffic densities

Hierarchical decomposition without deliberation achieves the best absolute performance for most models.

Observed performance rankings across the evaluated configurations and models (six models across five model families) in the CybORG CAGE-2 evaluation (3,475 episodes), comparing monolithic ReAct vs. delegation to specialized sub-agents with and without deliberation tools.

medium positive Context, Reasoning, and Hierarchy: A Cost-Performance Study ... absolute mean return

Effective AI implementation, coupled with employee training and transparent communication, can reduce resistance and anxiety among employees.

Interpretation and conclusion drawn from the observed negative relationship between perceived opportunities and challenges and the pattern of survey responses; presented as a recommended approach in the study.

medium positive Opportunities and Challenges of Human- AI Collaboration in W... reduction in resistance/anxiety (perceived)

The shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures.

Category-level breakdowns within the Product Hunt dataset showing larger increases in solo-founder launches in categories with a historical bias toward team-based ventures.

medium positive Generative AI Fuels Solo Entrepreneurship, but Teams Still L... change in solo-founder share by category (relative increase)

Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine.

Contextual claim motivating the work; presented as an empirical generalization about production agent pipelines, but not quantified in the abstract.

medium positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... distribution of model-call types in production agentic systems (short/structured...

Small and mid-sized open-weight models are already sufficient for much of the short-horizon, structured tool use work that dominates real agent pipelines.

Aggregate benchmark results across AgentFloor tiers showing high performance of smaller and mid-sized open-weight models on short-horizon structured tasks; supported by the 16,542 scored runs and model comparisons reported in the paper.

medium positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... ability to complete short-horizon, structured tool-use tasks on the AgentFloor b...

Embedding governance into agent reasoning produces more consistent, explainable, and auditable compliance than external enforcement.

Comparative claim asserted in the paper, apparently supported by the reported production deployment results (95% compliance, zero false escalations); explicit experimental comparison details are not provided in the abstract.

medium positive Think Before You Act -- A Neurocognitive Governance Model fo... consistency, explainability, and auditability of compliance

« Prev 1 2 3 … 64 65 66 … 83 84 Next »