Evidence (4175 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

Firms need complementary investments (data pipelines, monitoring tools, feedback loops, human oversight systems) which materially affect the economics of adoption.

Industry case studies and practitioner reports synthesized in the review describing necessary complementary investments; no quantified investment sample or ROI analysis provided here.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... required investment levels, effect on adoption economics and ROI

Regulatory attention is likely to focus on transparency, liability for factual errors, data privacy, and nondiscrimination; compliance and auditing will add to adoption costs.

Policy and regulatory analyses aggregated in the review and references to ongoing regulatory discussions; no primary regulatory impact study conducted in this paper.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... regulatory compliance requirements, related adoption costs, and scope of regulat...

Generative AI currently lacks genuine empathy and relational capabilities necessary for high-stakes or sensitive interactions.

Conceptual analyses and practitioner case examples aggregated in the review; limited direct quantitative measurement cited in this brief review.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... empathy/relational effectiveness in sensitive interactions, customer satisfactio...

Generative models exhibit contextual misunderstandings and cannot reliably infer nuanced customer intent in all cases.

Synthesis of empirical studies and practitioner observations documenting misinterpretation and intent-detection failures; no new testing reported in this review.

medium negative The Effectiveness of ChatGPT in Customer Service and Communi... accuracy of intent detection and rate of context-related misunderstandings

There is substitution risk: routine ideation and drafting tasks may be automated, altering task-level labor demand and wage structure.

Task-automation literature and empirical studies of LLMs performing routine drafting/ideation tasks summarized in the review; no long-run labor-market causality established in the paper.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... employment and wages for routine ideation/drafting tasks

Generative AI lacks reliable situational judgment on ambiguous problems and on ethical trade-offs, making it insufficient for autonomous decision-making in such contexts.

Case examples and experimental studies cited in the synthesis showing inconsistent or inappropriate responses to ambiguous/ethical scenarios; no large-scale causal evidence provided.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... quality/appropriateness of situational judgment and ethical decision-making in t...

LLMs are prone to bias, mediocrity, and factual or logical errors when domain-specific context or experiential knowledge is absent.

Review of empirical evaluations documenting biased outputs, superficial or mediocre suggestions, and factual errors in open-ended tasks and domain-specific prompts; evidence comes from multiple short-term studies and applied examples.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... accuracy/factuality, bias indicators, perceived quality of outputs in domain-spe...

LLMs are predominantly recombinative — they tend to rework and recombine existing material rather than produce deeply novel insights.

Analytical synthesis of output analyses and creativity assessments from multiple empirical studies demonstrating frequent recombination of existing concepts and lower rates of highly original novelty; studies and measures vary.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... novelty/creativity metrics (e.g., originality scores, novelty ratings)

Proliferation of low-quality or biased AI-generated ideas creates externalities: increased filtering and reputational costs for firms and risks of poor product designs, ethical lapses, or regulatory violations if evaluation is insufficient.

Case studies and qualitative reports documenting filtering burdens and instances of biased/misleading outputs; theoretical reasoning about reputational and regulatory risks; direct quantification of these externalities is limited.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... filtering effort/costs; incidence of reputational/regulatory incidents tied to A...

Standard productivity metrics (e.g., TFP) may undercount the value of ideation and creative augmentation provided by generative AI, making attribution between human and AI contributions difficult.

Methodological discussion in the review supported by heterogeneity in outcome measures across studies and challenges in measuring implemented idea quality and long-run impacts.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... coverage/accuracy of productivity metrics for ideation-related gains; attributio...

Generative models exhibit recombination bias: they tend to remix existing patterns rather than produce deeply original, paradigm-shifting insights.

Synthesis of output analyses across studies showing frequent recombination of known patterns and limited evidence of wholly novel, paradigm-changing ideas; claim based on qualitative and comparative analyses in reviewed literature.

medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... degree of novelty vs. recombination in generated outputs; incidence of paradigm-...

AI illiteracy (lack of understanding of AI capabilities/limits) impedes adoption and appropriate use of AI tools in finance.

Survey and interview data reporting lower adoption/intended use among respondents with limited self-reported AI understanding; supplemented by qualitative explanations; sample described as finance professionals across multinational institutions (size unspecified).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... adoption rates; appropriate use of AI tools

Excessive reliance on algorithmic suggestions can erode human judgment and create systemic risks.

Interview reports and, where available, operational/risk metrics indicating overreliance patterns; authors note systemic-risk implications based on combined qualitative and quantitative observations (no causal identification reported).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... quality of human judgment; systemic risk

Cognitive biases and inappropriate trust (both overtrust and distrust) distort decision outcomes and limit the benefits of AI-assisted decision-making.

Qualitative interview evidence describing instances of cognitive bias and misplaced trust; some quantitative indicators of decision distortion and risk where operational performance/risk metrics were available; sample: finance professionals across multinational institutions (detailed metrics not specified).

medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... decision quality/distortion; systemic risk indicators

Increased monitoring and algorithmic management raise concerns about worker autonomy and privacy and will prompt regulatory responses (data protection, algorithmic transparency) that shape adoption costs and trajectories.

Recurring concerns reported across included studies and the review's policy implication section; grounded in qualitative and normative discussions within the literature.

medium negative Data-Driven Strategies in Human Resource Management: The Rol... worker autonomy/privacy incidents, regulatory actions, adoption costs

Model risk, bias, and privacy concerns impose negative externalities (e.g., systemic risk in supply chains, discrimination), motivating governance standards, auditing, and possibly regulation.

Documentation in standards, practitioner reports, and conceptual literature within the 2020–2025 review describing incidents, risks, and calls for governance/regulation.

medium negative Integrating Artificial Intelligence and Enterprise Resource ... externality indicators (e.g., cross-firm contagion incidents, measured discrimin...

Firms will need to invest in new control technologies, governance structures, and personnel (AI auditors, red teams), increasing the total cost of GenAI adoption.

Economic reasoning and implications section; no empirical cost estimates or survey data; projection based on anticipated control needs.

medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... total cost of GenAI adoption including ongoing control and governance expenditur...

Malicious insiders, external actors (vendors, consultants, customers), shadow AI (unsanctioned consumer-grade GenAI use), and supply-chain/third-party prompt templates are plausible attack vectors for prompt fraud.

Threat taxonomy and scenario mapping with case-style examples; conceptual identification of actors rather than documented incident attribution.

medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... range of plausible adversary vectors capable of injecting malicious prompts

Poor logging, weak prompt governance, and over-reliance on machine-generated artifacts increase organizational vulnerability to prompt fraud.

Control gap analysis and prescriptive argumentation; examples of weak controls used to illustrate exploitability; no empirical measurement of effect sizes.

medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... organizational vulnerability/risk exposure to prompt fraud given control quality

Because prompt fraud operates at the linguistic/procedural surface rather than the network/technical surface, existing control frameworks are ill-prepared to address this new attack surface.

Control gap analysis comparing conventional internal controls to the linguistic attack surface; conceptual rather than empirical evaluation.

medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... adequacy of existing internal control frameworks to mitigate prompt-driven risks

Upfront governance costs (policy, tooling, staff) become a key part of adoption cost and affect ROI calculations and payback periods for automation investments.

Economic reasoning and implications discussed in the paper; no empirical cost data provided—recommendation based on practitioner experience and theoretical cost accounting.

medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... adoption costs, ROI, payback periods (economic outcomes, not empirically measure...

Traditional automation governance is often ad hoc, underestimates security and compliance risks, and does not scale safely for mission-critical enterprise systems.

Synthesis of industry best practices and practitioner-sourced lessons (qualitative observations and case illustrations). No systematic survey or quantitative incidence rates provided.

medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... quality of governance practices; prevalence of security/compliance risk awarenes...

Divergent governance regimes increase the risk of data localization, interoperability frictions, and regulatory fragmentation — raising costs for multinational AI development and limiting global model generalizability.

Policy‑level comparative inference from contrasting national approaches identified in the document analysis and related literature on cross‑border data governance; no direct measurement of costs or model generalizability in the paper.

medium negative Balancing openness and security in scientific data governanc... data localization, interoperability frictions, regulatory fragmentation, costs t...

State‑led coordination can rapidly mobilize resources and scale national champions, altering competitive dynamics and potentially creating winner‑take‑most outcomes.

Theoretical inference from document evidence of state mobilization and developmentalist goals in Chinese texts, combined with literature on state coordination and industrial scaling (no empirical competition measures in the paper).

medium negative Balancing openness and security in scientific data governanc... market concentration / competitive dynamics (winner‑take‑most)

Resource-rich labs and firms are likely to adopt LLM orchestration faster, which could widen gaps in research capacity between institutions and countries unless mitigated by policy choices.

Equity and diffusion argument based on resource requirements (compute, data, validation); no adoption-rate data or cross-institution comparisons provided.

medium negative ChatMicroscopy: A Perspective Review of Large Language Model... adoption rates across institutions, disparities in research capacity

There is potential for 'winner-take-most' market outcomes if a few players combine superior models, instrument control software, and exclusive datasets.

Economics reasoning about network effects and data concentration; no empirical market concentration metrics specific to microscopy provided.

medium negative ChatMicroscopy: A Perspective Review of Large Language Model... market concentration and distribution of market share among firms

Upfront investments required for compute, data labeling, validation, and safety testing may raise entry costs and favor incumbents.

Economic logic about fixed costs and scale advantages; no measured entry-cost or firm-dynamics data provided.

medium negative ChatMicroscopy: A Perspective Review of Large Language Model... entry costs and competitive dynamics (incumbent advantage)

There is a risk of deskilling for some technical roles, creating implications for training and workforce development.

Theoretical reasoning about automation-induced deskilling; no empirical study or measured skill changes provided.

medium negative ChatMicroscopy: A Perspective Review of Large Language Model... level of technical skill required for routine roles and training needs

Human-in-the-loop controls formalize supervisory labor and create persistent oversight costs even after automation scales.

Pattern design and governance lifecycle recommendations highlighting human checkpoints; qualitative reasoning without measurement of oversight hours or costs.

medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... ongoing human oversight hours/costs per automated transaction

Distributed training introduces novel incentive issues (free-riding, poisoning incentives, misreporting of local metrics) that require contractual and cryptographic solutions and may create demand for trusted intermediaries or certification markets.

Mechanism/incentive analysis within the paper; threat modeling and proposed governance solutions. No experimental evaluation of incentive mechanisms or market responses.

medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... incidence of strategic behaviors (free-riding, misreporting, poisoning) and effe...

Federated infrastructures redistribute informational power — moving custody away from centralized platforms reduces their exclusive access to behavioral data and can lower their data-based market power.

Economic and institutional analysis (conceptual), discussion of informational rents and bargaining positions. This is a theoretical economic claim without empirical market measurement in the paper.

medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... distribution of informational rents/market power indicators (conceptual; no empi...

Fairness constraints (e.g., disparate ad delivery) and monitoring become more challenging to enforce and audit without centralized raw data, requiring new governance and measurement mechanisms.

Policy and governance analysis describing limitations of decentralized data for fairness monitoring; proposed policy-aware governance layer and attestation/audit mechanisms. No empirical validation of governance effectiveness provided.

medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... ability to detect and correct disparate outcomes (fairness metrics) under decent...

Identified concrete training gaps in current models: delegation, scoped execution, and mode switching are skills absent from current training data and limit splitting models into manager/worker roles.

Authors' diagnosis based on experimental outcomes and qualitative reasoning about model training distributions; recommendation for future training focus.

medium neutral Can AI Models Direct Each Other? Organizational Structure as... presence/absence of specific training capabilities in model training data (deleg...

Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.

Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.

medium neutral Results-Actionability Gap: Understanding How Practitioners E... characterization of interpretive evaluation practices (rational adaptation vs. m...

AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.

Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.

medium neutral Modern Management in the Age of Artificial Intelligence: Str... components of multifactor productivity attributable to AI assets versus organiza...

Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.

Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.

medium neutral Rethinking How IT Professionals Build IT Products with Artif... need for regulatory standards and governance mechanisms for AI-assisted developm...

Pakistan’s IT sector employs around 600,000 people and generates billions in exports, with several cities (Karachi, Lahore, Islamabad) acting as software/AI/digital services hubs.

Background statistics reported in the paper (sector-level descriptive figures).

medium null result Enhancing innovation in Pakistan’s IT sector employment and exports in Pakistan IT sector

Much of the growth in solo entrepreneurial entry reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes.

Analysis of engagement/commitment indicators and ranking outcomes on Product Hunt showing that many new solo launches post-release are low-commitment/experimental and that these entrants are underrepresented among top-ranked/high-quality launches.

medium null result Generative AI Fuels Solo Entrepreneurship, but Teams Still L... representation of new solo entrants among highest-quality (top-ranked) outcomes;...

In aggregate, the strongest open-weight model matches GPT-5 on our benchmark while being substantially cheaper and faster to run.

Aggregate score comparisons between the top-performing open-weight model and GPT-5 on AgentFloor, together with reported cost and latency measurements (as described in the evaluation section).

medium null result AgentFloor: How Far Up the tool use Ladder Can Small Open-We... aggregate benchmark score (performance) and operational cost/latency

AI is increasingly deployed in multi-agent systems; however, most research considers only the behavior of individual models.

Background/literature claim stated in the paper (asserted by authors as motivation).

medium null result AI Organizations are More Effective but Less Aligned than In... prevalence of multi-agent deployment vs. research focus on individual models (li...

Five interaction mechanisms were identified, with the majority propagating across the subsystem boundary.

Authors' thematic analysis and STS mapping identifying five cross- or within-subsystem interaction mechanisms; qualitative assessment that most propagate across subsystem boundary.

medium null result BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION interaction_mechanisms_and_propagation

Mobile penetration reaches 84% (in the context of low-income countries), a statistic used to motivate RSI's potential reach.

Single numeric statistic reported in the paper as background context; source or empirical basis for the statistic not provided within the supplied text.

medium null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... mobile penetration rate (percent)

Output quality saturates at approximately seven governed memories per entity.

Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.

medium null result Governed Memory: A Production Architecture for Multi-Agent W... output quality as a function of number of governed memories per entity (saturati...

A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.

Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.

medium null result Generative AI and the algorithmic workplace: a bibliometric ... lexical convergence across themes and concentration of author influence (disprop...

This paper is one of the first systematic reviews focused specifically on NLP in bank marketing, organizing findings along the customer journey and the marketing mix to provide a practical taxonomy.

Authors' stated novelty claim based on the scoped literature search (2014–2024) and topical focus; novelty inferred from the small number of prior papers identified at the intersection.

medium null result Natural language processing in bank marketing: a systematic ... existence of prior systematic reviews specifically on NLP in bank marketing

Productivity gains from AI may be under- or mis-measured if national accounts and tax systems do not adjust for AI-driven quality changes in services.

Analytic observation in the paper's measurement and externalities discussion; not empirically tested within the study.

medium null result Explore the Impact of Generative AI on Finance and Taxation accuracy of productivity measurement and GDP accounting for AI-enabled quality i...

The paper documents production failure vignettes and operational lessons drawn from a real enterprise deployment integrated with a major cloud provider's MCP servers (client redacted).

Paper states empirical context is field lessons from an enterprise agent platform; failure vignettes are enumerated as deliverables.

medium null result Bridging Protocol and Production: Design Patterns for Deploy... presence and content of documented failure vignettes and lessons

ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.

Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.

medium null result Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... difference in coordination performance between matched and mismatched ToM orders...

Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.

Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.

medium null result AI as a universal collaboration layer: Eliminating language ... feasibility and precision of proposed coordination/productivity metrics

Many early-stage AI advances have not translated into higher Phase II/III success rates.

Synthesis of reported outcomes and failures from industry experience; no new systematic statistical analysis provided.

medium null result Learning from the successes and failures of early artificial... Phase II/III clinical success rates

« Prev 1 2 3 … 63 64 65 … 83 84 Next »