The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (4175 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Org Design Remove filter
Firms need complementary investments (data pipelines, monitoring tools, feedback loops, human oversight systems) which materially affect the economics of adoption.
Industry case studies and practitioner reports synthesized in the review describing necessary complementary investments; no quantified investment sample or ROI analysis provided here.
medium negative The Effectiveness of ChatGPT in Customer Service and Communi... required investment levels, effect on adoption economics and ROI
Regulatory attention is likely to focus on transparency, liability for factual errors, data privacy, and nondiscrimination; compliance and auditing will add to adoption costs.
Policy and regulatory analyses aggregated in the review and references to ongoing regulatory discussions; no primary regulatory impact study conducted in this paper.
medium negative The Effectiveness of ChatGPT in Customer Service and Communi... regulatory compliance requirements, related adoption costs, and scope of regulat...
Generative AI currently lacks genuine empathy and relational capabilities necessary for high-stakes or sensitive interactions.
Conceptual analyses and practitioner case examples aggregated in the review; limited direct quantitative measurement cited in this brief review.
medium negative The Effectiveness of ChatGPT in Customer Service and Communi... empathy/relational effectiveness in sensitive interactions, customer satisfactio...
Generative models exhibit contextual misunderstandings and cannot reliably infer nuanced customer intent in all cases.
Synthesis of empirical studies and practitioner observations documenting misinterpretation and intent-detection failures; no new testing reported in this review.
medium negative The Effectiveness of ChatGPT in Customer Service and Communi... accuracy of intent detection and rate of context-related misunderstandings
There is substitution risk: routine ideation and drafting tasks may be automated, altering task-level labor demand and wage structure.
Task-automation literature and empirical studies of LLMs performing routine drafting/ideation tasks summarized in the review; no long-run labor-market causality established in the paper.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... employment and wages for routine ideation/drafting tasks
Generative AI lacks reliable situational judgment on ambiguous problems and on ethical trade-offs, making it insufficient for autonomous decision-making in such contexts.
Case examples and experimental studies cited in the synthesis showing inconsistent or inappropriate responses to ambiguous/ethical scenarios; no large-scale causal evidence provided.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... quality/appropriateness of situational judgment and ethical decision-making in t...
LLMs are prone to bias, mediocrity, and factual or logical errors when domain-specific context or experiential knowledge is absent.
Review of empirical evaluations documenting biased outputs, superficial or mediocre suggestions, and factual errors in open-ended tasks and domain-specific prompts; evidence comes from multiple short-term studies and applied examples.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... accuracy/factuality, bias indicators, perceived quality of outputs in domain-spe...
LLMs are predominantly recombinative — they tend to rework and recombine existing material rather than produce deeply novel insights.
Analytical synthesis of output analyses and creativity assessments from multiple empirical studies demonstrating frequent recombination of existing concepts and lower rates of highly original novelty; studies and measures vary.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... novelty/creativity metrics (e.g., originality scores, novelty ratings)
Proliferation of low-quality or biased AI-generated ideas creates externalities: increased filtering and reputational costs for firms and risks of poor product designs, ethical lapses, or regulatory violations if evaluation is insufficient.
Case studies and qualitative reports documenting filtering burdens and instances of biased/misleading outputs; theoretical reasoning about reputational and regulatory risks; direct quantification of these externalities is limited.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... filtering effort/costs; incidence of reputational/regulatory incidents tied to A...
Standard productivity metrics (e.g., TFP) may undercount the value of ideation and creative augmentation provided by generative AI, making attribution between human and AI contributions difficult.
Methodological discussion in the review supported by heterogeneity in outcome measures across studies and challenges in measuring implemented idea quality and long-run impacts.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... coverage/accuracy of productivity metrics for ideation-related gains; attributio...
Generative models exhibit recombination bias: they tend to remix existing patterns rather than produce deeply original, paradigm-shifting insights.
Synthesis of output analyses across studies showing frequent recombination of known patterns and limited evidence of wholly novel, paradigm-changing ideas; claim based on qualitative and comparative analyses in reviewed literature.
medium negative ChatGPT as an Innovative Tool for Idea Generation and Proble... degree of novelty vs. recombination in generated outputs; incidence of paradigm-...
AI illiteracy (lack of understanding of AI capabilities/limits) impedes adoption and appropriate use of AI tools in finance.
Survey and interview data reporting lower adoption/intended use among respondents with limited self-reported AI understanding; supplemented by qualitative explanations; sample described as finance professionals across multinational institutions (size unspecified).
medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... adoption rates; appropriate use of AI tools
Excessive reliance on algorithmic suggestions can erode human judgment and create systemic risks.
Interview reports and, where available, operational/risk metrics indicating overreliance patterns; authors note systemic-risk implications based on combined qualitative and quantitative observations (no causal identification reported).
medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... quality of human judgment; systemic risk
Cognitive biases and inappropriate trust (both overtrust and distrust) distort decision outcomes and limit the benefits of AI-assisted decision-making.
Qualitative interview evidence describing instances of cognitive bias and misplaced trust; some quantitative indicators of decision distortion and risk where operational performance/risk metrics were available; sample: finance professionals across multinational institutions (detailed metrics not specified).
medium negative Human-AI Synergy in Financial Decision-Making: Exploring Tru... decision quality/distortion; systemic risk indicators
Increased monitoring and algorithmic management raise concerns about worker autonomy and privacy and will prompt regulatory responses (data protection, algorithmic transparency) that shape adoption costs and trajectories.
Recurring concerns reported across included studies and the review's policy implication section; grounded in qualitative and normative discussions within the literature.
medium negative Data-Driven Strategies in Human Resource Management: The Rol... worker autonomy/privacy incidents, regulatory actions, adoption costs
Model risk, bias, and privacy concerns impose negative externalities (e.g., systemic risk in supply chains, discrimination), motivating governance standards, auditing, and possibly regulation.
Documentation in standards, practitioner reports, and conceptual literature within the 2020–2025 review describing incidents, risks, and calls for governance/regulation.
medium negative Integrating Artificial Intelligence and Enterprise Resource ... externality indicators (e.g., cross-firm contagion incidents, measured discrimin...
Firms will need to invest in new control technologies, governance structures, and personnel (AI auditors, red teams), increasing the total cost of GenAI adoption.
Economic reasoning and implications section; no empirical cost estimates or survey data; projection based on anticipated control needs.
medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... total cost of GenAI adoption including ongoing control and governance expenditur...
Malicious insiders, external actors (vendors, consultants, customers), shadow AI (unsanctioned consumer-grade GenAI use), and supply-chain/third-party prompt templates are plausible attack vectors for prompt fraud.
Threat taxonomy and scenario mapping with case-style examples; conceptual identification of actors rather than documented incident attribution.
medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... range of plausible adversary vectors capable of injecting malicious prompts
Poor logging, weak prompt governance, and over-reliance on machine-generated artifacts increase organizational vulnerability to prompt fraud.
Control gap analysis and prescriptive argumentation; examples of weak controls used to illustrate exploitability; no empirical measurement of effect sizes.
medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... organizational vulnerability/risk exposure to prompt fraud given control quality
Because prompt fraud operates at the linguistic/procedural surface rather than the network/technical surface, existing control frameworks are ill-prepared to address this new attack surface.
Control gap analysis comparing conventional internal controls to the linguistic attack surface; conceptual rather than empirical evaluation.
medium negative Prompt Engineering or Prompt Fraud? Governance Challenges fo... adequacy of existing internal control frameworks to mitigate prompt-driven risks
Upfront governance costs (policy, tooling, staff) become a key part of adoption cost and affect ROI calculations and payback periods for automation investments.
Economic reasoning and implications discussed in the paper; no empirical cost data provided—recommendation based on practitioner experience and theoretical cost accounting.
medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... adoption costs, ROI, payback periods (economic outcomes, not empirically measure...
Traditional automation governance is often ad hoc, underestimates security and compliance risks, and does not scale safely for mission-critical enterprise systems.
Synthesis of industry best practices and practitioner-sourced lessons (qualitative observations and case illustrations). No systematic survey or quantitative incidence rates provided.
medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... quality of governance practices; prevalence of security/compliance risk awarenes...
Divergent governance regimes increase the risk of data localization, interoperability frictions, and regulatory fragmentation — raising costs for multinational AI development and limiting global model generalizability.
Policy‑level comparative inference from contrasting national approaches identified in the document analysis and related literature on cross‑border data governance; no direct measurement of costs or model generalizability in the paper.
medium negative Balancing openness and security in scientific data governanc... data localization, interoperability frictions, regulatory fragmentation, costs t...
State‑led coordination can rapidly mobilize resources and scale national champions, altering competitive dynamics and potentially creating winner‑take‑most outcomes.
Theoretical inference from document evidence of state mobilization and developmentalist goals in Chinese texts, combined with literature on state coordination and industrial scaling (no empirical competition measures in the paper).
medium negative Balancing openness and security in scientific data governanc... market concentration / competitive dynamics (winner‑take‑most)
Resource-rich labs and firms are likely to adopt LLM orchestration faster, which could widen gaps in research capacity between institutions and countries unless mitigated by policy choices.
Equity and diffusion argument based on resource requirements (compute, data, validation); no adoption-rate data or cross-institution comparisons provided.
medium negative ChatMicroscopy: A Perspective Review of Large Language Model... adoption rates across institutions, disparities in research capacity
There is potential for 'winner-take-most' market outcomes if a few players combine superior models, instrument control software, and exclusive datasets.
Economics reasoning about network effects and data concentration; no empirical market concentration metrics specific to microscopy provided.
medium negative ChatMicroscopy: A Perspective Review of Large Language Model... market concentration and distribution of market share among firms
Upfront investments required for compute, data labeling, validation, and safety testing may raise entry costs and favor incumbents.
Economic logic about fixed costs and scale advantages; no measured entry-cost or firm-dynamics data provided.
medium negative ChatMicroscopy: A Perspective Review of Large Language Model... entry costs and competitive dynamics (incumbent advantage)
There is a risk of deskilling for some technical roles, creating implications for training and workforce development.
Theoretical reasoning about automation-induced deskilling; no empirical study or measured skill changes provided.
medium negative ChatMicroscopy: A Perspective Review of Large Language Model... level of technical skill required for routine roles and training needs
Human-in-the-loop controls formalize supervisory labor and create persistent oversight costs even after automation scales.
Pattern design and governance lifecycle recommendations highlighting human checkpoints; qualitative reasoning without measurement of oversight hours or costs.
medium negative Governed Hyperautomation for CRM and ERP: A Reference Patter... ongoing human oversight hours/costs per automated transaction
Distributed training introduces novel incentive issues (free-riding, poisoning incentives, misreporting of local metrics) that require contractual and cryptographic solutions and may create demand for trusted intermediaries or certification markets.
Mechanism/incentive analysis within the paper; threat modeling and proposed governance solutions. No experimental evaluation of incentive mechanisms or market responses.
medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... incidence of strategic behaviors (free-riding, misreporting, poisoning) and effe...
Federated infrastructures redistribute informational power — moving custody away from centralized platforms reduces their exclusive access to behavioral data and can lower their data-based market power.
Economic and institutional analysis (conceptual), discussion of informational rents and bargaining positions. This is a theoretical economic claim without empirical market measurement in the paper.
medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... distribution of informational rents/market power indicators (conceptual; no empi...
Fairness constraints (e.g., disparate ad delivery) and monitoring become more challenging to enforce and audit without centralized raw data, requiring new governance and measurement mechanisms.
Policy and governance analysis describing limitations of decentralized data for fairness monitoring; proposed policy-aware governance layer and attestation/audit mechanisms. No empirical validation of governance effectiveness provided.
medium negative Privacy-Aware AI Advertising Systems: A Federated Learning F... ability to detect and correct disparate outcomes (fairness metrics) under decent...
Identified concrete training gaps in current models: delegation, scoped execution, and mode switching are skills absent from current training data and limit splitting models into manager/worker roles.
Authors' diagnosis based on experimental outcomes and qualitative reasoning about model training distributions; recommendation for future training focus.
medium neutral Can AI Models Direct Each Other? Organizational Structure as... presence/absence of specific training capabilities in model training data (deleg...
Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.
Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.
medium neutral Results-Actionability Gap: Understanding How Practitioners E... characterization of interpretive evaluation practices (rational adaptation vs. m...
AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.
Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.
medium neutral Modern Management in the Age of Artificial Intelligence: Str... components of multifactor productivity attributable to AI assets versus organiza...
Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.
Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.
medium neutral Rethinking How IT Professionals Build IT Products with Artif... need for regulatory standards and governance mechanisms for AI-assisted developm...
Pakistan’s IT sector employs around 600,000 people and generates billions in exports, with several cities (Karachi, Lahore, Islamabad) acting as software/AI/digital services hubs.
Background statistics reported in the paper (sector-level descriptive figures).
medium null result Enhancing innovation in Pakistan’s IT sector employment and exports in Pakistan IT sector
Much of the growth in solo entrepreneurial entry reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes.
Analysis of engagement/commitment indicators and ranking outcomes on Product Hunt showing that many new solo launches post-release are low-commitment/experimental and that these entrants are underrepresented among top-ranked/high-quality launches.
medium null result Generative AI Fuels Solo Entrepreneurship, but Teams Still L... representation of new solo entrants among highest-quality (top-ranked) outcomes;...
In aggregate, the strongest open-weight model matches GPT-5 on our benchmark while being substantially cheaper and faster to run.
Aggregate score comparisons between the top-performing open-weight model and GPT-5 on AgentFloor, together with reported cost and latency measurements (as described in the evaluation section).
medium null result AgentFloor: How Far Up the tool use Ladder Can Small Open-We... aggregate benchmark score (performance) and operational cost/latency
AI is increasingly deployed in multi-agent systems; however, most research considers only the behavior of individual models.
Background/literature claim stated in the paper (asserted by authors as motivation).
medium null result AI Organizations are More Effective but Less Aligned than In... prevalence of multi-agent deployment vs. research focus on individual models (li...
Five interaction mechanisms were identified, with the majority propagating across the subsystem boundary.
Authors' thematic analysis and STS mapping identifying five cross- or within-subsystem interaction mechanisms; qualitative assessment that most propagate across subsystem boundary.
medium null result BARRIERS TO AGENTIC AI ENTERPRISE TRANSFORMATION interaction_mechanisms_and_propagation
Mobile penetration reaches 84% (in the context of low-income countries), a statistic used to motivate RSI's potential reach.
Single numeric statistic reported in the paper as background context; source or empirical basis for the statistic not provided within the supplied text.
medium null result Revenue-Sharing as Infrastructure: A Distributed Business Mo... mobile penetration rate (percent)
Output quality saturates at approximately seven governed memories per entity.
Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.
medium null result Governed Memory: A Production Architecture for Multi-Agent W... output quality as a function of number of governed memories per entity (saturati...
A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.
Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.
medium null result Generative AI and the algorithmic workplace: a bibliometric ... lexical convergence across themes and concentration of author influence (disprop...
This paper is one of the first systematic reviews focused specifically on NLP in bank marketing, organizing findings along the customer journey and the marketing mix to provide a practical taxonomy.
Authors' stated novelty claim based on the scoped literature search (2014–2024) and topical focus; novelty inferred from the small number of prior papers identified at the intersection.
medium null result Natural language processing in bank marketing: a systematic ... existence of prior systematic reviews specifically on NLP in bank marketing
Productivity gains from AI may be under- or mis-measured if national accounts and tax systems do not adjust for AI-driven quality changes in services.
Analytic observation in the paper's measurement and externalities discussion; not empirically tested within the study.
medium null result Explore the Impact of Generative AI on Finance and Taxation accuracy of productivity measurement and GDP accounting for AI-enabled quality i...
The paper documents production failure vignettes and operational lessons drawn from a real enterprise deployment integrated with a major cloud provider's MCP servers (client redacted).
Paper states empirical context is field lessons from an enterprise agent platform; failure vignettes are enumerated as deliverables.
medium null result Bridging Protocol and Production: Design Patterns for Deploy... presence and content of documented failure vignettes and lessons
ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.
Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.
medium null result Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... difference in coordination performance between matched and mismatched ToM orders...
Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.
Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.
medium null result AI as a universal collaboration layer: Eliminating language ... feasibility and precision of proposed coordination/productivity metrics
Many early-stage AI advances have not translated into higher Phase II/III success rates.
Synthesis of reported outcomes and failures from industry experience; no new systematic statistical analysis provided.
medium null result Learning from the successes and failures of early artificial... Phase II/III clinical success rates