The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (8570 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Adoption Remove filter
We establish a Volume-Quality Inverse Law: code volume is a near perfect predictor of structural degradation.
Empirical finding from the paper's analysis correlating code volume with measures of structural degradation; described as 'near perfect predictor'.
high negative AI-Generated Smells: An Analysis of Code and Architecture in... structural degradation (predicted by code volume)
There exists a fundamental Reasoning-Complexity Trade-off: as models become more capable, they generate increasingly bloated and coupled code.
Multi-scale comparative analysis across models of differing capability showing higher-capability models produce larger (volume) and more highly-coupled code artifacts.
high negative AI-Generated Smells: An Analysis of Code and Architecture in... code volume and coupling (architectural complexity)
AI does not eliminate software flaws but rather introduces a distinct 'machine signature' of defects in generated code.
Systematic audit (multi-scale analysis) of AI-generated software across single-file algorithmic tasks and complex, agent-generated systems, reporting characteristic defect patterns attributed to machine generation.
high negative AI-Generated Smells: An Analysis of Code and Architecture in... presence and patterning of defects in AI-generated code (machine signature of de...
The promise of Large Language Models in automated software engineering is often measured by functional correctness, overlooking the critical issue of long term maintainability.
Framing statement in the paper; argument based on literature/practice that current evaluations emphasize functional correctness rather than maintainability.
high negative AI-Generated Smells: An Analysis of Code and Architecture in... emphasis of evaluation metrics (functional correctness vs maintainability)
Standard metrics fail to detect four of the seven failure modes entirely and detect three others only after a lag of multiple evaluation cycles.
Quantitative analysis reported in the paper comparing detection of the seven failure modes by standard metrics over evaluation cycles.
high negative Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... proportion and timing of detection of failure modes by standard metrics
Standard metrics (ROUGE, BERTScore, accuracy/AUC, and agentic benchmarks such as HELM/MT-Bench/AgentBench/BIG-bench) fail to detect each of the seven production failure modes.
Empirical demonstration reported in the paper comparing standard metrics and agentic benchmarks against the seven failure modes.
high negative Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... detection capability of standard metrics/benchmarks for production failure modes
The seven failure modes include compounding decision errors, tool failure cascades, non-deterministic output drift, and the absence of ground truth for long-horizon tasks.
Author-provided list of example failure modes within the taxonomy; grounded in observations described in the paper.
high negative Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... types of failure modes affecting production agentic systems
Existing evaluation frameworks for large language models -- including HELM, MT-Bench, AgentBench, and BIG-bench -- are designed for controlled, single-session, lab-scale settings and do not address the evaluation challenges that emerge when agentic AI systems operate continuously in production.
Author statement based on literature/framework review (references to HELM, MT-Bench, AgentBench, BIG-bench) and contrast with production agentic evaluation needs.
high negative Evaluating Agentic AI in the Wild: Failure Modes, Drift Patt... ability of existing LLM evaluation frameworks to address continuous production a...
The most valuable AI capabilities (reasoning, judgment, intuition) are precisely those we cannot verify with current methods.
Argumentative claim in the position paper linking capability value to unverifiability; no empirical validation or measurement of 'value' or verifiability included.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... verifiability of high-level AI capabilities (reasoning, judgment, intuition)
Current reliability methods can only verify explicit knowledge against sources, creating a fundamental gap in verifying AI's implicit knowledge.
Conceptual critique in the paper of existing verification/validation approaches; no systematic review or empirical comparison provided.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... verifiability of AI knowledge (explicit vs implicit)
Implicit knowledge remains unexternalized because documentation cost exceeds perceived value.
Presented as an economic/theoretical explanation in the paper; no empirical study, sample, or cost estimates provided.
high negative Reliable AI Needs to Externalize Implicit Knowledge: A Human... degree of externalization of implicit knowledge (documentation vs tacit retentio...
Compound-system-specific operational challenges arise when serving agentic workloads, including multi-model fan-out overhead, cascading cold-start propagation, and heterogeneous scaling dynamics.
The paper presents a novel analysis and discussion of these challenges and supports the points via case studies and operational lessons from the production deployment; no quantitative prevalence metrics or sample sizes are provided in the provided text.
high negative Scalable Inference Architectures for Compound AI Systems: A ... operational challenges: fan-out overhead, cold-start propagation, heterogeneous ...
Whether it is the periodic compulsory recoinage in medieval Europe or Gesell's stamp scrip, both are essentially mechanisms for taxing money holdings.
Interpretive/historical claim presented by the authors; no empirical testing or sample reported in the excerpt.
high negative RSDM: The Consensus Honest Money in the AI Era degree_to_which_historical_monetary_policies_function_as_a_tax_on_money_holdings
The devaluation of money runs through almost the whole process of history, from the weight reduction and purity decrease of metallic coin to the unanchored over-issuance of paper currency.
Historical summary/claim by the authors referencing long-run monetary history; no specific empirical study or sample size given in the excerpt.
high negative RSDM: The Consensus Honest Money in the AI Era occurrence_of_currency_devaluation_over_history
Disparities may lead to AI bias and governance challenges that potentially leave the poorest communities excluded from the Fourth Industrial Revolution.
Paper lists AI bias and governance challenges as potential consequences of uneven AI development; presented as conceptual/ethical/political risks without empirical quantification in the excerpt.
high negative GLOBAL DISPROPORTIONS IN THE IMPLEMENTATION AND USE OF ARTIF... AI bias and governance failures leading to exclusion
These disparities risk causing economic isolation and social inequality.
Qualitative claim in the paper listing potential socio-economic risks of uneven AI adoption; no supporting empirical estimates in the excerpt.
high negative GLOBAL DISPROPORTIONS IN THE IMPLEMENTATION AND USE OF ARTIF... economic isolation and social inequality
These disparities carry the risk of a deepening digital divide.
Stated as a consequence/risk in the paper; presented qualitatively without empirical quantification in the excerpt.
high negative GLOBAL DISPROPORTIONS IN THE IMPLEMENTATION AND USE OF ARTIF... digital divide (differential access/use of digital technologies)
Projections indicate that without additional measures, these disparities are likely to increase.
Paper reports forward-looking projections or scenario analysis (methods, assumptions, and quantitative projection details not given in the excerpt).
high negative GLOBAL DISPROPORTIONS IN THE IMPLEMENTATION AND USE OF ARTIF... future global disparities / inequality in AI and digital access
Low-income regions (in particular parts of Africa and South Asia) lag significantly behind in both education and access to digital technologies.
Statement in the paper based on comparative assessment of education levels and digital access across regions; the excerpt provides no numeric data or described sample.
high negative GLOBAL DISPROPORTIONS IN THE IMPLEMENTATION AND USE OF ARTIF... education levels and access to digital technologies
Current AI agents implement only the first half of CLS (fast exemplar/hippocampal-style storage) and lack the slow weight-consolidation half.
Analytic claim in paper comparing current AI agent designs to CLS; no empirical evaluation reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory presence/absence of slow weight-consolidation mechanisms in AI agents
Agents that rely only on lookup are structurally vulnerable to persistent memory poisoning as injected content propagates across all future sessions.
Theoretical/security argument presented in paper; claims about propagation of injected content across sessions; no empirical attack experiments detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory vulnerability to persistent memory poisoning
Conflating the two produces agents that face a provable generalization ceiling on compositionally novel tasks that no increase in context size or retrieval quality can overcome.
Formal claim asserted in paper (formalization of limitations and proofs claimed); no empirical sample detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory generalization performance on compositionally novel tasks
Conflating retrieval and weight-based memory produces agents that accumulate notes indefinitely without developing expertise.
Theoretical argument/formalization presented in paper; claim based on analysis of how lookup-only systems fail to consolidate abstract knowledge; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory expertise development / continued accumulation of notes
Treating lookup as memory is a category error with provable consequences for security.
Theoretical/formal argument and formalization in paper; security consequences (e.g., persistent poisoning) claimed; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory security (vulnerability to persistent memory poisoning)
Treating lookup as memory is a category error with provable consequences for long-term learning.
Theoretical/formal argument asserted in the paper, drawing on formalization and Complementary Learning Systems theory; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory long-term learning
Treating lookup as memory is a category error with provable consequences for agent capability.
Theoretical/formal argument asserted in the paper (formalization and proofs claimed); no empirical sample reported in abstract.
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup.
Conceptual/analytic claim stated in paper; supported by comparison of existing agent memory mechanisms (vector stores, RAG, scratchpads, context-window management) to the paper's definition of 'memory'. No empirical sample reported.
high negative Contextual Agentic Memory is a Memo, Not True Memory whether systems implement memory vs. lookup
Workers acquire skills through generative AI tools but lack credible ways to signal or validate these skills in competitive freelance markets (a structural challenge the paper terms 'invisible competencies').
Reported finding and conceptual contribution based on the paper's mixed-methods study (survey + semi-structured interviews).
high negative Upskilling with Generative AI: Practices and Challenges for ... ability to signal/validate skills acquired via generative AI in freelance market...
There is a shift from learning as growth to learning as survival, where upskilling is oriented toward immediate market viability rather than long-term development.
Reported thematic finding from the paper's interviews and survey of freelance knowledge workers.
high negative Upskilling with Generative AI: Practices and Challenges for ... orientation of upskilling (immediate market viability vs long-term development)
Freelancers do not treat generative AI as their primary learning resource due to inconsistency, lack of contextual relevance, and verification overhead.
Reported finding from the paper's mixed-methods study (survey + semi-structured interviews with freelance knowledge workers).
high negative Upskilling with Generative AI: Practices and Challenges for ... role of generative AI in freelancers' learning stacks / barriers to using it as ...
Freelance workers must continually acquire new skills to remain competitive in online labor markets, yet they lack the organizational training, mentorship, and infrastructure available to traditional employees.
Framing statement in the paper's introduction / literature review (not reported as an empirical result from this study).
high negative Upskilling with Generative AI: Practices and Challenges for ... need for continual upskilling and availability of organizational training/mentor...
Existing approaches address data quality but not data valuation.
Literature review / background discussion in paper contrasting prior work on data quality with lack of approaches for data valuation.
high negative Calibrating Attribution Proxies for Reward Allocation in Par... coverage of data valuation in existing approaches
Existing approaches, runtime guardrails, training-time alignment, and post-hoc auditing treat governance as an external constraint rather than an internalized behavioral principle, leaving agents vulnerable to unsafe and irreversible actions.
Author's conceptual/literature critique presented in the paper (argumentative claim, no empirical sample or experiment reported for this statement).
high negative Think Before You Act -- A Neurocognitive Governance Model fo... vulnerability to unsafe and irreversible actions
Obstacles exist for healthcare workers in rural areas that limit the benefits of technology.
Review conclusion noting persistent obstacles for rural healthcare workers drawn from the literature; synthesis of qualitative/quantitative sources (no sample size in excerpt).
high negative A Comprehensive Review of Technology Adoption and Its Impact... barriers to technology benefits in rural healthcare
Indian healthcare faces barriers to technological integration such as financial issues, poor infrastructure, and regulatory problems.
Review-identifed barriers drawn from the literature (qualitative and quantitative studies summarized by the authors); no aggregate sample size reported in the excerpt.
high negative A Comprehensive Review of Technology Adoption and Its Impact... barriers to technology adoption
The marginal gains from genAI came at the high cost of recruiter deskilling, a trend that jeopardizes meaningful oversight of decision-making.
Qualitative interview evidence (n=22) where participants described loss of skills/deskilling associated with genAI use and concerns about oversight.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... deskilling / erosion of practitioner skills and oversight capacity
The decision of whether or not to adopt genAI was often outside recruiters' control, with many feeling compelled to adopt due to directives from higher-ups in their business.
Reports from interviewed recruiters (n=22) indicating organizational pressure and top-down calls to integrate AI.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... decision-making autonomy over tool adoption
Recruiters believe they have final authority across the recruiting pipeline, but genAI has become an invisible architect shaping the foundational information used for evaluation (e.g., defining a job, determining what counts as a good interview performance).
Qualitative findings from interviews with 22 recruiting professionals describing perceived authority versus the influence of genAI on informational inputs.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... perceived decision authority vs. shaping of evaluation criteria
GenAI subtly influences control over everyday recruiting workflows and individual hiring decisions.
Qualitative evidence from semi-structured interviews with 22 recruiting professionals (n=22).
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... perceived control/agency in workflows and hiring decisions
AIOs are less robust to minor query edits.
Experiments applying small edits to queries and measuring changes in AIO outputs; observed larger changes for AIOs compared to traditional search.
high negative How Generative AI Disrupts Search: An Empirical Study of Goo... robustness of results to minor query edits
AIOs are less consistent when processing two runs of the same query.
Repeated-query experiments (running the same query multiple times) comparing AIO outputs across runs and measuring variability; paper reports greater run-to-run inconsistency for AIOs.
high negative How Generative AI Disrupts Search: An Empirical Study of Goo... run-to-run consistency/variability of AIO outputs
Websites that block Google's AI crawler are significantly less likely to be retrieved by AIOs, despite having access to the content.
Comparison of retrieval frequency in AIOs for domains that block Google's AI crawler versus domains that do not, using the benchmark set of queries and observed crawl/access signals.
high negative How Generative AI Disrupts Search: An Empirical Study of Goo... likelihood/frequency of being retrieved in AIOs for crawler-blocking vs non-bloc...
AI-adopting firms anticipate smaller increases in their own prices and lower medium- to long-term inflation than non-adopters.
Survey questions on firms' price-change expectations and macro inflation expectations, comparing responses of adopting vs non-adopting firms.
high negative The economic impact of artificial intelligence: evidence fro... firms' expected own price increases and medium- to long-term inflation expectati...
AI adoption leads to a contraction of blue-collar employment.
Difference-in-differences analysis of administrative employer–employee records showing decreases in blue-collar employment associated with adoption.
high negative The economic impact of artificial intelligence: evidence fro... blue-collar employment (count or share)
Boundary conditions limit UCF applicability in contexts requiring human accountability or embodied knowledge.
Author-stated caveat in the abstract identifying contexts (accountability, embodied knowledge) where the framework may not apply; theoretical reasoning, no empirical tests.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... limits to applicability of UCF where human accountability or embodied knowledge ...
Existing frameworks (Transaction Cost Economics and Electronic Markets Hypothesis) cannot explain emerging organizational phenomena like GitHub Copilot’s recursive value creation or AI-mediated expert networks.
Conceptual critique in the position paper using illustrative examples (GitHub Copilot, AI-mediated expert networks); no empirical testing or sample provided.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... theoretical explanatory adequacy of extant organizational frameworks
AI governance, ethical concerns, openness, workforce adjustment, and integration complexity are crucial concerns that managers must consider when implementing AI.
Synthesis of risks and challenges reported across the reviewed literature (paper's discussion/conclusion); no specific counts of studies or empirical measures provided in the abstract.
high negative Artificial intelligence, machine learning, and deep learning... governance and ethical risks, workforce adjustment challenges, system integratio...
Conventional managerial practices usually encounter difficulties dealing with the flow of information, ineffectiveness of workflow, slow decision making, and redundant administrative processes.
Background statement in the paper's introduction / literature review (narrative claim based on surveyed literature); no specific empirical study or sample size reported in the abstract.
high negative Artificial intelligence, machine learning, and deep learning... information flow, workflow effectiveness, decision speed, administrative redunda...
The research also identifies policy loopholes and unequal AI preparedness on the continent.
Findings from the paper's systematic review highlighting gaps in policy frameworks and uneven preparedness across Sub‑Saharan African countries; no country‑level counts or indices provided in the summary.
high negative The Impact of AI-Driven Automation on Semi and Unskilled Wor... presence of policy gaps and heterogeneity in AI preparedness across countries
Results indicate rising job displacement, industrial change, and inequality.
Aggregate findings reported from the systematic review pointing to increases in job displacement, structural industrial change, and inequality across studies; no aggregated numerical magnitudes provided in the summary.
high negative The Impact of AI-Driven Automation on Semi and Unskilled Wor... incidence of job displacement; extent of industrial/structural change; levels of...