The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6507 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 736 1615
Governance & Regulation 664 329 160 99 1273
Organizational Efficiency 624 143 105 70 949
Technology Adoption Rate 502 176 98 78 861
Research Productivity 348 109 48 322 836
Output Quality 391 120 44 40 595
Firm Productivity 385 46 85 17 539
Decision Quality 275 143 62 34 521
AI Safety & Ethics 183 241 59 30 517
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 105 40 6 187
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 78 8 1 151
Regulatory Compliance 69 64 14 3 150
Training Effectiveness 81 15 13 18 129
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Productivity Remove filter
Current AI agents implement only the first half of CLS (fast exemplar/hippocampal-style storage) and lack the slow weight-consolidation half.
Analytic claim in paper comparing current AI agent designs to CLS; no empirical evaluation reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory presence/absence of slow weight-consolidation mechanisms in AI agents
Agents that rely only on lookup are structurally vulnerable to persistent memory poisoning as injected content propagates across all future sessions.
Theoretical/security argument presented in paper; claims about propagation of injected content across sessions; no empirical attack experiments detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory vulnerability to persistent memory poisoning
Conflating the two produces agents that face a provable generalization ceiling on compositionally novel tasks that no increase in context size or retrieval quality can overcome.
Formal claim asserted in paper (formalization of limitations and proofs claimed); no empirical sample detailed in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory generalization performance on compositionally novel tasks
Conflating retrieval and weight-based memory produces agents that accumulate notes indefinitely without developing expertise.
Theoretical argument/formalization presented in paper; claim based on analysis of how lookup-only systems fail to consolidate abstract knowledge; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory expertise development / continued accumulation of notes
Treating lookup as memory is a category error with provable consequences for security.
Theoretical/formal argument and formalization in paper; security consequences (e.g., persistent poisoning) claimed; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory security (vulnerability to persistent memory poisoning)
Treating lookup as memory is a category error with provable consequences for long-term learning.
Theoretical/formal argument asserted in the paper, drawing on formalization and Complementary Learning Systems theory; no empirical sample reported in abstract.
high negative Contextual Agentic Memory is a Memo, Not True Memory long-term learning
Treating lookup as memory is a category error with provable consequences for agent capability.
Theoretical/formal argument asserted in the paper (formalization and proofs claimed); no empirical sample reported in abstract.
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup.
Conceptual/analytic claim stated in paper; supported by comparison of existing agent memory mechanisms (vector stores, RAG, scratchpads, context-window management) to the paper's definition of 'memory'. No empirical sample reported.
high negative Contextual Agentic Memory is a Memo, Not True Memory whether systems implement memory vs. lookup
Obstacles exist for healthcare workers in rural areas that limit the benefits of technology.
Review conclusion noting persistent obstacles for rural healthcare workers drawn from the literature; synthesis of qualitative/quantitative sources (no sample size in excerpt).
high negative A Comprehensive Review of Technology Adoption and Its Impact... barriers to technology benefits in rural healthcare
Indian healthcare faces barriers to technological integration such as financial issues, poor infrastructure, and regulatory problems.
Review-identifed barriers drawn from the literature (qualitative and quantitative studies summarized by the authors); no aggregate sample size reported in the excerpt.
high negative A Comprehensive Review of Technology Adoption and Its Impact... barriers to technology adoption
Algorithmic collusion is a new form of market failure arising from the agentic economy.
Theoretical claim and analysis of market failure mechanisms; no empirical antitrust cases or simulation evidence included in the provided text.
high negative DIGITAL AGENTS AS FUNCTIONAL EQUIVALENTS OF ECONOMIC ACTORS:... existence/emergence of algorithmic collusion as market failure
The marginal gains from genAI came at the high cost of recruiter deskilling, a trend that jeopardizes meaningful oversight of decision-making.
Qualitative interview evidence (n=22) where participants described loss of skills/deskilling associated with genAI use and concerns about oversight.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... deskilling / erosion of practitioner skills and oversight capacity
The decision of whether or not to adopt genAI was often outside recruiters' control, with many feeling compelled to adopt due to directives from higher-ups in their business.
Reports from interviewed recruiters (n=22) indicating organizational pressure and top-down calls to integrate AI.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... decision-making autonomy over tool adoption
Recruiters believe they have final authority across the recruiting pipeline, but genAI has become an invisible architect shaping the foundational information used for evaluation (e.g., defining a job, determining what counts as a good interview performance).
Qualitative findings from interviews with 22 recruiting professionals describing perceived authority versus the influence of genAI on informational inputs.
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... perceived decision authority vs. shaping of evaluation criteria
GenAI subtly influences control over everyday recruiting workflows and individual hiring decisions.
Qualitative evidence from semi-structured interviews with 22 recruiting professionals (n=22).
high negative Resume-ing Control: (Mis)Perceptions of Agency Around GenAI ... perceived control/agency in workflows and hiring decisions
AI-adopting firms anticipate smaller increases in their own prices and lower medium- to long-term inflation than non-adopters.
Survey questions on firms' price-change expectations and macro inflation expectations, comparing responses of adopting vs non-adopting firms.
high negative The economic impact of artificial intelligence: evidence fro... firms' expected own price increases and medium- to long-term inflation expectati...
AI adoption leads to a contraction of blue-collar employment.
Difference-in-differences analysis of administrative employer–employee records showing decreases in blue-collar employment associated with adoption.
high negative The economic impact of artificial intelligence: evidence fro... blue-collar employment (count or share)
Boundary conditions limit UCF applicability in contexts requiring human accountability or embodied knowledge.
Author-stated caveat in the abstract identifying contexts (accountability, embodied knowledge) where the framework may not apply; theoretical reasoning, no empirical tests.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... limits to applicability of UCF where human accountability or embodied knowledge ...
Existing frameworks (Transaction Cost Economics and Electronic Markets Hypothesis) cannot explain emerging organizational phenomena like GitHub Copilot’s recursive value creation or AI-mediated expert networks.
Conceptual critique in the position paper using illustrative examples (GitHub Copilot, AI-mediated expert networks); no empirical testing or sample provided.
high negative Beyond markets and hierarchies: How GenAI enables unbounded ... theoretical explanatory adequacy of extant organizational frameworks
AI governance, ethical concerns, openness, workforce adjustment, and integration complexity are crucial concerns that managers must consider when implementing AI.
Synthesis of risks and challenges reported across the reviewed literature (paper's discussion/conclusion); no specific counts of studies or empirical measures provided in the abstract.
high negative Artificial intelligence, machine learning, and deep learning... governance and ethical risks, workforce adjustment challenges, system integratio...
Conventional managerial practices usually encounter difficulties dealing with the flow of information, ineffectiveness of workflow, slow decision making, and redundant administrative processes.
Background statement in the paper's introduction / literature review (narrative claim based on surveyed literature); no specific empirical study or sample size reported in the abstract.
high negative Artificial intelligence, machine learning, and deep learning... information flow, workflow effectiveness, decision speed, administrative redunda...
Vulnerable populations—including low-skill workers, aging labour forces, and developing economies—are especially affected by AI-driven changes.
Abstract highlights special attention to vulnerable populations in the review and asserts differential impacts; no specific empirical estimates or sample sizes provided in abstract.
high negative AI and the Transformation of Human Employment: Challenges, O... distributional effects / disproportionate adverse impacts on vulnerable groups
AI displaces routine cognitive and manual tasks.
Explicit finding reported in abstract based on the paper's systematic review of empirical studies (no individual study sample sizes or quantitative estimates provided in abstract).
high negative AI and the Transformation of Human Employment: Challenges, O... displacement of routine tasks / job_displacement for routine roles
Persistent AI memory reduced to a retrieval problem (store prior interactions as text, embed them, and ask the model to recover relevant context later) is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns.
Argument and conceptual analysis presented in the paper describing types of operations (exact facts, updates/deletions, aggregation, relations, negative queries, explicit unknowns) that retrieval-style memory fails to satisfy; no sample size or quantitative evaluation provided for this specific claim in the excerpt.
high negative From Unstructured Recall to Schema-Grounded Memory: Reliable... suitability of retrieval-only memory designs for production agent memory needs
This stratification produces trust-based inequality in who can leverage AI while sustaining credibility, voice, and liveness.
Analytical claim based on patterns in 16 interviews indicating differential capacities to conceal/humanize AI lead to unequal ability to both use AI and maintain audience trust and perceived authenticity.
high negative AI passing and invisible authenticity labor: trust vulnerabi... inequality in access to benefits of AI conditioned on ability to sustain trust/c...
Passing capacity is stratified by educational and professional capital, economic resources and team support, and platform position.
Interview evidence (n=16) showing creators with higher education/professional capital, more economic resources, team support, or advantageous platform positions report greater ability to conceal and perform AI-assisted content.
high negative AI passing and invisible authenticity labor: trust vulnerabi... variation in ability to perform 'AI passing' across creators
These invisible authenticity practices reallocate work from generation to downstream repair and performance, complicating claims that AI simply improves efficiency.
Derived from creators' accounts in 16 interviews describing extra downstream editing, verification, and performance labor required after AI generation.
high negative AI passing and invisible authenticity labor: trust vulnerabi... shift in locus of work and implications for efficiency
Creators associate legible AI assistance with intertwined trust vulnerabilities, including epistemic unreliability, anticipated relational penalties, and platform authenticity regimes.
Thematic findings from 16 interviews in which creators express concerns about AI-generated content being epistemically unreliable, damaging relationships with audiences, and conflicting with platform authenticity norms.
high negative AI passing and invisible authenticity labor: trust vulnerabi... perceived trust vulnerabilities tied to visible AI assistance
On authenticity-oriented platforms, visible use of AI can be discrediting for creators.
Reported by creators across 16 in-depth interviews on Xiaohongshu and Douyin; qualitative thematic analysis identifying platform-specific authenticity norms and reputational consequences.
high negative AI passing and invisible authenticity labor: trust vulnerabi... perceived reputational/discrediting effects of visible AI use
Leaderboard rank alone is insufficient because models with similar pass rates can diverge in overall completion, and task-level discrimination concentrates in a middle band of tasks.
Analytical observations from benchmark results comparing pass rates, overall completion metrics, and per-task discrimination patterns across models; based on the 13-model leaderboard analysis.
high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... correspondence between leaderboard rank, pass rate, and overall completion; task...
Experiments reveal that reliable workflow automation remains far from solved: the leading model passes only 66.7% of tasks and no model reaches 70%.
Experimental evaluation of 13 frontier models on 105 tasks; reported pass rates from the benchmark runs (leading model pass rate 66.7%, no model >=70%).
high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... task pass rate (task completion success)
Many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed.
Qualitative critique in the paper comparing existing benchmark design choices; based on authors' survey/analysis of prevailing benchmark practices (no explicit systematic review sample size reported).
high negative Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-Wor... benchmark design adequacy for evolving workflow demand and execution verifiabili...
The 2026 Amazon outages illustrate how 'mechanized convergence' (homogenization of code/engineering practices via AI) leads to systemic fragility.
Case study analysis using the 2026 Amazon outages as a single illustrative example; implies qualitative examination of that event.
high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... systemic fragility as evidenced by outage events (2026 Amazon outages case study...
Recursive training on synthetic code threatens to homogenize the global software reservoir, diminishing the variance required for robust engineering.
Theoretical claim about dataset/model feedback loops; no empirical quantification provided in the text excerpt (argumentative risk assessment).
high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... variance/diversity in global software codebase
This epistemological debt erodes the mental models essential for root-cause analysis, widening the gap between system complexity and human comprehension.
Argumentative/theoretical claim supported by reasoning in the paper; no quantified measurement of mental-model erosion reported.
high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... quality/robustness of engineers' mental models and root-cause analysis capabilit...
Substituting logical derivation with passive AI verification creates an 'Epistemological Debt' — a hidden carrying cost incurred by engineers.
Theoretical/conceptual assertion within the paper; argued qualitatively rather than demonstrated with controlled empirical data.
high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... accumulation of epistemic/knowledge debt among engineers
The integration of Large Language Models (LLMs) into the software development lifecycle (SDLC) masks a critical socio-technical failure the authors term 'Cognitive-Systemic Collapse.'
Conceptual/theoretical claim presented in the paper's argumentation; no empirical sample or quantitative study reported for this specific naming claim.
high negative Cognitive Atrophy and Systemic Collapse in AI-Dependent Soft... socio-technical system failure risk (Cognitive-Systemic Collapse)
Most studies are exploratory (59%) and methodologically diverse, but there is a lack of longitudinal and team-based evaluations.
Authors report study typology counts and note the absence of longitudinal and team-based designs across the reviewed literature.
high negative The Impact of LLM-Assistants on Software Developer Productiv... study design types and presence/absence of longitudinal or team-based evaluation...
Studies highlight concerns around cognitive offloading and reduced team collaboration when using LLM-assistants.
Synthesis of reported negative effects in included studies (themes extracted by the authors).
high negative The Impact of LLM-Assistants on Software Developer Productiv... cognitive processes and team collaboration
A notable subset of studies identifies critical risks associated with LLM-assistants.
Synthesis across included studies noting reported risks (e.g., cognitive offloading, collaboration issues).
high negative The Impact of LLM-Assistants on Software Developer Productiv... reported risks and negative impacts
Answer completeness averages 0.40.
Reported average completeness metric for system answers on EnterpriseDocBench (method for computing completeness not given in excerpt).
high negative Benchmarking Complex Multimodal Document Processing Pipeline... answer completeness (average completeness score)
Hallucination rate does not grow monotonically with document length: short documents and very long ones both hallucinate more than medium ones (28.1% and 23.8% vs. 9.2%).
Empirical measurement of hallucination rates by document-length buckets on EnterpriseDocBench; percentages reported in paper. Sample sizes per bucket not provided in excerpt.
high negative Benchmarking Complex Multimodal Document Processing Pipeline... hallucination rate (fraction of generated outputs judged hallucinated)
Strong heuristic, single-agent RL, and multi-agent RL baselines (including Greedy, SAC, MAPPO, and MADDPG) achieved net profit in the range $0.58M--$0.70M in the same experiments.
Empirical comparison in the paper's experiments on the NYC-taxi-based EV fleet simulator listing baseline methods and their reported net profits ($0.58M--$0.70M).
high negative Semi-Markov Reinforcement Learning for City-Scale EV Ride-Ha... net profit of baseline methods (Greedy, SAC, MAPPO, MADDPG)
Monthly operational cost of running the system is approximately USD 4,000.
Full-scale performance characterization reports monthly cost estimate of approximately USD 4,000.
Prior work has largely focused on developing novel cooperative architectures while overlooking the question of when joint training is necessary.
Literature-review style claim made in the paper asserting a gap in prior research emphasis (novel cooperative architectures) versus investigation of training modality necessity.
high negative An Analysis of the Coordination Gap between Joint and Modula... research focus (coverage of training-modality necessity in prior literature)
The coordination gap advantage (between joint and modular training) diminishes in bottleneck environments, particularly under severe transport and processing constraints.
Results from a sensitivity analysis varying resource scarcity and temporal dominance showing the relative performance gap shrinks under bottleneck conditions with tight transport and processing constraints. Details on experimental scenarios not provided in the abstract.
high negative An Analysis of the Coordination Gap between Joint and Modula... coordination gap (performance difference between training modalities)
The framework addresses emerging tensions captured in the Creativity Paradox, whereby GenAI may weaken intrinsic motivation, conceptual risk-taking, and evaluative depth.
Theoretical extension of paradox theory and conceptual discussion of potential negative effects; presented as conceptual risks rather than empirically demonstrated outcomes.
high negative Beyond the Creativity Paradox: A Theory-informed Framework f... intrinsic motivation, conceptual risk-taking, evaluative depth
Manual tools like mind maps support structure creation but lack intelligent (AI) assistance.
Paper's comparison of manual tools versus AI-augmented tools (background/related-work discussion; no empirical evaluation reported for this claim).
high negative MindTrellis: Co-Creating Knowledge Structures with AI throug... presence of intelligent assistance in manual structure-creation tools
Current LLM-based systems let users query information but do not let users shape how knowledge is organized.
Paper's analysis of existing tools and limitations (literature/feature comparison described in introduction; no new empirical test reported).
high negative MindTrellis: Co-Creating Knowledge Structures with AI throug... capability to shape knowledge organization in LLM-based systems
Knowledge workers face increasing challenges in synthesizing information from multiple documents into structured conceptual understanding.
Statement in paper's introduction/motivation; conceptual observation (no empirical data reported here).
high negative MindTrellis: Co-Creating Knowledge Structures with AI throug... ability to synthesize information from multiple documents into structured concep...