Evidence (4175 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Land-transfer effects on AGTFP are positive but constrained: institutional frictions limit the contribution of land transfer to green transformation.
Mediation results indicating a positive but limited indirect effect via land transfer/scale expansion, supplemented by discussion of institutional barriers in the paper.
The Order should be read as policy that privileges state and cloud-provider access over broader democratic accountability and social considerations (labor, education, culture, the commons).
Synthesis of textual absence of social-domain terms in the EO, the EO's access/control provisions, and the paper's political-economic critique.
Structurally, the Order is not deregulation but re-regulation centered on state access and cloud rent—a policy instantiation of technofeudalism with a security face.
Political-economic analysis connecting EO provisions (access, testing, state capabilities) with literature on cloud capital and technofeudalism (e.g., Varoufakis) and the paper's archival operators.
The Order mandates testing for 'advanced cyber capabilities' but omits or fails to adopt benchmark frameworks (e.g., Reasoning Under Load (RUL), PER, DSL, IPF, Diversity Contraction, Constitutive Provenance) that the Crimson Hexagonal Archive has deposited.
Comparative policy analysis between the EO's testing mandate language and the list of evaluation frameworks deposited by the Crimson Hexagonal Archive; textual absence of those benchmarks in the EO.
The Order's call for a 'voluntary' corporate framework operates as a 'Mediation Ratchet' that strengthens corporate governance control rather than providing substantive public protections.
Critical/theoretical reading of the Order's voluntary mechanisms combined with the paper's Mediation Ratchet concept.
The Order formalizes an 'AI caste system' that stratifies access into public tiers (e.g., Opus 4.8) and frontier/privileged tiers (e.g., Mythos Preview / Glasswing).
Policy text read against observed product/access tiers in industry; theoretical framing of access stratification.
The paper presents the 'Anthropic arc' (Feb 27 supply-chain-risk designation → June 1 IPO filing → June 2 EO endorsement) as a worked example of 'Institutional-Prior Foreclosure' via state co-optation of a firm.
Chronological mapping of public events (designation, IPO filing, EO) and interpretive analysis linking them as an example of state-firm coordination/co-optation.
Governance ambiguity is responsible for 61% of hybrid workflow failures (and the framework aims to remediate this).
Paper reports 'governance ambiguity responsible for 61% of hybrid workflow failures' as a documented gap; no methodological details or sample size provided in the abstract.
Attribution failures occur in 68% of organizations (and the framework addresses these attribution failures).
Paper states 'attribution failures in 68% of organizations' as a documented gap the constructs address; abstract does not report study method or sample size behind the 68% figure.
Public discourse often portrays AI as a threat to employment.
Statement in the paper summarizing public/media discourse; no specific survey or corpus size reported in the excerpt.
The lack of a relationship between prior productivity and AI adoption points to organizational readiness as a key barrier to AI diffusion.
Interpretation/inference based on the null finding that prior productivity does not predict adoption and the observed associations with digital infrastructure and management practices in the survey data.
Developer adoption has overwhelmingly favored orchestration (despite the viability of subterranean agents).
Author observation/claim about adoption trends (contrasting prior-works' feasibility with observed developer choices, likely based on ecosystem signals such as project popularity and usage).
Without design corrections that better align AI development with workers' needs, workplace AI incidents are likely to persist, causing the invisible erosion of worker agency and organizational productivity.
Interpretation / implication drawn from empirical findings (high prevalence of misalignments and developer-driven misalignments); presented as a policy/design recommendation and projected outcome.
When deliberation tools are distributed across a hierarchy they can interact destructively (a 'deliberation cascade'), producing substantially worse returns and higher token costs than hierarchy alone.
Observed cross-configuration pattern labeled in the paper as a 'deliberation cascade', supported by empirical comparisons showing degraded mean returns and increased token usage for distributed deliberation across model families in the 3,475-episode evaluation.
AI adoption may reinforce, rather than mitigate, the challenges arising from internal divisions within TMTs, with respect to environmental strategic decision-making.
Interpretation/implication drawn from the empirical findings (negative moderation and moderated mediation involving AI) based on the panel analyses (35,347 firm-year observations).
Existing assessments that rely predominantly on patent statistics and structural network centralities dilute substantive technological strengths and thus can obscure hidden core innovators in knowledge-intensive domains such as AI.
Argument supported by comparative analysis in this study showing differences between capability-driven identification and traditional patent/centrality-based approaches using the 282,778 Chinese AI patents.
The proliferation into competing Shapley formulations has created a fragmented landscape with little consensus on practical deployment.
Motivating literature review and discussion in the paper noting multiple competing Shapley variants and lack of consensus on practical deployment decisions.
Because experienced workers are aging out of the workforce, simultaneous curtailment of formative occupational layers by platforms may create a shortage of workers able to manage complex systems.
Argument combining demographic observation (aging workforce) with the paper's theoretical claim about erosion of entry-level apprenticeship layers; no empirical test or quantified projection provided.
Because aggressive compression shifts interpretive burden to the model's reasoning phase, aggressive token compression can paradoxically increase overall cost.
Interpretation/explanation of the experimental result (causal mechanism proposed by authors) linking compression to increased reasoning burden; supported by the reported experiment but mechanism is inferential rather than directly measured in abstract.
Scaling intelligence alone will not solve coordination problems in multi-agent systems and will require deliberate cooperative design, even when helping others costs nothing.
Conclusion drawn from the paper's experimental findings (comparative performance across models and responses to targeted interventions); presented as a general implication in the abstract.
The AI-as-advisor approach has limitations: people frequently ignore accurate advice, rely too much on inaccurate advice, and their decision-making skills may deteriorate over time.
Paper asserts these limitations in motivation/background and/or derives them from observed behavior in experiments (stated in abstract as known problems with AI-as-advisor).
Our findings surface practical limits on the complexity people can manage in human-AI negotiation.
Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.
Beyond an environment-specific optimum, scaling further degrades institutional fitness because trust erosion and cost penalties outweigh marginal capability gains.
Analytical argument from the Institutional Scaling Law together with illustrative examples and discussion of mechanisms (trust erosion, cost penalties) in the paper.
Traditional ex ante regulatory approaches struggle to keep pace with AI development, exacerbating the 'pacing problem' and the Collingridge dilemma.
Theoretical/legal literature review and conceptual argument presented in the paper (no empirical sample or quantitative data reported in the abstract).
Low internal conflict or unanimity can be diagnostic of variance depletion (i.e., exclusion) rather than healthy integration, so governance systems should treat low conflict as a potential red flag until heterogeneity integration is verified.
Interpretive policy implication derived from the model's demonstration that exclusionary processes can produce deceptively low observed disagreement while increasing fragility; this recommendation is based on theoretical reasoning without empirical validation in the paper.
The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.
Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).
AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.
Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.
AI can augment measurement (e.g., collaboration patterns, output tracking) but if poorly designed may reinforce visibility biases that disadvantage remote workers.
Theoretical reasoning and literature citations about algorithmic bias and monitoring; illustrated with secondary examples rather than primary empirical tests.
Hybrid arrangements can exacerbate inequities in access to informal networks and career advancement, often privileging co-located or better-networked employees.
Theoretical integration of sociological and management studies with comparative case illustrations; secondary data examples referenced but no new causal empirical tests reported.
Hybrid and remote work create risks of professional invisibility, fragmented social networks, and unequal access to workplace social capital.
Literature synthesis and illustrative case studies drawn from secondary sources; qualitative/comparative case evidence rather than primary quantitative data.
Traditional STP showed a 67% performance decline after six months in unstable market conditions.
Empirical observation reported in the study—likely derived from simulation scenarios and/or longitudinal analysis of behavioral data; precise data source (simulation vs. observed field data), statistical tests, and sample framing are not specified in the summary.
The persistence of interpretive, human-in-the-loop evaluation implies ongoing labor requirements (annotation, sense-making, governance roles), affecting forecasts of automation and labor substitution in sectors adopting LLMs.
Interview reports describing continued manual work for evaluation tasks across participants; authors draw implications for labor demand.
The under‑use of external text sources in the reviewed literature may be due to privacy, legal/regulatory uncertainty, or integration costs.
Authors' interpretation linking observed low coverage of external text sources (social media, news, reviews) in the 109 articles to plausible barriers (privacy/regulation/integration); no direct empirical test in the review.
Widespread deployment of similar models could create correlated failures or fraud vectors, implying systemic risk that may warrant macroprudential attention.
Analytic caution based on model homogeneity and case/literature discussion; speculative systemic risk concern rather than empirically demonstrated.
There is regulatory uncertainty around AI-generated filings and responsibility/liability for automated outputs.
Analysis and literature review discuss unclear regulatory positions and legal risks noted in case organizations' deployment considerations.
Integration complexity with legacy ERP/financial systems and sharing-center processes is a significant implementation challenge.
Case study narratives describe integration work and friction points; analytic framing highlights ERP compatibility issues.
Model hallucinations, lack of explainability, and limited audit trails limit safe adoption.
Paper cites literature and case observations about model reliability and explainability issues; examples and discussion are qualitative.
Data privacy, confidentiality, and cross-border data transfer concerns are important barriers to deployment.
Challenges enumerated from case studies and literature; specific organizational concerns cited in cases (Xiaomi, Deloitte) and in regulatory discussion.
Explainability, auditability, or data-localization requirements could favor larger vendors with compliance capacity, increasing market concentration and affecting competition among AI suppliers.
Market-structure argument grounded in regulatory-compliance burden analysis and comparative examples; not supported by empirical market data in the study.
Legal uncertainty and strict procedural requirements increase compliance costs and regulatory risk, which can slow AI adoption by firms and public agencies.
Theoretical economic implications drawn from legal analysis and comparative observations; no empirical measurement of costs or adoption rates in the study.
AI can restrict or reshape human administrative discretion in legally sensitive ways.
Doctrinal analysis of statutory specificity and formal procedural requirements in civil-law contexts, illustrated with Vietnam as the exemplar case; comparative observations.
Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky.
Conceptual argumentation using known infrastructure constraints and economic reasoning about diminishing returns; no new empirical assessment or quantified risk analysis included.
Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems.
Argument based on architectural patterns (multi-step reasoning, retrieval augmentation, multiple model passes) and reported per-query compute multipliers in auxiliary literature (referenced anecdotally); the paper provides no new benchmarked per-query compute measurements.
The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic.
Synthesis of industry observations and early/anecdotal quantitative reports on operational workloads; no original empirical time-series or workload measurements provided in this paper.
Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns).
Conceptual synthesis and argumentation drawing on observed industry trends (training/inference cost growth), infrastructure constraints (grid reliability, data-center cooling/water use) and theoretical diminishing marginal returns on model/data scaling. No new empirical dataset or controlled experiments reported in the paper.
Field observations from an enterprise deployment demonstrate production failure modes traceable to missing identity propagation, timeout/budgeting policies, and machine-readable error semantics.
Empirical context described as field lessons from an enterprise agent platform integrated with a major cloud provider's MCP servers; production failure vignettes and operational log analysis (client redacted).
MCP lacks three protocol-level primitives needed for reliable, production-scale agent operation: identity propagation, adaptive tool budgeting, and structured error semantics.
Observational analysis and classification of production failures from an enterprise agent deployment; taxonomy of failure modes identifying gaps in these specific areas.
Agents that attempt to infer others' reasoning depth may be vulnerable to strategic misrepresentation (partners could behave to induce incorrect ToM estimates).
Conceptual analysis in the paper and discussion of strategic incentives; paper also identifies the risk and suggests potential mitigations (e.g., conservatism, verification, meta-reasoning).
Both too little and too much recursive reasoning (i.e., too shallow or too deep ToM) can produce poor joint behavior — miscalibrated anticipation harms coordination.
Observed non-monotonic effects in the reported experiments where fixed-order agents at either low or high ToM orders performed worse in mismatched pairings; evidence comes from the same multi-environment evaluation using joint-payoff / success-rate metrics.
Misalignment in Theory-of-Mind (ToM) order between agents (i.e., agents using different recursive reasoning depths) degrades coordination performance.
Empirical experiments using LLM-driven agents with configurable ToM depth across four coordination environments (a repeated matrix game, two grid navigation tasks, and an Overcooked task); comparisons of matched (same-order) vs mismatched (different-order) pairings using task-specific joint payoffs and success rates as metrics.