Evidence (4175 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

Land-transfer effects on AGTFP are positive but constrained: institutional frictions limit the contribution of land transfer to green transformation.

Mediation results indicating a positive but limited indirect effect via land transfer/scale expansion, supplemented by discussion of institutional barriers in the paper.

medium mixed Digital rural development and agricultural green total facto... Land transfer / scale expansion (mediator) and AGTFP

The Order should be read as policy that privileges state and cloud-provider access over broader democratic accountability and social considerations (labor, education, culture, the commons).

Synthesis of textual absence of social-domain terms in the EO, the EO's access/control provisions, and the paper's political-economic critique.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... privileging of state/cloud access relative to social domains

Structurally, the Order is not deregulation but re-regulation centered on state access and cloud rent—a policy instantiation of technofeudalism with a security face.

Political-economic analysis connecting EO provisions (access, testing, state capabilities) with literature on cloud capital and technofeudalism (e.g., Varoufakis) and the paper's archival operators.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... regulatory orientation (deregulation vs re-regulation) and concentration of rent...

The Order mandates testing for 'advanced cyber capabilities' but omits or fails to adopt benchmark frameworks (e.g., Reasoning Under Load (RUL), PER, DSL, IPF, Diversity Contraction, Constitutive Provenance) that the Crimson Hexagonal Archive has deposited.

Comparative policy analysis between the EO's testing mandate language and the list of evaluation frameworks deposited by the Crimson Hexagonal Archive; textual absence of those benchmarks in the EO.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... adequacy/coverage of testing benchmarks for AI evaluation

The Order's call for a 'voluntary' corporate framework operates as a 'Mediation Ratchet' that strengthens corporate governance control rather than providing substantive public protections.

Critical/theoretical reading of the Order's voluntary mechanisms combined with the paper's Mediation Ratchet concept.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... effect of voluntary frameworks on corporate governance and public accountability

The Order formalizes an 'AI caste system' that stratifies access into public tiers (e.g., Opus 4.8) and frontier/privileged tiers (e.g., Mythos Preview / Glasswing).

Policy text read against observed product/access tiers in industry; theoretical framing of access stratification.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... stratification of model access / tiered access policy

The paper presents the 'Anthropic arc' (Feb 27 supply-chain-risk designation → June 1 IPO filing → June 2 EO endorsement) as a worked example of 'Institutional-Prior Foreclosure' via state co-optation of a firm.

Chronological mapping of public events (designation, IPO filing, EO) and interpretive analysis linking them as an example of state-firm coordination/co-optation.

medium negative The Security Frame Is a Selection Kernel: Trump's AI Executi... state influence / preferential treatment of firms (institutional foreclosure)

Governance ambiguity is responsible for 61% of hybrid workflow failures (and the framework aims to remediate this).

Paper reports 'governance ambiguity responsible for 61% of hybrid workflow failures' as a documented gap; no methodological details or sample size provided in the abstract.

medium negative Workforce Unit Abstraction for Governing Hybrid Human and Ar... proportion of hybrid workflow failures attributed to governance ambiguity

Attribution failures occur in 68% of organizations (and the framework addresses these attribution failures).

Paper states 'attribution failures in 68% of organizations' as a documented gap the constructs address; abstract does not report study method or sample size behind the 68% figure.

medium negative Workforce Unit Abstraction for Governing Hybrid Human and Ar... prevalence of performance attribution failures across organizations

Public discourse often portrays AI as a threat to employment.

Statement in the paper summarizing public/media discourse; no specific survey or corpus size reported in the excerpt.

medium negative From Automation Panic to Workforce Resilience: A Governance ... public portrayal of AI's employment impact

The lack of a relationship between prior productivity and AI adoption points to organizational readiness as a key barrier to AI diffusion.

Interpretation/inference based on the null finding that prior productivity does not predict adoption and the observed associations with digital infrastructure and management practices in the survey data.

medium negative The Adoption of Industrial AI in America organizational readiness as a barrier to AI diffusion

Developer adoption has overwhelmingly favored orchestration (despite the viability of subterranean agents).

Author observation/claim about adoption trends (contrasting prior-works' feasibility with observed developer choices, likely based on ecosystem signals such as project popularity and usage).

medium negative Compiling Agentic Workflows into LLM Weights: Near-Frontier ... developer adoption preference (orchestration vs. subterranean agents)

Without design corrections that better align AI development with workers' needs, workplace AI incidents are likely to persist, causing the invisible erosion of worker agency and organizational productivity.

Interpretation / implication drawn from empirical findings (high prevalence of misalignments and developer-driven misalignments); presented as a policy/design recommendation and projected outcome.

medium negative The Quiet Path from Seemingly Minor Design Errors to Workpla... persistence of incidents and resulting erosion of worker agency and organization...

When deliberation tools are distributed across a hierarchy they can interact destructively (a 'deliberation cascade'), producing substantially worse returns and higher token costs than hierarchy alone.

Observed cross-configuration pattern labeled in the paper as a 'deliberation cascade', supported by empirical comparisons showing degraded mean returns and increased token usage for distributed deliberation across model families in the 3,475-episode evaluation.

medium negative Context, Reasoning, and Hierarchy: A Cost-Performance Study ... mean return and token consumption

AI adoption may reinforce, rather than mitigate, the challenges arising from internal divisions within TMTs, with respect to environmental strategic decision-making.

Interpretation/implication drawn from the empirical findings (negative moderation and moderated mediation involving AI) based on the panel analyses (35,347 firm-year observations).

medium negative When AI Amplifies Negative Echoes: CEO–TMT Faultlines, Eco-A... organizational attention and green innovation (strategic decision-making outcome...

Existing assessments that rely predominantly on patent statistics and structural network centralities dilute substantive technological strengths and thus can obscure hidden core innovators in knowledge-intensive domains such as AI.

Argument supported by comparative analysis in this study showing differences between capability-driven identification and traditional patent/centrality-based approaches using the 282,778 Chinese AI patents.

medium negative Technological capability and innovation network resilience: ... adequacy of patent-count and centrality-based assessments to capture technologic...

The proliferation into competing Shapley formulations has created a fragmented landscape with little consensus on practical deployment.

Motivating literature review and discussion in the paper noting multiple competing Shapley variants and lack of consensus on practical deployment decisions.

medium negative Rethinking XAI Evaluation: A Human-Centered Audit of Shapley... degree of consensus on practical deployment of Shapley formulations

Because experienced workers are aging out of the workforce, simultaneous curtailment of formative occupational layers by platforms may create a shortage of workers able to manage complex systems.

Argument combining demographic observation (aging workforce) with the paper's theoretical claim about erosion of entry-level apprenticeship layers; no empirical test or quantified projection provided.

medium negative When Platforms Replace the Pipeline: AI, Labor Erosion, and ... availability of skilled workers for supervisory/complex management roles

Because aggressive compression shifts interpretive burden to the model's reasoning phase, aggressive token compression can paradoxically increase overall cost.

Interpretation/explanation of the experimental result (causal mechanism proposed by authors) linking compression to increased reasoning burden; supported by the reported experiment but mechanism is inferential rather than directly measured in abstract.

medium negative Beyond Human-Readable: Rethinking Software Engineering Conve... distribution of computational/interpretive workload between input processing and...

Scaling intelligence alone will not solve coordination problems in multi-agent systems and will require deliberate cooperative design, even when helping others costs nothing.

Conclusion drawn from the paper's experimental findings (comparative performance across models and responses to targeted interventions); presented as a general implication in the abstract.

medium negative More Capable, Less Cooperative? When LLMs Fail At Zero-Cost ... ability of scaling model capability alone to resolve coordination failures

The AI-as-advisor approach has limitations: people frequently ignore accurate advice, rely too much on inaccurate advice, and their decision-making skills may deteriorate over time.

Paper asserts these limitations in motivation/background and/or derives them from observed behavior in experiments (stated in abstract as known problems with AI-as-advisor).

medium negative Beyond AI advice -- independent aggregation boosts human-AI ... skill deterioration / susceptibility to incorrect advice

Our findings surface practical limits on the complexity people can manage in human-AI negotiation.

Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.

medium negative From Overload to Convergence: Supporting Multi-Issue Human-A... maximum manageable negotiation complexity (number of issues before performance d...

Beyond an environment-specific optimum, scaling further degrades institutional fitness because trust erosion and cost penalties outweigh marginal capability gains.

Analytical argument from the Institutional Scaling Law together with illustrative examples and discussion of mechanisms (trust erosion, cost penalties) in the paper.

medium negative Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness (net effect of capability, trust, cost, compliance)

Traditional ex ante regulatory approaches struggle to keep pace with AI development, exacerbating the 'pacing problem' and the Collingridge dilemma.

Theoretical/legal literature review and conceptual argument presented in the paper (no empirical sample or quantitative data reported in the abstract).

medium negative Experimentalism beyond ex ante regulation: A law and economi... regulatory responsiveness/effectiveness in relation to AI technological change

Low internal conflict or unanimity can be diagnostic of variance depletion (i.e., exclusion) rather than healthy integration, so governance systems should treat low conflict as a potential red flag until heterogeneity integration is verified.

Interpretive policy implication derived from the model's demonstration that exclusionary processes can produce deceptively low observed disagreement while increasing fragility; this recommendation is based on theoretical reasoning without empirical validation in the paper.

medium negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... internal conflict levels (observed dissent/unanimity) as indicator of variance d...

The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.

Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).

medium negative Governed Memory: A Production Architecture for Multi-Agent W... presence/identification of five structural governance challenges

AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.

Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.

medium negative Comparative analysis of strategic vs. computational thinking... managerial cognitive complexity and frequency/severity of optimisation vs ethica...

AI can augment measurement (e.g., collaboration patterns, output tracking) but if poorly designed may reinforce visibility biases that disadvantage remote workers.

Theoretical reasoning and literature citations about algorithmic bias and monitoring; illustrated with secondary examples rather than primary empirical tests.

medium negative The Sociology of Remote Work and Organisational Culture: How... measurement bias; differential visibility; career impacts for remote workers

Hybrid arrangements can exacerbate inequities in access to informal networks and career advancement, often privileging co-located or better-networked employees.

Theoretical integration of sociological and management studies with comparative case illustrations; secondary data examples referenced but no new causal empirical tests reported.

medium negative The Sociology of Remote Work and Organisational Culture: How... access to informal networks; promotion/career advancement rates

Hybrid and remote work create risks of professional invisibility, fragmented social networks, and unequal access to workplace social capital.

Literature synthesis and illustrative case studies drawn from secondary sources; qualitative/comparative case evidence rather than primary quantitative data.

medium negative The Sociology of Remote Work and Organisational Culture: How... professional visibility; social network cohesion; access to workplace social cap...

Traditional STP showed a 67% performance decline after six months in unstable market conditions.

Empirical observation reported in the study—likely derived from simulation scenarios and/or longitudinal analysis of behavioral data; precise data source (simulation vs. observed field data), statistical tests, and sample framing are not specified in the summary.

medium negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... effectiveness/performance of traditional STP over time (decline over six months ...

The persistence of interpretive, human-in-the-loop evaluation implies ongoing labor requirements (annotation, sense-making, governance roles), affecting forecasts of automation and labor substitution in sectors adopting LLMs.

Interview reports describing continued manual work for evaluation tasks across participants; authors draw implications for labor demand.

medium negative Results-Actionability Gap: Understanding How Practitioners E... continued human labor requirements for evaluation

The under‑use of external text sources in the reviewed literature may be due to privacy, legal/regulatory uncertainty, or integration costs.

Authors' interpretation linking observed low coverage of external text sources (social media, news, reviews) in the 109 articles to plausible barriers (privacy/regulation/integration); no direct empirical test in the review.

medium negative Natural language processing in bank marketing: a systematic ... use of external text sources in marketing research and barriers to their use

Widespread deployment of similar models could create correlated failures or fraud vectors, implying systemic risk that may warrant macroprudential attention.

Analytic caution based on model homogeneity and case/literature discussion; speculative systemic risk concern rather than empirically demonstrated.

medium negative Explore the Impact of Generative AI on Finance and Taxation systemic correlated failure risk, incidence of correlated fraud events

There is regulatory uncertainty around AI-generated filings and responsibility/liability for automated outputs.

Analysis and literature review discuss unclear regulatory positions and legal risks noted in case organizations' deployment considerations.

medium negative Explore the Impact of Generative AI on Finance and Taxation regulatory/compliance risk exposure for AI-generated filings

Integration complexity with legacy ERP/financial systems and sharing-center processes is a significant implementation challenge.

Case study narratives describe integration work and friction points; analytic framing highlights ERP compatibility issues.

medium negative Explore the Impact of Generative AI on Finance and Taxation integration effort/time/cost, compatibility with ERP systems

Model hallucinations, lack of explainability, and limited audit trails limit safe adoption.

Paper cites literature and case observations about model reliability and explainability issues; examples and discussion are qualitative.

medium negative Explore the Impact of Generative AI on Finance and Taxation model reliability (hallucination incidence), explainability/auditability metrics

Data privacy, confidentiality, and cross-border data transfer concerns are important barriers to deployment.

Challenges enumerated from case studies and literature; specific organizational concerns cited in cases (Xiaomi, Deloitte) and in regulatory discussion.

medium negative Explore the Impact of Generative AI on Finance and Taxation deployment constraints related to data privacy (e.g., blocked data flows, need f...

Explainability, auditability, or data-localization requirements could favor larger vendors with compliance capacity, increasing market concentration and affecting competition among AI suppliers.

Market-structure argument grounded in regulatory-compliance burden analysis and comparative examples; not supported by empirical market data in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... market concentration and competition among AI vendors (supplier market structure...

Legal uncertainty and strict procedural requirements increase compliance costs and regulatory risk, which can slow AI adoption by firms and public agencies.

Theoretical economic implications drawn from legal analysis and comparative observations; no empirical measurement of costs or adoption rates in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... AI adoption rate and investment risk (speed and likelihood of procurement/invest...

AI can restrict or reshape human administrative discretion in legally sensitive ways.

Doctrinal analysis of statutory specificity and formal procedural requirements in civil-law contexts, illustrated with Vietnam as the exemplar case; comparative observations.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... scope of administrative discretion (degree of human decision-making latitude)

Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky.

Conceptual argumentation using known infrastructure constraints and economic reasoning about diminishing returns; no new empirical assessment or quantified risk analysis included.

medium negative An Alternative Trajectory for Generative AI economic and environmental risk metrics (probability/impact of grid stress, wate...

Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems.

Argument based on architectural patterns (multi-step reasoning, retrieval augmentation, multiple model passes) and reported per-query compute multipliers in auxiliary literature (referenced anecdotally); the paper provides no new benchmarked per-query compute measurements.

medium negative An Alternative Trajectory for Generative AI per-query compute cost and associated energy consumption (compute FLOPs or joule...

The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic.

Synthesis of industry observations and early/anecdotal quantitative reports on operational workloads; no original empirical time-series or workload measurements provided in this paper.

medium negative An Alternative Trajectory for Generative AI distribution of energy consumption between training and inference (energy per in...

Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns).

Conceptual synthesis and argumentation drawing on observed industry trends (training/inference cost growth), infrastructure constraints (grid reliability, data-center cooling/water use) and theoretical diminishing marginal returns on model/data scaling. No new empirical dataset or controlled experiments reported in the paper.

medium negative An Alternative Trajectory for Generative AI feasibility of continued monolithic scaling measured by physical (power, water, ...

Field observations from an enterprise deployment demonstrate production failure modes traceable to missing identity propagation, timeout/budgeting policies, and machine-readable error semantics.

Empirical context described as field lessons from an enterprise agent platform integrated with a major cloud provider's MCP servers; production failure vignettes and operational log analysis (client redacted).

medium negative Bridging Protocol and Production: Design Patterns for Deploy... frequency and types of production failures related to identity, timeouts/budgets...

MCP lacks three protocol-level primitives needed for reliable, production-scale agent operation: identity propagation, adaptive tool budgeting, and structured error semantics.

Observational analysis and classification of production failures from an enterprise agent deployment; taxonomy of failure modes identifying gaps in these specific areas.

medium negative Bridging Protocol and Production: Design Patterns for Deploy... presence/absence of protocol-level primitives for (1) identity propagation, (2) ...

Agents that attempt to infer others' reasoning depth may be vulnerable to strategic misrepresentation (partners could behave to induce incorrect ToM estimates).

Conceptual analysis in the paper and discussion of strategic incentives; paper also identifies the risk and suggests potential mitigations (e.g., conservatism, verification, meta-reasoning).

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... vulnerability to strategic manipulation (qualitative risk and proposed mitigatio...

Both too little and too much recursive reasoning (i.e., too shallow or too deep ToM) can produce poor joint behavior — miscalibrated anticipation harms coordination.

Observed non-monotonic effects in the reported experiments where fixed-order agents at either low or high ToM orders performed worse in mismatched pairings; evidence comes from the same multi-environment evaluation using joint-payoff / success-rate metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate)

Misalignment in Theory-of-Mind (ToM) order between agents (i.e., agents using different recursive reasoning depths) degrades coordination performance.

Empirical experiments using LLM-driven agents with configurable ToM depth across four coordination environments (a repeated matrix game, two grid navigation tasks, and an Overcooked task); comparisons of matched (same-order) vs mismatched (different-order) pairings using task-specific joint payoffs and success rates as metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, task success rate, task completion/time)

« Prev 1 2 3 … 58 59 60 … 83 84 Next »