Evidence (2469 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Org Design Remove filter

AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.

Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.

medium negative Comparative analysis of strategic vs. computational thinking... managerial cognitive complexity and frequency/severity of optimisation vs ethica...

AI can augment measurement (e.g., collaboration patterns, output tracking) but if poorly designed may reinforce visibility biases that disadvantage remote workers.

Theoretical reasoning and literature citations about algorithmic bias and monitoring; illustrated with secondary examples rather than primary empirical tests.

medium negative The Sociology of Remote Work and Organisational Culture: How... measurement bias; differential visibility; career impacts for remote workers

Hybrid arrangements can exacerbate inequities in access to informal networks and career advancement, often privileging co-located or better-networked employees.

Theoretical integration of sociological and management studies with comparative case illustrations; secondary data examples referenced but no new causal empirical tests reported.

medium negative The Sociology of Remote Work and Organisational Culture: How... access to informal networks; promotion/career advancement rates

Hybrid and remote work create risks of professional invisibility, fragmented social networks, and unequal access to workplace social capital.

Literature synthesis and illustrative case studies drawn from secondary sources; qualitative/comparative case evidence rather than primary quantitative data.

medium negative The Sociology of Remote Work and Organisational Culture: How... professional visibility; social network cohesion; access to workplace social cap...

Traditional STP showed a 67% performance decline after six months in unstable market conditions.

Empirical observation reported in the study—likely derived from simulation scenarios and/or longitudinal analysis of behavioral data; precise data source (simulation vs. observed field data), statistical tests, and sample framing are not specified in the summary.

medium negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... effectiveness/performance of traditional STP over time (decline over six months ...

The persistence of interpretive, human-in-the-loop evaluation implies ongoing labor requirements (annotation, sense-making, governance roles), affecting forecasts of automation and labor substitution in sectors adopting LLMs.

Interview reports describing continued manual work for evaluation tasks across participants; authors draw implications for labor demand.

medium negative Results-Actionability Gap: Understanding How Practitioners E... continued human labor requirements for evaluation

The under‑use of external text sources in the reviewed literature may be due to privacy, legal/regulatory uncertainty, or integration costs.

Authors' interpretation linking observed low coverage of external text sources (social media, news, reviews) in the 109 articles to plausible barriers (privacy/regulation/integration); no direct empirical test in the review.

medium negative Natural language processing in bank marketing: a systematic ... use of external text sources in marketing research and barriers to their use

Widespread deployment of similar models could create correlated failures or fraud vectors, implying systemic risk that may warrant macroprudential attention.

Analytic caution based on model homogeneity and case/literature discussion; speculative systemic risk concern rather than empirically demonstrated.

medium negative Explore the Impact of Generative AI on Finance and Taxation systemic correlated failure risk, incidence of correlated fraud events

There is regulatory uncertainty around AI-generated filings and responsibility/liability for automated outputs.

Analysis and literature review discuss unclear regulatory positions and legal risks noted in case organizations' deployment considerations.

medium negative Explore the Impact of Generative AI on Finance and Taxation regulatory/compliance risk exposure for AI-generated filings

Integration complexity with legacy ERP/financial systems and sharing-center processes is a significant implementation challenge.

Case study narratives describe integration work and friction points; analytic framing highlights ERP compatibility issues.

medium negative Explore the Impact of Generative AI on Finance and Taxation integration effort/time/cost, compatibility with ERP systems

Model hallucinations, lack of explainability, and limited audit trails limit safe adoption.

Paper cites literature and case observations about model reliability and explainability issues; examples and discussion are qualitative.

medium negative Explore the Impact of Generative AI on Finance and Taxation model reliability (hallucination incidence), explainability/auditability metrics

Data privacy, confidentiality, and cross-border data transfer concerns are important barriers to deployment.

Challenges enumerated from case studies and literature; specific organizational concerns cited in cases (Xiaomi, Deloitte) and in regulatory discussion.

medium negative Explore the Impact of Generative AI on Finance and Taxation deployment constraints related to data privacy (e.g., blocked data flows, need f...

Explainability, auditability, or data-localization requirements could favor larger vendors with compliance capacity, increasing market concentration and affecting competition among AI suppliers.

Market-structure argument grounded in regulatory-compliance burden analysis and comparative examples; not supported by empirical market data in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... market concentration and competition among AI vendors (supplier market structure...

Legal uncertainty and strict procedural requirements increase compliance costs and regulatory risk, which can slow AI adoption by firms and public agencies.

Theoretical economic implications drawn from legal analysis and comparative observations; no empirical measurement of costs or adoption rates in the study.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... AI adoption rate and investment risk (speed and likelihood of procurement/invest...

AI can restrict or reshape human administrative discretion in legally sensitive ways.

Doctrinal analysis of statutory specificity and formal procedural requirements in civil-law contexts, illustrated with Vietnam as the exemplar case; comparative observations.

medium negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... scope of administrative discretion (degree of human decision-making latitude)

Physical constraints (power grid reliability, water consumption for cooling, and data-center capacity) together with diminishing marginal returns on scaling make continued monolithic scaling economically and environmentally risky.

Conceptual argumentation using known infrastructure constraints and economic reasoning about diminishing returns; no new empirical assessment or quantified risk analysis included.

medium negative An Alternative Trajectory for Generative AI economic and environmental risk metrics (probability/impact of grid stress, wate...

Reasoning-augmented models (e.g., models using chain-of-thought, multi-step reasoning, or external retrieval/looping) can inflate per-query compute by orders of magnitude, exacerbating sustainability problems.

Argument based on architectural patterns (multi-step reasoning, retrieval augmentation, multiple model passes) and reported per-query compute multipliers in auxiliary literature (referenced anecdotally); the paper provides no new benchmarked per-query compute measurements.

medium negative An Alternative Trajectory for Generative AI per-query compute cost and associated energy consumption (compute FLOPs or joule...

The energetic burden of generative AI is shifting from one-time training to recurring, potentially unbounded inference costs as models become productized and high-traffic.

Synthesis of industry observations and early/anecdotal quantitative reports on operational workloads; no original empirical time-series or workload measurements provided in this paper.

medium negative An Alternative Trajectory for Generative AI distribution of energy consumption between training and inference (energy per in...

Scaling monolithic LLMs toward artificial general intelligence (AGI) is colliding with hard physical and economic limits (energy, grid stress, water use, diminishing returns).

Conceptual synthesis and argumentation drawing on observed industry trends (training/inference cost growth), infrastructure constraints (grid reliability, data-center cooling/water use) and theoretical diminishing marginal returns on model/data scaling. No new empirical dataset or controlled experiments reported in the paper.

medium negative An Alternative Trajectory for Generative AI feasibility of continued monolithic scaling measured by physical (power, water, ...

Field observations from an enterprise deployment demonstrate production failure modes traceable to missing identity propagation, timeout/budgeting policies, and machine-readable error semantics.

Empirical context described as field lessons from an enterprise agent platform integrated with a major cloud provider's MCP servers; production failure vignettes and operational log analysis (client redacted).

medium negative Bridging Protocol and Production: Design Patterns for Deploy... frequency and types of production failures related to identity, timeouts/budgets...

MCP lacks three protocol-level primitives needed for reliable, production-scale agent operation: identity propagation, adaptive tool budgeting, and structured error semantics.

Observational analysis and classification of production failures from an enterprise agent deployment; taxonomy of failure modes identifying gaps in these specific areas.

medium negative Bridging Protocol and Production: Design Patterns for Deploy... presence/absence of protocol-level primitives for (1) identity propagation, (2) ...

Agents that attempt to infer others' reasoning depth may be vulnerable to strategic misrepresentation (partners could behave to induce incorrect ToM estimates).

Conceptual analysis in the paper and discussion of strategic incentives; paper also identifies the risk and suggests potential mitigations (e.g., conservatism, verification, meta-reasoning).

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... vulnerability to strategic manipulation (qualitative risk and proposed mitigatio...

Both too little and too much recursive reasoning (i.e., too shallow or too deep ToM) can produce poor joint behavior — miscalibrated anticipation harms coordination.

Observed non-monotonic effects in the reported experiments where fixed-order agents at either low or high ToM orders performed worse in mismatched pairings; evidence comes from the same multi-environment evaluation using joint-payoff / success-rate metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate)

Misalignment in Theory-of-Mind (ToM) order between agents (i.e., agents using different recursive reasoning depths) degrades coordination performance.

Empirical experiments using LLM-driven agents with configurable ToM depth across four coordination environments (a repeated matrix game, two grid navigation tasks, and an Overcooked task); comparisons of matched (same-order) vs mismatched (different-order) pairings using task-specific joint payoffs and success rates as metrics.

medium negative Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, task success rate, task completion/time)

Human–AI chats contain fewer emotional and social messages compared with human–human chats.

Content coding of chat transcripts comparing frequencies of emotional/social message categories across human–AI (n = 126) and human–human (n = 108) conditions; reported lower counts/proportions of social/emotional content in human–AI dialogs.

medium negative Playing Against the Machine: Cooperation, Communication, and... frequency/count of emotional/social message types in chat logs

Public‑interest concerns (bias, misuse, systemic risk) may be harder to mitigate via simple transparency rules; policies should emphasize outcome‑based regulations, mandatory behavioral testing, and marketplace disclosure obligations for stressed scenarios.

Policy implication derived from the non‑rule‑encodability thesis; no empirical policy evaluation included.

medium negative Why the Valuable Capabilities of LLMs Are Precisely the Unex... effectiveness of transparency-based vs outcome-based regulatory approaches

Standard contracts and regulatory audits that rely on inspection of rule sets or source code will be insufficient to assess model behavior or risk; regulators and buyers must rely more on behavior‑based testing, standards, and outcome measures.

Policy and regulatory argument derived from the main theorem about non‑rule‑encodability; no empirical regulatory studies presented.

medium negative Why the Valuable Capabilities of LLMs Are Precisely the Unex... effectiveness of rule‑based audits/regulatory inspections for assessing model ri...

Full interpretability via rule extraction may be impossible for the most valuable parts of LLM competence, limiting the utility of some transparency approaches for safety and auditing.

Argumentative consequence of the main theoretical claim and structural mismatch; supported by historical limitations of rule‑based systems; no empirical tests reported.

medium negative Why the Valuable Capabilities of LLMs Are Precisely the Unex... feasibility of fully extracting human‑readable rules from LLMs (interpretability...

There is a structural mismatch between explicit human cognitive tools (rules, checklists) and the pattern‑rich, high‑dimensional competence encoded in LLMs.

Theoretical/structural argument about distributed statistical representations in LLMs versus discrete rules; no experimental quantification provided.

medium negative Why the Valuable Capabilities of LLMs Are Precisely the Unex... alignment/mismatch between human‑readable rules and LLM representations/competen...

Historical expert systems failed to generalize or scale to complex, ambiguous tasks, contrasting with LLMs' broader empirical successes.

Historical case analysis and literature review-style discussion of expert systems versus contemporary LLM performance; no new quantitative historical dataset provided.

medium negative Why the Valuable Capabilities of LLMs Are Precisely the Unex... generalization and scalability of rule‑based expert systems

High governance costs in regulated/high-risk domains can slow adoption of agentic systems, concentrating deployment in less regulated uses or among large firms that can afford governance infrastructure.

Economic reasoning about fixed and marginal governance costs and firm-level adoption decisions; no empirical adoption data presented.

medium negative Runtime Governance for AI Agents: Policies on Paths rate of adoption of agentic systems across firm sizes and regulated domains

Path-dependent behavior increases the complexity of principal–agent contracting and moral hazard between platforms, enterprise customers, and downstream users, requiring richer contract terms (acceptable paths, logging, audit rights).

Economic theory reasoning and applied contract/design implications discussed; no empirical contract-study data.

medium negative Runtime Governance for AI Agents: Policies on Paths complexity of contractual arrangements (number/complexity of contract clauses or...

Path-dependent policies complicate ex post auditing and simple rule-based regulation; regulators may prefer standards requiring runtime evaluation and logging to be enforceable in practice.

Conceptual argument about limits of auditing when important state is ephemeral and about how runtime logging enables ex post review; illustrative policy examples mapping to runtime requirements.

medium negative Runtime Governance for AI Agents: Policies on Paths enforceability of regulation (ease of ex post compliance verification)

Current models appear to internalize preferences as persistent, high‑priority rules rather than conditional behavioral signals contingent on conversational norms and context.

Behavioral patterns observed across BenchPreS scenarios (preference application persisting in inappropriate contexts) and ablation results; interpretive claim based on empirical behavior rather than direct model internals inspection.

medium negative BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Tendency to apply stored preferences across contexts (inferred internalization)

BenchPreS detects a pervasive context‑sensitivity failure: models often treat stored preferences as globally enforceable rules rather than conditional, context‑dependent signals.

Pattern of results across the benchmark showing high MR alongside cases where preference application should have been suppressed; qualitative interpretation of model behavior across varied interaction partners and normative contexts in the dataset.

medium negative BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Context sensitivity of preference application (operationalized via MR and AAR di...

Modern frontier LLMs frequently misapply stored user preferences in contexts where social or institutional norms require suppression (third‑party communication).

Empirical evaluation using the BenchPreS benchmark: models were provided stored preferences and asked to generate responses across contexts requiring either application or suppression; Misapplication Rate (MR) computed as fraction of instances where preferences were applied despite required suppression. Multiple state‑of‑the‑art models were tested (described generically as “frontier models”) across the scenario set.

medium negative BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Misapplication Rate (MR) — frequency of inappropriate application of stored pref...

If left unchecked, managerial short-termism combined with AI adoption can create a feedback loop where firms cut labor to boost short-term profits, undermining aggregate demand and eroding the market that sustains those profits.

Conceptual macroeconomic and organizational synthesis drawing on theory and historical patterns; no new empirical time-series demonstrating this loop in current AI-driven layoffs.

medium negative A Shorter Workweek as a Policy Response to AI-Driven Labor D... sequence of firm-level layoffs, short-term profits, aggregate demand decline, su...

Work-time reduction policies carry distributional and implementation risks (heterogeneous effects by occupation, firm size, capital intensity; risk of hidden wage cuts) that require careful compensation rules and monitoring.

Theoretical reasoning and references to heterogeneous outcomes in prior work-hour studies; no new empirical quantification of heterogeneity in AI-era implementations.

medium negative A Shorter Workweek as a Policy Response to AI-Driven Labor D... heterogeneous employment/wage effects across occupations/firms; incidence of wag...

Lower household demand resulting from payroll cuts can precipitate further cost-cutting and automation, creating a self-reinforcing feedback loop that risks persistent demand shortfalls and higher structural unemployment.

Theoretical models of demand-driven adjustment and cited historical patterns; conceptual argument rather than empirical causal identification in contemporary AI contexts.

medium negative A Shorter Workweek as a Policy Response to AI-Driven Labor D... aggregate demand, subsequent rounds of layoffs/automation adoption, structural u...

AI-justified layoffs are driven more by managerial short-termism and misaligned executive incentives than by immediate technological necessity.

Interdisciplinary conceptual synthesis drawing on labor-economics theory, organizational behavior literature linking executive compensation/short-termism to layoffs, and selected prior empirical studies; no new firm-level causal identification or large-scale dataset provided.

medium negative A Shorter Workweek as a Policy Response to AI-Driven Labor D... frequency/extent of layoffs attributed to AI (vs. attributable to managerial inc...

Passive monitoring and predictive models are insufficient for governing the complex dynamics of a tech-driven economy.

Conceptual critique based on economic cybernetics literature and the author's expert assessment; no empirical test comparing governance regimes is provided.

medium negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... governance adequacy/effectiveness (ability to steer socio-economic outcomes)

Digitalization is deepening digital inequality (unequal access to digital tools, skills, and benefits) across social groups and regions.

Qualitative analysis and expert assessment; the paper calls for new metrics but does not present systematic empirical measures of inequality.

medium negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... digital inequality (access to internet/digital services, digital literacy rates)

Digital transformation can generate technological unemployment if not managed with appropriate retraining and social protection measures.

Expert assessment and literature-informed argumentation in the paper; no empirical longitudinal analysis isolating technology-driven job losses presented.

medium negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... technological unemployment (job losses attributable to automation/AI adoption)

Forced or poorly regulated digitalization risks exacerbating social stratification.

Conceptual argument supported by qualitative analysis of policy documents and expert assessment; no empirical causal estimates provided.

medium negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... social stratification (income/wealth inequality measures, social mobility proxie...

Industry-level AI substitution risk moderates the AI–ECSR relationship: higher substitution risk sharpens the inverted U and shifts its peak left (firms in high-substitution-risk industries reach the turning point earlier and suffer stronger negative effects at high AI adoption).

Interaction terms between AI (and AI^2) and an industry AI substitution-risk measure in panel regressions show heterogeneity consistent with a leftward shift and steeper decline in high-risk industries; results reported across the 2,575-firm panel with controls and robustness checks.

medium negative Attention to Whom? AI Adoption and Corporate Social Responsi... ECSR

Beyond a certain threshold of AI embedding, deeper AI adoption shifts managerial attention toward AI systems and away from employees, reducing ECSR (AI attention shift mechanism).

Negative AI^2 coefficient in quadratic panel regressions indicates declining ECSR at high AI adoption; supported by theoretical dual-agent model arguing attention shift; robustness checks reported. (Sample: same 2,575 firms, 2013–2023.)

medium negative Attention to Whom? AI Adoption and Corporate Social Responsi... ECSR (managerial attention shift inferred)

Trust, verification costs, and legal/governance requirements remain consequential even with AI mediation and may limit or shape adoption.

Theoretical discussion of governance and verification costs; no empirical measurement of these costs in adopter firms provided.

medium negative AI as a universal collaboration layer: Eliminating language ... verification/trust costs; legal/governance compliance costs; adoption barriers

AI-mediated interpretation and action carry risks related to quality, bias, and misalignment, which can produce miscommunication or incorrect automated actions.

Paper's discussion section raising caveats; conceptual risk analysis without empirical incident data; references to general concerns in AI safety literature (no new empirical evidence provided).

medium negative AI as a universal collaboration layer: Eliminating language ... incidence of miscommunication/errors attributable to AI mediation; bias metrics;...

Organisations struggle to optimise human–AI collaboration in knowledge‑intensive decision‑making.

Statement based on a systematic synthesis of human–AI interaction and knowledge management literature presented in the paper; no primary empirical sample or dataset reported in the abstract.

medium negative Optimising Human– AI Decision Performance: A Trust and Cap... ability to optimise human–AI collaboration / effectiveness of knowledge‑intensiv...

Despite increased deployment, the field lacks a principled framework for answering when a team is helpful, how many agents to use, how team structure impacts performance, and whether a team is better than a single agent.

Authors' assessment of the literature and gaps; presented as a motivation for their work (no empirical count of missing frameworks given in excerpt).

medium negative Language Model Teams as Distributed Systems availability of principled frameworks addressing team design questions

« Prev 1 2 3 … 25 26 27 … 49 50 Next »