Evidence (3470 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

Current (pay-upfront) models impose a financial barrier to entry for developers, limiting innovation and excluding actors from emerging economies.

Analytical argument in the paper based on cost-structure reasoning and literature on barriers to entry; no empirical sample or causal estimate provided.

high negative Revenue-Sharing as Infrastructure: A Distributed Business Mo... developer entry barriers / access to platform

Developers and experts still lack a shared view, resulting in repeated coordination, clarification rounds, and error-prone handoffs.

Observational/qualitative claim in paper describing current MSD practice (no numeric sample reported).

high negative LLM-Powered Workflow Optimization for Multidisciplinary Soft... frequency of coordination rounds / error-prone handoffs

Even with AI coding assistants like GitHub Copilot, individual coding tasks are semi-automated, but the workflow connecting domain knowledge to implementation is not.

Qualitative observation/comparative statement in paper (no empirical sample reported).

high negative LLM-Powered Workflow Optimization for Multidisciplinary Soft... degree of automation of coding tasks vs. end-to-end workflow automation

Multidisciplinary Software Development (MSD) requires domain experts and developers to collaborate across incompatible formalisms and separate artifact sets.

Conceptual/argument in paper framing the problem (no empirical sample reported).

high negative LLM-Powered Workflow Optimization for Multidisciplinary Soft... collaboration/workflow efficiency between domain experts and developers

Only 12% of AI market value is used in physical activities.

Descriptive aggregate: authors categorize and report that 12% of estimated AI market value maps to physical activities.

high negative Where can AI be used? Insights from a deep ontology of work ... share of AI market value by activity type (physical)

Significant limitations emerged in case law citations, with most cited cases being non-existent or incorrectly referenced.

Authors' review of the case citations produced by the four AI engines for the single transcript, finding many citations were fabricated or misreferenced.

high negative Robot Wingman: Using AI to Assess an Employment Termination accuracy of case law citations (error rate / hallucination rate)

Traditional paradigms, specifically the resource-based view and the dynamic capabilities framework, operate under closed-system, first-order cybernetic assumptions that fail to capture the dissipative nature of algorithmic agents.

Conceptual critique presented in the paper's theoretical argumentation (literature critique and re-framing); no empirical sample reported.

high negative Governing Human–AI Co-Evolution: Intelligentization Capabili... explanatory_power_of_management_theory (ability to account for AI-driven organiz...

AI usage predicts work disengagement behavior via emotional exhaustion elicited by AI-associated technostressors.

Four-stage longitudinal study (survey) of finance professionals (N=285); mediation analysis testing AI usage -> technostressors -> emotional exhaustion -> work disengagement, based on SOR framework.

high negative Autonomous enhancement or emotional depletion? The dual-path... work disengagement behavior (mediated by emotional exhaustion from technostresso...

There is a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence.

Conceptual/theoretical claim derived from the framework and discussion in the paper (argument and mathematical framing), no empirical sample or longitudinal data presented in the excerpt.

high negative Cognitive Amplification vs Cognitive Delegation in Human-AI ... long-term human cognitive competence

This result directly contradicts classical scaling laws which assume monotonic capability gains with model scale.

Comparative theoretical claim in the paper contrasting the Institutional Scaling Law with classical empirical/theoretical scaling laws in ML literature.

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... relationship between model scale and deployment-relevant fitness/capability

The Institutional Scaling Law proves that institutional fitness is non-monotonic in model scale.

Formal mathematical derivation/proof presented in the paper (the 'Institutional Scaling Law').

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... institutional fitness as a function of model scale

AI development proceeds not through smooth advancement but through extended periods of stasis interrupted by rapid phase transitions that reorganize the competitive landscape (punctuated equilibrium pattern).

Argument based on punctuated equilibrium theory from evolutionary biology and historical analysis presented in the paper identifying discrete transitions in AI history; the paper cites and classifies eras/events as evidence.

high negative Punctuated Equilibria in Artificial Intelligence: The Instit... pattern of AI development (stasis vs. phase transitions)

Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.

Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.

high negative AI, Productivity, and Labor Markets: A Review of the Empiric... automation of entry-level/junior tasks and changes to entry-level job content

Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.

Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... co-occurrence of multiple adverse outcomes: tail bias, observed disagreement, ef...

When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.

Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... delay in regime recognition and magnitude/timing of corrective update

Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.

Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... effective number of independent decision inputs (information concentration)

Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).

Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observed disagreement (reported variance) versus true estimation error in tail r...

Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).

Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.

high negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... estimation bias (especially under regime displacement/tail events)

Limitations include possible limited organizational generalizability due to a single Fortune 500 lab context; ABS results depend on model specification/calibration; and operational definitions of 'resilience' and 'planning cycle' require careful reading.

Authors' reported limitations based on study design: single lab context (n = 23), dependence of ABS on model choices, and nontrivial operational definitions.

high negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... generalizability and robustness of study findings

Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.

Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy; perceived meaningfulness (measured post-return to manual work)

Passive use of AI reduces perceived meaningfulness of work.

Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... perceived meaningfulness of work

Passive use of AI reduces psychological ownership of the produced outputs.

Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... psychological ownership of outputs

Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.

Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.

high negative Relying on AI at work reduces self-efficacy, ownership, and ... self-efficacy (confidence to complete tasks without AI)

The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.

Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.

high negative Natural language processing in bank marketing: a systematic ... presence vs. absence of causal/experimental studies measuring effects on custome...

Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.

Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.

high negative Natural language processing in bank marketing: a systematic ... topical coverage by customer journey stage and source type (acquisition, persona...

Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).

Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.

high negative Natural language processing in bank marketing: a systematic ... topical frequency/coverage by customer journey stage (retention)

NLP applications in bank marketing are severely under‑studied.

Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.

high negative Natural language processing in bank marketing: a systematic ... proportion and absolute count of studies at the intersection of NLP and bank mar...

Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.

Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... legal compatibility of AI deployment (degree of legal obstacles to deployment)

Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.

Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... clarity of accountability (ability to assign responsibility) and effectiveness o...

Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.

Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.

high negative ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... compliance with due-process requirements (notice, reasons, appealability)

Frontier language models and human editors do not reliably reproduce the evaluative signal contained in institutional publication records.

Comparison of zero-shot frontier-model average accuracy (31%) and human-panel majority-vote accuracy (42%) versus fine-tuned models (up to 59% and higher in economics), indicating that neither zero-shot frontier models nor the human panels matched fine-tuned performance on the held-out benchmarks.

high negative Machines acquire scientific taste from institutional traces Relative prediction accuracy on held-out benchmark(s) of research-pitch quality

Eleven frontier language models (proprietary and open) averaged 31% accuracy on a held-out four-tier benchmark of management research pitches (chance ≈25%); this is only marginally above chance.

Zero-shot (or as-provided) evaluation of eleven state-of-the-art language models on the held-out four-tier management pitches benchmark, yielding an average accuracy of 31% versus chance ≈25%. (Exact list of models and number of benchmark examples not provided in the supplied text.)

high negative Machines acquire scientific taste from institutional traces Accuracy on the four-tier management research-pitch benchmark

Cooperation with the AI plateaus and never reaches the near-complete cooperation levels observed in human–human interactions.

Time-series/trajectory analysis of cooperation rates in the lab human–AI experiment (n = 126) compared to the human–human benchmark (n = 108); reported convergence/end-state cooperation levels show AI condition asymptotes below the human–human condition.

high negative Playing Against the Machine: Cooperation, Communication, and... cooperation rate over time and asymptotic/end-state cooperation level

A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS).

Theoretical analysis using the Friedkin–Johnsen (FJ) opinion-formation model (analysis of fixed points and influence propagation) plus simulation experiments mapping LLM-MAS interactions to FJ dynamics across multiple network topologies and attacker profiles. (Paper reports simulation results but does not provide exact sample sizes in the provided summary.)

high negative Don't Trust Stubborn Neighbors: A Security Framework for Age... extent of adversarial sway / shift in collective opinion (final consensus and op...

The empirical validation is performed only on synthetic text-preference data rather than real-world user populations, so field deployment effects and richer preference models remain to be tested.

Experiments section states synthetic dataset for text preferences and notes absence of field experiments on real user populations.

high negative Finding Common Ground in a Sea of Alternatives scope of empirical validation (synthetic dataset vs. real-world data)

The theoretical results (algorithms and sample-complexity bounds) assume truthful, exogenous preferences and simple sampling access; strategic behavior or costly reporting could change the information requirements.

Modeling assumptions explicitly stated in the paper (sampling access to truthful preferences) and discussion in the implications/limitations section noting the need to consider strategic behavior and reporting costs.

high negative Finding Common Ground in a Sea of Alternatives applicability limitations given model assumptions (truthful sampling access vs. ...

Matching information-theoretic lower bounds are proved, establishing that no algorithm can guarantee finding an (approximate) proportional veto-core element with fewer queries than the stated bounds (i.e., the sample complexity is optimal).

Lower-bound proofs in the theoretical section of the paper showing impossibility results that match the upper-bound rates.

high negative Finding Common Ground in a Sea of Alternatives information-theoretic lower bound on sample/query complexity (optimality claim)

Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints.

Formal argument and examples showing static ACLs map to Policy functions that do not depend on partial_path; illustrative limitations presented.

high negative Runtime Governance for AI Agents: Policies on Paths coverage of organizational constraints by static ACLs (proportion of constraints...

Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents.

Operational discussion in the paper outlining additional runtime compute and logging requirements; cost implications argued qualitatively; no empirical cost measurements provided.

high negative Runtime Governance for AI Agents: Policies on Paths marginal deployment cost (compute/latency/engineering overhead)

Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules.

Formalization showing prompt/system messages and static ACLs map to restricted forms of the Policy(agent_id, partial_path, proposed_action, org_state) function; logical proof/argument in the paper and illustrative counterexamples.

high negative Runtime Governance for AI Agents: Policies on Paths ability to detect/enforce path-dependent policy violations (yes/no / coverage of...

LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action.

Formal/abstract execution model defined in the paper (states, actions, execution paths) and conceptual arguments/illustrative examples showing how earlier states/actions affect later behavior; no large-scale empirical dataset reported.

high negative Runtime Governance for AI Agents: Policies on Paths path-dependent compliance/safety risk (probability of policy violation condition...

Real-world deployment will require representative data coverage and online adaptation despite the method’s robustness mechanisms.

Authors' discussion/limitations section: theoretical requirements for persistently exciting/representative trajectories for DeePC and recommendation for online adaptation and continual data collection for deployment.

high negative Data-driven generalized perimeter control: Zürich case study data representativeness and need for online adaptation (deployment readiness/ris...

Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.

Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.

high negative DIGITAL TRANSFORMATION OF THE RUSSIAN FEDERATION’S SOCIOECON... risks to transparency, accountability, privacy and potential for misuse

Regulators and payers remain central bottlenecks—AI can accelerate discovery but cannot bypass clinical evidence requirements.

Policy discussion and regulatory analysis in the paper noting that approvals require clinical evidence independent of discovery modality.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... regulatory and payer requirements as constraints on the impact of AI-driven disc...

Downstream clinical development costs and translational failure rates remain the major drivers of total R&D expenditure; early-stage AI savings may not translate into proportionate increases in approved drugs.

Economic analysis and discussion in the paper referencing known cost distributions in drug development and historical attrition rates in clinical phases.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... contribution of clinical development costs and failure rates to total R&D expend...

Inherent biological complexity and translational gaps between in silico predictions, preclinical models, and human biology constrain downstream success rates.

Review of translational failures and literature cited in the paper demonstrating mismatch between preclinical signals and clinical outcomes; conceptual analysis of biological complexity.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... translational success rate from preclinical predictions to clinical efficacy

Gaps exist between computational designs and chemical/experimental feasibility (e.g., synthetic accessibility and assay readiness), limiting the usefulness of some generative outputs.

Case studies and critiques in the paper showing generated molecules that are synthetically infeasible or incompatible with experimental constraints; discussion of missing integration of practical constraints in many generative models.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... fraction of computationally designed molecules that are synthetically accessible...

Many models have limited interpretability and insufficient uncertainty quantification, hampering trust and decision-making.

Methodological analysis in the paper noting common deep-learning approaches lacking clear interpretability and uncertainty estimates; references to literature on model explainability and calibration gaps.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... degree of model interpretability and presence/quality of uncertainty quantificat...

Poor data quality, fragmentation, and limited accessibility reduce model reliability and generalizability.

Survey of data characteristics and limitations presented in the paper; examples of biased or sparse datasets and the paper's discussion of impacts on model performance and transferability.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... model reliability/generalizability as a function of data quality, coverage, and ...

AI remains an augmenting technology rather than a standalone solution: no AI-only originated drug has yet achieved regulatory approval.

Review of drug-approval records and company disclosures summarized in the paper; explicit statement that to date no entirely AI-originated molecule has received full regulatory approval.

high negative Has AI Reshaped Drug Discovery, or Is There Still a Long Way... regulatory approval status of AI-originated drug candidates (number of approvals...

« Prev 1 2 3 … 9 10 11 … 69 70 Next »