Evidence (2340 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Org Design
Remove filter
Without support, performance stays stable up to three issues but declines as additional issues increase cognitive load.
Empirical study / human-AI negotiation case study in a property rental scenario that varied the number of negotiated issues; the paper reports observed performance across different numbers of issues (no sample size for this specific comparison stated in the abstract).
Reliance on automated content generation introduces risks of cognitive overreliance, algorithmic bias, and strategic misalignment.
The paper articulates these risks as conceptual/qualitative concerns in its discussion; no quantitative estimates or empirical tests of these specific risks are reported in the provided excerpt.
Wide disagreement among AIs created confusion and undermined appropriate reliance on advice.
Reported experimental finding from the paper: manipulating within-panel disagreement across tasks produced wide disagreement conditions that, according to the abstract, led to confusion and reduced appropriate reliance. No quantitative metrics reported in abstract.
High within-panel consensus fostered overreliance on AI advice.
Experimental manipulation of within-panel consensus across the three tasks; the abstract reports that high consensus increased participants' reliance on AI (interpreted as overreliance). Specific measures and sample size not provided in abstract.
Current (pay-upfront) models impose a financial barrier to entry for developers, limiting innovation and excluding actors from emerging economies.
Analytical argument in the paper based on cost-structure reasoning and literature on barriers to entry; no empirical sample or causal estimate provided.
Developers and experts still lack a shared view, resulting in repeated coordination, clarification rounds, and error-prone handoffs.
Observational/qualitative claim in paper describing current MSD practice (no numeric sample reported).
Even with AI coding assistants like GitHub Copilot, individual coding tasks are semi-automated, but the workflow connecting domain knowledge to implementation is not.
Qualitative observation/comparative statement in paper (no empirical sample reported).
Multidisciplinary Software Development (MSD) requires domain experts and developers to collaborate across incompatible formalisms and separate artifact sets.
Conceptual/argument in paper framing the problem (no empirical sample reported).
Only 12% of AI market value is used in physical activities.
Descriptive aggregate: authors categorize and report that 12% of estimated AI market value maps to physical activities.
Significant limitations emerged in case law citations, with most cited cases being non-existent or incorrectly referenced.
Authors' review of the case citations produced by the four AI engines for the single transcript, finding many citations were fabricated or misreferenced.
Traditional paradigms, specifically the resource-based view and the dynamic capabilities framework, operate under closed-system, first-order cybernetic assumptions that fail to capture the dissipative nature of algorithmic agents.
Conceptual critique presented in the paper's theoretical argumentation (literature critique and re-framing); no empirical sample reported.
AI usage predicts work disengagement behavior via emotional exhaustion elicited by AI-associated technostressors.
Four-stage longitudinal study (survey) of finance professionals (N=285); mediation analysis testing AI usage -> technostressors -> emotional exhaustion -> work disengagement, based on SOR framework.
There is a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence.
Conceptual/theoretical claim derived from the framework and discussion in the paper (argument and mathematical framing), no empirical sample or longitudinal data presented in the excerpt.
This result directly contradicts classical scaling laws which assume monotonic capability gains with model scale.
Comparative theoretical claim in the paper contrasting the Institutional Scaling Law with classical empirical/theoretical scaling laws in ML literature.
The Institutional Scaling Law proves that institutional fitness is non-monotonic in model scale.
Formal mathematical derivation/proof presented in the paper (the 'Institutional Scaling Law').
AI development proceeds not through smooth advancement but through extended periods of stasis interrupted by rapid phase transitions that reorganize the competitive landscape (punctuated equilibrium pattern).
Argument based on punctuated equilibrium theory from evolutionary biology and historical analysis presented in the paper identifying discrete transitions in AI history; the paper cites and classifies eras/events as evidence.
Rather than broad job losses, evidence points to a reallocation at the entry level: AI automates tasks typically assigned to junior staff, shifting the nature of entry-level roles.
Synthesis of firm- and task-level empirical studies reported in the brief documenting automation of routine/junior tasks and changes in job-task composition; specific sample sizes vary by cited study and are not provided in the brief.
Exclusion-based cohesion can produce state-contingent illusory precision together with effective input concentration and dynamic lock-in simultaneously—i.e., these phenomena co-occur under the model's parameter regimes.
Analytical model results showing co-occurrence of multiple adverse phenomena (bias that grows in tails, illusory precision, input concentration, lock-in) under the same exclusion mechanisms; derived within the paper's theoretical framework.
When the anchor belief is updated from internally filtered aggregates, the system can exhibit dynamic lock-in: delayed recognition of regime shifts followed by abrupt correction.
Analytical dynamics studied in the model when anchor updates depend on filtered (excluded) aggregates; derivations demonstrate delayed detection and abrupt adjustments. This is a theoretical/dynamical model result, no empirical data.
Exclusion leads to effective concentration of decision inputs: the effective number of independent inputs falls below the nominal participant count.
Model-derived analytic result showing that report shrinkage and discarding reduce effective information contributions, quantified relative to nominal participation in the theoretical framework. No empirical sample.
Exclusion-based cohesion induces 'illusory precision': observed disagreement can fall while actual estimation error in tail regimes rises (i.e., lower recorded variance despite higher true error).
Theoretical result derived from the signal-aggregation model showing a regime in which filtered reports reduce observed variance even as tail-regime estimation error increases. No empirical validation provided.
Relative to a full-inclusion benchmark, exclusion-based cohesion produces state-contingent bias that is small in normal regimes but grows sharply under regime displacement (tail events).
Analytical comparisons between the exclusion model and a full-inclusion benchmark within the theoretical model; derivations showing bias as a function of regime and exclusion parameters. The result is from model analysis, not empirical data.
Limitations include possible limited organizational generalizability due to a single Fortune 500 lab context; ABS results depend on model specification/calibration; and operational definitions of 'resilience' and 'planning cycle' require careful reading.
Authors' reported limitations based on study design: single lab context (n = 23), dependence of ABS on model choices, and nontrivial operational definitions.
Some declines (in self-efficacy and meaningfulness) from passive AI use persist after participants return to manual work.
Within-experiment assessment of outcomes after participants returned to manual (no-AI) tasks following the AI-use manipulation in the pre-registered experiment (N = 269); reported persistent reductions in self-efficacy and meaningfulness for the passive condition.
Passive use of AI reduces perceived meaningfulness of work.
Pre-registered experiment (N = 269) with self-reported measure of work meaningfulness; passive-copy condition showed lower meaningfulness ratings than No-AI and Active-collaboration conditions.
Passive use of AI reduces psychological ownership of the produced outputs.
Same pre-registered experiment (N = 269). Participants in the passive-copy AI condition reported lower psychological ownership of their outputs (self-report scales) relative to No-AI and Active-collaboration conditions.
Passive use of AI (copying AI-generated output) reduces workers' self-efficacy.
Pre-registered between-subjects experiment (N = 269) using occupation-specific writing tasks. Participants assigned to a passive-copy AI condition reported lower self-efficacy (self-reported confidence to complete tasks without AI) compared to the No-AI (manual) and Active-collaboration conditions.
The current literature is skewed toward descriptive and engineering work; there is a lack of causal, field‑experimental evidence on NLP interventions' effects on customer behavior and firm profits.
Review coding of study types in the sample (engineering/descriptive vs. experimental/causal) showing few field experiments or causal designs.
Important gaps include customer acquisition, personalization at scale, use of external text sources (social media, news, reviews), operational process improvement, and cross‑channel integration.
Gap detection via low‑density regions in the UMAP thematic map of sentence‑transformer embeddings and manual review showing low article counts for these topics within the 109‑article sample.
Existing literature on NLP in marketing is concentrated around customer retention tasks (e.g., churn prediction, complaint handling, relationship management).
Thematic clustering from sentence‑transformer embeddings of article text combined with UMAP visualization, and manual review of article topics and keywords identifying frequent retention‑related themes.
NLP applications in bank marketing are severely under‑studied.
Descriptive result from the PRISMA review showing only 8/109 articles focused on NLP in bank marketing (≈7%), plus thematic mapping showing sparse coverage in bank‑marketing/NLP intersection.
Vietnam's civil-law features—statutory specificity, formal procedures, and constitutional principles like legal certainty and fairness—make straightforward AI deployment legally fraught.
Close textual analysis of Vietnam's statutes, constitutional provisions, and administrative procedures (doctrinal legal analysis); no quantitative sample.
Automated decisions complicate assigning responsibility and hinder judicial and administrative reviewability.
Doctrinal examination of accountability and review mechanisms in administrative law plus comparative institutional analysis of automated decision-making governance.
Opaque AI models risk violating notice, reason-giving, and appeal rights protected under administrative due process.
Analysis of procedural due-process requirements (notice, reason-giving, appeal) in Vietnam's legal framework and assessment of opacity issues in algorithmic systems; qualitative reasoning, no empirical testing.
Frontier language models and human editors do not reliably reproduce the evaluative signal contained in institutional publication records.
Comparison of zero-shot frontier-model average accuracy (31%) and human-panel majority-vote accuracy (42%) versus fine-tuned models (up to 59% and higher in economics), indicating that neither zero-shot frontier models nor the human panels matched fine-tuned performance on the held-out benchmarks.
Eleven frontier language models (proprietary and open) averaged 31% accuracy on a held-out four-tier benchmark of management research pitches (chance ≈25%); this is only marginally above chance.
Zero-shot (or as-provided) evaluation of eleven state-of-the-art language models on the held-out four-tier management pitches benchmark, yielding an average accuracy of 31% versus chance ≈25%. (Exact list of models and number of benchmark examples not provided in the supplied text.)
Cooperation with the AI plateaus and never reaches the near-complete cooperation levels observed in human–human interactions.
Time-series/trajectory analysis of cooperation rates in the lab human–AI experiment (n = 126) compared to the human–human benchmark (n = 108); reported convergence/end-state cooperation levels show AI condition asymptotes below the human–human condition.
A single malicious or compromised LLM agent with high stubbornness and persuasive power can trigger a persuasion cascade that steers the collective opinion of a multi-agent LLM system (MAS).
Theoretical analysis using the Friedkin–Johnsen (FJ) opinion-formation model (analysis of fixed points and influence propagation) plus simulation experiments mapping LLM-MAS interactions to FJ dynamics across multiple network topologies and attacker profiles. (Paper reports simulation results but does not provide exact sample sizes in the provided summary.)
The empirical validation is performed only on synthetic text-preference data rather than real-world user populations, so field deployment effects and richer preference models remain to be tested.
Experiments section states synthetic dataset for text preferences and notes absence of field experiments on real user populations.
The theoretical results (algorithms and sample-complexity bounds) assume truthful, exogenous preferences and simple sampling access; strategic behavior or costly reporting could change the information requirements.
Modeling assumptions explicitly stated in the paper (sampling access to truthful preferences) and discussion in the implications/limitations section noting the need to consider strategic behavior and reporting costs.
Matching information-theoretic lower bounds are proved, establishing that no algorithm can guarantee finding an (approximate) proportional veto-core element with fewer queries than the stated bounds (i.e., the sample complexity is optimal).
Lower-bound proofs in the theoretical section of the paper showing impossibility results that match the upper-bound rates.
Static ACLs evaluate deterministic rules that ignore partial execution paths and therefore can only capture a subset of organizational constraints.
Formal argument and examples showing static ACLs map to Policy functions that do not depend on partial_path; illustrative limitations presented.
Runtime evaluation imposes additional compute, latency, logging, and engineering costs that increase the marginal cost of deploying agents.
Operational discussion in the paper outlining additional runtime compute and logging requirements; cost implications argued qualitatively; no empirical cost measurements provided.
Prompt-level instructions and static access control lists (ACLs) are limited special cases of a more general runtime policy-evaluation framework and cannot, in general, enforce path-dependent rules.
Formalization showing prompt/system messages and static ACLs map to restricted forms of the Policy(agent_id, partial_path, proposed_action, org_state) function; logical proof/argument in the paper and illustrative counterexamples.
LLM-based agent behavior is non-deterministic and path-dependent: an agent's safety/compliance risk depends on the entire execution path, not just the current prompt or single action.
Formal/abstract execution model defined in the paper (states, actions, execution paths) and conceptual arguments/illustrative examples showing how earlier states/actions affect later behavior; no large-scale empirical dataset reported.
Real-world deployment will require representative data coverage and online adaptation despite the method’s robustness mechanisms.
Authors' discussion/limitations section: theoretical requirements for persistently exciting/representative trajectories for DeePC and recommendation for online adaptation and continual data collection for deployment.
Proactive AI at national scale amplifies concerns around transparency, accountability, privacy, and potential misuse, necessitating robust regulatory and ethical frameworks.
Normative and ethical analysis in the paper, supported by general literature on large-scale AI governance; no empirical assessment of regulatory effectiveness in Russia included.
Regulators and payers remain central bottlenecks—AI can accelerate discovery but cannot bypass clinical evidence requirements.
Policy discussion and regulatory analysis in the paper noting that approvals require clinical evidence independent of discovery modality.
Downstream clinical development costs and translational failure rates remain the major drivers of total R&D expenditure; early-stage AI savings may not translate into proportionate increases in approved drugs.
Economic analysis and discussion in the paper referencing known cost distributions in drug development and historical attrition rates in clinical phases.
Inherent biological complexity and translational gaps between in silico predictions, preclinical models, and human biology constrain downstream success rates.
Review of translational failures and literature cited in the paper demonstrating mismatch between preclinical signals and clinical outcomes; conceptual analysis of biological complexity.