Evidence (4175 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Dispersed work alters identity construction, belonging, and social cohesion; digital interactions reshape workplace rituals and norms.
Sociological literature synthesis and qualitative case illustrations emphasizing identity and ritual processes; no longitudinal or quantitative measures provided in the paper.
The paper proposes an 'algorithmic workplace' framework emphasising hybrid agency (agents composed of humans plus GenAI), decentralised decision processes, and erosion of rigid managerial boundaries.
Conceptual synthesis derived from thematic mapping, co‑word analysis and interpretive discussion of the mapped literature; framework presented as the article's conceptual contribution.
Passive AI use produced an initial increase in enjoyment/satisfaction that reversed once participants returned to manual work.
Pre-registered experiment (N = 269) measured enjoyment/satisfaction before and after return to manual work; passive-copy condition showed short-term increases in enjoyment/satisfaction that declined after returning to manual tasks.
Realizing NLP value in banks requires organizational investments (data pipelines, model deployment, CRM integration) and complementarity between AI tools and managerial/IT capabilities; returns will depend on these complementarities.
Conceptual implication derived from review of applied/engineering papers and literature on technology complementarities; not directly estimated empirically in the review.
Automated tax-preparation and filing could increase compliance rates but also make tax bases more sensitive to automated tax-optimization strategies, requiring updated regulatory oversight and audit tools.
Paper's policy and economic implications section combining case-based observations and literature; presented as plausible outcomes rather than measured effects.
Regulatory design acts as an economic instrument that can balance social value from AI with protection of rights, affecting social welfare, public trust, and long-term adoption rates.
Normative synthesis combining legal and economic reasoning; suggested as a theoretical mechanism rather than empirically validated within the paper.
Automation of routine administrative tasks may reduce demand for certain clerical roles while increasing demand for oversight, auditing, and legal-technical expertise, altering public-sector labor composition and retraining needs.
Qualitative labor-market reasoning based on task-based automation literature and the administrative context; no field labor-data or sample provided.
Current LLMs produce deep, reliable reasoning mainly in domains with rigorous, pre-existing abstractions (mathematics, programming) and underperform in domains that lack such formal abstractions.
Performance comparisons and observed patterns referenced qualitatively (e.g., better behavior on math and code tasks) drawn from existing literature and practitioner reports; the paper does not present new controlled benchmark experiments.
Cooperation with the AI is sustained mainly through conditional rule-based strategies rather than through trust-building, emotional, and social channels.
Synthesis of behavioral trajectories (cooperation plateauing below human–human levels), strategy-estimation results (prevalence of rule-based strategies such as Grim Trigger), and chat-content analysis (more explicit commitments, fewer social/emotional messages) from the laboratory experiment (human–AI n = 126) and comparison to human–human benchmark (n = 108).
When allowed repeated communication with the AI, human subjects remain behaviorally dispersed and do not converge to a single dominant strategy.
Strategy-estimation results for the human–AI repeated-chat treatment (from the experiment, n = 126) showing heterogeneous assignment across strategy classes and lack of convergence over time.
Increasing benign-agent count and agent stubbornness are practical levers for improving robustness, but both carry costs: added compute/operational cost for scaling agents, and degraded consensus/coordination when stubbornness is high.
Argumentation supported by simulation results showing improved robustness with more agents or higher stubbornness, combined with discussion of computational cost (scaling) and observed consensus degradation; computational cost is presented as conceptual/operational reasoning rather than quantified in the summary.
Naïvely lowering trust weights assigned to suspected adversaries can limit adversarial influence but may also hinder cooperation and reduce task performance.
Simulations manipulating fixed trust weights and observing tradeoffs between reduced adversarial sway and decreased cooperative task performance/convergence; conceptual analysis of the tradeoff is provided.
Raising agents' innate stubbornness (peer resistance) reduces susceptibility to adversarial manipulation but impairs the network's ability to reach consensus or coordinate effectively.
Combined theoretical reasoning from FJ model (stubbornness is weight on innate opinion) and simulation experiments varying stubbornness parameters; measured outcomes include adversarial influence and measures of convergence/coordination or task performance.
Investments in interpretability that aim to fully 'rule‑ify' LLM competence may have diminishing returns; economic value may be better captured by research into robust behavioral evaluation, stress testing, and hybrid human‑AI workflows, while partial interpretability remains valuable.
R&D allocation and interpretability economics argument built on the central thesis; suggestion rather than empirical finding.
The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.
Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.
Liability regimes and penalties should account for limits of enforced compliance and false positives/negatives from probabilistic policy evaluations.
Normative/economic discussion in the paper highlighting probabilistic outputs of the Policy function and calibration challenges; no empirical validation.
Firms will trade off compliance strictness against service quality (task completion rates), creating an economic tradeoff that shapes market offerings (e.g., safer-but-slower vs. faster-but-riskier agents).
Economic reasoning and conceptual models in the paper; suggested objective balancing task completion and legal/reputational costs; no empirical market data.
The economic value of deploying DeePC-based controllers depends critically on representativeness of training data and the costs of online adaptation and safety verification.
Authors' deployment-risk analysis and discussion of trade-offs (qualitative), grounded in methodological requirements of DeePC (need for representative, persistently exciting data and safeguards).
System-level improvements from the controller do not imply uniform spatial/temporal benefits—distributional effects may favor certain routes or neighborhoods.
Authors' discussion and caution about distributional effects and equity; possibly supported by spatial analyses in simulation (qualitative discussion in paper).
Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.
Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.
Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.
Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.
Attempts to mitigate misapplication with stronger reasoning prompts (e.g., chain‑of‑thought) reduce Misapplication Rate but do not eliminate it.
Ablation applying reasoning prompts and chain‑of‑thought style instructions to models, comparing MR before and after; reported reductions in MR but persistence of non‑zero MR across scenarios.
Models that more faithfully enforce stored preferences achieve higher Appropriate Application Rate (AAR) but also systematically have higher Misapplication Rate (MR), indicating a trade‑off between correct personalization and harmful over‑application.
Ablation experiments varying strength of preference encoding and measuring resulting AAR and MR per model; quantitative comparisons across models showing positive correlation between stronger preference adherence and both higher AAR and higher MR.
Reducing payrolls raises short-term firm profitability but reduces aggregate household income and consumption.
Macroeconomic accounting and labor-demand theory combined with historical examples of payroll reductions; argument is theoretical/conceptual rather than estimated with new aggregate time-series regression evidence.
Reviving model-based central planning tools (ISB+NDMS) risks political-economy problems and requires evaluation of efficiency and flexibility compared to market coordination.
Analytic discussion and normative argument in the paper; no empirical comparative study provided.
Russia's digitalization and adoption of AI/Big Data are reshaping the country's socio-economic infrastructure in multifaceted and systemic ways.
Qualitative analysis of national strategies and policy documents plus the author's expert assessments; no sample size or statistical testing reported.
Theoretical framing: an attention-based view (ABV) and a dual-agent model capture two opposing mechanisms—(1) human attention gain from initial AI–human collaboration and (2) AI attention shift under deep embedding—that jointly generate the inverted U-shaped AI–ECSR relationship.
The paper develops and presents ABV and a dual-agent theoretical model to explain observed empirical patterns; model predictions align qualitatively with regression results and heterogeneity tests.
Trust calibration influences project performance outcomes: organizations tend toward metric-driven evaluation of AI outputs and use AI to strategically augment human expertise, but miscalibration risks overreliance or inappropriate metric focus that can harm performance.
Based on participants' reported experiences in the 40 interviews and interpretive thematic analysis linking trust practices to observed/perceived performance consequences (shift to metric-based evaluation, strategic use, and noted risks).
Trust calibration shapes collaboration patterns, including delegation of oversight to systems or specialists, changes in communication networks (who talks to whom), and erosion of informal ad hoc communications used previously for tacit coordination.
Observed in interview narratives (40 interviews) and thematic coding showing repeated reports of shifted oversight roles, altered communication pathways, and reduced informal coordination after AI integration.
Trust calibration is produced and maintained through ongoing boundary work between humans and machines (i.e., teams continuously negotiate which inputs/responsibilities are treated as human versus machine).
Derived from participants' accounts in the 40 interviews and thematic analysis documenting repeated examples of role negotiation and boundary-setting between people and AI systems during project routines.
Trust in AI within project-based work is situational and socially distributed across team members, rather than a stable individual attitude.
The claim is based on thematic qualitative analysis of 40 semi-structured interviews with project professionals across multiple industries in the UK. Interview data showed variation in how different team members described their trust in systems depending on role, task, and context.
Explicit governance reduces negative externalities (bias, privacy breaches, loss of trust) but entails compliance costs that should be factored into adoption and diffusion models.
Conceptual claim synthesizing trade‑off arguments from governance and risk literatures and practitioner examples; not measured empirically in the paper.
Embedding AI into workflows may change firm boundaries (e.g., outsourcing models vs. in‑house systems) and make investments in internal auditability and explainability strategic assets.
Theoretical implication drawn from synthesis of organizational boundary theory and practitioner trends; suggested rather than empirically demonstrated within the paper.
AI is likely to continue shifting the frontier of early discovery and increase the throughput and quality of hypotheses, but persistent biological uncertainty and the cost of clinical validation mean AI will complement—not fully replace—traditional R&D for the foreseeable future.
Synthesis of technological trends, application successes and limitations, translational risk, and economic reasoning presented throughout the paper.
Proprietary data, precompetitive consortia, and platform consolidation can create barriers to entry; public-data initiatives could alter competitive dynamics.
Market-structure analysis and discussion of data-access models in the paper, with examples of consortia and proprietary platform effects.
Expect strong returns-to-scale and winner-take-most dynamics: large incumbents and well-funded startups with proprietary data/compute may dominate the field.
Economic reasoning and observations in the paper about data/compute concentration, platform effects, and market outcomes.
Early-stage unit costs and time-per-hit can fall with AI, but late-stage clinical trial costs driven by biology remain the primary bottleneck to overall R&D productivity gains.
Qualitative assessment of stage-specific effects based on industry observations and conceptual decomposition of R&D stages; no new cost accounting or econometric estimates provided.
AI can improve specific stages of drug discovery but cannot eliminate fundamental biological uncertainty.
Conceptual and thematic analysis across technological capability and R&D integration levels; supported by illustrative examples showing limits of prediction in complex biology.
Many of the fundamental advantages and challenges studied in distributed computing also arise in LLM teams.
Empirical and/or conceptual analysis reported by the authors mapping distributed computing phenomena to LLM-team behavior (the excerpt states this finding but does not include the experimental details or metrics).
There is a design gap: developers' emphasized traits (politeness, strictness, imagination) differ from workers' preferred traits (straightforwardness, tolerance, practicality).
Comparison of developer and worker survey responses reported in the study (171 tasks; LM scaling to 10,131 tasks).
Human capital is no longer defined solely by formal education or accumulated experience; it increasingly takes the form of a multidimensional system in which cognitive abilities, digital competencies, social and communicative skills, and ethical awareness interact and reinforce one another.
Result of the paper's synthesis combining systemic analysis and comparative assessment of international practices; conceptual/qualitative evidence rather than quantified measurement across populations.
Ongoing digital transformation and the widespread adoption of artificial intelligence are reshaping the formation, structure, and practical use of human capital in modern economies.
Paper's core analytical conclusion based on systemic analysis, comparative assessment of international practices, and analytical generalization of organizational learning models; no primary quantitative sample size or experimental data reported.
Organizations must reconceptualize AI implementation as a fundamental redesign of work systems requiring new competencies, governance structures, and attention to human cognitive limits.
Normative recommendation based on the paper's synthesis of organizational adaptation literature and reported negative outcomes of current AI deployments; no empirical test of this prescriptive claim provided in the excerpt.
As compute costs decline, pro-price-competitive policies may lose their effectiveness in improving consumer surplus, while compute subsidies may shift from ineffective to effective.
Comparative statics within the theoretical model tracking how policy effects on consumer surplus change as the model parameter for compute cost is decreased.
Pro-quality-competitive policies increase the provider's profits while reducing the downstream firms' profits.
Model equilibrium analysis indicating that enhancing downstream quality competition shifts surplus toward the provider (higher provider profit) while lowering downstream firms' profits in the modeled equilibria.
Compute subsidies are effective at improving consumer surplus only when compute or data preprocessing costs are low.
Model analysis and comparative statics in the paper: introducing compute subsidies raises consumer surplus in parameter regions where compute/preprocessing costs are low.
Policies that promote price competition in downstream markets boost consumer surplus only when compute or data preprocessing costs are high.
Comparative-static results from the game-theoretic model showing that pro-price-competitive policy interventions increase consumer surplus under parameter regimes where compute or data preprocessing costs are high.
The maturity of an organization's data governance framework influences the success of AI and Big Data in lowering market uncertainty.
Findings from the qualitative case studies and overall analysis highlighting organizational data-governance maturity as a moderating factor (no standardized maturity measure or sample breakdown provided in the summary).
The stringency of the regulatory environment moderates how effectively AI and Big Data reduce market uncertainty.
Moderation identified via the study's analysis and case studies (specific regulatory measures and empirical tests not detailed in the summary).
The effectiveness of AI and Big Data in reducing market uncertainty is contingent upon industry type.
Observed variation across industries in the paper's qualitative case studies and analysis (the summary does not specify which industries or comparative sample sizes).