Evidence (2340 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
Org Design
Remove filter
The observed behaviors stem from a root cause: current models are trained as monolithic agents, so splitting them into director/worker roles conflicts with their training distribution; retaining each model close to its trained mode (text generation for the manager, tool use for the worker) and externalizing organizational structure to code enables the pipeline to succeed.
Qualitative analysis and interpretation of experimental results and pipeline design choices reported in the paper (comparison of different pipeline structures and model modes).
The paper provides supporting empirical evidence spanning frontier laboratory dynamics, post-training alignment evolution, and the rise of sovereign AI as a geopolitical selection pressure.
Empirical/observational sections in the paper that the authors state cover those three areas (specific datasets, experiments, or case studies are referenced in the text but not quantified in the abstract).
Macroeconomic effects remain hard to observe because of a 'productivity J-curve': firms often must invest in organizational changes first and only later realize measurable financial/productivity gains from AI.
Conceptual synthesis supported by firm-level case studies and empirical papers in the reviewed literature indicating implementation lags; the brief frames this as an interpretation of mixed short-run macro evidence rather than a single causal estimate.
The success of regulatory sandboxes ultimately depends on sound institutional safeguards, proportionality, and alignment with broader policy objectives.
Normative conclusion derived from the paper's analytical framework and comparative lessons (no empirical validation reported in the abstract).
Organisational rules, regulatory constraints, and transparency requirements materially shape micro-level human–AI interactions and can alter adoption incentives and accountability outcomes.
Conceptual governance argument linking institutional constraints to human–AI design choices; theoretical reasoning, no empirical policy evaluation provided.
Potential productivity gains from automating routine informational tasks are conditional: net gains depend on managerial capacity to integrate AI outputs into systemic decision-making and on governance structures.
Conceptual conditional claim derived from integration of systems thinking and algorithmic optimisation literatures; no empirical measurement of productivity effects.
Information-processing and optimisation tasks exhibit clear substitution pressure (are most automatable), whereas relational and normative tasks remain complementary to human labour.
Theory-driven claim combining managerial role analysis with general automation/complementarity logic from AI economics; conceptual prediction without empirical quantification.
Human–algorithm architectures can take three forms—augment (assist), displace (replace), or reconfigure (redistribute) cognitive tasks—and their design depends on organisational design, regulation, and decision-structure rules.
Taxonomic conceptualization derived from cross-framework analysis; prescriptive mapping rather than empirical classification; no sample.
Interpersonal coordination roles (disturbance handler, liaison, leader) retain strong human elements (influence, ethics, legitimacy) that are difficult to fully algorithmise.
Conceptual argument based on the nature of relational and legitimacy-based tasks within Mintzberg’s framework and limits of algorithmic substitution; theoretical only.
Entrepreneurial and disturbance-handling roles become hybrid decision zones requiring integrated strategic and computational reasoning (modelling, simulation, anomaly detection plus contextual interpretation and values-based trade-offs).
Analytical synthesis of role demands and computational affordances; cross-framework analysis producing a hybrid strategic–computational characterization; no primary data.
Roles that rely on relational intelligence, ethical judgement, and influence (leader, liaison, figurehead, negotiator) remain primarily strategic but are increasingly supported by predictive and diagnostic analytics.
Role-specific effects derived from cross-framework conceptual mapping (Mintzberg roles × computational thinking); theoretical argumentation rather than empirical measurement.
AI systematically reconfigures managerial work by augmenting, displacing, or reconfiguring cognitive tasks across Mintzberg’s ten managerial roles.
Conceptual synthesis and comparative role mapping integrating Mintzberg’s ten managerial roles with Senge’s Five Disciplines and computational thinking; theoretical analysis only (no primary empirical data; no sample).
Hybrid norms combined with AI platforms lower coordination costs and may encourage more decentralized or platform‑based organizational structures, changing the premium on co‑location.
Theoretical integration of organizational economics and digital platform literature; supported by conceptual examples but no firm‑level causal analysis in the paper.
Differential access to informal learning and sponsorship in hybrid settings can produce long‑term human‑capital inequalities; AI-based mentoring and visibility tools may partially mitigate these gaps but risk biased recommendations if trained on skewed data.
Synthesis of literature on mentorship, social capital, and algorithmic bias; illustrative case examples rather than empirical evaluation of AI mentoring systems.
Geographic dispersion plus AI-enabled remote hiring can widen the labor supply for firms, potentially compressing wages for some roles while raising returns to digital-collaboration skills.
Economic reasoning and literature review on remote hiring and labor supply effects; the paper offers conceptual arguments rather than presenting empirical wage-impact estimates.
Automation of routine tasks may shift task content toward relational and creative work, areas where hybrid arrangements influence social capital accumulation.
Theoretical argument combining automation literature with sociological perspectives on social capital; no direct empirical measurement or longitudinal data in the paper.
Hybrid work complicates traditional productivity metrics, making AI-driven analytics and monitoring tools more attractive but creating trade-offs between measurement accuracy, privacy, and employee trust.
Conceptual argument synthesizing literature on measurement, monitoring, and AI tools; no empirical evaluation of specific tools or datasets in the paper.
Sustaining productivity and organizational culture under hybrid arrangements depends crucially on leadership practices—trust, communication, and fairness—and on inclusive policies that explicitly manage equity, well‑being, and flexibility.
Comparative case illustrations and management literature integration; recommendations derived from secondary sources and theoretical argumentation rather than controlled empirical testing.
Dispersed work alters identity construction, belonging, and social cohesion; digital interactions reshape workplace rituals and norms.
Sociological literature synthesis and qualitative case illustrations emphasizing identity and ritual processes; no longitudinal or quantitative measures provided in the paper.
The paper proposes an 'algorithmic workplace' framework emphasising hybrid agency (agents composed of humans plus GenAI), decentralised decision processes, and erosion of rigid managerial boundaries.
Conceptual synthesis derived from thematic mapping, co‑word analysis and interpretive discussion of the mapped literature; framework presented as the article's conceptual contribution.
Passive AI use produced an initial increase in enjoyment/satisfaction that reversed once participants returned to manual work.
Pre-registered experiment (N = 269) measured enjoyment/satisfaction before and after return to manual work; passive-copy condition showed short-term increases in enjoyment/satisfaction that declined after returning to manual tasks.
Realizing NLP value in banks requires organizational investments (data pipelines, model deployment, CRM integration) and complementarity between AI tools and managerial/IT capabilities; returns will depend on these complementarities.
Conceptual implication derived from review of applied/engineering papers and literature on technology complementarities; not directly estimated empirically in the review.
Automated tax-preparation and filing could increase compliance rates but also make tax bases more sensitive to automated tax-optimization strategies, requiring updated regulatory oversight and audit tools.
Paper's policy and economic implications section combining case-based observations and literature; presented as plausible outcomes rather than measured effects.
Regulatory design acts as an economic instrument that can balance social value from AI with protection of rights, affecting social welfare, public trust, and long-term adoption rates.
Normative synthesis combining legal and economic reasoning; suggested as a theoretical mechanism rather than empirically validated within the paper.
Automation of routine administrative tasks may reduce demand for certain clerical roles while increasing demand for oversight, auditing, and legal-technical expertise, altering public-sector labor composition and retraining needs.
Qualitative labor-market reasoning based on task-based automation literature and the administrative context; no field labor-data or sample provided.
Current LLMs produce deep, reliable reasoning mainly in domains with rigorous, pre-existing abstractions (mathematics, programming) and underperform in domains that lack such formal abstractions.
Performance comparisons and observed patterns referenced qualitatively (e.g., better behavior on math and code tasks) drawn from existing literature and practitioner reports; the paper does not present new controlled benchmark experiments.
Cooperation with the AI is sustained mainly through conditional rule-based strategies rather than through trust-building, emotional, and social channels.
Synthesis of behavioral trajectories (cooperation plateauing below human–human levels), strategy-estimation results (prevalence of rule-based strategies such as Grim Trigger), and chat-content analysis (more explicit commitments, fewer social/emotional messages) from the laboratory experiment (human–AI n = 126) and comparison to human–human benchmark (n = 108).
When allowed repeated communication with the AI, human subjects remain behaviorally dispersed and do not converge to a single dominant strategy.
Strategy-estimation results for the human–AI repeated-chat treatment (from the experiment, n = 126) showing heterogeneous assignment across strategy classes and lack of convergence over time.
Increasing benign-agent count and agent stubbornness are practical levers for improving robustness, but both carry costs: added compute/operational cost for scaling agents, and degraded consensus/coordination when stubbornness is high.
Argumentation supported by simulation results showing improved robustness with more agents or higher stubbornness, combined with discussion of computational cost (scaling) and observed consensus degradation; computational cost is presented as conceptual/operational reasoning rather than quantified in the summary.
Naïvely lowering trust weights assigned to suspected adversaries can limit adversarial influence but may also hinder cooperation and reduce task performance.
Simulations manipulating fixed trust weights and observing tradeoffs between reduced adversarial sway and decreased cooperative task performance/convergence; conceptual analysis of the tradeoff is provided.
Raising agents' innate stubbornness (peer resistance) reduces susceptibility to adversarial manipulation but impairs the network's ability to reach consensus or coordinate effectively.
Combined theoretical reasoning from FJ model (stubbornness is weight on innate opinion) and simulation experiments varying stubbornness parameters; measured outcomes include adversarial influence and measures of convergence/coordination or task performance.
Investments in interpretability that aim to fully 'rule‑ify' LLM competence may have diminishing returns; economic value may be better captured by research into robust behavioral evaluation, stress testing, and hybrid human‑AI workflows, while partial interpretability remains valuable.
R&D allocation and interpretability economics argument built on the central thesis; suggestion rather than empirical finding.
The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.
Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.
Liability regimes and penalties should account for limits of enforced compliance and false positives/negatives from probabilistic policy evaluations.
Normative/economic discussion in the paper highlighting probabilistic outputs of the Policy function and calibration challenges; no empirical validation.
Firms will trade off compliance strictness against service quality (task completion rates), creating an economic tradeoff that shapes market offerings (e.g., safer-but-slower vs. faster-but-riskier agents).
Economic reasoning and conceptual models in the paper; suggested objective balancing task completion and legal/reputational costs; no empirical market data.
The economic value of deploying DeePC-based controllers depends critically on representativeness of training data and the costs of online adaptation and safety verification.
Authors' deployment-risk analysis and discussion of trade-offs (qualitative), grounded in methodological requirements of DeePC (need for representative, persistently exciting data and safeguards).
System-level improvements from the controller do not imply uniform spatial/temporal benefits—distributional effects may favor certain routes or neighborhoods.
Authors' discussion and caution about distributional effects and equity; possibly supported by spatial analyses in simulation (qualitative discussion in paper).
Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.
Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.
Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.
Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.
Attempts to mitigate misapplication with stronger reasoning prompts (e.g., chain‑of‑thought) reduce Misapplication Rate but do not eliminate it.
Ablation applying reasoning prompts and chain‑of‑thought style instructions to models, comparing MR before and after; reported reductions in MR but persistence of non‑zero MR across scenarios.
Models that more faithfully enforce stored preferences achieve higher Appropriate Application Rate (AAR) but also systematically have higher Misapplication Rate (MR), indicating a trade‑off between correct personalization and harmful over‑application.
Ablation experiments varying strength of preference encoding and measuring resulting AAR and MR per model; quantitative comparisons across models showing positive correlation between stronger preference adherence and both higher AAR and higher MR.
Reducing payrolls raises short-term firm profitability but reduces aggregate household income and consumption.
Macroeconomic accounting and labor-demand theory combined with historical examples of payroll reductions; argument is theoretical/conceptual rather than estimated with new aggregate time-series regression evidence.
Reviving model-based central planning tools (ISB+NDMS) risks political-economy problems and requires evaluation of efficiency and flexibility compared to market coordination.
Analytic discussion and normative argument in the paper; no empirical comparative study provided.
Russia's digitalization and adoption of AI/Big Data are reshaping the country's socio-economic infrastructure in multifaceted and systemic ways.
Qualitative analysis of national strategies and policy documents plus the author's expert assessments; no sample size or statistical testing reported.
Theoretical framing: an attention-based view (ABV) and a dual-agent model capture two opposing mechanisms—(1) human attention gain from initial AI–human collaboration and (2) AI attention shift under deep embedding—that jointly generate the inverted U-shaped AI–ECSR relationship.
The paper develops and presents ABV and a dual-agent theoretical model to explain observed empirical patterns; model predictions align qualitatively with regression results and heterogeneity tests.
Trust calibration influences project performance outcomes: organizations tend toward metric-driven evaluation of AI outputs and use AI to strategically augment human expertise, but miscalibration risks overreliance or inappropriate metric focus that can harm performance.
Based on participants' reported experiences in the 40 interviews and interpretive thematic analysis linking trust practices to observed/perceived performance consequences (shift to metric-based evaluation, strategic use, and noted risks).
Trust calibration shapes collaboration patterns, including delegation of oversight to systems or specialists, changes in communication networks (who talks to whom), and erosion of informal ad hoc communications used previously for tacit coordination.
Observed in interview narratives (40 interviews) and thematic coding showing repeated reports of shifted oversight roles, altered communication pathways, and reduced informal coordination after AI integration.
Trust calibration is produced and maintained through ongoing boundary work between humans and machines (i.e., teams continuously negotiate which inputs/responsibilities are treated as human versus machine).
Derived from participants' accounts in the 40 interviews and thematic analysis documenting repeated examples of role negotiation and boundary-setting between people and AI systems during project routines.
Trust in AI within project-based work is situational and socially distributed across team members, rather than a stable individual attitude.
The claim is based on thematic qualitative analysis of 40 semi-structured interviews with project professionals across multiple industries in the UK. Interview data showed variation in how different team members described their trust in systems depending on role, task, and context.
Explicit governance reduces negative externalities (bias, privacy breaches, loss of trust) but entails compliance costs that should be factored into adoption and diffusion models.
Conceptual claim synthesizing trade‑off arguments from governance and risk literatures and practitioner examples; not measured empirically in the paper.