Evidence (3062 claims)
Adoption
5227 claims
Productivity
4503 claims
Governance
4100 claims
Human-AI Collaboration
3062 claims
Labor Markets
2480 claims
Innovation
2320 claims
Org Design
2305 claims
Skills & Training
1920 claims
Inequality
1311 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 373 | 105 | 59 | 439 | 984 |
| Governance & Regulation | 366 | 172 | 115 | 55 | 718 |
| Research Productivity | 237 | 95 | 34 | 294 | 664 |
| Organizational Efficiency | 364 | 82 | 62 | 34 | 545 |
| Technology Adoption Rate | 293 | 118 | 66 | 30 | 511 |
| Firm Productivity | 274 | 33 | 68 | 10 | 390 |
| AI Safety & Ethics | 117 | 178 | 44 | 24 | 365 |
| Output Quality | 231 | 61 | 23 | 25 | 340 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 158 | 68 | 33 | 17 | 279 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 88 | 31 | 38 | 9 | 166 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 105 | 12 | 21 | 11 | 150 |
| Consumer Welfare | 68 | 29 | 35 | 7 | 139 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 71 | 10 | 29 | 6 | 116 |
| Worker Satisfaction | 46 | 38 | 12 | 9 | 105 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 11 | 16 | 94 |
| Task Completion Time | 76 | 5 | 4 | 2 | 87 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 16 | 9 | 5 | 48 |
| Job Displacement | 5 | 29 | 12 | — | 46 |
| Social Protection | 19 | 8 | 6 | 1 | 34 |
| Developer Productivity | 27 | 2 | 3 | 1 | 33 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 8 | 4 | 9 | — | 21 |
Human Ai Collab
Remove filter
Bounded-autonomy governance internalizes some externalities from automated interactions, reducing the probability of cascading failures and associated economic damages, but misaligned or heterogeneous governance across firms/sectors can still generate systemic vulnerabilities.
Theoretical argument combining externalities literature and governance design principles; illustrative scenarios and policy reasoning (no empirical validation).
Modern critical infrastructure increasingly uses embodied AI for monitoring, predictive maintenance, and decision support, but these systems are typically trained for statistically representable uncertainty rather than systemic, cascading crises.
Review and synthesis of policy texts, industry descriptions, and safety/AI standards cited in the paper (EU AI Act, ISO standards) and literature on embodied-AI applications; conceptual argument (no original empirical sample).
Cooperation with the AI is sustained mainly through conditional rule-based strategies rather than through trust-building, emotional, and social channels.
Synthesis of behavioral trajectories (cooperation plateauing below human–human levels), strategy-estimation results (prevalence of rule-based strategies such as Grim Trigger), and chat-content analysis (more explicit commitments, fewer social/emotional messages) from the laboratory experiment (human–AI n = 126) and comparison to human–human benchmark (n = 108).
When allowed repeated communication with the AI, human subjects remain behaviorally dispersed and do not converge to a single dominant strategy.
Strategy-estimation results for the human–AI repeated-chat treatment (from the experiment, n = 126) showing heterogeneous assignment across strategy classes and lack of convergence over time.
The study documents a 'silent empathy' effect: people often feel empathic concern but fail to express it in ways that align with normative empathic communication; targeted feedback helps close that expression gap.
Analysis showing mismatch between internal empathic concern (implied by context/self-report/ratings) and the presence of idiomatic empathic moves in participants' messages; targeted personalized feedback increased use of normative empathic expressions.
Investments in interpretability that aim to fully 'rule‑ify' LLM competence may have diminishing returns; economic value may be better captured by research into robust behavioral evaluation, stress testing, and hybrid human‑AI workflows, while partial interpretability remains valuable.
R&D allocation and interpretability economics argument built on the central thesis; suggestion rather than empirical finding.
The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.
Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.
LLMs can provide useful inputs for near-term economic and logistical forecasting in crises (e.g., supply-chain disruptions, commodity market impacts, transport/logistics constraints), but their political/strategic forecasts should be used cautiously.
Observed stronger and more verifiable performance on economic/logistical question types in the 42-node evaluation; weaker reliability on politically ambiguous multi-actor issues reported in qualitative coding and verifiability checks.
Model narratives evolve over time: earlier node outputs emphasize rapid containment, while later node outputs increasingly describe regional entrenchment and attritional de-escalation scenarios.
Longitudinal analysis across 11 temporal nodes comparing thematic/narrative content of model responses; qualitative coding tracked shifts in dominant scenario framings from early to later nodes.
Model reliability is uneven across domains: performance is stronger on structured economic and logistical questions than on politically ambiguous, multi-actor strategic issues.
Domain-specific comparison of model outputs on node-specific verifiable questions and exploratory prompts, with higher verifiability/accuracy and more consistent inferences reported for economic/logistical items versus greater ambiguity and lower consistency on political/multi-actor items.
Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.
Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.
Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.
Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.
Attempts to mitigate misapplication with stronger reasoning prompts (e.g., chain‑of‑thought) reduce Misapplication Rate but do not eliminate it.
Ablation applying reasoning prompts and chain‑of‑thought style instructions to models, comparing MR before and after; reported reductions in MR but persistence of non‑zero MR across scenarios.
Models that more faithfully enforce stored preferences achieve higher Appropriate Application Rate (AAR) but also systematically have higher Misapplication Rate (MR), indicating a trade‑off between correct personalization and harmful over‑application.
Ablation experiments varying strength of preference encoding and measuring resulting AAR and MR per model; quantitative comparisons across models showing positive correlation between stronger preference adherence and both higher AAR and higher MR.
Finance, Education, and Transportation show mixed dynamics: both displacement of routine tasks and creation of new hybrid roles.
Descriptive sectoral analyses from the simulated dataset (hybrid share, task-displacement indicators, employment changes) covering Finance, Education, Transportation (2020–2024), plus mixed-evidence studies from the literature synthesis (ACM/IEEE/Springer 2020–2024).
Overall, economic benefits from AI in radiology are plausible but conditional on human-AI interaction design, governance, workforce effects, and payment structures; net value is not determined by algorithmic accuracy alone.
Synthesis of the heterogeneous literature (laboratory, reader, observational, qualitative) and conceptual economic analysis highlighting dependencies beyond algorithmic performance.
The net effect of AI on clinician burnout is ambiguous: tools can remove tedious tasks but may introduce new cognitive, administrative, and liability stresses.
Mixed qualitative and small-scale observational studies with variable findings on burnout-related measures after AI introduction.
Changes in workload composition can reduce routine burdens but may shift cognitive load to follow-up decisions and managing AI outputs.
Observational and qualitative studies of deployed systems reporting redistribution of tasks and clinician-reported changes in cognitive demands.
Economic outcomes depend on complementarity versus substitution: AI that augments radiologists can raise output per worker; AI that substitutes tasks may reduce demand for certain diagnostic activities.
Theoretical economic frameworks and case studies of task reallocation in early deployments; empirical workforce-impact studies limited.
Automation bias can increase undue reliance on AI, while algorithmic aversion can drive underuse of helpful tools.
Cognitive and behavioral studies and reader simulations demonstrating both increased acceptance/overtrust in automated outputs in some settings and rejection/discounting of AI advice in others.
Real clinical value depends critically on how AI tools interact with radiologists in practice (integration design and human-AI interaction).
Conceptual models and synthesis of reader studies, simulation/interaction studies, usability and qualitative deployment evaluations that compare standalone algorithm performance versus clinician+AI workflows.
Practical takeaway: effectiveness of human–AI teaming in security tasks depends heavily on human ability to formulate context-rich prompts; autonomous workflows that self-manage prompting and tool selection can be more effective.
Synthesis of empirical observations from the live CTF (41 participants) and the autonomous agent benchmark (4 agents), showing human prompting failures limiting team performance and autonomous agents with self-directed prompting achieving higher performance.
Participants’ perceptions, trust, and expectations about the AI shifted after hands-on use (qualitative observation).
Pre- vs. post-AI qualitative measures and observational analysis collected during the live CTF (self-reports/observations of trust and expectations after using the instrumented AI).
Implication for substitution: Because there was no main effect of partner type on collaboration proficiency, AI teammates may substitute for humans on short, temporary tasks without clear productivity loss—conditional on emotional and empathetic factors.
Inference by authors based on the null main effect of partner type combined with the observed role of emotion and service empathy in moderating/mediating collaboration proficiency (experimental evidence, n = 861).
Theoretical framing: an attention-based view (ABV) and a dual-agent model capture two opposing mechanisms—(1) human attention gain from initial AI–human collaboration and (2) AI attention shift under deep embedding—that jointly generate the inverted U-shaped AI–ECSR relationship.
The paper develops and presents ABV and a dual-agent theoretical model to explain observed empirical patterns; model predictions align qualitatively with regression results and heterogeneity tests.
Trust calibration influences project performance outcomes: organizations tend toward metric-driven evaluation of AI outputs and use AI to strategically augment human expertise, but miscalibration risks overreliance or inappropriate metric focus that can harm performance.
Based on participants' reported experiences in the 40 interviews and interpretive thematic analysis linking trust practices to observed/perceived performance consequences (shift to metric-based evaluation, strategic use, and noted risks).
Trust calibration shapes collaboration patterns, including delegation of oversight to systems or specialists, changes in communication networks (who talks to whom), and erosion of informal ad hoc communications used previously for tacit coordination.
Observed in interview narratives (40 interviews) and thematic coding showing repeated reports of shifted oversight roles, altered communication pathways, and reduced informal coordination after AI integration.
Trust calibration is produced and maintained through ongoing boundary work between humans and machines (i.e., teams continuously negotiate which inputs/responsibilities are treated as human versus machine).
Derived from participants' accounts in the 40 interviews and thematic analysis documenting repeated examples of role negotiation and boundary-setting between people and AI systems during project routines.
Trust in AI within project-based work is situational and socially distributed across team members, rather than a stable individual attitude.
The claim is based on thematic qualitative analysis of 40 semi-structured interviews with project professionals across multiple industries in the UK. Interview data showed variation in how different team members described their trust in systems depending on role, task, and context.
Explicit governance reduces negative externalities (bias, privacy breaches, loss of trust) but entails compliance costs that should be factored into adoption and diffusion models.
Conceptual claim synthesizing trade‑off arguments from governance and risk literatures and practitioner examples; not measured empirically in the paper.
Embedding AI into workflows may change firm boundaries (e.g., outsourcing models vs. in‑house systems) and make investments in internal auditability and explainability strategic assets.
Theoretical implication drawn from synthesis of organizational boundary theory and practitioner trends; suggested rather than empirically demonstrated within the paper.
Emerging technologies (AI, digital twins, computational rheology) can compress high-dimensional sensory/rheological spaces into actionable models, enabling faster iteration in R&D and altering how firms value R&D inputs.
Theoretical projection and literature-based argument about technological capabilities; illustrative scenarios offered; no empirical trials or measured productivity changes reported.
Upfront costs are high (expert annotation, longitudinal monitoring), but automation of routine tasks can reduce operational costs for ecological monitoring and enforcement.
Cost-structure observation in the paper referencing the resource intensity of data collection and the cost-saving potential of task automation (derived from examples and economic reasoning).
Investments in cross‑disciplinary projects produce high social returns (methodological innovation plus environmental public goods), but private returns may be limited, suggesting a role for public funding and philanthropic support.
Economic-returns argument in the paper based on the public‑good nature of conservation outcomes and the dual-output character of interdisciplinary R&D (theoretical/evaluation-based claim across examples).
Occupational competence varies from 43.2% in high-tech to 9.7% in the public sector.
Sectoral analysis derived from the study's dataset (LinkedIn job adverts and/or Indeed salary information, 2022–2024) where 'occupational competence' was operationalized and measured across sectors to produce the cited percentages.
AI adoption shifts inventor composition within firms.
Analyses of inventor-level or inventor-aggregate characteristics before and after AI adoption showing changes in composition, using the staggered diff-in-diff approach.
Overall, AI adoption facilitates both refinement of existing knowledge (exploitation) and exploration of new technological domains (exploration).
Combined evidence: increases in exploitative-patent share (exploitation) together with increases in originality, generality and technological distance (exploration) using the stacked diff-in-diff approach.
Programming experience cannot be fully substituted by Gemini.
Comparative results from the experimental conditions: although participants could use Gemini (free or paid), the observed benefit of programming experience on code security remained significant, indicating Gemini did not replicate or replace the effect of experience in the sample of 159 developers.
Many of the fundamental advantages and challenges studied in distributed computing also arise in LLM teams.
Empirical and/or conceptual analysis reported by the authors mapping distributed computing phenomena to LLM-team behavior (the excerpt states this finding but does not include the experimental details or metrics).
There is a design gap: developers' emphasized traits (politeness, strictness, imagination) differ from workers' preferred traits (straightforwardness, tolerance, practicality).
Comparison of developer and worker survey responses reported in the study (171 tasks; LM scaling to 10,131 tasks).
These findings suggest that agent skills are a narrow intervention whose utility depends strongly on domain fit, abstraction level, and contextual compatibility.
Interpretation derived from the empirical pattern: majority of skills show no improvement, a few specialized skills help, and some harm — leading to the conclusion that utility depends on fit and context.
There is a fundamental tension between designing AI for complementarity (performance-boosting) and designing AI for alignment (trust-building) when training a single AI model to assist human decision making.
Conceptual and theoretical analysis presented in the paper identifying the trade-off; no dataset/sample-size given in the excerpt.
Human capital is no longer defined solely by formal education or accumulated experience; it increasingly takes the form of a multidimensional system in which cognitive abilities, digital competencies, social and communicative skills, and ethical awareness interact and reinforce one another.
Result of the paper's synthesis combining systemic analysis and comparative assessment of international practices; conceptual/qualitative evidence rather than quantified measurement across populations.
Ongoing digital transformation and the widespread adoption of artificial intelligence are reshaping the formation, structure, and practical use of human capital in modern economies.
Paper's core analytical conclusion based on systemic analysis, comparative assessment of international practices, and analytical generalization of organizational learning models; no primary quantitative sample size or experimental data reported.
Organizations must reconceptualize AI implementation as a fundamental redesign of work systems requiring new competencies, governance structures, and attention to human cognitive limits.
Normative recommendation based on the paper's synthesis of organizational adaptation literature and reported negative outcomes of current AI deployments; no empirical test of this prescriptive claim provided in the excerpt.
Evidence on apprenticeship reforms indicates a shift toward higher-level qualifications and younger participants, while overall apprenticeship participation has declined.
Synthesis of reform evaluations and comparative studies on apprenticeship systems presented in the paper (summary does not identify which reforms/countries or provide participation statistics).
Participation in adult education and training has increased overall but remains uneven across age groups and skill levels.
Secondary data and comparative evidence cited in the paper showing rising adult learning participation with heterogeneity by age and skill level (no numerical breakdown provided in the summary).
Facilitated access to AI reconfigures startup roles, organizational structures, and decision routines.
Analytic findings from semi-structured interviews pointing to changes in role definitions, reporting lines, and decision-making routines after AI adoption (qualitative evidence; sample size not specified).
Artificial intelligence (AI) has redefined what it means to perform, achieve and succeed.
Stated as a conceptual claim in the paper's purpose/introduction; supported by theoretical argument and literature synthesis (leadership theory, emotional intelligence research, AI ethics). No empirical sample, experiments, or quantitative data provided in the paper.
AI adoption generates different effects across different occupations.
Summary statement based on analysis of publicly available labor market data (occupational-level heterogeneity asserted but specific datasets, sample sizes, and methods not described).