Evidence (3062 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	373	105	59	439	984
Governance & Regulation	366	172	115	55	718
Research Productivity	237	95	34	294	664
Organizational Efficiency	364	82	62	34	545
Technology Adoption Rate	293	118	66	30	511
Firm Productivity	274	33	68	10	390
AI Safety & Ethics	117	178	44	24	365
Output Quality	231	61	23	25	340
Market Structure	107	123	85	14	334
Decision Quality	158	68	33	17	279
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	88	31	38	9	166
Firm Revenue	96	34	22	—	152
Innovation Output	105	12	21	11	150
Consumer Welfare	68	29	35	7	139
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	71	10	29	6	116
Worker Satisfaction	46	38	12	9	105
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	11	16	94
Task Completion Time	76	5	4	2	87
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	16	9	5	48
Job Displacement	5	29	12	—	46
Social Protection	19	8	6	1	34
Developer Productivity	27	2	3	1	33
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	8	4	9	—	21

Human Ai Collab Remove filter

Bounded-autonomy governance internalizes some externalities from automated interactions, reducing the probability of cascading failures and associated economic damages, but misaligned or heterogeneous governance across firms/sectors can still generate systemic vulnerabilities.

Theoretical argument combining externalities literature and governance design principles; illustrative scenarios and policy reasoning (no empirical validation).

medium mixed Resilience Meets Autonomy: Governing Embodied AI in Critical... net effect on systemic risk (probability and expected loss from cascades) under ...

Modern critical infrastructure increasingly uses embodied AI for monitoring, predictive maintenance, and decision support, but these systems are typically trained for statistically representable uncertainty rather than systemic, cascading crises.

Review and synthesis of policy texts, industry descriptions, and safety/AI standards cited in the paper (EU AI Act, ISO standards) and literature on embodied-AI applications; conceptual argument (no original empirical sample).

medium mixed Resilience Meets Autonomy: Governing Embodied AI in Critical... mismatch between training uncertainty assumptions and real-world systemic crisis...

Cooperation with the AI is sustained mainly through conditional rule-based strategies rather than through trust-building, emotional, and social channels.

Synthesis of behavioral trajectories (cooperation plateauing below human–human levels), strategy-estimation results (prevalence of rule-based strategies such as Grim Trigger), and chat-content analysis (more explicit commitments, fewer social/emotional messages) from the laboratory experiment (human–AI n = 126) and comparison to human–human benchmark (n = 108).

medium mixed Playing Against the Machine: Cooperation, Communication, and... mechanism of cooperation (relative contribution of conditional rule-following vs...

When allowed repeated communication with the AI, human subjects remain behaviorally dispersed and do not converge to a single dominant strategy.

Strategy-estimation results for the human–AI repeated-chat treatment (from the experiment, n = 126) showing heterogeneous assignment across strategy classes and lack of convergence over time.

medium mixed Playing Against the Machine: Cooperation, Communication, and... strategy convergence / dispersion (distribution of inferred strategies over time...

The study documents a 'silent empathy' effect: people often feel empathic concern but fail to express it in ways that align with normative empathic communication; targeted feedback helps close that expression gap.

Analysis showing mismatch between internal empathic concern (implied by context/self-report/ratings) and the presence of idiomatic empathic moves in participants' messages; targeted personalized feedback increased use of normative empathic expressions.

medium mixed Practicing with Language Models Cultivates Human Empathic Co... gap between experienced empathy and expressed empathic moves (alignment with nor...

Investments in interpretability that aim to fully 'rule‑ify' LLM competence may have diminishing returns; economic value may be better captured by research into robust behavioral evaluation, stress testing, and hybrid human‑AI workflows, while partial interpretability remains valuable.

R&D allocation and interpretability economics argument built on the central thesis; suggestion rather than empirical finding.

medium mixed Why the Valuable Capabilities of LLMs Are Precisely the Unex... returns to different types of interpretability/AI safety R&D

The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.

Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.

medium mixed Why the Valuable Capabilities of LLMs Are Precisely the Unex... completeness of rule‑based scientific explanations when applied to LLM behavior

LLMs can provide useful inputs for near-term economic and logistical forecasting in crises (e.g., supply-chain disruptions, commodity market impacts, transport/logistics constraints), but their political/strategic forecasts should be used cautiously.

Observed stronger and more verifiable performance on economic/logistical question types in the 42-node evaluation; weaker reliability on politically ambiguous multi-actor issues reported in qualitative coding and verifiability checks.

medium mixed When AI Navigates the Fog of War usefulness for forecasting (economic/logistical forecasting accuracy/utility vs....

Model narratives evolve over time: earlier node outputs emphasize rapid containment, while later node outputs increasingly describe regional entrenchment and attritional de-escalation scenarios.

Longitudinal analysis across 11 temporal nodes comparing thematic/narrative content of model responses; qualitative coding tracked shifts in dominant scenario framings from early to later nodes.

medium mixed When AI Navigates the Fog of War narrative framing over time (frequency of containment vs. entrenchment/attrition...

Model reliability is uneven across domains: performance is stronger on structured economic and logistical questions than on politically ambiguous, multi-actor strategic issues.

Domain-specific comparison of model outputs on node-specific verifiable questions and exploratory prompts, with higher verifiability/accuracy and more consistent inferences reported for economic/logistical items versus greater ambiguity and lower consistency on political/multi-actor items.

medium mixed When AI Navigates the Fog of War domain-specific accuracy/reliability (economic/logistical vs. political/strategi...

Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.

Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Co‑occurrence of high Appropriate Application Rate (AAR) and nonzero Misapplicat...

Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.

Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Misapplication Rate (MR) and Appropriate Application Rate (AAR) under prompt‑bas...

Attempts to mitigate misapplication with stronger reasoning prompts (e.g., chain‑of‑thought) reduce Misapplication Rate but do not eliminate it.

Ablation applying reasoning prompts and chain‑of‑thought style instructions to models, comparing MR before and after; reported reductions in MR but persistence of non‑zero MR across scenarios.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Change in Misapplication Rate (MR) after applying chain‑of‑thought / reasoning p...

Models that more faithfully enforce stored preferences achieve higher Appropriate Application Rate (AAR) but also systematically have higher Misapplication Rate (MR), indicating a trade‑off between correct personalization and harmful over‑application.

Ablation experiments varying strength of preference encoding and measuring resulting AAR and MR per model; quantitative comparisons across models showing positive correlation between stronger preference adherence and both higher AAR and higher MR.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Appropriate Application Rate (AAR) and Misapplication Rate (MR) — trade‑off rela...

Finance, Education, and Transportation show mixed dynamics: both displacement of routine tasks and creation of new hybrid roles.

Descriptive sectoral analyses from the simulated dataset (hybrid share, task-displacement indicators, employment changes) covering Finance, Education, Transportation (2020–2024), plus mixed-evidence studies from the literature synthesis (ACM/IEEE/Springer 2020–2024).

medium mixed AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Hybrid job share, task-displacement indicators, employment levels by sector

Overall, economic benefits from AI in radiology are plausible but conditional on human-AI interaction design, governance, workforce effects, and payment structures; net value is not determined by algorithmic accuracy alone.

Synthesis of the heterogeneous literature (laboratory, reader, observational, qualitative) and conceptual economic analysis highlighting dependencies beyond algorithmic performance.

medium mixed Human-AI interaction and collaboration in radiology: from co... net economic value/ROI, clinical outcomes, adoption and sustainability metrics

The net effect of AI on clinician burnout is ambiguous: tools can remove tedious tasks but may introduce new cognitive, administrative, and liability stresses.

Mixed qualitative and small-scale observational studies with variable findings on burnout-related measures after AI introduction.

medium mixed Human-AI interaction and collaboration in radiology: from co... burnout survey scores, task satisfaction, administrative burden metrics

Changes in workload composition can reduce routine burdens but may shift cognitive load to follow-up decisions and managing AI outputs.

Observational and qualitative studies of deployed systems reporting redistribution of tasks and clinician-reported changes in cognitive demands.

medium mixed Human-AI interaction and collaboration in radiology: from co... time allocation across task types, subjective cognitive workload scores, frequen...

Economic outcomes depend on complementarity versus substitution: AI that augments radiologists can raise output per worker; AI that substitutes tasks may reduce demand for certain diagnostic activities.

Theoretical economic frameworks and case studies of task reallocation in early deployments; empirical workforce-impact studies limited.

medium mixed Human-AI interaction and collaboration in radiology: from co... radiologist productivity metrics, employment levels/demand for diagnostic activi...

Automation bias can increase undue reliance on AI, while algorithmic aversion can drive underuse of helpful tools.

Cognitive and behavioral studies and reader simulations demonstrating both increased acceptance/overtrust in automated outputs in some settings and rejection/discounting of AI advice in others.

medium mixed Human-AI interaction and collaboration in radiology: from co... rates of clinician acceptance/use of AI recommendations, error rates when follow...

Real clinical value depends critically on how AI tools interact with radiologists in practice (integration design and human-AI interaction).

Conceptual models and synthesis of reader studies, simulation/interaction studies, usability and qualitative deployment evaluations that compare standalone algorithm performance versus clinician+AI workflows.

medium mixed Human-AI interaction and collaboration in radiology: from co... clinician-AI joint diagnostic performance, patient-relevant outcomes, workflow m...

Practical takeaway: effectiveness of human–AI teaming in security tasks depends heavily on human ability to formulate context-rich prompts; autonomous workflows that self-manage prompting and tool selection can be more effective.

Synthesis of empirical observations from the live CTF (41 participants) and the autonomous agent benchmark (4 agents), showing human prompting failures limiting team performance and autonomous agents with self-directed prompting achieving higher performance.

medium mixed Understanding Human-AI Collaboration in Cybersecurity Compet... relative effectiveness (challenge solve rates/rankings) conditional on human pro...

Participants’ perceptions, trust, and expectations about the AI shifted after hands-on use (qualitative observation).

Pre- vs. post-AI qualitative measures and observational analysis collected during the live CTF (self-reports/observations of trust and expectations after using the instrumented AI).

medium mixed Understanding Human-AI Collaboration in Cybersecurity Compet... qualitative changes in participant perceptions, trust, and expectations after ha...

Implication for substitution: Because there was no main effect of partner type on collaboration proficiency, AI teammates may substitute for humans on short, temporary tasks without clear productivity loss—conditional on emotional and empathetic factors.

Inference by authors based on the null main effect of partner type combined with the observed role of emotion and service empathy in moderating/mediating collaboration proficiency (experimental evidence, n = 861).

medium mixed Adoption of AI partners in temporary tasks: exploring the ef... productivity / collaboration proficiency

Theoretical framing: an attention-based view (ABV) and a dual-agent model capture two opposing mechanisms—(1) human attention gain from initial AI–human collaboration and (2) AI attention shift under deep embedding—that jointly generate the inverted U-shaped AI–ECSR relationship.

The paper develops and presents ABV and a dual-agent theoretical model to explain observed empirical patterns; model predictions align qualitatively with regression results and heterogeneity tests.

medium mixed Attention to Whom? AI Adoption and Corporate Social Responsi... Managerial attention (theoretical/mediating construct)

Trust calibration influences project performance outcomes: organizations tend toward metric-driven evaluation of AI outputs and use AI to strategically augment human expertise, but miscalibration risks overreliance or inappropriate metric focus that can harm performance.

Based on participants' reported experiences in the 40 interviews and interpretive thematic analysis linking trust practices to observed/perceived performance consequences (shift to metric-based evaluation, strategic use, and noted risks).

medium mixed AI in project teams: how trust calibration reconfigures team... project performance (measured outputs, augmentation of expertise, error rates/qu...

Trust calibration shapes collaboration patterns, including delegation of oversight to systems or specialists, changes in communication networks (who talks to whom), and erosion of informal ad hoc communications used previously for tacit coordination.

Observed in interview narratives (40 interviews) and thematic coding showing repeated reports of shifted oversight roles, altered communication pathways, and reduced informal coordination after AI integration.

medium mixed AI in project teams: how trust calibration reconfigures team... collaboration dynamics (oversight delegation, communication patterns, informal c...

Trust calibration is produced and maintained through ongoing boundary work between humans and machines (i.e., teams continuously negotiate which inputs/responsibilities are treated as human versus machine).

Derived from participants' accounts in the 40 interviews and thematic analysis documenting repeated examples of role negotiation and boundary-setting between people and AI systems during project routines.

medium mixed AI in project teams: how trust calibration reconfigures team... trust calibration practices / boundary work (who is responsible for tasks/inputs...

Trust in AI within project-based work is situational and socially distributed across team members, rather than a stable individual attitude.

The claim is based on thematic qualitative analysis of 40 semi-structured interviews with project professionals across multiple industries in the UK. Interview data showed variation in how different team members described their trust in systems depending on role, task, and context.

medium mixed AI in project teams: how trust calibration reconfigures team... trust in AI (nature/distribution of trust across individuals and situations)

Explicit governance reduces negative externalities (bias, privacy breaches, loss of trust) but entails compliance costs that should be factored into adoption and diffusion models.

Conceptual claim synthesizing trade‑off arguments from governance and risk literatures and practitioner examples; not measured empirically in the paper.

medium mixed Symbiarchic leadership: leading integrated human and AI cybe... incidence of bias/privacy breaches/loss of trust; governance/compliance costs

Embedding AI into workflows may change firm boundaries (e.g., outsourcing models vs. in‑house systems) and make investments in internal auditability and explainability strategic assets.

Theoretical implication drawn from synthesis of organizational boundary theory and practitioner trends; suggested rather than empirically demonstrated within the paper.

medium mixed Symbiarchic leadership: leading integrated human and AI cybe... firm boundaries (insourcing vs outsourcing); value of internal governance capabi...

Emerging technologies (AI, digital twins, computational rheology) can compress high-dimensional sensory/rheological spaces into actionable models, enabling faster iteration in R&D and altering how firms value R&D inputs.

Theoretical projection and literature-based argument about technological capabilities; illustrative scenarios offered; no empirical trials or measured productivity changes reported.

medium mixed At the table with Wittgenstein: How language shapes taste an... R&D iteration speed, valuation of R&D inputs, and model compressibility of senso...

Upfront costs are high (expert annotation, longitudinal monitoring), but automation of routine tasks can reduce operational costs for ecological monitoring and enforcement.

Cost-structure observation in the paper referencing the resource intensity of data collection and the cost-saving potential of task automation (derived from examples and economic reasoning).

medium mixed Towards ‘digital ecology’: Advances in integrating artificia... upfront versus operational costs for ecological monitoring

Investments in cross‑disciplinary projects produce high social returns (methodological innovation plus environmental public goods), but private returns may be limited, suggesting a role for public funding and philanthropic support.

Economic-returns argument in the paper based on the public‑good nature of conservation outcomes and the dual-output character of interdisciplinary R&D (theoretical/evaluation-based claim across examples).

medium mixed Towards ‘digital ecology’: Advances in integrating artificia... social returns vs private returns on interdisciplinary R&D investments

Occupational competence varies from 43.2% in high-tech to 9.7% in the public sector.

Sectoral analysis derived from the study's dataset (LinkedIn job adverts and/or Indeed salary information, 2022–2024) where 'occupational competence' was operationalized and measured across sectors to produce the cited percentages.

medium mixed Reconstruction of knowledge worker performance evaluation sy... measured occupational competence (%) by sector (high-tech and public sector exam...

AI adoption shifts inventor composition within firms.

Analyses of inventor-level or inventor-aggregate characteristics before and after AI adoption showing changes in composition, using the staggered diff-in-diff approach.

medium mixed AI and Productivity: The Role of Innovation inventor composition measures (e.g., shares by skill, tenure, or role)

Overall, AI adoption facilitates both refinement of existing knowledge (exploitation) and exploration of new technological domains (exploration).

Combined evidence: increases in exploitative-patent share (exploitation) together with increases in originality, generality and technological distance (exploration) using the stacked diff-in-diff approach.

medium mixed AI and Productivity: The Role of Innovation mix of exploitation indicators (share exploitative) and exploration indicators (...

Programming experience cannot be fully substituted by Gemini.

Comparative results from the experimental conditions: although participants could use Gemini (free or paid), the observed benefit of programming experience on code security remained significant, indicating Gemini did not replicate or replace the effect of experience in the sample of 159 developers.

medium mixed The Impact of AI-Assisted Development on Software Security: ... degree to which Gemini use offsets the effect of programming experience on code ...

Many of the fundamental advantages and challenges studied in distributed computing also arise in LLM teams.

Empirical and/or conceptual analysis reported by the authors mapping distributed computing phenomena to LLM-team behavior (the excerpt states this finding but does not include the experimental details or metrics).

medium mixed Language Model Teams as Distributed Systems presence of distributed-computing advantages/challenges in LLM teams

There is a design gap: developers' emphasized traits (politeness, strictness, imagination) differ from workers' preferred traits (straightforwardness, tolerance, practicality).

Comparison of developer and worker survey responses reported in the study (171 tasks; LM scaling to 10,131 tasks).

medium mixed Are We Automating the Joy Out of Work? Designing AI to Augme... degree of alignment/misalignment between developer-design priorities and worker ...

These findings suggest that agent skills are a narrow intervention whose utility depends strongly on domain fit, abstraction level, and contextual compatibility.

Interpretation derived from the empirical pattern: majority of skills show no improvement, a few specialized skills help, and some harm — leading to the conclusion that utility depends on fit and context.

medium mixed SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... qualitative assessment of conditions affecting utility of agent skills (domain f...

There is a fundamental tension between designing AI for complementarity (performance-boosting) and designing AI for alignment (trust-building) when training a single AI model to assist human decision making.

Conceptual and theoretical analysis presented in the paper identifying the trade-off; no dataset/sample-size given in the excerpt.

medium mixed Align When They Want, Complement When They Need! Human-Cente... trade-off between human-AI team performance (complementarity) and human trust/al...

Human capital is no longer defined solely by formal education or accumulated experience; it increasingly takes the form of a multidimensional system in which cognitive abilities, digital competencies, social and communicative skills, and ethical awareness interact and reinforce one another.

Result of the paper's synthesis combining systemic analysis and comparative assessment of international practices; conceptual/qualitative evidence rather than quantified measurement across populations.

medium mixed EDUCATIONAL AND PROFESSIONAL STRATEGIES FOR PREPARING HUMAN ... composition/dimensionality of human capital (cognitive abilities, digital compet...

Ongoing digital transformation and the widespread adoption of artificial intelligence are reshaping the formation, structure, and practical use of human capital in modern economies.

Paper's core analytical conclusion based on systemic analysis, comparative assessment of international practices, and analytical generalization of organizational learning models; no primary quantitative sample size or experimental data reported.

medium mixed EDUCATIONAL AND PROFESSIONAL STRATEGIES FOR PREPARING HUMAN ... formation, structure, and practical use of human capital

Organizations must reconceptualize AI implementation as a fundamental redesign of work systems requiring new competencies, governance structures, and attention to human cognitive limits.

Normative recommendation based on the paper's synthesis of organizational adaptation literature and reported negative outcomes of current AI deployments; no empirical test of this prescriptive claim provided in the excerpt.

medium mixed When AI Assistance Becomes Cognitive Overload: Understanding... organizational readiness/adequacy of governance and competencies (implementation...

Evidence on apprenticeship reforms indicates a shift toward higher-level qualifications and younger participants, while overall apprenticeship participation has declined.

Synthesis of reform evaluations and comparative studies on apprenticeship systems presented in the paper (summary does not identify which reforms/countries or provide participation statistics).

medium mixed Balancing Higher Education, Vocational Training, and Lifelon... apprenticeship qualification levels, age distribution of participants, overall p...

Participation in adult education and training has increased overall but remains uneven across age groups and skill levels.

Secondary data and comparative evidence cited in the paper showing rising adult learning participation with heterogeneity by age and skill level (no numerical breakdown provided in the summary).

medium mixed Balancing Higher Education, Vocational Training, and Lifelon... participation rates in adult education/training by age group and skill level

Facilitated access to AI reconfigures startup roles, organizational structures, and decision routines.

Analytic findings from semi-structured interviews pointing to changes in role definitions, reporting lines, and decision-making routines after AI adoption (qualitative evidence; sample size not specified).

medium mixed Hybrid decision architectures: exploring how facilitated AI ... roles, organizational structure, and decision routines

Artificial intelligence (AI) has redefined what it means to perform, achieve and succeed.

Stated as a conceptual claim in the paper's purpose/introduction; supported by theoretical argument and literature synthesis (leadership theory, emotional intelligence research, AI ethics). No empirical sample, experiments, or quantitative data provided in the paper.

medium mixed Deconstructing success: why being human still matters definition/criteria of 'success' (conceptual)

AI adoption generates different effects across different occupations.

Summary statement based on analysis of publicly available labor market data (occupational-level heterogeneity asserted but specific datasets, sample sizes, and methods not described).

medium mixed Analysis of Economics and the Labor Market: With Implication... occupation-specific employment and productivity outcomes

« Prev 1 2 3 … 26 27 28 … 61 62 Next »