Evidence (6491 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
The authors introduce the concept of 'functional equivalents': technical capabilities (internal cognition, contextual intelligence, adaptive learning, and collaborative intelligence) that achieve collaborative outcomes comparable to human SEI attributes.
Conceptual contribution proposed by the authors based on interview findings and theoretical argumentation (no quantitative validation reported).
Socio-emotional intelligence (SEI) enhances collaboration among human teammates.
Stated as background in the paper (no primary data from this study provided to support the claim).
The binding constraint on human–AI complementarity in the Global South is not technology access but labor market institutions (formality).
Interpretation of empirical findings (formality interactions, triple interaction result) from the augmented Mincer regressions on Colombian data (N = 105,517).
These results provide the first developing-country evidence of cognitive factor decomposition in AI-augmented labor markets.
Claim based on the empirical results from the study using Colombian data and comparison to literature (author statement).
The augmentation premium is strongest in the health and education sectors.
Heterogeneity analysis / sectoral estimates in the augmented Mincer regression using the merged dataset (N = 105,517); reported strongest effects in health and education.
The augmentation premium (return to H^A with AI) is strongest for experienced workers (ages 46-65).
Heterogeneity analysis / subgroup estimates by age in the augmented Mincer regression using the merged dataset (N = 105,517); reported finding that ages 46–65 show the largest augmentation premium.
A triple interaction confirms formality as the binding mechanism: beta_{AHC x D x Formal} = +0.272 (p < 0.001).
Coefficient on triple interaction term in augmented Mincer regression estimated on merged dataset (N = 105,517); reported estimate +0.272, p < 0.001.
In the estimated augmented Mincer equation, the wage return to augmentable-cognitive capital (H^A) increases with AI adoption in the formal sector (beta_2 = +0.051, p < 0.001).
Econometric estimate from augmented Mincer regression using merged data (household survey N = 105,517; LLM-based occupational augmentability measures); reported coefficient beta_2 = +0.051 with p < 0.001.
The empirical analysis uses LLM-generated measures of occupational augmentability for 18,796 O*NET task statements mapped to 440 Colombian occupations, merged with household survey microdata (N = 105,517 workers).
Data construction described in the paper: LLM scoring of O*NET tasks (18,796 tasks), mapping to 440 occupations, merged with household survey microdata (sample N = 105,517).
I derive a corrected Mincerian wage equation and show that the standard specification is misspecified in AI-augmented economies.
Analytical derivation in the paper (theoretical correction to Mincerian wage equation).
AI capital interacts asymmetrically with those components: it substitutes for routine cognitive work (H^C) while complementing augmentable cognitive work (H^A) through an amplification function phi(D).
Theoretical production-function model and derivation in the paper (analytical result).
The paper proposes a decomposition of human capital into three orthogonal components: physical-manual (H^P), routine-cognitive (H^C), and augmentable-cognitive (H^A).
Theoretical proposal in the paper (modeling framework).
This research contributes to debates about the future of work, power asymmetries in platform economies, and the development of worker-protective regulatory frameworks, engaging perspectives from feminist economics, institutional theory, and surveillance capitalism studies.
Stated contribution in the abstract based on theoretical engagement and literature synthesis (conceptual claim; no empirical citation in abstract).
Theoretical frameworks developed in the paper require future empirical validation via case studies, quantitative analysis, and ethnographic research.
Methodological statement within the abstract describing the paper's limitations and next steps (self-report about the paper's status).
The study proposes institutional frameworks for realizing labor value and for worker-protective regulatory frameworks applicable to digital/platform economies.
Normative/theoretical proposals derived from conceptual analysis and engagement with feminist economics, institutional theory, and surveillance capitalism literature (no empirical testing reported).
The paper identifies key characteristics of value formation specific to platform economies.
Theoretical framework and literature synthesis presented in the study (conceptual; no empirical cases reported in abstract).
Living labor remains the sole source of new value; the core insights of the labor theory of value remain essential for critiquing contemporary digital capitalism.
Argumentative/theoretical development grounded in Marxist political economy and literature synthesis (conceptual paper, no empirical testing reported).
AI should be classified as constant capital rather than as labor.
Theoretical analysis and critical literature synthesis in a conceptual study (no empirical sample reported).
Secondary empirical evidence from Colombia's EDIT manufacturing survey (N=6,799 firms) shows that management practice quality amplifies the return to technology investment (interaction coefficient 0.304, p<0.01).
Secondary empirical analysis of EDIT manufacturing survey data; sample size reported as N = 6,799 firms; regression interaction term reported as coefficient 0.304 with p < 0.01.
We endogenize the augmentation function as phi(D, W), where W is a five-dimensional workplace design vector (AI interface design, decision authority allocation, task orchestration, learning loop architecture, psychosocial work environment), and prove that human-centric design is profit-maximizing when the workforce's augmentable cognitive capital exceeds a critical threshold.
Theoretical model and formal proof presented in the paper (analytical derivation of phi(D,W) and threshold condition).
To optimize agentic AI integration and ensure responsible innovation across financial services, interdisciplinary, longitudinal research and robust governance frameworks are needed.
Authors' conclusions and recommendations based on the identified findings and gaps in the reviewed literature.
Diverse architectural models such as multi-agent systems and cloud-based frameworks enable scalable, adaptive agentic AI deployments in financial services.
Synthesis of architecture-focused studies and framework descriptions within the reviewed literature (architectural benchmarking across papers).
Findings reveal substantial productivity gains and operational efficiencies predominantly in banking and investment.
Systematic review synthesizing multidisciplinary qualitative, quantitative, and bibliometric studies of agentic AI applications in financial services published up to mid-2024 (review-level synthesis).
Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.
Overall conclusion drawn from the empirical comparisons across datasets and analyses described in the paper (summary statement in abstract).
Using signal detection theory, the paper finds that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice.
Analysis in the paper applying signal detection theory to the empirical results (as stated in abstract).
The HCT also performed better in almost all cases in which the AI offered an explanation of its judgment.
Empirical results on the subset of four datasets with AI explanations (abstract reports HCT performed better in 'almost all' of these cases).
The HCT outperformed the AI-as-advisor approach in all datasets.
Empirical comparisons reported across the 10 datasets (statement in abstract that HCT 'outperformed' in all datasets). Specific performance metrics not provided in abstract.
An AI agent given revealed-preference data predicts subjects' choices more accurately than an AI agent given stated-preference prompts.
Online experiment in which subjects provided written instructions (prompts) and revealed preferences via choices in a series of binary lottery questions; AI agents were given either the revealed-preference data or the stated-preference prompts and their prediction accuracy on subjects' choices was compared.
Under economy-wide deployment, the share of computer-vision-exposed labor compensation that is cost-effectively automatable rises sharply (relative to the firm-level 11% estimate).
Model counterfactuals or calibration scenarios comparing firm-level deployment vs economy-wide deployment; qualitative statement that share increases substantially.
At the firm level, cost-effective automation captures approximately 11% of computer-vision-exposed labor compensation.
Calibration and implementation in computer vision; reported firm-level estimate from the framework.
Scale of deployment is a key determinant: AI-as-a-Service and AI agents spread fixed costs across users, sharply expanding economically viable tasks.
Modeling and calibration arguments showing fixed-cost spreading effects increase set of tasks for which automation is cost-effective; qualitative and quantitative comparisons in implementation.
Because higher accuracy is disproportionately costly (convex cost), full automation is often not cost-minimizing; partial automation, where firms retain human workers for residual tasks, frequently emerges as the equilibrium.
Theoretical model combined with calibration (scaling laws + task mappings); equilibrium outcomes reported from the framework implementation.
We model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation.
The paper develops a theoretical framework / model that treats automation intensity as a continuous decision variable; described as the central modeling approach.
The results (conceptual/model results) support corporate GenAI policies, leadership development programs, and HR assessment of leader readiness for GenAI-enabled delegation and communication.
Practical implications and recommendations section arguing policy and HR applications based on the conceptual model.
The article introduces an EI-driven trust-calibration framework as an explanatory mechanism showing when generative AI improves leadership effectiveness and when it amplifies managerial errors.
Novel theoretical framework developed in the paper synthesizing EI, trust calibration, and psychological safety to explain boundary conditions of AI in leadership.
The paper provides an operationalization toolkit including measures: GenAI use intensity; delegation quality indices (clarity, boundaries, success criteria); communication quality indices (empathy, tone, transparency); psychological safety markers; and behavioral trust-calibration measures.
Operationalization section in the paper listing suggested indices and markers for empirical measurement.
As a follow-up validation path, the paper proposes a two-wave time-lag design and 180° assessment (leader + subordinates) to reduce common-method bias.
Methodological proposal in the paper describing longitudinal and multi-rater validation approaches.
The paper proposes a 'Package B' rapid empirical design: a randomized online experiment manipulating access to generative AI in core managerial tasks (decision, delegation, team communication), combined with EI measurement and trust-calibration indicators.
Methodology section proposing the rapid randomized online experiment design as the primary empirical test.
Emotional intelligence strengthens the positive impact of generative AI on managerial outcomes when trust is properly calibrated and psychological safety is maintained.
Conceptual model and integrative argument combining EI, trust-calibration, and psychological safety; supported by proposed empirical test design.
The paper conceptualizes human–AI leadership as an integrated managerial competence.
Conceptual modeling presented in the paper integrating EI theory, psychological safety, and trust calibration (theoretical synthesis).
In the user study, AI-expanded 5W3H prompts increase user satisfaction from 3.16 to 4.04.
Reported pre/post or baseline vs AI-expanded satisfaction scores in the N=50 user study with numeric scores 3.16 and 4.04.
In the user study, AI-expanded 5W3H prompts reduce interaction rounds by 60 percent.
Reported comparison in the N=50 user study between baseline interaction rounds and rounds after AI-assisted 5W3H expansion; percentage reduction reported as 60%.
A weak-model compensation pattern was observed: the lowest-baseline model (Gemini) shows a much larger D-A gain (+1.006) than the strongest model (Claude, +0.217).
Model-level comparison of D-A gain (difference between structured and unstructured conditions) across three models (Claude, GPT-4o, Gemini) on the evaluated outputs; reported gains for Gemini and Claude.
The strongest structured conditions reduce cross-language sigma from 0.470 to about 0.020.
Reported numeric comparison of sigma (variance) between unstructured baseline and strongest structured prompting conditions across evaluated outputs.
Structured prompting substantially reduces cross-language score variance relative to unstructured baselines.
Empirical comparison across 3,240 outputs evaluated by DeepSeek-V3, comparing structured vs. unstructured prompting across three languages.
Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese.
Statement referring to prior work (not new experiments in this paper); no sample size or methods provided in this text excerpt.
Large language model (LLM) use can improve observable output and short-term task performance.
Paper synthesizes empirical findings from human–AI interaction studies, learning-research experiments, and model-evaluation work indicating improved produced outputs and short-term task performance when humans use LLMs; no single pooled sample size or unified effect estimate is reported in the paper.
Frontier models (Claude Haiku 4.5, GPT-5-chat, GPT-5-mini) achieve statistically indistinguishable semantic closeness scores above 4.6 out of 5.0.
Reported semantic closeness scores from the LLM-as-Judge evaluation on the 15-proposal dataset; the paper states frontier models scored above 4.6/5.0 and were statistically indistinguishable from each other.
Autor et al. (2024) show that the majority of current employment is in job specialties that did not exist in 1940, with new task creation driven by augmentation-type innovations.
Citation reported in the paper summarizing Autor et al. (2024); no sample size provided in excerpt.
Firms may not sufficiently account for non-monetary aspects of technological progress (well-being, safety, quality of work); a planner would include such considerations in steering technological progress.
Normative conclusion based on theoretical analysis comparing firm objective functions (profits) vs social planner objectives (including non-monetary utility).