Evidence (6574 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
We introduce JobBench, which evaluates AI agents on the workflows that experts identify as high-priority for delegation, empowering humans based on their needs instead of replacing them with GDP value.
Description of a new benchmark (JobBench) presented by the authors; methodological design claim about target tasks and intent (expert-identified workflows prioritized for delegation).
This study proposes a Workforce Resilience Governance Framework (WRGF) that includes task-level exposure assessment, human augmentation design, reskilling, redeployment, transparent communication, psychological safety, workforce impact accountability, and policy alignment.
Conceptual framework proposed by the authors in the paper (design/proposal; no empirical test described in the excerpt).
The paper concludes with policy recommendations for accelerating human-centred AI integration in public-sector HRM.
Stated conclusion and policy recommendations section in the paper; recommendations derived from empirical findings.
Access to modern digital tools positively moderates AI uptake.
Reported moderation/interaction effects in regression/path analysis indicating that access to modern digital tools is associated with higher AI adoption/uptake; exact effect size not specified in summary.
Holding a managerial position is the strongest predictor of active AI adoption (OR = 1.609).
Reported odds ratio from the binary logistic regression for role/position predictor (managerial status) predicting active AI adoption; OR = 1.609.
Internal HR factors exert a stronger influence on perceived HR effectiveness (β = 0.463) than external factors (β = 0.227).
Reported standardized (?) path/regression coefficients from OLS/path analysis linking internal and external HR quality indices to perceived HR effectiveness; coefficients β = 0.463 and β = 0.227 respectively.
We evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in AgentSociety on real-world datasets.
Empirical evaluation / experiments using real-world datasets to measure collaborative performance under consensus-based routing among heterogeneous agents.
We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions.
Analytical game-theoretic characterization/proof of Nash equilibrium in the paper.
The mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, in order to garner influence.
Theoretical analysis and mechanism design arguments (and possibly supporting simulations) within the paper.
Delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus.
Formal theoretical proof/analysis presented in the paper (analytical/theoretical result).
We propose AgentSociety, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory.
Description and design of the AgentSociety mechanism in the paper (mechanism proposal / system design).
AI assistance can stabilize an overloaded workflow only when (i) the fraction of tasks handled by AI exceeds a critical threshold, and (ii) the human attention required for review and expected rework is lower than the attention required for manual completion.
Formal analytical conditions derived from the paper's queueing model (model-based theoretical result; no empirical sample reported).
The paper calls for action by stakeholders to consider human and environmental moderators when adopting AI.
Policy/recommendation statement in the paper's conclusion/abstract; normative recommendation rather than empirical finding.
We revise the existing framework to redefine effective organizational determinants and shed light on practical implications including industry and education.
Authors' proposed theoretical revision of an existing framework and discussion of implications; presented as a conceptual contribution within the paper.
Most practitioners assume that AI brings productivity boosts owing to enhanced technical capabilities.
Statement of common practitioner belief reported by the authors in the paper's framing; no supporting survey or sample reported in the abstract.
Adoption of Claude Code increases cumulative lifetime languages used by +0.51.
Panel analysis of 5,838 developers over 28 months using the Callaway & Sant'Anna estimator; treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the count of newly-used languages by +0.31.
Same dataset and staggered-rollout estimator (Callaway & Sant'Anna), treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code increases Shannon language entropy by +0.14.
Estimated with the doubly robust Callaway & Sant'Anna approach on the 5,838-developer panel over 28 months, using first Claude-co-authored commit as treatment.
Adoption of Claude Code increases the number of distinct programming languages used by a developer by +0.83.
Same panel and staggered-rollout estimation as above (Callaway & Sant'Anna), treatment = first Claude-co-authored commit.
Adoption of Claude Code increases the number of repositories a developer contributes to by +1.5 (monthly).
Same panel (5,838 developers, 28 months) and estimator (Callaway & Sant'Anna). Treatment = first Claude-co-authored commit; not-yet-treated controls.
Adoption of Claude Code is associated with an increase of +41 monthly commits per developer.
Analysis of a panel of 5,838 GitHub developers observed monthly over 28 months, exploiting staggered rollout of Claude Code (May 2025–Jan 2026). Treatment defined by developer's first Claude-co-authored commit; not-yet-treated developers used as controls. Estimates from the doubly robust Callaway and Sant'Anna (2021) staggered-difference-in-differences estimator.
A profile-driven approach places humans and AI systems on shared scales, supporting comparisons that are predictive of novel-task performance, explanatory of why agents succeed or fail, and auditable.
Claim about anticipated benefits of the proposed profile-driven approach presented in the paper (theoretical argument; no empirical results reported).
Suitability evaluations for task-assignment should be profile-driven — based on assessments that infer latent constructs such as capabilities and propensities from observed performance.
Core proposal of the position paper (conceptual/methodological recommendation; no empirical pilot or validation reported).
As AI is integrated into the workplace, organisations increasingly face allocation decisions between human and machine workers, and these decisions are increasingly made or assisted by algorithms.
Position paper / conceptual argument in the paper's introduction (no empirical sample or quantitative data reported).
Structured AI-based interventions provide causal evidence that they can transform access to scientific feedback from a largely private advantage into a more widely distributed resource.
Causal inference based on randomized field experiment showing increased revision likelihood and broader uptake of LLM tools across diverse regions and author groups.
Effects were strongest among teams with lower h-indexes and earlier career stages.
Heterogeneous treatment effects by team-level metrics (h-index) and career stage reported in the randomized experiment.
Effects were strongest for manuscripts less embedded in the scholarly literature.
Heterogeneous treatment effects reported by manuscript-level embedding in literature (e.g., referencing/citation context) within the randomized experiment.
Effects of AI feedback were strongest among authors from non-English-dominant research regions.
Heterogeneous treatment effects reported in the randomized experiment stratified by authors' geographic / language-dominance region; sample includes authors from 133 geographic regions.
Exposure to AI feedback increased authors' subsequent use of LLM tools in their future papers, suggesting longer-run shifts in scientific practice.
Follow-up measurements in the randomized field experiment tracking authors' later behavior (use of LLM tools in subsequent papers); comparison between treatment and control authors.
Authors who received LLM-generated feedback had a significantly higher likelihood of revising their manuscripts, corresponding to a 12.55% relative increase over the baseline revision rate.
Randomized field experiment comparing treatment (LLM feedback) vs control; sample described as >31,000 arXiv preprints and >45,000 researchers; reported comparative revision rate and statistical significance.
A-insensitivity increases with financial literacy, suggesting financially literate decision-makers perceive greater ambiguity in prediction accuracy.
Association reported in the incentivized laboratory experiment between participants' measured financial literacy and their measured a-insensitivity (correlational evidence; sample size not reported in abstract).
Decision-makers hold more optimistic beliefs about the accuracy of ML analysts than about human analysts, and this greater optimism predicts higher trust in ML analysts relative to human analysts.
Incentivized laboratory experiment measuring participants' optimism about forecast accuracy for human vs. ML analysts and examining the relationship between those beliefs and expressed trust (correlational/regression evidence; sample size not reported in abstract).
A human-centred approach underpinned by ongoing reskilling and ethical governance is vital for sustainable workforce evolution in the Indian IT sector.
Authors' policy/recommendation derived from their literature synthesis and thematic analysis (qualitative conclusion).
The paper introduces a conceptual framework for hybrid intelligence within the Indian IT sector.
Authors present a new conceptual framework as part of this qualitative research article (conceptual contribution).
Collaboration between humans and AI enhances decision-making, efficiency, and innovation.
Reported result from thematic evaluation of literature and secondary data (qualitative synthesis). No sample size or quantified effect provided.
AI improves overall organisational productivity.
Authors' synthesis of peer-reviewed studies and secondary data indicating productivity impacts (qualitative literature review). No quantitative sample size reported.
AI increases human capacities.
Conclusion from comprehensive analysis of peer-reviewed literature and thematic evaluation of secondary data (literature review). No primary sample size reported.
Time and effort dissociate: participants reported lower subjective effort with AI despite equivalent completion times.
Empirical result reported in the abstract: subjective effort ratings were lower for AI-assisted conditions even though measured completion times were equivalent (preregistered study, N = 1237).
Participants predicted AI to be significantly faster.
Empirical result reported in the abstract: participants' predicted completion times indicated AI-assisted completion would be faster than independent completion (statistical significance claimed). Sample from preregistered study (N = 1237).
Large language models (LLMs) have the potential to boost human productivity by speeding up task completion -- provided users know when to offload cognitive work to them.
Framing/introductory claim in the paper (theoretical/argumentative), no direct empirical evidence reported in the abstract.
Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.
Aggregate of dataset creation, benchmark results, algorithm (curriculum-LoRA) efficiency gains, and system integration reported in the paper; claim is a stated implication/claim about practical feasibility for local administrations.
The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline.
System-level description and implementation in the paper that embeds curriculum-LoRA within a closed-loop pipeline for policy evaluation and iteration.
Curriculum-LoRA Pareto-dominates every configuration tested.
Empirical comparisons across the tested configurations in the experiments reported in the paper; curriculum-LoRA outperforms or matches all other configurations on the fidelity-versus-cost Pareto frontier.
Curriculum-LoRA is a parameter-efficient personalization framework that, by closing the fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost.
Experimental evaluation comparing curriculum-LoRA to baselines on fidelity and per-call cost metrics; reported result that curriculum-LoRA attains comparable fidelity while reducing per-call cost by about a factor of ten.
Adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline.
Benchmark comparisons between prompting strategies that include rich life-history profiles versus a no-profile baseline across the evaluated LLMs, using the interview-derived dataset to assess fidelity.
The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains.
Reported dataset construction: two-hour semi-structured interviews with each of 92 residents (92 interviews), organized around nine governance domains; reported total text volume ~1.2 million characters.
The design isolates the contribution of the platform's algorithm to the outcome which is separable from creative content.
Methodological claim supported by the proposed three-arm design and its empirical demonstration in the live campaign.
Roughly three-quarters of the absolute reallocation is algorithmic.
Empirical decomposition from the live Meta campaign reported in the paper (proportion of total reallocation attributed to algorithmic channel).
In a live Meta campaign with a women-targeted text fragment, the algorithmic channel raises female impression share by +2.07 ppt.
Empirical result from a live Meta campaign reported in the paper; conveys a measured effect size (+2.07 percentage points).
We propose a three-arm design that adds an arm exposing the algorithm to the treatment metadata while holding the user-facing creative identical to control, point-identifying the natural indirect (algorithmic) and direct (creative) effects without sequential ignorability.
Methodological proposal in the paper (design description and identification claim); presumably supported by theoretical derivation/proof in the paper.