Evidence (5157 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
There is a shift from learning as growth to learning as survival, where upskilling is oriented toward immediate market viability rather than long-term development.
Reported thematic finding from the paper's interviews and survey of freelance knowledge workers.
Freelancers do not treat generative AI as their primary learning resource due to inconsistency, lack of contextual relevance, and verification overhead.
Reported finding from the paper's mixed-methods study (survey + semi-structured interviews with freelance knowledge workers).
Freelance workers must continually acquire new skills to remain competitive in online labor markets, yet they lack the organizational training, mentorship, and infrastructure available to traditional employees.
Framing statement in the paper's introduction / literature review (not reported as an empirical result from this study).
Suppression bias is the systematic suppression of correct-but-difficult recommendations when clinician capability falls below the execution threshold.
Definition and characterization of a proposed failure mode provided in the paper (conceptual/theoretical).
Existing approaches, runtime guardrails, training-time alignment, and post-hoc auditing treat governance as an external constraint rather than an internalized behavioral principle, leaving agents vulnerable to unsafe and irreversible actions.
Author's conceptual/literature critique presented in the paper (argumentative claim, no empirical sample or experiment reported for this statement).
The marginal gains from genAI came at the high cost of recruiter deskilling, a trend that jeopardizes meaningful oversight of decision-making.
Qualitative interview evidence (n=22) where participants described loss of skills/deskilling associated with genAI use and concerns about oversight.
The decision of whether or not to adopt genAI was often outside recruiters' control, with many feeling compelled to adopt due to directives from higher-ups in their business.
Reports from interviewed recruiters (n=22) indicating organizational pressure and top-down calls to integrate AI.
Recruiters believe they have final authority across the recruiting pipeline, but genAI has become an invisible architect shaping the foundational information used for evaluation (e.g., defining a job, determining what counts as a good interview performance).
Qualitative findings from interviews with 22 recruiting professionals describing perceived authority versus the influence of genAI on informational inputs.
GenAI subtly influences control over everyday recruiting workflows and individual hiring decisions.
Qualitative evidence from semi-structured interviews with 22 recruiting professionals (n=22).
Boundary conditions limit UCF applicability in contexts requiring human accountability or embodied knowledge.
Author-stated caveat in the abstract identifying contexts (accountability, embodied knowledge) where the framework may not apply; theoretical reasoning, no empirical tests.
Existing frameworks (Transaction Cost Economics and Electronic Markets Hypothesis) cannot explain emerging organizational phenomena like GitHub Copilot’s recursive value creation or AI-mediated expert networks.
Conceptual critique in the position paper using illustrative examples (GitHub Copilot, AI-mediated expert networks); no empirical testing or sample provided.
AI governance, ethical concerns, openness, workforce adjustment, and integration complexity are crucial concerns that managers must consider when implementing AI.
Synthesis of risks and challenges reported across the reviewed literature (paper's discussion/conclusion); no specific counts of studies or empirical measures provided in the abstract.
Conventional managerial practices usually encounter difficulties dealing with the flow of information, ineffectiveness of workflow, slow decision making, and redundant administrative processes.
Background statement in the paper's introduction / literature review (narrative claim based on surveyed literature); no specific empirical study or sample size reported in the abstract.
Vulnerable populations—including low-skill workers, aging labour forces, and developing economies—are especially affected by AI-driven changes.
Abstract highlights special attention to vulnerable populations in the review and asserts differential impacts; no specific empirical estimates or sample sizes provided in abstract.
AI displaces routine cognitive and manual tasks.
Explicit finding reported in abstract based on the paper's systematic review of empirical studies (no individual study sample sizes or quantitative estimates provided in abstract).
Persistent AI memory reduced to a retrieval problem (store prior interactions as text, embed them, and ask the model to recover relevant context later) is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns.
Argument and conceptual analysis presented in the paper describing types of operations (exact facts, updates/deletions, aggregation, relations, negative queries, explicit unknowns) that retrieval-style memory fails to satisfy; no sample size or quantitative evaluation provided for this specific claim in the excerpt.
This stratification produces trust-based inequality in who can leverage AI while sustaining credibility, voice, and liveness.
Analytical claim based on patterns in 16 interviews indicating differential capacities to conceal/humanize AI lead to unequal ability to both use AI and maintain audience trust and perceived authenticity.
Passing capacity is stratified by educational and professional capital, economic resources and team support, and platform position.
Interview evidence (n=16) showing creators with higher education/professional capital, more economic resources, team support, or advantageous platform positions report greater ability to conceal and perform AI-assisted content.
These invisible authenticity practices reallocate work from generation to downstream repair and performance, complicating claims that AI simply improves efficiency.
Derived from creators' accounts in 16 interviews describing extra downstream editing, verification, and performance labor required after AI generation.
Creators associate legible AI assistance with intertwined trust vulnerabilities, including epistemic unreliability, anticipated relational penalties, and platform authenticity regimes.
Thematic findings from 16 interviews in which creators express concerns about AI-generated content being epistemically unreliable, damaging relationships with audiences, and conflicting with platform authenticity norms.
On authenticity-oriented platforms, visible use of AI can be discrediting for creators.
Reported by creators across 16 in-depth interviews on Xiaohongshu and Douyin; qualitative thematic analysis identifying platform-specific authenticity norms and reputational consequences.
Each stakeholder in the supply chain may believe they are compliant; nevertheless, the integrated system may produce biased outcomes.
Conceptual argument based on literature synthesis and analysis of responsibility fragmentation (no empirical sample reported).
Information asymmetries mean deploying organizations bear legal responsibility without technical visibility into vendor-supplied algorithms, while vendors control implementations without meaningful disclosure requirements.
Regulatory analysis and literature review identifying mismatches in legal liability and technical visibility (no empirical sample reported).
A resume parser may function without bias independently but contribute to discrimination when integrated with specific ranking algorithms and filtering thresholds (illustrative example of interaction effects).
Illustrative example presented in conceptual analysis (no empirical test or sample reported).
Fragmented responsibilities create a critical problem: bias can emerge from interactions among components rather than from isolated elements, yet proprietary configurations prevent integrated evaluation of the full hiring system.
Argument and examples drawn from literature review and regulatory analysis; no empirical sample size reported.
Existing research examines bias through technical or regulatory lenses, but both perspectives overlook a fundamental challenge: modern AI hiring systems operate within complex supply chains where responsibility fragments across data vendors, model developers, platform providers, and deploying organizations.
Synthesis from literature review and conceptual analysis of AI hiring supply chains (no empirical sample reported).
The increasing adoption of AI systems in hiring has raised concerns about algorithmic bias and accountability, prompting regulatory responses including the EU AI Act, NYC Local Law 144, and Colorado's AI Act.
Literature review and regulatory analysis; cites existence of named laws/regulations as examples of regulatory responses (no sample size required).
Leaderboard rank alone is insufficient because models with similar pass rates can diverge in overall completion, and task-level discrimination concentrates in a middle band of tasks.
Analytical observations from benchmark results comparing pass rates, overall completion metrics, and per-task discrimination patterns across models; based on the 13-model leaderboard analysis.
Experiments reveal that reliable workflow automation remains far from solved: the leading model passes only 66.7% of tasks and no model reaches 70%.
Experimental evaluation of 13 frontier models on 105 tasks; reported pass rates from the benchmark runs (leading model pass rate 66.7%, no model >=70%).
Many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed.
Qualitative critique in the paper comparing existing benchmark design choices; based on authors' survey/analysis of prevailing benchmark practices (no explicit systematic review sample size reported).
The 2026 Amazon outages illustrate how 'mechanized convergence' (homogenization of code/engineering practices via AI) leads to systemic fragility.
Case study analysis using the 2026 Amazon outages as a single illustrative example; implies qualitative examination of that event.
Recursive training on synthetic code threatens to homogenize the global software reservoir, diminishing the variance required for robust engineering.
Theoretical claim about dataset/model feedback loops; no empirical quantification provided in the text excerpt (argumentative risk assessment).
This epistemological debt erodes the mental models essential for root-cause analysis, widening the gap between system complexity and human comprehension.
Argumentative/theoretical claim supported by reasoning in the paper; no quantified measurement of mental-model erosion reported.
Substituting logical derivation with passive AI verification creates an 'Epistemological Debt' — a hidden carrying cost incurred by engineers.
Theoretical/conceptual assertion within the paper; argued qualitatively rather than demonstrated with controlled empirical data.
The integration of Large Language Models (LLMs) into the software development lifecycle (SDLC) masks a critical socio-technical failure the authors term 'Cognitive-Systemic Collapse.'
Conceptual/theoretical claim presented in the paper's argumentation; no empirical sample or quantitative study reported for this specific naming claim.
Most studies are exploratory (59%) and methodologically diverse, but there is a lack of longitudinal and team-based evaluations.
Authors report study typology counts and note the absence of longitudinal and team-based designs across the reviewed literature.
Studies highlight concerns around cognitive offloading and reduced team collaboration when using LLM-assistants.
Synthesis of reported negative effects in included studies (themes extracted by the authors).
A notable subset of studies identifies critical risks associated with LLM-assistants.
Synthesis across included studies noting reported risks (e.g., cognitive offloading, collaboration issues).
Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics.
Outcome of pre-launch test cases and observed failure modes during testing.
There is limited but suggestive early evidence of labor market disruption from AI/LLMs.
Paper summarizes emerging empirical research indicating early signs of disruption; the abstract characterizes the evidence as limited and suggestive without presenting numeric estimates or sample sizes.
Certain occupations face the greatest risk from AI-driven automation (the article examines which occupations are most at risk).
Paper claims to examine occupation-level risk using synthesized empirical studies; the abstract does not list which occupations or quantitative risk estimates.
There is a gap between theoretical automation potential and observed real-world implementation of AI/LLMs.
Synthesis of recent empirical studies that compare task-level exposure metrics with employment and usage data; no specific sample sizes or numeric estimates provided in the abstract.
Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation.
Authors' experimental observations and analysis comparing outcomes starting from different seed designs (qualitative conclusion drawn from experimental runs).
Humans are more aggressive negotiators, accepting deals without a counteroffer only 56.3% of the time compared to 67.6% for LM-based agents.
Quantitative comparison reported in the user study (acceptance rates for humans vs LM-based agents).
Monthly operational cost of running the system is approximately USD 4,000.
Full-scale performance characterization reports monthly cost estimate of approximately USD 4,000.
The supply of AI-literate workers attenuates wage inequality effects.
Presented in the article as a distributional mechanism informed by synthesized theoretical and empirical findings; no concrete empirical methods or sample sizes are provided in the excerpt.
The framework addresses emerging tensions captured in the Creativity Paradox, whereby GenAI may weaken intrinsic motivation, conceptual risk-taking, and evaluative depth.
Theoretical extension of paradox theory and conceptual discussion of potential negative effects; presented as conceptual risks rather than empirically demonstrated outcomes.
Making AI usable can thus make procedures easier for future governments to learn and exploit.
Synthesis concluding claim based on the paper's formal model and argumentation (theoretical; no empirical testing reported).
The model shows why expansions in AI use may be difficult to unwind.
Analytical conclusion from the paper's formal model (theoretical argument without empirical sample).
The model explains why reforms that initially improve oversight can later increase that vulnerability.
Analytical/theoretical result from the paper's formal model (presented as an explanation; no empirical data).