Evidence (3062 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	373	105	59	439	984
Governance & Regulation	366	172	115	55	718
Research Productivity	237	95	34	294	664
Organizational Efficiency	364	82	62	34	545
Technology Adoption Rate	293	118	66	30	511
Firm Productivity	274	33	68	10	390
AI Safety & Ethics	117	178	44	24	365
Output Quality	231	61	23	25	340
Market Structure	107	123	85	14	334
Decision Quality	158	68	33	17	279
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	88	31	38	9	166
Firm Revenue	96	34	22	—	152
Innovation Output	105	12	21	11	150
Consumer Welfare	68	29	35	7	139
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	71	10	29	6	116
Worker Satisfaction	46	38	12	9	105
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	11	16	94
Task Completion Time	76	5	4	2	87
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	16	9	5	48
Job Displacement	5	29	12	—	46
Social Protection	19	8	6	1	34
Developer Productivity	27	2	3	1	33
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	8	4	9	—	21

Human Ai Collab Remove filter

Generative AI raises measurable productivity (lower marginal cost per interaction) but introduces quality and trust externalities; optimal deployment balances these trade-offs.

Pilot cost analyses and operational reports showing lower marginal costs per interaction alongside documented quality/trust issues; primarily observational and model-based reasoning.

medium mixed The Effectiveness of ChatGPT in Customer Service and Communi... marginal cost per interaction; quality/trust metrics (accuracy, escalation, chur...

Full automation produces trade-offs unfavorable to complex service quality and trust; hybrid models with human-in-the-loop control are preferable.

Synthesis of case studies, pilot results, and conceptual reasoning comparing fully automated routing to hybrid/human-in-the-loop deployments; limited randomized comparisons.

medium mixed The Effectiveness of ChatGPT in Customer Service and Communi... service quality metrics; customer trust; escalation rates

Generative AI can materially improve customer service productivity through 24/7 automation, scalable personalization, and agent augmentation — but is not a substitute for humans.

Synthesis of deployments, pilot studies, vendor reports, and some experimental A/B tests described in the paper; no pooled sample size provided and much evidence is short-run or observational.

medium mixed The Effectiveness of ChatGPT in Customer Service and Communi... productivity metrics (handling time, agent productivity), uptime/availability, t...

Data-driven HRM reinforces skill-biased technological change: routine HR tasks are being substituted by automation while demand rises for analytical and interpersonal skills.

Theoretical implication and synthesis across studies in the review noting automation of routine tasks and increased demand for analytic/interpersonal skills.

medium mixed Data-Driven Strategies in Human Resource Management: The Rol... employment composition by skill (routine vs analytical/interpersonal), substitut...

Adoption will be heterogeneous and distributional effects will follow: organizational readiness, regulatory environments, and industry structure will drive uneven adoption and competitive impacts.

Review finds varying adoption patterns in empirical and practitioner literature and synthesizes theoretical reasons for heterogeneity; empirical causal estimates are noted as scarce.

medium mixed Integrating Artificial Intelligence and Enterprise Resource ... adoption heterogeneity metrics (e.g., adoption rates across firm sizes/sectors, ...

One-off AI features typically produce limited returns unless organizations build complementary human and process capabilities and adapt governance and incentives.

Interpretive synthesis of case studies and practitioner guidance showing short-lived or limited benefits from isolated feature deployments without complementary investments.

medium mixed Integrating Artificial Intelligence and Enterprise Resource ... return on AI investment and persistence of benefits (e.g., ROI, sustained proces...

Blockchain and decentralized fintech tools could increase transparency and access to alternative assets for women, but practical adoption barriers remain.

Qualitative assessment of blockchain capabilities and uptake surveys / case studies cited in the article (product analyses and early adoption data; no large‑scale causal evidence).

medium mixed Women's Investment Behaviour and Technology: Exploring the I... access to alternative assets, transparency measures, adoption rates

AI-enabled macro and fiscal models can improve policy testing and contingency planning but require transparency, validation, and safeguards against overreliance.

Conceptual argument and illustrative examples; no empirical trials or model performance metrics reported.

medium mixed Governing The Future quality of policy testing/contingency planning and levels of model transparency/...

AI shifts the locus of economic governance from static rules to living systems that anticipate shocks and adapt in real time.

Policy-analytic framing and scenario-based reasoning within the book; supported by illustrative examples rather than empirical measurement.

medium mixed Governing The Future degree to which governance systems operate as adaptive, real-time 'living system...

Expected differential wage pressure: wages are likely to fall for routine/low‑skill occupations and rise or remain stable for high‑skill workers who possess complementary AI skills.

Econometric studies summarized in the review (cross‑sectional and panel regressions) and theoretical consistency with SBTC; the review highlights heterogeneity in findings and limited long‑run causal certainty.

medium mixed The Impact of AI Machine Learning on Human Labor in the Work... wage trajectories by skill level (routine/low‑skill vs high‑skill complementary ...

AI contributes to skills polarization: demand rises for advanced cognitive, digital, and socio‑emotional skills while routine cognitive and manual task demand declines.

Theoretical integration (SBTC), task decomposition studies showing shifts in task demand by skill content, and labour‑market analyses reporting changes in occupational skill mixes; evidence comes from cross‑sectional and panel studies summarized in the review.

medium mixed The Impact of AI Machine Learning on Human Labor in the Work... demand for different skill categories (advanced cognitive/digital/socio‑emotiona...

AI/ML has a dual, sector- and skill-dependent effect on labor: widespread displacement of routine and lower-skilled tasks coexists with augmentation of professional and cognitive work and the creation of new labor forms (gig, platform-mediated, and human–AI hybrid roles).

Systematic synthesis of peer‑reviewed empirical studies, industry and policy reports, task‑based analyses, and firm/establishment case studies across cross‑country and sectoral analyses; empirical approaches include econometric (cross‑sectional and panel) studies linking automation/AI adoption to employment and wages, task decomposition analyses, and surveys of firm adoption and restructuring. The review notes heterogeneity across studies and limited long‑run causal evidence.

medium mixed The Impact of AI Machine Learning on Human Labor in the Work... employment composition and task allocation (displacement of routine/low‑skill ta...

Routine automation of routine drafting tasks by GLAI may reduce demand for junior drafting labor while increasing demand for skilled reviewers, auditors, and legal technologists.

Labor-market reasoning based on task automation literature and illustrative vignettes; no labor-force survey or longitudinal employment data provided.

medium mixed (negative for junior drafting roles, positive for reviewer/technologist roles) Why Avoid Generative Legal AI Systems? Hallucination, Overre... employment demand by role (junior drafters vs. skilled reviewers/auditors/techno...

The dominant mechanism behind the performance drop is a collapse of Type2_Contextual issue detection at config_B, consistent with attention dilution in long contexts.

Analysis of issue-type specific detection rates shows Type2_Contextual detection collapses at config_B; interpretation ties this to attention dilution in longer contexts.

medium negative SWE-PRBench: Benchmarking AI Code Review Quality Against Pul... Type2_Contextual issue detection rate

The economic inevitability of technological transformation (in agentic finance) and the critical urgency of proactive intervention.

Author claim synthesizing the paper's argument and modeling results (normative conclusion based on earlier analysis and assertions, not a validated empirical finding).

medium negative STRENGTHENING FINANCIAL WORKFORCE COMPETITIVENESS: A CURRICU... likelihood of technology-driven structural change in the finance workforce

Surveillance intensity is associated with hyper-vigilance (reported effect = -4.213).

One of the six propositions from the paper's trilevel framework; the abstract reports an effect value of '-4.213' associated with surveillance intensity → hyper-vigilance.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... hyper-vigilance (psychological arousal/state)

Platform workers receive 36.3% more third-party ratings than traditional workers.

Quantitative synthesis/summary reported in the paper (no primary sample size in abstract); likely aggregated from included studies.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... number of third-party ratings received

Platform workers experience 59.6% higher digital speed determination than traditional workers.

Quantitative synthesis/summary reported in the paper (no primary sample size given in the abstract); presumably aggregated from included studies comparing platform and traditional workers.

medium negative Algorithmic Control and Psychological Risk in Digitally Mana... digital speed determination

Our findings surface practical limits on the complexity people can manage in human-AI negotiation.

Synthesis claim based on the empirical study varying number of issues and observed decline in performance beyond three issues; presented as a conceptual/practical implication of the results.

medium negative From Overload to Convergence: Supporting Multi-Issue Human-A... maximum manageable negotiation complexity (number of issues before performance d...

Bias effects vary by vulnerability type, with injection flaws being more susceptible to framing bias than memory corruption bugs.

Subgroup analysis in Study 1 comparing framing sensitivity across vulnerability classes (injection vs memory corruption) within the experiment dataset.

medium negative Measuring and Exploiting Confirmation Bias in LLM-Assisted S... change in vulnerability detection rate by vulnerability type

Low internal conflict or unanimity can be diagnostic of variance depletion (i.e., exclusion) rather than healthy integration, so governance systems should treat low conflict as a potential red flag until heterogeneity integration is verified.

Interpretive policy implication derived from the model's demonstration that exclusionary processes can produce deceptively low observed disagreement while increasing fragility; this recommendation is based on theoretical reasoning without empirical validation in the paper.

medium negative Cohesion as Concentration: Exclusion-Driven Fragility in Fin... internal conflict levels (observed dissent/unanimity) as indicator of variance d...

Most existing candidate matching systems act as keyword filters, failing to handle skill synonyms and nonlinear careers, resulting in missed candidates and opaque match scores.

Paper's introductory assertion about limitations of most current systems. The excerpt does not cite empirical studies, statistics, or systematic reviews to substantiate this claim.

medium negative JobMatchAI An Intelligent Job Matching Platform Using Knowle... limitations of extant systems: keyword-filter behavior, failure on skill synonym...

TDD (test-driven development) prompting alone increased regressions to 9.94%.

Empirical result reported in the paper comparing a TDD prompting intervention against other workflows on the benchmark (values given in the excerpt).

medium negative TDAD: Test-Driven Agentic Development - Reducing Code Regres... regression rate (percentage of tests that regressed) under TDD prompting

Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied.

Paper's critique of existing benchmark literature and practices (asserted by authors in background; no specific benchmark survey details in the excerpt).

medium negative TDAD: Test-Driven Agentic Development - Reducing Code Regres... coverage of regression measurement in existing benchmarks

The paper identifies five structural challenges arising from the memory governance gap: memory silos across agent workflows; governance fragmentation across teams and tools; unstructured memories unusable by downstream systems; redundant context delivery in autonomous multi-step executions; and silent quality degradation without feedback loops.

Qualitative analysis and problem framing presented in the paper (authors' identification of five specific challenges).

medium negative Governed Memory: A Production Architecture for Multi-Agent W... presence/identification of five structural governance challenges

AI raises managerial cognitive complexity and creates recurring tensions between algorithmic optimisation and systemic, ethical reasoning.

Theoretical synthesis highlighting emergent tensions from integrating computational optimisation with systems thinking and ethical considerations; conceptual, no empirical tests.

medium negative Comparative analysis of strategic vs. computational thinking... managerial cognitive complexity and frequency/severity of optimisation vs ethica...

Underprovision of verification is likely if left to market forces because information quality has positive externalities and misinformation imposes negative externalities, justifying public funding, subsidies, or regulation.

Economic reasoning and policy implications drawn from the study's findings and the literature on public goods/externalities.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... level of provision of verification services relative to social optimum

Censorship, restricted data flows, and government interference fragment markets, limit economies of scale, and favor well-resourced, internationally connected actors—widening capacity gaps.

Interpretive economic analysis grounded in observed access constraints and comparative case material across the three platforms.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... market fragmentation and distribution of capacity among actors

Limited data access and censorship reduce the efficacy of AI tools by creating training and validation gaps; legal risks complicate use of proprietary platforms and cloud services.

Interviews describing constraints on data availability and legal/operational barriers to using some platforms and cloud services; interpretive analysis of implications for AI training/validation.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... AI tool effectiveness (training/validation quality) and deployability

Generative AI increases the volume and sophistication of misinformation (deepfakes, fabricated documents), raises false-positive risks, and can be weaponized by state or nonstate actors.

Interview accounts and qualitative analysis noting observed or anticipated misuse of generative models and associated verification challenges.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... misinformation volume/sophistication and verification error risk

Resource constraints—limited staff time, funding, and technical capacity—are recurring operational challenges for these platforms.

Staff and stakeholder interviews plus analysis of organizational reports indicating staffing, funding, and technical limitations.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... staffing levels, funding availability, technical capacity

Platforms experience difficulty building and retaining audience trust and engagement, especially in contexts of high public skepticism or polarization.

Interview data from platform staff describing audience engagement challenges, supported by analysis of audience-focused platform formats and community-reporting strategies.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... audience trust and engagement levels

Platforms face limited or asymmetric access to primary data sources such as platform APIs, state data, and archives.

Interview accounts and document analysis noting restricted API access and barriers to state-held data and archives across the three cases.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... access to primary data sources

Censorship and legal risks constrain reporting and distribution for these fact-checking platforms.

Consistent reports from interview subjects and corroborating document analysis indicating legal/censorship-related limitations on publishing and distribution.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... reporting frequency, distribution channels, and content choices

Political instability, legal pressure, and censorship strongly shape what platforms can investigate, publish, and access in the region.

Thematic findings from semi-structured interviews with platform staff and document analysis of public reports and policy statements across the three country cases.

medium negative Fact-Checking Platforms in the Middle East: A Comparative St... ability to investigate, publish, and access information

AI can augment measurement (e.g., collaboration patterns, output tracking) but if poorly designed may reinforce visibility biases that disadvantage remote workers.

Theoretical reasoning and literature citations about algorithmic bias and monitoring; illustrated with secondary examples rather than primary empirical tests.

medium negative The Sociology of Remote Work and Organisational Culture: How... measurement bias; differential visibility; career impacts for remote workers

Hybrid arrangements can exacerbate inequities in access to informal networks and career advancement, often privileging co-located or better-networked employees.

Theoretical integration of sociological and management studies with comparative case illustrations; secondary data examples referenced but no new causal empirical tests reported.

medium negative The Sociology of Remote Work and Organisational Culture: How... access to informal networks; promotion/career advancement rates

Hybrid and remote work create risks of professional invisibility, fragmented social networks, and unequal access to workplace social capital.

Literature synthesis and illustrative case studies drawn from secondary sources; qualitative/comparative case evidence rather than primary quantitative data.

medium negative The Sociology of Remote Work and Organisational Culture: How... professional visibility; social network cohesion; access to workplace social cap...

Traditional STP showed a 67% performance decline after six months in unstable market conditions.

Empirical observation reported in the study—likely derived from simulation scenarios and/or longitudinal analysis of behavioral data; precise data source (simulation vs. observed field data), statistical tests, and sample framing are not specified in the summary.

medium negative The Algorithmic Canvas: On the Autopoietic Redefinition of S... effectiveness/performance of traditional STP over time (decline over six months ...

The persistence of interpretive, human-in-the-loop evaluation implies ongoing labor requirements (annotation, sense-making, governance roles), affecting forecasts of automation and labor substitution in sectors adopting LLMs.

Interview reports describing continued manual work for evaluation tasks across participants; authors draw implications for labor demand.

medium negative Results-Actionability Gap: Understanding How Practitioners E... continued human labor requirements for evaluation

Automation and human–robot assemblages can reproduce subjugation and vulnerability affecting care workers and marginalized users, requiring attention to distributional justice and labor-market impacts.

Illustrative vignettes from healthcare robotics and literature synthesis on care ethics and labor impacts; no quantitative labor-market analysis presented.

medium negative Examining ethical challenges in human–robot interaction usin... distributional impacts on wages, bargaining power, welfare, and vulnerability of...

Legal liability regimes and insurance products may systematically under- or mis-assign costs of harm in socio-technical assemblages when primordial ethical demands are considered.

Conceptual argument and suggested modeling directions; no empirical simulation or insurance-market data presented.

medium negative Examining ethical challenges in human–robot interaction usin... accuracy of cost assignment in liability/insurance regimes for socio-technical h...

Treating responsibility as a Levinasian, asymmetrical moral obligation implies it operates as a non-contractible externality that markets and contracts may fail to internalize, creating persistent externalities in AI deployment that standard economic models may miss.

Theoretical implication derived from philosophical argument applied to economic concepts; suggested consequences but no formal models or empirical validation in the paper.

medium negative Examining ethical challenges in human–robot interaction usin... degree to which markets/contracts internalize asymmetrical moral obligations (th...

Simple pluralist or multi-principle balancing approaches risk reproducing structural subordination by failing to foreground the asymmetrical ethical demand toward vulnerable Others.

Normative critique supported by cross-disciplinary literature (care ethics, mediation, STS) and illustrative examples; no empirical test of pluralist approaches’ effects.

medium negative Examining ethical challenges in human–robot interaction usin... tendency of pluralist balancing approaches to reproduce structural subordination...

The Levinasian framework helps reveal how human–robot interactions can both expose and reproduce systemic vulnerabilities, subjugation, and unaddressed harms (termed 'Problem C' — attribution of responsibility and distributed agency).

Theoretical diagnosis supported by interdisciplinary literature synthesis and illustrative vignettes from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative prevalence data.

medium negative Examining ethical challenges in human–robot interaction usin... presence/manifestation of systemic vulnerabilities, subjugation, and unaddressed...

Capabilities and data advantages for certain vendors could lead to market concentration and platform dominance in AI-driven educational feedback.

Expert concern synthesized from the workshop of 50 scholars about market dynamics; theoretical warning without empirical market-structure analysis in the report.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... market concentration measures (market share, Herfindahl index); entry barriers; ...

Differential access to high-quality AI feedback systems and bias in training data can exacerbate educational inequalities and harm marginalized groups.

Expert consensus and thematic analysis from the 50-scholar workshop, raising equity and bias risks; no empirical subgroup effectiveness estimates included.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... access disparities; differential effectiveness by subgroup; measures of algorith...

Learners may over-rely on AI feedback or game systems to obtain desirable responses, reducing effortful learning.

Workshop participant concerns synthesized qualitatively; cited as risk and an open empirical question—no experimental data provided.

medium negative The Future of Feedback: How Can AI Help Transform Feedback t... learner reliance on AI (usage patterns); changes in effortful learning behaviors...

Reliance on single-agent outputs or non-diverse agent ensembles can understate substantive uncertainty and bias conclusions in automated policy evaluation or AI-assisted empirical research.

Observed substantial agent-to-agent variability (NSEs) in the experiment (150 agents) demonstrating that single-agent results do not capture between-agent methodological uncertainty; imbalance between model families further implies potential bias if only one family is used.

medium negative Nonstandard Errors in AI Agents degree to which single-agent point estimates fail to capture between-agent dispe...

The post-exemplar convergence largely reflected imitation of exemplar choices rather than demonstrated understanding or principled correction by agents.

Qualitative and behavioral analysis of agents' post-exposure outputs showing direct adoption of exemplar measures/procedures and lack of substantive justification or mechanistic reasoning indicating comprehension; inference based on content of agent code and writeups after exposure.

medium negative Nonstandard Errors in AI Agents qualitative indicators of reasoning/comprehension in agents' outputs (textual ju...

« Prev 1 2 3 … 30 31 32 … 61 62 Next »