Evidence (7395 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Adoption Remove filter

Operationalizing hardware-based governance must address transition realities including legacy hardware, attestation at scale, and protection of civil liberties.

Policy implementation analysis in the paper identifying practical challenges to deploying hardware-layer controls (conceptual/operational analysis; no empirical trial data provided).

high mixed The Open-Weight Paradox: Why Restricting Access to AI Models... practical hurdles to governance deployment (legacy hardware, attestation scalabi...

For LLM agents, memory management critically impacts efficiency, quality, and security.

Statement in paper framing and motivation; supported conceptually by literature linking memory design to system properties (no specific experimental details provided in abstract).

high mixed FSFM: A Biologically-Inspired Framework for Selective Forget... efficiency, content quality, and security of LLM agents

Coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code ("vibe coding"), while in 23%, humans write all code themselves.

Empirical analysis of authorship attribution across the 6,000 sessions in the SWE-chat dataset; percentages derived from session-level classification.

high mixed SWE-chat: Coding Agent Interactions From Real Users in the W... distribution of code authorship across sessions (agent-dominant vs human-only se...

A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but DPM exposes one nondeterministic call while summarization exposes N compounding calls.

Determinism experiment with 10 replays per case at temperature zero; qualitative/quantitative observation about number of nondeterministic LLM calls exposed by each architecture.

high mixed Stateless Decision Memory for Enterprise AI Agents system nondeterminism / number of nondeterministic LLM calls exposed per decisio...

Advanced prompting methods improve accuracy on inconclusive cases but over-correct, withholding decisions even on clear cases.

Empirical comparison of prompting methods reported in paper: advanced prompts increased accuracy on inconclusive (insufficient-information) cases but led to excessive deferral/withholding on clear cases.

high mixed Learning When Not to Decide: A Framework for Overcoming Fact... accuracy on inconclusive cases and rate of withholding/deferral on clear cases

There is significant heterogeneity in methodological rigor across studies.

Authors' thematic observation from quality appraisal/extraction noting wide variation in methods, validation approaches, and reporting standards among the 64 studies.

high mixed AI-Driven Financial Risk Management and Decision Intelligenc... methodological rigor/quality of studies

AI is increasingly being integrated into both existing and newly emerging digital infrastructures, altering their architecture, functional role, and strategic significance as these systems begin to operate as embedded cognitive infrastructures shaping knowledge production, decision-making, and institutional processes.

Conceptual and descriptive claim presented by the paper (theoretical analysis/literature-informed observation). No empirical sample size or quantitative methods reported in the provided text.

high mixed Digital Sovereignty in the Global Cognitive-Informational Or... change in the architecture/role of digital infrastructures and their effect on k...

Hybrid ML+rules systems achieve partial DES-property fillability.

Result of the paper's analytic comparison across the four architectures identifying relative fillability levels for hybrid ML+rules systems.

high mixed Governed Auditable Decisioning Under Uncertainty: Synthesis ... DES-property fillability

Open-source versus closed-source trade-offs (including deployment architectures and competitive differentiation) are a central strategic consideration when selecting an enterprise LLM approach.

Paper's comparative analysis of open-source and closed-source alternatives and discussion of strategic implications; supported by the Bills Converter design rationale.

high mixed Buy Or Build? A Practitioner’s Framework for Large Language ... strategic positioning / competitive differentiation from LLM architecture choice

AI is becoming a geopolitical tool that defines trade, finance, supply chains, surveillance abilities, and diplomatic bargaining power.

Conceptual/qualitative synthesis in the paper's argument; no empirical methods or sample size reported in the abstract.

high mixed ARTIFICIAL INTELLIGENCE AND THE WEAPONIZATION OF ECONOMIC IN... influence over trade, finance, supply chains, surveillance capabilities, and dip...

The proposed safety-filter outperforms a standalone deep reinforcement learning-based controller in energy and cost metrics, with only a slight increase in comfort temperature violations.

Reported experimental comparison between the safety-filter-enhanced controller and a standalone DRL controller in the paper; specific metrics and sample size not provided in the excerpt.

high mixed Safe Deep Reinforcement Learning for Building Heating Contro... energy metrics, cost metrics, and comfort temperature violations

Confirmatory Factor Analysis (CFA) and Structural Equation Modeling (SEM) verified correlations among educational background, gender inclusiveness, digital literacy, and perceived algorithmic fairness.

Paper reports use of CFA and SEM to test relationships among those variables; reliability/fit supported by Composite Reliability (CR), Average Variance Extracted (AVE), and model-fit indicators.

high mixed A Machine Learning Perspective on FinTech-Driven Inclusion: ... correlations among educational background, gender inclusiveness, digital literac...

Benefits of technology and data analytics are context-dependent, with emerging markets facing unique regulatory and infrastructural barriers.

Narrative synthesis of included studies noting heterogeneity by context and reports of regulatory/infrastructural constraints in emerging markets.

high mixed The Use of Technology and Data Analytics in Modern Auditing:... realized benefits / adoption in varying contexts

Cybersecurity has a moderating effect on audit data analytics.

Synthesis statement in the review summarizing included studies that report cybersecurity influences the effectiveness/usability of audit data analytics.

high mixed The Use of Technology and Data Analytics in Modern Auditing:... effectiveness of audit data analytics

Digitization is reshaping the structures of Resource Dependence Theory (RDT) instead of eliminating it completely (Yordanova & Hristozov, 2025).

Conceptual/theoretical claim supported by citation to Yordanova & Hristozov (2025); presented as an interpretive conclusion about how digitization interacts with organizational dependence structures. No empirical details provided in the excerpt.

high mixed Re-Evaluation of Resource Dependence in AI Enabled SME Finan... structure of resource dependence / organizational dependence on external resourc...

Outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints.

Argument/assertion in paper framing motivations for Marketplace Evaluation; conceptual reasoning listing mechanisms (user switching, routing, operational constraints); no empirical tests or sample size reported.

high mixed Evaluation of Agents under Simulated AI Marketplace Dynamics post-deployment system outcomes (e.g., success influenced by competition factors...

Alignment operates as a two-way translation, where models are made 'safe for worlds' while those worlds are reshaped to be 'safe for models.'

Conceptual claim supported by ethnographic examples illustrating reciprocal adaptations between models and social/institutional contexts in Nairobi's credit-scoring ecosystem.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya reciprocal adjustments between predictive models and social/institutional enviro...

Algorithmic credit scoring is accomplished through the ongoing work of alignment that stabilizes risk under conditions of persistent uncertainty, taking epistemic, modeling, and contextual forms.

The paper's theoretical argument grounded in nine-month ethnographic observations and analysis of how practitioners and institutions engage in alignment work across epistemic, modeling, and contextual dimensions.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya alignment practices that stabilize risk amid uncertainty (epistemic, modeling, c...

Practitioners negotiate model performance via technical and political means.

Observational data from the ethnography showing technical adjustments, benchmarks, and political negotiation (e.g., with regulators or management) to establish acceptable performance.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya practices used to achieve and justify model performance (technical tuning and po...

Practitioners formulate risk through multiple interpretations.

Ethnographic evidence from interviews and observations indicating that risk is characterized differently across actors (technical, legal, business interpretations).

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya variation in definitions and framings of risk among practitioners

Practitioners construct alternative data using technical and legal workarounds.

Field observations and interviews showing practitioners employing technical methods and legal strategies to create or repurpose alternative data sources for credit scoring.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya practices for generating and using alternative data in credit models

Algorithmic credit scoring is being transformed by new actors, techniques, and shifting regulations.

Ethnographic fieldwork documenting the entry of new actors, novel technical techniques, and regulatory changes affecting credit scoring in Nairobi's digital lending ecosystem.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya structural transformation of algorithmic credit scoring (actor composition, tech...

Credit scoring is an increasingly central and contested domain of data and AI governance.

Nine-month ethnography of credit scoring practices in Nairobi, Kenya; participant observation and interviews across stakeholders in digital lending.

high mixed Risk, Data, Alignment: Making Credit Scoring Work in Kenya role of credit scoring in data and AI governance (centrality and contestedness)

The local labor market will follow a dual trajectory: low-skill, routine jobs face high automation risk while demand will rise for AI-collaborative, higher-skill roles.

Paper's analytical prediction based on distinguishing current job roles into routine/repetitive vs cognitive/non-routine and projecting likely impacts; no numeric forecasts or sample sizes provided in the excerpt.

high mixed PREDICTING THE FUTURE OF JOBS IN NAGPUR DISTRICT MIDC: THE R... combined job displacement for routine roles and increased demand for AI-collabor...

Professional and Technical Services, Information, and Finance and Insurance account for approximately 86 percent of the base-case direct contribution.

Sectoral decomposition of base-case direct contribution in the model; paper explicitly reports the three sectors' combined share as ~86%.

high mixed AI Capex Is Justified: A Bottom-Up Sectoral Estimate of Arti... share of base-case direct GDP contribution by sector (three-sector concentration...

The inverted U-shaped pattern between AI knowledge stickiness and technological concentration is more clearly detected in eastern cities and in small and medium-sized cities; in large cities the quadratic term is not statistically significant.

Heterogeneity/subsample regressions by region (east vs. other) and city size categories within the city-year panel (2014–2023); statistical significance of quadratic term differs across subsamples.

high mixed Knowledge stickiness and technological concentration in the ... technological concentration (presence and significance of nonlinear relationship...

Technological complexity moderates the nonlinear (inverted U) association between AI knowledge stickiness and technological concentration by altering its strength and curvature rather than producing a simple, uniform shift in the turning point.

Interaction/heterogeneity analyses in the two-way fixed-effects city-year panel (2014–2023), examining moderating role of a technological complexity measure on the quadratic association.

high mixed Knowledge stickiness and technological concentration in the ... technological concentration (degree and curvature of the stickiness–concentratio...

There is an inverted U-shaped association between AI knowledge stickiness and technological concentration: higher stickiness up to a limit leads to more concentration and thereafter the opposite.

City-year panel combining AI patent applications with urban statistics for 2014–2023; two-way fixed-effects regression showing a significant positive linear and negative quadratic term (nonlinear association).

high mixed Knowledge stickiness and technological concentration in the ... technological concentration (allocation of AI activity across sub-technology bra...

Subjectivity persisted in AI-powered recruitment decisions; human judgment remained an important factor.

Theme 2 (subjectivity in AI-powered recruitment) from interviews indicating retained human subjectivity and judgement in recruitment processes (n = 22).

high mixed The augmented recruiter: examining AI integration and decisi... degree_of_subjectivity_in_decision_making

Big data analytics (BDA) adoption is a risky strategy with potentially high rewards for start-ups.

Stated as a summary conclusion based on empirical analysis of a large sample of start-ups in Germany comparing adopters and non-adopters across multiple performance measures (survival, costs, sales, employee growth, access to financing).

high mixed Big data-based management decisions and start-up performance overall performance/risk–reward tradeoff

Bounded agents act as an amplifying but not necessary extension to the foundation-model stack for changing work coordination.

Conceptual argument within the paper distinguishing bounded agents from the core stack; no empirical comparison or measurement reported.

high mixed Remote-Capable Knowledge Work Should Default to AI-Enabled F... role of bounded agents in amplifying coordination impacts

The spatial spillover effects are geographically constrained and vary significantly across regions.

Reported heterogeneity in spatial Durbin model results and discussion of geographic constraint and inter-regional variation (regional heterogeneity analysis).

high mixed Research on the Pathways and Spatial Effects of Digital–Inte... heterogeneity of spatial spillover effects on carbon intensity across regions

The effects of generative AI on work and organisations are heterogeneous and context-dependent, shaped by job roles, skill levels, and institutional environments.

Synthesis across the included studies noting variation in outcomes conditional on role, skill, and institutional context.

high mixed Generative AI in the Workplace: A Systematic Review of Produ... heterogeneity of AI effects across roles/skills/institutions

Overall, AI emerges as a transformative but context-dependent tool for business decision-making in Latin America.

The authors' overall interpretation and synthesis of the 27 reviewed studies highlighting variable outcomes depending on context and readiness.

high mixed Artificial Intelligence for Business Decision-Making in Lati... overall impact of AI on business decision-making (transformative effect conditio...

Although the concurrent paradigm performs worse than the sequential paradigm in terms of immediate task performance, it is more effective in promoting users' emotional trust.

Comparison between concurrent and sequential AI-assisted decision-making paradigms in the RCT (N=120); authors report concurrent < sequential for immediate task performance, but concurrent > sequential for emotional trust.

high mixed How AI-Assisted Decision-Making Paradigms and Explainability... immediate task performance (negative) and emotional trust (positive)

AI adoption outcomes depend on organizational routines, data arrangements, accountability structures, and public values.

Empirical and theoretical literature review and argument in the article drawing on scholarship in digital government and public-sector technology adoption.

high mixed Governing frontier general-purpose AI in the public sector: ... determinants of AI adoption in government (organizational, data, accountability,...

If employment losses are relatively small and productivity gains are realised, AI adoption could boost Exchequer revenues. But if job displacement is sizeable, tax receipts fall while welfare spending rises, resulting in potentially large pressures on the public finances.

Conditional fiscal scenarios simulated in the report combining employment, wage and benefit changes with the public finance implications (tax receipts and welfare spending); reported as scenario-based outcomes.

high mixed Artificial Intelligence and income inequality in Ireland Exchequer revenues / tax receipts and welfare spending

Ireland’s tax and welfare system absorbs most of the income loss for lower income households, and roughly half of the loss for households at the top of the income distribution.

Microsimulation using SWITCH to model taxes and transfers applied to simulated income changes across income groups; reported as a finding in the report.

high mixed Artificial Intelligence and income inequality in Ireland net income after taxes and transfers (absorption of income loss)

India exhibits a distinctive polarisation pattern: a shrinking middle-skill workforce alongside a persistently large low-skill labour segment.

Descriptive analysis of secondary data and official reports from 2020–2024 comparing occupational and skill distributions in India.

high mixed Artificial Intelligence and labour market polarisation in In... changes in the share of labour across skill bands (middle vs low skill)

Mathematics (SAFI: 73.2) and Programming (71.8) receive the highest automation feasibility scores; Active Listening (42.2) and Reading Comprehension (45.5) receive the lowest.

SAFI benchmark results reported for specific O*NET skills (numerical SAFI scores provided in the paper).

high mixed The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... SAFI score by skill (automation feasibility)

Only a small subset of LLM retailers can consistently achieve capital appreciation, while many hover around the break-even point.

Empirical results from the 20-agent benchmark experiments reported in the paper, contrasting capital appreciation for winners vs break-even for many agents.

high mixed Market-Bench: Benchmarking Large Language Models on Economic... capital appreciation / agent profitability

Benchmarking on 20 open- and closed-source LLM agents reveals significant performance disparities and a winner-take-most phenomenon.

Empirical evaluation described in the paper using 20 LLM agents (open- and closed-source); results reported show uneven performance distribution.

high mixed Market-Bench: Benchmarking Large Language Models on Economic... performance (financial/competitive outcomes of retailer agents)

Tool developers, users, and social scientists conceptualize 'context' differently, and these divergent conceptualizations reveal specific pitfalls inherent in computational approaches to context.

Analytic comparison across stakeholder perspectives derived from interviews and conceptual analysis in the paper (qualitative evidence; sample size unspecified).

high mixed Context Collapse: Barriers to Adoption for Generative AI in ... differences in conceptual definitions and the resulting pitfalls for computation...

AI adoption significantly reshaped task profiles for 73% of respondents, particularly affecting routine data processing, administrative tasks, and scheduling activities.

Survey data and secondary data analysis reported in this study (sample size not stated); self-reported change in task profiles with reported percentage (73%).

high mixed Artificial Intelligence Adoption and Career Reconfiguration ... task profile change (impact on routine data processing, administrative tasks, sc...

There is a robust inverted U-shaped relationship between robotics manufacturing development and urban carbon emissions.

Panel data analysis using 277 Chinese prefecture-level cities from 2008 to 2019; econometric analysis reported in the paper finds an inverted U-shaped association and robustness checks are claimed.

high mixed Exploring the nonlinear relationship between robotics manufa... urban carbon emissions

AI adoption across firms is heterogeneous, varying across sectors such as finance, technology, and manufacturing.

Survey of 150 leading Nigerian firms across finance, tech, and manufacturing showing variation in AI integration; supported by qualitative interviews and policy analysis.

high mixed Human Capital and the AI-Powered Future of Work: (Training, ... heterogeneity in AI adoption across firms/sectors

The rapid, heterogeneous integration of Artificial Intelligence (AI) technologies is profoundly reshaping the dynamics of work across the Nigerian business sector, generating both significant economic opportunities and acute labor market challenges.

Mixed-methods study combining a quantitative survey of 150 leading Nigerian firms across finance, tech, and manufacturing and qualitative analysis of government policy and workforce interviews.

high mixed Human Capital and the AI-Powered Future of Work: (Training, ... dynamics of work (economic opportunities and labor market challenges)

Both rapid model improvement and benchmark quality issues contributed to underestimating agent capabilities.

Synthesis of results: improved LLM performance plus audit findings showing benchmark errors together explain the prior underestimation; based on the re-evaluation and audit described in the paper.

high mixed ELT-Bench-Verified: Benchmark Quality Issues Underestimate A... factors contributing to underestimation of agent capabilities (model improvement...

Models performed well on commonly discussed topics but struggled with specialized health data.

Task-level performance comparison across topics in the elicited population statistics: better accuracy on commonly discussed topics, poorer performance on specialized health data tasks.

high mixed Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... topic-specific estimation accuracy

In a preliminary experiment, giving models web search access degraded predictions for already-accurate models, while modestly improving predictions for weaker ones.

A preliminary comparative test where some models were given web search access and changes in predictive performance were observed: degradation for already-accurate models and modest improvement for weaker models.

high mixed Bayesian Elicitation with LLMs: Model Size Helps, Extra "Rea... change in predictive accuracy with web search access

« Prev 1 2 3 4 … 147 148 Next »