Evidence (4333 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Governance Remove filter

Persistent Gini coefficients of 0.43 to 0.62 across all conditions indicate concentrated detection inequality.

Reported range of Gini coefficients from simulation experiments across conditions.

high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Gini Coefficient (detection distribution inequality)

Experiments reveal extreme and year-variant bias in Baltimore's detected mode, with mean annual DIR up to 15,714 in 2019.

Reported experimental result from simulations on Baltimore data giving mean annual DIR up to 15,714 for 2019.

high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... Disparate Impact Ratio (DIR)

We compute four monthly bias metrics across 264 city-year-mode observations: the Disparate Impact Ratio (DIR), Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score.

Statement of metrics computed and the number of observations (264 city-year-mode observations) reported in the paper.

high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... monthly bias metrics (DIR, Demographic Parity Gap, Gini, Bias Amplification Scor...

The study uses 145,000+ Part 1 crime records from Baltimore (2017–2019) and 233,000+ records from Chicago (2022), augmented with US Census ACS demographic data.

Reported dataset sizes and data sources in the paper (crime records from Baltimore and Chicago; ACS demographic augmentation).

high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... data sample size / dataset composition

We present a reproducible simulation framework that couples a Generative Adversarial Network (GAN) with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact.

Description of methods in paper: coupling a GAN (CTGAN) for synthetic crime generation with a Noisy OR detection/patrol model; method-level claim rather than a numerical result.

high positive Unmasking Algorithmic Bias in Predictive Policing: A GAN-Bas... bias propagation through enforcement pipeline (simulation framework)

Empirical simulations of five game scenarios (ranging from repeated prisoner's dilemma to stylized repeated marketing promotion games) validate the theoretical predictions: AI agents naturally exhibit the proposed reasoning patterns and attain stable equilibrium behaviors intrinsically.

Simulation experiments reported in the paper across five distinct game scenarios; these simulations are presented as empirical validation of the theoretical results.

high positive Reasonably reasoning AI agents can avoid game-theoretic fail... frequency/occurrence of stable equilibrium behaviors (Nash-like play) in simulat...

Relaxing the common-knowledge payoff assumption—allowing stage payoffs to be unknown and each agent to observe only its own privately realized stochastic payoffs—still yields the same on-path Nash convergence guarantee.

Theoretical extension/proof in the paper showing convergence results hold under private, stochastic stage payoffs (no common-knowledge of payoffs).

high positive Reasonably reasoning AI agents can avoid game-theoretic fail... on-path Nash convergence under private, stochastic payoffs

We prove that 'reasonably reasoning' agents—agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs—eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game.

Formal theoretical proof provided in the paper (mathematical analysis of agent belief-formation and best-response learning leading to on-path closeness to Nash equilibria).

high positive Reasonably reasoning AI agents can avoid game-theoretic fail... on-path proximity (weak closeness) to Nash equilibrium of the continuation game

Off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training.

Stated claim in the paper supported by a combination of theoretical results (formal proofs about convergence properties of 'reasonably reasoning' agents) and empirical simulations across five game scenarios (including repeated prisoner's dilemma and stylized repeated marketing promotion games).

high positive Reasonably reasoning AI agents can avoid game-theoretic fail... attainment of Nash-like play / strategic equilibrium (zero-shot)

This paper employs large language models to conduct semantic analysis on the text of annual reports from Chinese A-share listed companies from 2006 to 2024.

Methodological statement in the abstract describing use of LLM-based semantic analysis on annual report texts spanning 2006–2024.

high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... methodological approach (use of LLMs for semantic analysis)

The paper recommends that the government design targeted support tools to 'enhance market returns and alleviate financing constraints', adopt a differentiated regulatory strategy, and establish a disclosure mechanism combining 'professional identification and reputational sanctions' to curb peer AI washing behaviour.

Policy prescriptions derived from empirical findings and simulation results reported in the paper; presented as recommendations in the abstract.

high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... effectiveness of policy interventions in curbing AI washing and supporting green...

Simulation results indicate that a combination of policy tools can effectively improve market equilibrium (mitigating the negative effects of AI washing).

Simulation exercises reported in the paper (model specification not provided in abstract) testing policy tool combinations and their effects on market equilibrium.

high positive The Spillover Effects of Peer AI Rinsing on Corporate Green ... market equilibrium (improvement in market outcomes related to AI washing and gre...

The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.

Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).

high positive Experimentalism beyond ex ante regulation: A law and economi... existence and articulation of design principles for RSs

Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.

Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).

high positive Experimentalism beyond ex ante regulation: A law and economi... adoption/emergence of RSs as a governance mechanism for AI

External inputs that bypass internal filtering shorten recognition delays (i.e., speed up detection of regime shifts).

Model extensions/analysis showing that when some inputs are allowed to bypass internal exclusion mechanisms, the dynamics of anchor updating detect regime changes faster; result comes from theoretical model manipulations, not empirical testing.

high positive Cohesion as Concentration: Exclusion-Driven Fragility in Fin... time to recognize regime shift (recognition delay)

Immediate practical steps include improved documentation, stakeholder audits, and multi‑metric evaluation; medium‑term steps include standards for participatory evaluation and tooling for transparency and monitoring; long‑term steps include institutional governance, interoperable safety APIs, and public‑interest evaluation infrastructure.

Prescriptive roadmap in the paper based on conceptual analysis and prior literature; these are recommended policy/program milestones rather than empirically validated interventions.

high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... implementation status of the recommended immediate, medium‑term, and long‑term a...

Transparency (detailed documentation of data, objectives, evaluation processes, and deployment constraints; audit and contest mechanisms) is a necessary mechanism for accountable alignment.

Normative and practical argumentation supported by prior work on model cards, documentation standards, and auditing; no new audits are presented in the paper.

high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... availability and granularity of documentation and auditability of model developm...

Pluralistic evaluation—using multiple, diverse evaluation criteria and stakeholder‑informed metrics rather than single aggregated alignment scores—will better capture the values and harms at stake.

Argumentative rationale and literature synthesis advocating multi‑metric evaluation approaches; examples from prior evaluation critiques are referenced rather than new empirical comparison.

high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... evaluation coverage of diverse values, harms, and stakeholder perspectives

The Flourishing–Justice–Autonomy (FJA) framework should guide alignment efforts, emphasizing (1) Flourishing (human well‑being and meaningful opportunities), (2) Justice (distributional fairness and protection of vulnerable groups), and (3) Autonomy (informed choice and user control).

Prescriptive proposal grounded in conceptual analysis and synthesis of ethical and technical literature; the paper defines and motivates the three principles as its core normative contribution.

high positive LLM Alignment should go beyond Harmlessness–Helpfulness and ... alignment criteria operationalized as Flourishing, Justice, and Autonomy metrics...

The positive spillover effects of CAFTA on third‑country agricultural imports are concentrated in medium and large firms.

Heterogeneity analysis using firm‑size subgroup DID estimates derived from the China Industrial Enterprise Database (2000–2014) showing stronger effects for medium and large enterprises.

high positive How regional trade policy uncertainty affects agricultural i... firm‑level import increases from third countries, by firm size (medium/large vs ...

CAFTA induced spillovers that significantly increased China's agricultural imports from non‑ASEAN (third) countries.

Difference‑in‑differences (DID) estimation exploiting CAFTA as an exogenous shock; import outcomes drawn from China Customs Database 2000–2014; robustness checks reported (mediator tests and subgroup analyses).

high positive How regional trade policy uncertainty affects agricultural i... China's agricultural imports from non‑ASEAN countries (import volumes/values)

The report issues seven policy recommendations grouped into three goals: (1) improve understanding of the emerging threat, (2) strengthen defenses, and (3) ensure responsible development and deployment.

Policy synthesis based on threat analysis and governance review (report-authored recommendations; descriptive).

high positive Highly Autonomous Cyber-Capable Agents: Anticipating Capabil... adoption and implementation of the seven recommended policy actions

The main results are robust to inclusion of firm, industry, and year fixed effects, DID identification using the 2018 SCD pilot, and multiple robustness checks addressing potential confounders and endogeneity.

Authors report baseline regressions with firm/industry/year fixed effects, DID specifications exploiting the 2018 Supply Chain Innovation and Application Pilot Program as a quasi-natural experiment, and a battery of robustness tests (alternative specifications, controls, and checks).

high positive Supply Chain Digitalization and its Impact on Green Innovati... robustness of estimated SCD effects on corporate green innovation

The positive effect of SCD on green innovation is stronger for substantive green innovation (actual environmentally beneficial R&D and technologies) than for strategic green innovation (symbolic/labeling or reputation‑oriented activities).

Heterogeneous outcome analysis splitting green innovation into 'substantive' (e.g., green patents, technological R&D outputs) versus 'strategic' (signaling/compliance indicators); regression and DID estimates show larger and statistically significant coefficients for substantive measures compared to smaller or weaker effects on strategic measures.

high positive Supply Chain Digitalization and its Impact on Green Innovati... substantive green innovation (green patents, concrete environmental R&D outputs)...

Supply chain digitalization (SCD) significantly increases corporate green innovation among Chinese A-share listed firms (2012–2022).

Panel analysis of Chinese A-share listed firms over 2012–2022 using regression models with firm, industry, and year fixed effects; difference-in-differences (DID) identification exploiting the 2018 Supply Chain Innovation and Application Pilot Program as an exogenous shock to SCD; firm-level controls included; multiple robustness checks reported.

high positive Supply Chain Digitalization and its Impact on Green Innovati... corporate green innovation (aggregate measures of green innovation such as green...

Algorithmic transparency and interpretability are important so investors and regulators can understand how ESG inputs affect automated decision systems.

Normative recommendation grounded in literature on model risk, accountability, and regulatory needs; not an empirical finding but a consensus implication of reviewed work.

high positive SUSTAINABILITY ISSUES IN FINANCIAL ACCOUNTING RESEARCH model interpretability / stakeholder understanding / accountability

Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.

Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.

high positive The Evolution and Societal Impact of Artificial Intelligence... empirical evidence and standardized metrics for AI impacts (productivity, labor-...

AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.

Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.

high positive The Evolution and Societal Impact of Artificial Intelligence... technological evolution and cross-sector adoption (foundational-technology statu...

MYRIAD-EU synthesizes progress and remaining challenges and proposes concrete directions for continued research and practice in multi-hazard, multi-risk DRR.

Overall project scope: synthesis and reflection on interdisciplinary research and practice conducted across MYRIAD-EU (2021–2025), as reported in the paper.

high positive Reducing risk together: moving towards a more holistic appro... existence of a consolidated synthesis and recommended research/practice directio...

MYRIAD-EU conducted in-depth, place-based case studies co-produced with local stakeholders to test methods and tools for multi-risk assessment.

Reported methods include in-depth place-based case studies co-produced with local stakeholders as part of MYRIAD-EU activities (2021–2025).

high positive Reducing risk together: moving towards a more holistic appro... testing and validation of methods and tools via co-produced case studies

The main results are robust to inclusion of controls and a range of heterogeneity and moderation checks, supporting that findings are not driven by simple time trends or obvious confounders.

Reported robustness checks in the staggered-DID framework (control variables, alternative specifications, subgroup tests) and discussion of parallel-trends assumption.

high positive How Does Urban Green Data Center Policy Empower Corporate En... corporate energy utilization efficiency (stability of estimated policy effect ac...

Implementation of urban green data center pilot policies leads to measurable improvements in firms' energy utilization efficiency.

Staggered-adoption difference-in-differences (DID) using an unbalanced firm–year panel of Chinese A-share listed firms linked to prefecture-level cities (2012–2024); treatment is timing/location of urban green data center pilot designation; results reported as statistically significant and robust to controls and alternative specifications.

high positive How Does Urban Green Data Center Policy Empower Corporate En... corporate energy utilization efficiency

Mechanisms linking digital services to export performance include reduced transaction and search costs, platform network and scale effects, data as an input improving service quality and customization, and task‑level specialization changing comparative advantage.

Conceptual/theoretical synthesis drawing on multiple strands of literature and illustrative case studies presented in the review (no new causal identification).

high positive Analysis of Digital Services Trade and Export Competitivenes... export performance of digital services (via transaction costs, service quality, ...

Digital services trade is shifting from traditional cross‑border delivery toward online, platform‑based models, with cross‑border data flows a core input and determinant of competitiveness.

Integrative literature and policy review synthesizing domestic and international studies; theoretical/conceptual synthesis and cited case examples (no new econometric analysis or primary microdata).

high positive Analysis of Digital Services Trade and Export Competitivenes... mode of digital services delivery and export competitiveness (role of platforms ...

Policy recommendations include standards on explainability, audit trails, certification for finance/tax AI systems, stronger data governance, and public–private coordination to update regulatory guidance.

Paper's policy and governance recommendations drawn from case findings and literature synthesis; prescriptive content rather than evaluated interventions.

high positive Explore the Impact of Generative AI on Finance and Taxation existence/adoption of standards, improvements in regulatory clarity and complian...

Deployments should build governance, explainability, and auditability into systems and start with pilots on high-volume, well-structured tasks before scaling.

Paper recommendations based on case experience and analytic framing; advocated strategy rather than empirically validated at scale within the paper.

high positive Explore the Impact of Generative AI on Finance and Taxation deployment success rate, governance completeness, pilot-to-scale learning outcom...

To mitigate risks and realize benefits, AI systems in finance/tax should combine AI with human-in-the-loop controls and clear escalation paths.

Prescriptive recommendation grounded in case lessons and literature on safe AI deployment; presented as a best-practice guideline rather than tested intervention.

high positive Explore the Impact of Generative AI on Finance and Taxation safety/accuracy of outputs, reduction in erroneous autonomous actions

Technical building blocks leveraged in these deployments include large language models (LLMs), OCR plus structured information extraction, retrieval-augmented generation (RAG) and knowledge bases, and process automation/RPA.

Explicit technical characteristics section and case descriptions in the paper identify these components as core to implementations.

high positive Explore the Impact of Generative AI on Finance and Taxation capability enabling: natural language understanding, document extraction accurac...

Generative AI is used for risk control and audit functions, including real-time monitoring, fraud detection, KYC/AML screening, and automated exception reporting.

Reported use-cases in the two case organizations and corroborating industry reports discussed in the literature review portion of the paper.

high positive Explore the Impact of Generative AI on Finance and Taxation timeliness of monitoring, fraud detection rate, KYC/AML screening coverage, exce...

For tax declaration, generative AI enables extraction of tax-relevant facts from invoices and contracts, drafting of tax returns, compliance checks, and scenario simulations.

Case examples and literature synthesis describing OCR + information extraction and LLM-assisted drafting workflows used in practice.

high positive Explore the Impact of Generative AI on Finance and Taxation accuracy and speed of tax fact extraction, draft return quality, compliance-chec...

Generative AI is applied to fund management tasks such as cashflow forecasting, anomaly detection, and automated workflows for payments and collections.

Case descriptions and technical mapping in the paper showing implementations at the sharing center and professional services firm level.

high positive Explore the Impact of Generative AI on Finance and Taxation cashflow forecast accuracy, anomaly detection precision/recall, automation rate ...

Accounting automation use-cases include automated bookkeeping, reconciliations, journal entry suggestion, and error detection using LLMs and document understanding.

Detailed scope mapping and case examples in Xiaomi and Deloitte illustrating these accounting applications; supported by literature review of technical capabilities.

high positive Explore the Impact of Generative AI on Finance and Taxation functionality/performance in accounting tasks: bookkeeping accuracy, reconciliat...

Realizing those AI-driven gains in Vietnam requires legal and institutional redesigns.

Close reading of Vietnam's constitutional provisions, administrative statutes, procedural rules and judicial doctrine (doctrinal legal analysis) combined with comparative lessons from other jurisdictions; no quantitative data.

high positive ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... feasibility of AI deployment (legal/institutional compatibility enabling efficie...

A supplemental theological differentiator probe achieved perfect rank-order agreement between the two ceiling judges (Spearman rs = 1.00), supporting judge reliability for the ceiling probe.

Reported Spearman rank correlation rs = 1.00 between Gemini Pro and Copilot Pro on the theological differentiator probe used as a reliability check.

high positive Literary Narrative as Moral Probe : A Cross-System Framework... Spearman rank-order agreement (rs) between the two ceiling judges on the theolog...

Rigorous research priorities include randomized controlled trials with long-run follow-ups, cost-effectiveness studies, structural adoption models, and validated metrics for feedback quality and learning durability.

Actionable research recommendations produced by the 50-scholar interdisciplinary meeting; prescriptive synthesis rather than empirical results.

high positive The Future of Feedback: How Can AI Help Transform Feedback t... existence and quality of RCTs and long-run studies; availability of validated me...

Different model families (Sonnet 4.6 vs. Opus 4.6) exhibit stable, systematic differences in methodological preferences and choice patterns—distinct empirical 'styles'.

Comparison of choice patterns and methodological decisions across agents instantiated with Sonnet 4.6 versus Opus 4.6 within the 150-agent experiment, showing consistent between-family differences in measure selection and estimation procedures.

high positive Nonstandard Errors in AI Agents frequency/distribution of methodological choices by model family (categorical ch...

Agents split on measure choice (e.g., autocorrelation vs. variance-ratio tests; dollar-volume vs. share-volume measures), producing different substantive estimates from the same raw data and hypotheses.

Observed categorical divergences in measure selection across the 150 agents during independent analyses of SPY TAQ (2015–2024); documented alternative test/measure families and corresponding divergent effect estimates for the six hypotheses.

high positive Nonstandard Errors in AI Agents measure selection (categorical) and resulting substantive effect estimates (cont...

AI-to-AI variation (nonstandard errors, NSEs) across autonomous coding agents produces substantial uncertainty in empirical results analogous to human researcher heterogeneity.

Experimental results from 150 autonomous Claude Code agents (two model families: Sonnet 4.6 and Opus 4.6) independently analyzing the same SPY TAQ data (NYSE TAQ, 2015–2024) on six pre-specified hypotheses; recorded agent-to-agent variation in methodological choices and resulting effect estimates (dispersion measured via IQR and related diagnostics).

high positive Nonstandard Errors in AI Agents agent-to-agent variation in methodological choices and effect estimates (dispers...

Observations span multiple agent platforms (Moltbook, The Colony, 4claw) with more than 167,000 agents interacting as peers.

Author-reported coverage from naturalistic observations across the named platforms during the one-month observation window; count reported as ≈167k agents.

high positive When Openclaw Agents Learn from Each Other: Insights from Em... number of agents observed interacting as peers

Structured argumentation frameworks make chains of inference inspectable and machine-checkable, improving transparency and verifiability of AI outputs.

Argument from formal properties of AFs and representation; no empirical user studies but relies on known formal semantics.

high positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... inspectability/traceability of inference chains (auditability)

« Prev 1 2 3 … 29 30 31 … 86 87 Next »