Evidence (13870 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	196	98	892	1984
Governance & Regulation	817	394	188	121	1544
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	627	233	123	96	1088
Research Productivity	411	123	56	332	933
Output Quality	467	178	59	47	751
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	167	122	24	496
Task Allocation	207	64	71	32	379
Skill Acquisition	165	59	60	17	301
Innovation Output	203	27	43	18	292
Employment Level	105	52	107	13	279
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	150	48	26	3	227
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	63	20	12	184
Error Rate	69	92	10	2	173
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	93	21	13	19	148
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Creative Output	31	17	7	3	59
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

This work offers a cost-effective, scientifically grounded blueprint for ubiquitous AI education.

Authors' concluding statement based on the SOP, low labor/hardware claims, and the pilot exam results showing high accuracy with the Shadow Agent in newer 32B models.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... scalability/adoption potential of AI tutors

This suggests that structured reasoning guidance (as implemented by the Shadow Agent) is the key to unlocking the latent power of modern small language models.

Interpretive claim based on the pilot study's observed large gains for newer 32B models when using Shadow Agent guidance versus smaller gains for older models and stagnation in baselines.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... model capability unlocking (qualitative interpretation tied to accuracy gains)

In contrast, older models see only modest gains (~10%) from the Shadow Agent guidance.

Same pilot study reporting that older (unspecified) model generations showed only about a ~10% improvement when using the Shadow Agent versus baseline. No exact accuracy numbers, sample size, or model names provided.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... change in exam accuracy (percentage point gain)

The Shadow Agent, which provides structured reasoning guidance, triggers a massive capability surge in newer 32B models, boosting performance from 74% (Naive RAG) to mastery level (90%).

Pilot study on a full graduate-level final exam reported comparisons between Naive RAG (74% accuracy) and the Shadow Agent (90% accuracy) for newer 32B models. Specific number of exam items or statistical testing not stated.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... exam accuracy (percentage correct)

We used a Vision-Language Model data cleaning strategy and a novel Shadow-RAG architecture as core technical components of the localization pipeline.

Methodological description in the practitioner report; the paper explicitly names these two techniques as the data-cleaning and architectural contributions used to create the tutor.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... methodological approach (data quality and retrieval-augmented architecture)

Using a Vision-Language Model data cleaning strategy and a novel Shadow-RAG architecture, we localized a graduate-level Applied Mathematics tutor using only 3 person-days of non-expert labor and open-weights 32B models deployable on a single consumer-grade GPU.

Practitioner report describing a replicable Standard Operating Procedure (SOP); method claims include Vision-Language Model data cleaning and Shadow-RAG; deployment described as using open-weight 32B models on a single consumer GPU; labor reported as '3 person-days of non-expert labor'. No sample size or independent replication reported in text.

high positive From 50% to Mastery in 3 Days: A Low-Resource SOP for Locali... deployment resource requirements (time/labor and hardware feasibility)

If you can prove the value and the effort behind API token spending (agent memory), you can resell it.

Normative/operational claim within the paper's proposal; presented as an implication of verifiable provenance and market layering, with no empirical proof or transactional data.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... resellability of artifacts derived from API token spending

Enabling timely memory transfer reduces repeated exploration.

Argument in the paper asserting that shared/tradable memory decreases redundant exploration; no experimental or observational data provided.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... frequency/amount of repeated exploration by agents

Together, clawgang and meowtrade transform one-shot API token spending into reusable and tradable assets.

High-level systems argument in the paper; no empirical measurements of reuse or tradability presented.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... conversion of one-shot API calls into reusable/tradable assets

Meowtrade is a market layer for listing, transferring, and governing certified memory artifacts.

Design proposal described in the paper; no pilot deployment, user adoption metrics, or experimental data provided.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... existence/functionality of a market layer for certified memory artifacts

Clawgang binds memory to verifiable computational provenance.

System/design claim describing the proposed mechanism (clawgang) in the paper; no implementation results or empirical validation reported.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... ability to cryptographically or procedurally link memories to provenance

Agent memory can serve as an economic commodity in the agent economy, if buyers can verify that it is authentic, effort-backed, and produced in a compatible execution context.

Conceptual argument in the paper's proposal; no empirical evaluation, sample size, or experiments reported.

high positive Infrastructure for Valuable, Tradable, and Verifiable Agent ... feasibility of agent memory becoming a tradable commodity

Economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.

General conclusion drawn from the paper's experimental findings: improvement in model predictions after fine-tuning on theory-derived synthetic data.

high positive GARP-EFM: Improving Foundation Models with Revealed Preferen... improvement in foundation-model prediction accuracy when using theory-generated ...

Fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study.

Empirical results comparing fine-tuned Chronos-2 to zero-shot Chronos-2 across multiple forecast horizons on the authors' experimental panel (no numeric metrics or sample sizes given in the excerpt).

high positive GARP-EFM: Improving Foundation Models with Revealed Preferen... forecast prediction accuracy across forecast horizons

The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers.

Empirical approach described in paper: model fine-tuned on synthetic GARP-consistent histories and then evaluated on real consumer choice data (supports claim that model transfers learned relations to predicting real choices).

high positive GARP-EFM: Improving Foundation Models with Revealed Preferen... model's ability to predict real consumer choices (use of learned price-quantity ...

GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently.

Methodological argument in the paper: authors use GARP as a constructive condition to generate synthetic time series from many utility functions (no numeric efficiency metrics provided in the excerpt).

high positive GARP-EFM: Improving Foundation Models with Revealed Preferen... feasibility/efficiency of generating synthetic time series from utility classes

Teaching them basic economic logic improves how they predict demand using an experimental panel.

Reported experimental results in the paper: fine-tuning models on synthetic, economics-consistent data and evaluating on an experimental panel of consumer demand (no numeric sample size or metrics provided in the excerpt).

high positive GARP-EFM: Improving Foundation Models with Revealed Preferen... prediction accuracy of consumer demand

AI adoption and the associated improved governance lead to higher total factor productivity (TFP).

Empirical analysis showing a positive association between firm-level AI application index and measures of total factor productivity in the 2010–2023 Chinese A-share panel.

high positive The risk-mitigation effects of artificial intelligence adopt... total factor productivity (TFP)

AI adoption and the associated improved governance lead to a lower cost of debt financing for firms.

Empirical tests linking firm-level AI application and governance improvements to measures of debt financing costs (e.g., interest rates on debt, financing spreads) in the Chinese A-share firm sample.

high positive The risk-mitigation effects of artificial intelligence adopt... cost of debt financing (interest rate/spread measures)

The governance risk-mitigation effects of AI operate through enhancing external monitoring.

Mechanism analyses showing that AI adoption is associated with measures of stronger external monitoring (e.g., analyst coverage, media scrutiny, regulator activity) in the firm-year panel, linking that channel to reduced misconduct.

high positive The risk-mitigation effects of artificial intelligence adopt... external monitoring intensity (analyst coverage, media/regulatory scrutiny proxi...

The governance risk-mitigation effects of AI operate through strengthening internal control capacity.

Mechanism analyses showing that higher AI application is associated with improved internal control measures (as reported by firms or regulatory/financial-control indicators) in the dataset of Chinese A-share firms.

high positive The risk-mitigation effects of artificial intelligence adopt... internal control capacity (corporate internal control metrics)

The governance risk-mitigation effects of AI operate through lowering agency costs.

Mechanism analyses reported by authors linking AI adoption to reductions in measures interpreted as agency costs (e.g., agency-cost proxies, corporate governance metrics) in the same firm-year panel.

high positive The risk-mitigation effects of artificial intelligence adopt... agency costs (proxied by governance/financial measures)

AI application significantly reduces the monetary amount of penalties associated with executive misconduct.

Regression analyses on monetary penalty data for Chinese A-share firms (2010–2023) showing a statistically significant negative relationship between firm AI application index and penalty amounts.

high positive The risk-mitigation effects of artificial intelligence adopt... monetary amount of penalties for executive misconduct

AI application significantly reduces the frequency (number) of violations by executives.

Empirical frequency/regression analyses on the firm-year panel of Chinese A-share firms using the AI application index; authors report robust reductions in the number/frequency of violations conditional on AI adoption.

high positive The risk-mitigation effects of artificial intelligence adopt... frequency (count) of executive violations

AI application significantly reduces the incidence of executive misconduct.

Empirical analysis on Chinese A-share listed firms (2010–2023) using the constructed firm-level AI application index; reported significant negative association between AI application and whether a firm experiences executive misconduct (incidence).

high positive The risk-mitigation effects of artificial intelligence adopt... incidence (occurrence) of executive misconduct

Using Chinese A-share firms listed in Shanghai and Shenzhen from 2010 to 2023, we construct a firm-level AI application index and examine whether and how AI adoption mitigates executive misconduct.

Authors report building a firm-level AI application index and applying it to Chinese A-share listed firms (Shanghai and Shenzhen) over 2010–2023 to study links between AI adoption and executive misconduct (method: panel analysis using firm-year observations).

high positive The risk-mitigation effects of artificial intelligence adopt... existence and measurement of firm-level AI application index; sample frame of Ch...

Applying our framework to product listings on Etsy, we find that following ChatGPT's release, listings have significantly more machine-usable information about product selection, consistent with systematic mecha-nudging.

Empirical analysis of Etsy product listings comparing measures of 'machine-usable information about product selection' before and after ChatGPT's release. (The abstract states a significant increase; full paper presumably contains dataset details and statistical tests, but sample size and exact estimates are not provided in the excerpt.)

high positive Mecha-nudges for Machines machine-usable information about product selection

Adoption of AI can reduce procurement costs by 15.7%.

Field survey data (n=326) and regression analysis; authors report a 15.7% reduction in procurement costs associated with AI adoption.

high positive Research on the Adoption of Artificial Intelligence and Proc... procurement costs

Adoption of AI can shorten the procurement decision-making cycle by 21.3%.

Field survey data (n=326) analyzed (authors report a 21.3% reduction in procurement decision-making cycle associated with AI adoption); method described as questionnaire surveys and multiple linear regression.

high positive Research on the Adoption of Artificial Intelligence and Proc... procurement decision-making cycle (time)

Supplier AI capability positively drives AI adoption in procurement (β = 0.28, p < 0.01).

Same questionnaire survey (n=326) and multiple linear regression analysis; reported coefficient β=0.28 with p<0.01.

high positive Research on the Adoption of Artificial Intelligence and Proc... AI adoption in procurement

Perceived usefulness positively drives AI adoption in procurement (β = 0.32, p < 0.01).

Questionnaire survey of 326 procurement managers/supply chain managers in SMEs (Yangtze River Delta and Pearl River Delta) analyzed using multiple linear regression; reported coefficient β=0.32 with p<0.01.

high positive Research on the Adoption of Artificial Intelligence and Proc... AI adoption in procurement

The paper provides recommendations for designing strategic indicators to drive adoption, foster innovation, and objectively assess whether digital tools are delivering top-line impact.

Descriptive claim about the content of the perspective article (the authors state they provide these recommendations); the excerpt itself summarizes this contribution.

high positive Strategic Key Performance Indicators for AI in Lead Optimiza... existence of recommended strategic KPIs intended to affect adoption, innovation,...

The shift from expert-driven computer-aided drug design (CADD) to semiautonomous AI necessitates a new framework of impact-oriented KPIs.

Stated by the EFMC2 community authors as a normative conclusion in the perspective piece; based on the characterisation of a technological shift rather than on presented empirical tests in the excerpt.

high positive Strategic Key Performance Indicators for AI in Lead Optimiza... need for new KPI frameworks to assess impact of semiautonomous AI in drug discov...

Harnessing AI's potential requires moving beyond measuring technical model performance (e.g., predictive accuracy) to measuring strategic impact.

Authors argue this as a conceptual requirement for realizing AI's benefits in R&D; presented as a recommendation rather than supported by quantified empirical evidence in the excerpt.

high positive Strategic Key Performance Indicators for AI in Lead Optimiza... usefulness of measurement approaches (technical model metrics versus strategic i...

Preliminary analyses suggest that 'AI-native' companies may be outpacing traditional peers.

Explicitly stated in the paper as based on preliminary analyses; the excerpt provides no details on the analyses, metrics, or sample sizes.

high positive Strategic Key Performance Indicators for AI in Lead Optimiza... relative performance of AI-native companies versus traditional peers (e.g., prod...

The broad introduction of AI into the R&D landscape over the last years holds the promise to lift pharmaceutical R&D out of its productivity problem.

Framed as an expectation/promise in the paper; based on recent broad adoption trends of AI in R&D (no specific empirical evaluation or sample size reported in the excerpt).

high positive Strategic Key Performance Indicators for AI in Lead Optimiza... potential improvement in pharmaceutical R&D productivity due to AI adoption

The visualization preserved human control.

Reported result from the within-subjects experiment (N=32) indicating that using the visualization did not reduce human control/agency in the negotiation process.

high positive From Overload to Convergence: Supporting Multi-Issue Human-A... human control / agency (measure not specified in abstract)

In the same within-subjects experiment (N=32), the visualization improved efficiency.

Within-subjects experiment (N=32) reported in the paper; the authors state the visualization improved efficiency (likely measured as time, number of rounds, or steps to reach agreement).

high positive From Overload to Convergence: Supporting Multi-Issue Human-A... efficiency of negotiation (e.g., time to agreement or number of rounds)

In a within-subjects experiment (N=32), the uncertainty-based visualization improved human outcomes.

Within-subjects user experiment reported in the paper with N=32 participants comparing performance with and without the visualization.

high positive From Overload to Convergence: Supporting Multi-Issue Human-A... human outcomes in negotiation (e.g., participant utility / negotiation score)

We introduce a novel uncertainty-based visualization driven by Bayesian estimation of agreement probability that shows how the space of mutually acceptable agreements narrows as negotiation progresses, helping users identify promising options.

Design and implementation of a visualization technique described in the paper; the visualization is driven by Bayesian estimation of agreement probability and is presented as a tool to reveal the shrinking feasible agreement space during negotiation.

high positive From Overload to Convergence: Supporting Multi-Issue Human-A... ability to identify promising agreement options (user decision support)

In this verifiable domain, simple arbitrage strategies generate net profit margins of up to 40%.

Empirical result from the SWE-bench case study comparing arbitrage strategy returns using GPT-5 mini and DeepSeek v3.2 (reported maximum net profit margin = 40%).

high positive Computational Arbitrage in AI Model Markets net profit margin of arbitrage strategies

Generative AI can autonomously produce novel content, including text, images, models, and scenarios.

General technical/descriptive claim stated in the paper's background/introduction; not an empirically tested claim within the provided excerpt.

high positive The Strategic Impact of Generative Artificial Intelligence o... autonomous generation of novel content (text, images, models, scenarios)

Generative AI facilitates the synthesis of structured and unstructured information from diverse sources, enabling managers to explore multiple decision pathways, identify potential risks, and optimize strategic choices.

Descriptive/functional claim made in the paper's introduction and conceptual framing; the empirical component (survey + SEM) is described generally but no specific measures or effect sizes for information synthesis or these capabilities are provided in the excerpt.

high positive The Strategic Impact of Generative Artificial Intelligence o... ability to synthesize information and support exploration of decision pathways (...

Generative AI augments human creativity by producing innovative solutions and scenario-planning alternatives that may not emerge through conventional analytical approaches.

Stated in the conceptual/argumentative portion of the paper; may be supported by survey items but no explicit empirical measure or effect size for creativity is provided in the provided text.

high positive The Strategic Impact of Generative Artificial Intelligence o... augmentation of human creativity / production of innovative solutions and scenar...

Decision quality and strategic agility positively influence organizational performance.

Reported SEM results from the paper linking the constructs (decision quality and strategic agility) to organizational performance using survey data from senior managers and AI adoption specialists; method = SmartPLS.

high positive The Strategic Impact of Generative Artificial Intelligence o... organizational performance

Generative AI adoption significantly enhances strategic agility.

Same empirical source as above: survey of senior managers/decision-makers/AI adoption specialists; tested via Structural Equation Modeling (SmartPLS) as reported in the paper.

high positive The Strategic Impact of Generative Artificial Intelligence o... strategic agility

Generative AI adoption significantly enhances decision quality.

Empirical analysis reported in the paper: survey data collected from senior managers, decision-makers, and AI adoption specialists across multiple industries; relationships assessed using Structural Equation Modeling (SmartPLS). No numeric sample size or effect estimate reported in the provided text.

high positive The Strategic Impact of Generative Artificial Intelligence o... decision quality

Human-like presentations increased perceived usefulness and agency in certain tasks.

Experimental manipulation of the human-likeness of AI presentation in the study's three tasks; the abstract reports increased perceived usefulness and agency for human-like presentations in some tasks. No sample sizes, task specifics, or effect magnitudes reported in abstract.

high positive More Isn't Always Better: Balancing Decision Accuracy and Co... perceived usefulness and perceived agency

A single dissent within a panel reduced pressure to conform.

Experimental manipulation of within-panel consensus (introducing a single dissent) in the study's three tasks; abstract reports that a single dissent lowered conformity pressure. No numerical data provided in abstract.

high positive More Isn't Always Better: Balancing Decision Accuracy and Co... pressure to conform / reliance on AI advice

Accuracy improved for small panels relative to a single AI.

Reported experimental result from the paper's study: participants completed three tasks and received advice from AI panels; panel size was manipulated (small panels vs single AI). The abstract states this accuracy improvement for small panels. (Sample size and exact tasks not reported in abstract.)

high positive More Isn't Always Better: Balancing Decision Accuracy and Co... accuracy

« Prev 1 2 3 … 167 168 169 … 277 278 Next »