Evidence (5539 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

Adoption Remove filter

The paper challenges a purely rule‑based view of scientific explanation: some explanatory power will remain in implicit model structure rather than explicit rules.

Philosophical/epistemological argument based on the main thesis about tacit competence; no empirical validation.

medium mixed Why the Valuable Capabilities of LLMs Are Precisely the Unex... completeness of rule‑based scientific explanations when applied to LLM behavior

LLMs can provide useful inputs for near-term economic and logistical forecasting in crises (e.g., supply-chain disruptions, commodity market impacts, transport/logistics constraints), but their political/strategic forecasts should be used cautiously.

Observed stronger and more verifiable performance on economic/logistical question types in the 42-node evaluation; weaker reliability on politically ambiguous multi-actor issues reported in qualitative coding and verifiability checks.

medium mixed When AI Navigates the Fog of War usefulness for forecasting (economic/logistical forecasting accuracy/utility vs....

Model narratives evolve over time: earlier node outputs emphasize rapid containment, while later node outputs increasingly describe regional entrenchment and attritional de-escalation scenarios.

Longitudinal analysis across 11 temporal nodes comparing thematic/narrative content of model responses; qualitative coding tracked shifts in dominant scenario framings from early to later nodes.

medium mixed When AI Navigates the Fog of War narrative framing over time (frequency of containment vs. entrenchment/attrition...

Model reliability is uneven across domains: performance is stronger on structured economic and logistical questions than on politically ambiguous, multi-actor strategic issues.

Domain-specific comparison of model outputs on node-specific verifiable questions and exploratory prompts, with higher verifiability/accuracy and more consistent inferences reported for economic/logistical items versus greater ambiguity and lower consistency on political/multi-actor items.

medium mixed When AI Navigates the Fog of War domain-specific accuracy/reliability (economic/logistical vs. political/strategi...

Liability regimes and penalties should account for limits of enforced compliance and false positives/negatives from probabilistic policy evaluations.

Normative/economic discussion in the paper highlighting probabilistic outputs of the Policy function and calibration challenges; no empirical validation.

medium mixed Runtime Governance for AI Agents: Policies on Paths appropriateness of liability frameworks given probabilistic enforcement (policy ...

Firms will trade off compliance strictness against service quality (task completion rates), creating an economic tradeoff that shapes market offerings (e.g., safer-but-slower vs. faster-but-riskier agents).

Economic reasoning and conceptual models in the paper; suggested objective balancing task completion and legal/reputational costs; no empirical market data.

medium mixed Runtime Governance for AI Agents: Policies on Paths tradeoff curve between task completion rate and compliance risk (expected violat...

Alignment and instruction tuning approaches intended to encourage up-to-date answers improve some behaviors but do not reliably solve time-sensitivity and cross-modal consistency issues.

Experiments applying alignment/instruction-tuning methods with measurement of correctness and consistency; reported partial or inconsistent improvements rather than full resolution.

medium mixed V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... changes in correctness and consistency after alignment/instruction tuning

Diagnostic analysis links outdated predictions to (i) the static, time-stamped nature of training/evaluation datasets and (ii) mechanistic limits in how multimodal representations encode and retrieve temporal facts.

Error attribution analyses connecting incorrect answers to training snapshot timestamps and dataset provenance; representation-level analyses and qualitative case studies demonstrating multimodal encoding/retrieval limits.

medium mixed V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge i... attribution of errors to dataset temporal mismatch and representation/mechanisti...

For models/dynamics with negative LLE (contracting behavior), investment in parallel Newton tooling is likely to pay off; for expanding/chaotic dynamics (positive LLE), alternative architectural or modeling changes may be more cost-effective.

Application of the LLE convergence criterion derived in the thesis combined with empirical demonstrations on representative tasks indicating correlation between LLE sign and parallel solver performance; economic recommendation is interpretive.

medium mixed Unifying Optimization and Dynamics to Parallelize Sequential... return-on-investment / suitability of parallelization conditioned on LLE sign

The economic value of deploying DeePC-based controllers depends critically on representativeness of training data and the costs of online adaptation and safety verification.

Authors' deployment-risk analysis and discussion of trade-offs (qualitative), grounded in methodological requirements of DeePC (need for representative, persistently exciting data and safeguards).

medium mixed Data-driven generalized perimeter control: Zürich case study net economic value after accounting for data collection, adaptation, and verific...

System-level improvements from the controller do not imply uniform spatial/temporal benefits—distributional effects may favor certain routes or neighborhoods.

Authors' discussion and caution about distributional effects and equity; possibly supported by spatial analyses in simulation (qualitative discussion in paper).

medium mixed Data-driven generalized perimeter control: Zürich case study spatial/temporal distribution of travel-time changes across network links or nei...

Sparse MoE designs reduce active compute per query but can introduce serving complexity (routing, memory bandwidth, batching) that may require specialized infrastructure.

Architectural property of sparse MoE (sparse activation) and the paper's discussion of deployment trade-offs; the summary notes the need for specialized serving infra and potential transitional costs. This is an argument supported by known MoE deployment literature rather than novel empirical measurements in the summary.

medium mixed EngGPT2: Sovereign, Efficient and Open Intelligence trade-off between per-query active compute reduction and increased serving/opera...

Deploying conformal factuality systems increases development cost (collecting representative calibration data) and inference cost (verifier compute), though efficient verifiers mitigate inference cost.

Discussion and empirical cost measurements: need for representative calibration datasets to maintain guarantees; measured verifier FLOPs; qualitative economic analysis in the paper.

medium mixed Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... development effort for calibration data, inference compute cost (FLOPs), margina...

Conformal filtering improves formal reliability (statistical factuality guarantees) but does not, by itself, deliver robustness and task utility without careful system design.

Aggregate empirical results: improved factuality guarantees after calibration/filtering, but concurrent reductions in informativeness and sensitivity to distribution shift/distractors unless calibration/data-processing are adapted.

medium mixed Is Conformal Factuality for RAG-based LLMs Robust? Novel Met... post-filtering factuality guarantees, informativeness metrics, robustness under ...

Fine-tuning TSFMs on the high-frequency 5G data provides limited recovery; many configurations still perform poorly after fine-tuning.

Paper reports experiments including fine-tuning regimes where TSFMs were fine-tuned on the new dataset; results indicate limited improvement in many configurations. Specific fine-tuning procedures, datasets sizes, and quantitative results are not provided in the summary.

medium mixed Bridging the High-Frequency Data Gap: A Millisecond-Resoluti... predictive performance after fine-tuning (forecasting accuracy/error)

DeepSeek-R1 exhibits a distributed memorization signature: 76.6% partial reconstruction rate but 0% verbatim recall on the TS‑Guessing probe.

Model-specific results from Experiment 3 (TS‑Guessing) reporting per-model rates of partial reconstruction and verbatim recall across the 513 MMLU items for DeepSeek-R1.

medium mixed Are Large Language Models Truly Smarter Than Humans? partial reconstruction rate and verbatim recall rate (per-model)

Quantitative comparisons across tested models show systematic Misapplication Rate even in settings where Appropriate Application Rate is high.

Aggregated MR and AAR statistics reported for multiple frontier models across the benchmark showing co‑occurrence of high AAR and nontrivial MR.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Co‑occurrence of high Appropriate Application Rate (AAR) and nonzero Misapplicat...

Prompt‑based defensive instructions (explicitly instructing models to suppress preferences where inappropriate) reduce misapplication but fail to fully eliminate it.

Ablation experiments adding prompt‑based safety/defenses to model inputs and measuring MR and AAR; defenses produced reductions in MR but residual misapplication remained.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Misapplication Rate (MR) and Appropriate Application Rate (AAR) under prompt‑bas...

Attempts to mitigate misapplication with stronger reasoning prompts (e.g., chain‑of‑thought) reduce Misapplication Rate but do not eliminate it.

Ablation applying reasoning prompts and chain‑of‑thought style instructions to models, comparing MR before and after; reported reductions in MR but persistence of non‑zero MR across scenarios.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Change in Misapplication Rate (MR) after applying chain‑of‑thought / reasoning p...

Models that more faithfully enforce stored preferences achieve higher Appropriate Application Rate (AAR) but also systematically have higher Misapplication Rate (MR), indicating a trade‑off between correct personalization and harmful over‑application.

Ablation experiments varying strength of preference encoding and measuring resulting AAR and MR per model; quantitative comparisons across models showing positive correlation between stronger preference adherence and both higher AAR and higher MR.

medium mixed BenchPreS: A Benchmark for Context-Aware Personalized Prefer... Appropriate Application Rate (AAR) and Misapplication Rate (MR) — trade‑off rela...

Reducing payrolls raises short-term firm profitability but reduces aggregate household income and consumption.

Macroeconomic accounting and labor-demand theory combined with historical examples of payroll reductions; argument is theoretical/conceptual rather than estimated with new aggregate time-series regression evidence.

medium mixed A Shorter Workweek as a Policy Response to AI-Driven Labor D... firm profitability (short-term) and aggregate household income/consumption

Finance, Education, and Transportation show mixed dynamics: both displacement of routine tasks and creation of new hybrid roles.

Descriptive sectoral analyses from the simulated dataset (hybrid share, task-displacement indicators, employment changes) covering Finance, Education, Transportation (2020–2024), plus mixed-evidence studies from the literature synthesis (ACM/IEEE/Springer 2020–2024).

medium mixed AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Hybrid job share, task-displacement indicators, employment levels by sector

Improved matches and clearer skill signals can raise short-term wages for matched youth, while longer-term wage dynamics will depend on supply responses and bargaining power shifts.

Pilot reports higher reported short-term wages; longer-term effects are discussed as conditional and not measured in the pilot.

medium mixed AI-Driven Skill Mapping and Gig Economy Matching Algorithm f... short-term wages; long-term wage dynamics (not measured)

Overall, economic benefits from AI in radiology are plausible but conditional on human-AI interaction design, governance, workforce effects, and payment structures; net value is not determined by algorithmic accuracy alone.

Synthesis of the heterogeneous literature (laboratory, reader, observational, qualitative) and conceptual economic analysis highlighting dependencies beyond algorithmic performance.

medium mixed Human-AI interaction and collaboration in radiology: from co... net economic value/ROI, clinical outcomes, adoption and sustainability metrics

The net effect of AI on clinician burnout is ambiguous: tools can remove tedious tasks but may introduce new cognitive, administrative, and liability stresses.

Mixed qualitative and small-scale observational studies with variable findings on burnout-related measures after AI introduction.

medium mixed Human-AI interaction and collaboration in radiology: from co... burnout survey scores, task satisfaction, administrative burden metrics

Changes in workload composition can reduce routine burdens but may shift cognitive load to follow-up decisions and managing AI outputs.

Observational and qualitative studies of deployed systems reporting redistribution of tasks and clinician-reported changes in cognitive demands.

medium mixed Human-AI interaction and collaboration in radiology: from co... time allocation across task types, subjective cognitive workload scores, frequen...

Economic outcomes depend on complementarity versus substitution: AI that augments radiologists can raise output per worker; AI that substitutes tasks may reduce demand for certain diagnostic activities.

Theoretical economic frameworks and case studies of task reallocation in early deployments; empirical workforce-impact studies limited.

medium mixed Human-AI interaction and collaboration in radiology: from co... radiologist productivity metrics, employment levels/demand for diagnostic activi...

Automation bias can increase undue reliance on AI, while algorithmic aversion can drive underuse of helpful tools.

Cognitive and behavioral studies and reader simulations demonstrating both increased acceptance/overtrust in automated outputs in some settings and rejection/discounting of AI advice in others.

medium mixed Human-AI interaction and collaboration in radiology: from co... rates of clinician acceptance/use of AI recommendations, error rates when follow...

Real clinical value depends critically on how AI tools interact with radiologists in practice (integration design and human-AI interaction).

Conceptual models and synthesis of reader studies, simulation/interaction studies, usability and qualitative deployment evaluations that compare standalone algorithm performance versus clinician+AI workflows.

medium mixed Human-AI interaction and collaboration in radiology: from co... clinician-AI joint diagnostic performance, patient-relevant outcomes, workflow m...

Trust calibration influences project performance outcomes: organizations tend toward metric-driven evaluation of AI outputs and use AI to strategically augment human expertise, but miscalibration risks overreliance or inappropriate metric focus that can harm performance.

Based on participants' reported experiences in the 40 interviews and interpretive thematic analysis linking trust practices to observed/perceived performance consequences (shift to metric-based evaluation, strategic use, and noted risks).

medium mixed AI in project teams: how trust calibration reconfigures team... project performance (measured outputs, augmentation of expertise, error rates/qu...

Trust calibration shapes collaboration patterns, including delegation of oversight to systems or specialists, changes in communication networks (who talks to whom), and erosion of informal ad hoc communications used previously for tacit coordination.

Observed in interview narratives (40 interviews) and thematic coding showing repeated reports of shifted oversight roles, altered communication pathways, and reduced informal coordination after AI integration.

medium mixed AI in project teams: how trust calibration reconfigures team... collaboration dynamics (oversight delegation, communication patterns, informal c...

Trust calibration is produced and maintained through ongoing boundary work between humans and machines (i.e., teams continuously negotiate which inputs/responsibilities are treated as human versus machine).

Derived from participants' accounts in the 40 interviews and thematic analysis documenting repeated examples of role negotiation and boundary-setting between people and AI systems during project routines.

medium mixed AI in project teams: how trust calibration reconfigures team... trust calibration practices / boundary work (who is responsible for tasks/inputs...

Trust in AI within project-based work is situational and socially distributed across team members, rather than a stable individual attitude.

The claim is based on thematic qualitative analysis of 40 semi-structured interviews with project professionals across multiple industries in the UK. Interview data showed variation in how different team members described their trust in systems depending on role, task, and context.

medium mixed AI in project teams: how trust calibration reconfigures team... trust in AI (nature/distribution of trust across individuals and situations)

Explicit governance reduces negative externalities (bias, privacy breaches, loss of trust) but entails compliance costs that should be factored into adoption and diffusion models.

Conceptual claim synthesizing trade‑off arguments from governance and risk literatures and practitioner examples; not measured empirically in the paper.

medium mixed Symbiarchic leadership: leading integrated human and AI cybe... incidence of bias/privacy breaches/loss of trust; governance/compliance costs

Embedding AI into workflows may change firm boundaries (e.g., outsourcing models vs. in‑house systems) and make investments in internal auditability and explainability strategic assets.

Theoretical implication drawn from synthesis of organizational boundary theory and practitioner trends; suggested rather than empirically demonstrated within the paper.

medium mixed Symbiarchic leadership: leading integrated human and AI cybe... firm boundaries (insourcing vs outsourcing); value of internal governance capabi...

AI is likely to continue shifting the frontier of early discovery and increase the throughput and quality of hypotheses, but persistent biological uncertainty and the cost of clinical validation mean AI will complement—not fully replace—traditional R&D for the foreseeable future.

Synthesis of technological trends, application successes and limitations, translational risk, and economic reasoning presented throughout the paper.

medium mixed Has AI Reshaped Drug Discovery, or Is There Still a Long Way... long-run role of AI in drug discovery (degree of complementarity versus replacem...

Proprietary data, precompetitive consortia, and platform consolidation can create barriers to entry; public-data initiatives could alter competitive dynamics.

Market-structure analysis and discussion of data-access models in the paper, with examples of consortia and proprietary platform effects.

medium mixed Has AI Reshaped Drug Discovery, or Is There Still a Long Way... barriers to entry and competitive dynamics influenced by data-sharing models and...

Expect strong returns-to-scale and winner-take-most dynamics: large incumbents and well-funded startups with proprietary data/compute may dominate the field.

Economic reasoning and observations in the paper about data/compute concentration, platform effects, and market outcomes.

medium mixed Has AI Reshaped Drug Discovery, or Is There Still a Long Way... market concentration and returns-to-scale in AI-driven drug discovery firms

Realizing economic gains at scale from AI in drug R&D is constrained by data quality and access, high implementation and integration costs, regulatory uncertainty, and ethical/legal concerns; these constraints will shape how gains are distributed across firms, countries, and patients.

Aggregate conclusion of the narrative review synthesizing documented benefits and recurring constraints from published studies, case reports, industry/regulatory analyses; qualitative synthesis without quantitative projection of distributional outcomes.

medium mixed From Algorithm to Medicine: AI in the Discovery and Developm... scale of economic gains (industry-wide productivity); distributional outcomes ac...

Adoption of AI in pharma will increase demand for computational biologists, ML engineers, and data scientists and may displace or redefine some traditional bench roles.

Labor-market trend reports and organizational case studies included in the review noting hiring patterns and role changes; qualitative synthesis rather than comprehensive labor-market study.

medium mixed From Algorithm to Medicine: AI in the Discovery and Developm... employment composition by role; hiring demand for computational vs. bench roles

AI could lower discovery costs and permit more entrants in niche/specialty therapy discovery, but clinical development costs remain a major barrier to entry.

Synthesis of reported reductions in early-stage discovery costs and persistent high clinical trial costs from studies and industry reports; heterogeneous evidence across therapeutic areas.

medium mixed From Algorithm to Medicine: AI in the Discovery and Developm... discovery-stage cost per candidate; clinical development costs; number of entran...

Upfront capital and proprietary data requirements may advantage large incumbents or well-funded startups and could increase market concentration unless data-sharing or open platforms emerge.

Market-structure analysis and industry examples in the narrative review; inference based on observed data-asset advantages and investment needs across firms.

medium mixed From Algorithm to Medicine: AI in the Discovery and Developm... market concentration indicators; entry barriers; degree of data centralization

AI shifts the cost structure of drug R&D toward higher fixed costs (data infrastructure, compute, ML talent) and potentially lower marginal costs for candidate generation and some preclinical activities.

Economic synthesis and industry reports in the review describing capital-intensive investments and reduced per-unit costs in algorithmic candidate generation; largely conceptual and based on case examples.

medium mixed From Algorithm to Medicine: AI in the Discovery and Developm... fixed vs. marginal R&D costs; per-candidate generation cost

Early-stage unit costs and time-per-hit can fall with AI, but late-stage clinical trial costs driven by biology remain the primary bottleneck to overall R&D productivity gains.

Qualitative assessment of stage-specific effects based on industry observations and conceptual decomposition of R&D stages; no new cost accounting or econometric estimates provided.

medium mixed Learning from the successes and failures of early artificial... unit cost per hit; time-per-hit; overall cost per approved drug

AI can improve specific stages of drug discovery but cannot eliminate fundamental biological uncertainty.

Conceptual and thematic analysis across technological capability and R&D integration levels; supported by illustrative examples showing limits of prediction in complex biology.

medium mixed Learning from the successes and failures of early artificial... residual biological uncertainty as it affects late-stage attrition / unpredictab...

Two opposing market forces will act: (a) democratization lowering entry barriers for startups, and (b) concentration where firms with premium proprietary data and integrated AI capture outsized returns.

Conceptual economic analysis and illustrative industry observations; no empirical market-structure measurement presented.

medium mixed AI as the Catalyst for a New Paradigm in Biomedical Research market entry barriers and market concentration/returns

AI (including machine learning, generative AI, and NLP) is reshaping biomedical research and pharmaceutical R&D by creating distinct adoption archetypes within large pharmaceutical companies.

Editorial / conceptual synthesis using qualitative analysis and archetype classification based on cross-industry observations and illustrative examples; no systematic measurement or sample size reported.

medium mixed AI as the Catalyst for a New Paradigm in Biomedical Research organizational adoption patterns (adoption archetypes within large pharma)

Emerging technologies (AI, digital twins, computational rheology) can compress high-dimensional sensory/rheological spaces into actionable models, enabling faster iteration in R&D and altering how firms value R&D inputs.

Theoretical projection and literature-based argument about technological capabilities; illustrative scenarios offered; no empirical trials or measured productivity changes reported.

medium mixed At the table with Wittgenstein: How language shapes taste an... R&D iteration speed, valuation of R&D inputs, and model compressibility of senso...

There is potential for timely, personalized interventions (nudges/warnings) that could reduce harm, but causal evidence of long‑term effectiveness is limited.

Many studies propose or evaluate intervention prototypes and report feasibility/short‑term outcomes, while the review notes scarce randomized or longitudinal evaluations measuring welfare outcomes.

medium mixed Deep technologies and safer gambling: A systematic review. intervention uptake and short‑term behavioural change (pilot outcomes) versus lo...

Techniques to mitigate data scarcity—transfer learning, data augmentation, physics-informed priors, active learning, and leveraging multimodal data—provide partial improvements but do not fully resolve generalization limits.

Review of methodological papers and empirical studies applying these techniques; synthesis indicates improvements in certain contexts but ongoing limitations documented across sources.

medium mixed Machine Learning-Driven R&D of Perovskites and Spinels: From... improvement in model performance/generalization when applying data-scarcity miti...

« Prev 1 2 3 … 48 49 50 … 110 111 Next »