Evidence (2290 claims)
Adoption
5187 claims
Productivity
4472 claims
Governance
4082 claims
Human-AI Collaboration
3016 claims
Labor Markets
2450 claims
Org Design
2305 claims
Innovation
2290 claims
Skills & Training
1920 claims
Inequality
1286 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 373 | 105 | 59 | 437 | 982 |
| Governance & Regulation | 366 | 172 | 114 | 55 | 717 |
| Research Productivity | 237 | 95 | 34 | 294 | 664 |
| Organizational Efficiency | 364 | 82 | 62 | 34 | 545 |
| Technology Adoption Rate | 290 | 115 | 66 | 27 | 502 |
| Firm Productivity | 274 | 33 | 68 | 10 | 390 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Output Quality | 231 | 61 | 23 | 25 | 340 |
| Market Structure | 107 | 121 | 85 | 14 | 332 |
| Decision Quality | 158 | 68 | 33 | 17 | 279 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Fiscal & Macroeconomic | 74 | 52 | 32 | 21 | 183 |
| Skill Acquisition | 88 | 31 | 38 | 9 | 166 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 105 | 12 | 21 | 11 | 150 |
| Consumer Welfare | 66 | 29 | 35 | 7 | 137 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 66 | 31 | 4 | 125 |
| Task Allocation | 68 | 8 | 28 | 6 | 110 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 11 | 16 | 94 |
| Worker Satisfaction | 42 | 32 | 11 | 6 | 91 |
| Task Completion Time | 74 | 5 | 4 | 1 | 84 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Wages & Compensation | 38 | 13 | 19 | 4 | 74 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 15 | 9 | 5 | 47 |
| Job Displacement | 5 | 29 | 12 | — | 46 |
| Developer Productivity | 27 | 2 | 3 | 1 | 33 |
| Social Protection | 18 | 8 | 6 | 1 | 33 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 8 | 4 | 9 | — | 21 |
Innovation
Remove filter
Support systems for digital services exporters, especially SMEs, are inadequate in China.
Review of policy documents and literature highlighting gaps in finance, legal support, and standards compliance assistance for SME internationalization (qualitative).
China's platform firms show uneven internationalization and platform infrastructure is not consistently internationally competitive.
Case examples and synthesis of domestic/international studies on platform internationalization included in the review (qualitative evidence).
China has limited influence in high‑level trade rule formation.
Policy review and comparative institutional analysis within the literature review; descriptive assessment of China's participation in multilateral rule‑making (no formal measurement of influence).
Current institutional, technological, and market shortcomings limit China’s ability to close the gap with economies operating under high‑standard trade regimes.
Qualitative comparative analysis of policy and institutional frameworks against high‑standard trade members; literature and case examples (no new microdata).
Absence of governance and observability could increase social costs of accidents and induce conservative regulation that stifles beneficial adoption.
Policy reasoning and historical regulatory responses to systemic risks; conceptual projection without quantitative modeling of regulatory impact.
Strong proprietary stacks and incompatible protocols could create winner‑take‑all or oligopolistic market outcomes due to network effects and switching costs.
Market‑structure theory and historical platform examples (e.g., dominant tech platforms); argument is conceptual and not backed by new empirical market analysis in the paper.
Without these architectural commitments, the economic costs — stranded assets, safety incidents, reduced innovation, and high coordination costs — will be substantial.
Predictive economic argument built from historical IoT/Internet lessons and systems reasoning; no quantitative cost estimates or econometric analysis in the paper.
Poor governance and observability in agent networks would make accountability, certification, and regulation difficult.
Policy and governance reasoning with illustrative domain examples; conceptual argument without empirical governance case studies or metrics.
Weak or brittle security and trust mechanisms across distributed agent ecosystems will pose serious risks.
Lessons drawn from IoT security failures and conceptual threat analysis; no new penetration testing or security metrics presented.
Lifecycle mismatch — rapidly evolving AI software embedded in long‑lived physical assets — risks premature ossification or expensive retrofits.
Systems engineering reasoning and historical analogies to embedded systems/IoT lifecycles; no quantitative lifecycle modeling or case study data in the paper.
Top-performing community submissions (including baselines and competition entries) still leave a performance gap relative to elite human play on battling tasks.
Paper reports comparative evaluation results showing win-rate and other metrics for heuristic, RL, LLM baselines and community submissions versus human (elite) benchmarks; analysis highlights a remaining gap.
Misalignment or poor meta-control could produce persistent unsafe behaviors in autonomous learners; governance and oversight mechanisms will be crucial.
Risk analysis based on conceptual failure modes for meta-control; no empirical incidents reported in the paper.
Current models transfer poorly across domains, are brittle in nonstationary environments, and are inefficient in physical/embodied tasks.
Synthesis of known challenges from prior literature and practical experience; paper cites these as motivating observations rather than reporting new data.
Current models have limited meta-control and do not autonomously decide when to explore, imitate, consult prior knowledge, or consolidate.
Conceptual critique based on typical ML training pipelines and limited on-line decision-making modules; no empirical tests in paper.
There is weak integration between passive observation (supervised/representation learning) and active experimentation (reinforcement/exploratory learning) in current systems.
Observation of methodological separation in current literature and systems; conceptual discussion in the paper.
Current AI models lack the architectures and control mechanisms required for sustained, autonomous learning in dynamic real-world settings.
Conceptual/theoretical analysis presented in the paper; synthesis of limitations observed in existing literature and practices (no new empirical data provided).
Public‑interest concerns (bias, misuse, systemic risk) may be harder to mitigate via simple transparency rules; policies should emphasize outcome‑based regulations, mandatory behavioral testing, and marketplace disclosure obligations for stressed scenarios.
Policy implication derived from the non‑rule‑encodability thesis; no empirical policy evaluation included.
Standard contracts and regulatory audits that rely on inspection of rule sets or source code will be insufficient to assess model behavior or risk; regulators and buyers must rely more on behavior‑based testing, standards, and outcome measures.
Policy and regulatory argument derived from the main theorem about non‑rule‑encodability; no empirical regulatory studies presented.
Full interpretability via rule extraction may be impossible for the most valuable parts of LLM competence, limiting the utility of some transparency approaches for safety and auditing.
Argumentative consequence of the main theoretical claim and structural mismatch; supported by historical limitations of rule‑based systems; no empirical tests reported.
There is a structural mismatch between explicit human cognitive tools (rules, checklists) and the pattern‑rich, high‑dimensional competence encoded in LLMs.
Theoretical/structural argument about distributed statistical representations in LLMs versus discrete rules; no experimental quantification provided.
Historical expert systems failed to generalize or scale to complex, ambiguous tasks, contrasting with LLMs' broader empirical successes.
Historical case analysis and literature review-style discussion of expert systems versus contemporary LLM performance; no new quantitative historical dataset provided.
Existing idea-evaluation approaches (LLM judges or human panels) are subjective and disconnected from real research outcomes.
Framing and motivation in the paper arguing current approaches rely on subjective judgments and do not directly tie to later publication/citation outcomes; supported implicitly by the empirical mismatch (LLM-judge vs HindSight).
High governance costs in regulated/high-risk domains can slow adoption of agentic systems, concentrating deployment in less regulated uses or among large firms that can afford governance infrastructure.
Economic reasoning about fixed and marginal governance costs and firm-level adoption decisions; no empirical adoption data presented.
Path-dependent behavior increases the complexity of principal–agent contracting and moral hazard between platforms, enterprise customers, and downstream users, requiring richer contract terms (acceptable paths, logging, audit rights).
Economic theory reasoning and applied contract/design implications discussed; no empirical contract-study data.
Path-dependent policies complicate ex post auditing and simple rule-based regulation; regulators may prefer standards requiring runtime evaluation and logging to be enforceable in practice.
Conceptual argument about limits of auditing when important state is ephemeral and about how runtime logging enables ex post review; illustrative policy examples mapping to runtime requirements.
The poor TSFM performance is attributed to pretraining corpora lacking high-frequency, domain-diverse examples (temporal-scale and domain mismatch).
Paper interprets benchmark failures as resulting from pretraining data mismatch (TSFMs usually pretrained on low-frequency domains like energy/finance) and argues lack of high-frequency examples reduces effectiveness. This is a causal interpretation based on observed transfer failures rather than a controlled causal experiment.
Most TSFM configurations evaluated failed to achieve adequate predictive performance on this high-frequency distribution.
Benchmarking compares multiple TSFM configurations (and includes traditional ML baselines) on the 5G millisecond dataset and reports that most TSFMs did not reach acceptable performance levels. The summary does not provide exact performance numbers or how adequacy was defined.
Current time-series foundation models (TSFMs), typically pretrained on low-frequency data, generalize poorly to high-frequency wireless and traffic data in zero-shot transfer.
Benchmarks reported in the paper include zero-shot evaluations of multiple TSFM configurations on the high-frequency 5G dataset and find poor zero-shot predictive performance. Exact models, metrics, and sample sizes are not specified in the summary.
Estimates of productivity gains from automating quantum-program generation should be discounted given the current lack of hardware-execution validation; adoption timelines and returns remain contingent on resolving the Layer 3b gap.
Forward-looking inference in the review: because Layer 3b is unreported across systems, projected productivity/adoption gains derived from Layers 1–2 results are uncertain and should be treated conservatively.
The absence of Layer 3b reporting raises investment risk and valuation uncertainty for startups and investors building on generative quantum-code technologies.
Economic reasoning derived from the documented empirical gap (no real-device evaluation) in the review; the claim links missing validation to higher uncertainty in productization and revenue potential.
Because end-to-end hardware evaluation is missing, claims of model performance based only on syntactic and semantic tests may be over-optimistic when translated into hardware-deployed value.
Analytical inference in the review: observed evaluations stop at Layers 1–2 for most systems, so mapping to hardware outcomes is unvalidated; this underpins the caution about over-optimistic extrapolation.
Datasets and provenance vary in coverage and quality, and benchmarking practices are heterogeneous across systems, complicating cross-system comparisons.
Review of the 5 identified datasets and reported benchmarking across the 13 systems found variation in dataset provenance, size, task coverage, and bespoke evaluation metrics.
The absence of Layer 3b evaluations creates uncertainty about latency, fidelity, noise resilience, calibration dependence, and practical deployability of generated artifacts.
Logical inference based on the documented lack of real-hardware execution (Layer 3b) across 13 systems; review highlights these specific practical metrics as untested in real devices.
Operational sustainability is a challenge: coordinating long R&D timelines and ensuring expert governance for drug development within DAOs is difficult.
Case-study observations and discussion of organizational challenges; acknowledged lack of longitudinal performance data in the studied projects.
Token economics can create speculative behavior misaligned with long-horizon drug development incentives.
Theoretical analysis of token market dynamics and incentive misalignment; supported by general observations of crypto market speculative behavior, but no DAO-specific empirical causation demonstrated.
Traditional hierarchical firms struggle to coordinate dispersed expertise and finance public‑good stages of drug development.
Theoretical/organizational analysis and literature synthesis on coordination problems and financing gaps for public-good preclinical stages; qualitative argumentation rather than empirical causal inference.
If AI models encode prevailing consensus or measurement conventions, they risk locking in suboptimal conventions and creating path-dependent coordination failures in R&D.
Argument based on path-dependence and model-mediated coordination theory; conceptual exploration with illustrative scenarios; no empirical demonstrations.
Platformization of sensory models and proprietary digital twins could create winner-take-most market dynamics, raise barriers to entry, and concentrate rents in firms controlling large sensory-performance datasets.
Economic reasoning drawing on platform economics and data-monopoly literature; applied conceptually to sensory-model platforms; no empirical market-concentration measurement in the food domain provided.
Failures of translation—both literal (across languages/markets) and metaphorical (between disciplines, scales, and practices)—impede global adoption and ideation of food products and innovations.
Argumentative synthesis citing cross-cultural examples and theoretical literature on translation costs; qualitative examples rather than empirical measurement of translation failures.
Industrial food R&D tends toward conservatism, privileging established measurement and classification schemes that can obscure sensory nuance and cultural variation.
Critical review and synthesis of literature on industrial R&D practices and measurement norms; illustrative industry examples cited; no systematic surveys or quantitative industry-wide data presented.
Language and conceptual frameworks (drawing on Wittgenstein) constrain what can be noticed, measured, and communicated about texture and taste, creating epistemic limits in scientific practice.
Philosophical analysis using Wittgensteinian language theory and examples from food science and sensory studies; literature synthesis and illustrative examples; no systematic empirical validation.
Empirical evidence shows that every 1 percentage Industrial Robot Density elevation leads to a 0.8 percentage point decrease in the Manufacturing Global Value Chain Participation Rate.
Empirical claim reported in the paper; method described as empirical analysis but the provided excerpt does not specify dataset, country sample, time period, model specification, controls, or sample size.
Developing countries face Technology Embargo, Rule Bundling and Capital Concentration Triple Barriers.
Theoretical and literature-based claim described by the authors; no empirical quantification of these barriers (e.g., number of embargoes, measures of rule bundling, capital concentration metrics) included in the excerpt.
Despite positive outcomes, challenges such as workforce displacement, ethical concerns, and limited access to AI technologies were identified as barriers to full adoption.
Study respondents reported barriers in the survey; descriptive statistics summarized the prevalence of workforce displacement concerns, ethical issues, and limited access to AI technologies as impediments to broader adoption.
O SCF é expandido para uma camada de segunda ordem (SCF-E) que incorpora déficit de imaginação tecnocultural e governança simbólica, explicando por que a IA permanece em pilotos e não se converte em capacidade organizacional.
Extensão conceitual (segunda ordem) relatada no artigo; respaldada metodologicamente pela combinação QUAN→QUAL, incluindo etnografia orientada ao SCF (detalhes empíricos no corpo do artigo, não no resumo).
A literatura de adoção tecnológica (TAM, UTAUT, Difusão de Inovações) tende a tratar a resistência como variável comportamental genérica ou deficiência de 'treinamento', negligenciando dimensões simbólicas (ritos, identidades e poder), mecanismos cognitivos de ameaça (aversão à perda, sobrecarga e heurísticas) e seus efeitos econômicos.
Revisão bibliográfica e posicionamento teórico declarado no artigo comparando modelos consagrados com a perspectiva proposta; sem indicação de meta-análise ou contagem empírica no resumo.
A Fricção Psicoantropológica (SCF) é proposta e detalhada como um coeficiente mensurável do custo cultural e da resistência cognitiva que reduz a capacidade de pequenas e médias empresas (PMEs) de transformar iniciativas de Inteligência Artificial (IA) em geração de valor em escala.
Proposição teórica e operacionalização apresentada no artigo; desenho metodológico descrito como QUAN→QUAL incluindo construção de escala psicométrica e etnografia organizacional. O resumo não especifica tamanho de amostra para validação.
Over-reliance on data-driven insights without adequate human oversight can worsen market uncertainty.
Reported in the study's qualitative case studies and interpretive analysis as a potential negative consequence of improper AI/Big Data use (no quantified examples provided in the summary).
Algorithmic bias is a potential pitfall of using AI and Big Data that can exacerbate market uncertainty.
Identified as a risk in the paper's qualitative analysis and discussion of pitfalls (no incident counts or empirical quantification provided in the summary).
There are concerns that AI may undermine the right to privacy in India.
Legal and policy analysis in the paper discussing privacy risks associated with AI and data-driven governance (review of privacy frameworks and potential conflicts). No empirical sample size; based on normative/legal analysis.