Evidence (3470 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5877 claims
Human-AI Collaboration
5157 claims
Innovation
3492 claims
Org Design
3470 claims
Labor Markets
3224 claims
Skills & Training
2608 claims
Inequality
1835 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 736 | 1615 |
| Governance & Regulation | 664 | 329 | 160 | 99 | 1273 |
| Organizational Efficiency | 624 | 143 | 105 | 70 | 949 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 348 | 109 | 48 | 322 | 836 |
| Output Quality | 391 | 120 | 44 | 40 | 595 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 275 | 143 | 62 | 34 | 521 |
| AI Safety & Ethics | 183 | 241 | 59 | 30 | 517 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 105 | 40 | 6 | 187 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 78 | 8 | 1 | 151 |
| Regulatory Compliance | 69 | 64 | 14 | 3 | 150 |
| Training Effectiveness | 81 | 15 | 13 | 18 | 129 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
The study uses U.S. Census Bureau Business Trends and Outlook Survey data tracking over 1.2 million businesses.
Paper statement that it incorporates the Census Bureau Business Trends and Outlook Survey covering >1,200,000 businesses.
The analysis integrates the Anthropic Economic Index capturing approximately one million AI usage interactions.
Paper statement that the Anthropic Economic Index was used and captures ~1,000,000 AI usage interactions.
We develop an analytical model in which a firm jointly chooses AI deployment and cybersecurity investment under this governance-capability gap.
Methodological claim: the paper presents an analytical (theoretical) model describing joint choice of deployment and cybersecurity investment.
Foundational research on AI identity is the central conclusion of this report.
Authors' stated conclusion of the paper.
We define AI Identity as the continuous relationship between what an AI agent is declared to be and what it is observed to do, bounded by the confidence that those two things correspond at any given moment.
Conceptual definition presented by the authors (conceptual/terminological contribution rather than empirical evidence).
The sign reversal is a structural consequence of the reviewer effort collapse under log-concave quality distributions; this is proved analytically.
Formal analytical proofs in the paper that use the assumption of log-concave quality distributions to show the mechanism producing the sign reversal.
We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use.
Methodological statement: the paper presents a formal/theoretical model specifying institutional choice variables (model description rather than empirical result).
We document the performance of a market-based scaffolding with these LLMs.
Empirical documentation reported in the paper describing how a market-based scaffolding performs when using the six LLMs on the 93 tasks.
We use a 93-task subset of SWE-bench Lite, a software engineering benchmark, with six recently released LLMs as a demonstration.
Empirical setup described in the paper: evaluation uses a 93-task subset of SWE-bench Lite and six recent LLMs.
We propose MarketBench, a benchmark for assessing whether AI agents have these capabilities.
Paper contribution claim: introduction of a benchmark named MarketBench described in the paper.
In order to effectively participate in markets, agents need to have informative signals of their own ability to successfully complete a task and the cost of doing so.
Conceptual claim / design requirement motivating the benchmark; stated as part of the paper's framing rather than an empirical result.
The main findings are robust to multiple robustness checks.
Paper reports multiple unspecified robustness checks applied to the fixed-effects regression analyses on the panel of publicly listed Chinese firms (2012–2023).
We use a unified amortized framework to isolate semantic differences between eight Shapley variants under the low-latency constraints of operational risk workflows.
Methodological contribution described in the paper: a unified amortized computational framework applied to eight Shapley variants, evaluated under latency constraints typical of operational workflows.
No formulation improved objective analyst performance.
Controlled/empirical experiment reported in the paper evaluating eight Shapley variants with professional analysts in the fraud-detection environment; performance measured over 3,735 case reviews.
Standard quantitative metrics, such as sparsity and faithfulness, are decoupled from human-perceived clarity and decision utility.
Empirical comparison in the paper between quantitative metrics (sparsity, faithfulness) and human-judged clarity/decision-utility across the datasets and analyst reviews; based on the authors' large-scale evaluation.
We conduct a large-scale empirical evaluation across four risk datasets and a realistic fraud-detection environment involving professional analysts and 3,735 case reviews.
Experimental methods reported in the paper: evaluation across four risk datasets and a fraud-detection environment with professional analysts; stated sample of 3,735 case reviews.
A central issue is how humans interpret the algorithm's choice of features, which affects the design and evaluation of highlighting policies.
Framing and motivation in the paper: conceptual claim motivating the formal models and analysis (theoretical/argumentative).
We illustrate our framework in a calibrated empirical exercise based on the American Housing Survey.
An empirical/calibrated exercise using data from the American Housing Survey reported in the paper; the claim is that the framework is illustrated empirically (data-based demonstration).
Humans may interpret the algorithm's choice of features in different ways: a sophisticated agent correctly conditions on the selection rule, while a naive agent updates only on revealed feature values and treats the selection event as exogenous.
Conceptual/behavioral modeling in the paper that defines two agent-types (sophisticated vs naive) and analyzes their distinct inference processes (theoretical/modeling).
Highlighting can be modeled as a constrained information policy that selects a small number of features to reveal.
Modeling framework developed in the paper: formal definition of highlighting as an information policy with a feature-selection constraint (theoretical/modeling).
The paper proposes a conceptual framework linking AI adoption to employability and role transformation, mediated by skill adaptation, continuous learning, and organizational readiness.
Author-proposed conceptual framework presented in the review paper (theoretical linkage based on literature synthesis).
The paper develops an interdisciplinary conceptual framework that integrates insights from economics, management theory, and digital governance to characterize algorithmic enterprises.
Methodological claim about the paper's approach; stated in abstract as the paper's contribution (conceptual framework built from interdisciplinary literature).
The study uses a mixed-methods design combining a quantitative survey of 312 senior managers/strategy professionals and 28 semi-structured interviews across four sectors in Zimbabwe.
Methods reported in the paper: quantitative survey n = 312; qualitative 28 interviews across manufacturing, financial services, telecommunications, and retail.
The paper integrates information processing theory, the resource-based view, and the dynamic capabilities perspective to develop an integrated framework linking digital technology adoption, visibility, and resilience.
Theoretical framing described in the paper (explicit mention of the three theories and their integration).
The study employs hierarchical regression, structural equation modeling (SEM), and rigorous endogeneity controls including instrumental variables and propensity score matching.
Methods section summary reported in the paper; explicit listing of regression, SEM, IV, and propensity score matching.
The study draws on survey data from 742 manufacturing and logistics firms across 23 countries.
Reported sample description in the paper: survey of 742 firms across 23 countries (manufacturing and logistics).
The paper foregrounds industrial firms' own digital agency as a less understood aspect in the literature on digitalization and governance.
Authors' positioning of their contribution and literature review claim in the paper (qualitative/theoretical claim).
Hierarchical regression analysis and bootstrapping methods were employed for empirical testing.
Methods section explicitly states use of hierarchical regression and bootstrapping for empirical tests on the survey data.
The study used a three-wave longitudinal survey design collecting matched data from 497 employees.
Methods section states a three-wave longitudinal survey and reports matched data from 497 employees.
The paper contributes by sharpening the concept of management accounting decision quality, distinguishing GenAI from broader digital transformation, and offering a cautious process model grounded in documentary case evidence from leading Chinese manufacturers.
Author-stated contribution in the paper: conceptual refinement and process model based on the three-case documentary analysis.
Because the evidence is drawn primarily from external disclosures rather than direct internal observation, the claims should be read as interpretive analytical inferences rather than as definitive causal proof.
Author's own limitation statement about data sources (external corporate disclosures) and inferential scope.
The study adopts an interpretive multiple-case design and analyzes three major Chinese manufacturing firms - Midea Group, Haier Smart Home, and Dongfang Electric - using official annual and semi-annual reports, corporate disclosures, and recent AI-and-accounting literature.
Explicit methodological statement in the paper: interpretive multiple-case design; data sources listed as official annual and semi-annual reports, corporate disclosures, and literature; sample consists of three named firms.
This is an exploratory and qualitative state-of-practice study grounded in over 30 interviews across four stakeholder groups (large enterprises, small/medium firms, AI developers, and CAD/CAM/CAE vendors).
Methodological statement in the paper describing study design and sample composition.
Key breakthroughs needed include integration with traditional engineering tools and data types, robust verification frameworks, and improved spatial and physical reasoning.
Interviewee-identified requirements compiled from over 30 interviews; stakeholders repeatedly pinpoint integration, verification, and spatial/physical reasoning as priority technical advances.
As AI reduces the costs of ideation, synthesis, and search, the central bottlenecks of science increasingly shift toward coordination, adjudication, validation, and adaptive steering.
Argumentative/trend claim presented in the paper as motivation for PIM; no empirical time-series or quantitative analysis provided in the paper itself.
The paper formalises crowdsourced R&D and hackathon-type architectures as operational search forms and links these to Causal Problem Modelling (CPM) and the Causal Theoretical Twin Architecture (CTTA).
Conceptual mapping and theoretical linkage between existing crowdsourcing/hackathon models and CPM/CTTA within the PIM framework (theoretical exposition; no empirical mapping or measurement reported).
PIM proceeds through causal problem decomposition, distributed search, real-time evidential updating, contribution traceability, staged validation, and dynamic reprioritisation of candidate solution pathways.
Procedural description of the PIM methodology and its constituent stages in the paper (methodological/theoretical exposition; no experimental implementation reported).
PIM is designed for problem spaces characterised by causal heterogeneity, partial observability, nonlinear interaction, long feedback delays, and distributed expertise.
Methodological design specification within the paper describing the target problem-space features for which PIM is intended (conceptual specification; no empirical testing).
This paper formalises extensions of crowdsourced R&D and hackathon-based research into a general methodology called Probabilistic Innovation Methodology (PIM).
The paper presents a conceptual/theoretical formalisation and names the resulting methodology PIM (no empirical study or sample reported).
This study proposes a framework for evaluating platform ecosystems by their long-term effects on human capital formation and institutional resilience.
Methodological contribution claimed by the paper (development of an evaluative framework); presented as part of the paper's contributions rather than an empirical finding.
Under three scenarios (optimistic: 2028-2035; base: 2035-2045; pessimistic: 2045-2060), we specify disconfirmation criteria that would weaken the thesis if observed.
Scenario analysis and specification of disconfirmation criteria by the authors; methodological claim about forecasting structure rather than empirical result.
Converging evidence from history, philosophy, neuroscience, technology, organizational studies, and cultural analysis supports this thesis.
Authors' multidisciplinary literature review and synthesis across the named fields (method: qualitative review); no single empirical dataset or sample size given.
We introduce 'instrumental dissolution' -- loss of institutional-default status while persisting in specialist niches.
Conceptual/theoretical contribution defined by the authors and illustrated via cross-disciplinary examples; no empirical validation sample reported.
Typing's dominance was instrumental, not cognitively necessary.
Argumentative/historical analysis presented in the paper; synthesis of historical and philosophical literature (no empirical sample or experiment reported).
Algorithmic accuracy alone does not determine value; legitimacy and uptake hinge on people's and process readiness.
Thematic conclusion drawn from interviews, Likert surveys, and document analysis across cases indicating non-technical factors strongly influence uptake despite algorithmic performance metrics. (Sample size not reported.)
The long-term dynamic effects of AI on resilience remain unverified and require longer-term data.
Authors explicitly state the need for longer time-series data to validate long-term dynamics.
Enterprise-level indicators used in the study do not directly capture supply chain network structure and node dependencies.
Explicit limitation noted by the authors about measurement and scope.
The study's sample is limited to listed manufacturing companies, so conclusions should be applied cautiously to small and medium-sized enterprises (SMEs).
Explicit limitation stated by the authors in the paper.
Mediation and moderation models are leveraged to explore how AI enhances resilience via resource allocation optimization, productivity, and technological innovation, and how conditional factors (e.g., agility) affect these links.
Authors state they used mediation and moderation models on firm-level data to test mechanisms and conditional effects.
The study uses data on A-share listed manufacturing companies from 2011 to 2023 and applies a multi-period difference-in-differences (DID) model to assess AI's impact on SCR.
Methods description provided in the paper summary: sample timeframe and econometric approach explicitly stated.