Evidence (6574 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
A modular four-script Python pipeline processes synthetic FHIR-based claims data and real claims documents, extracting 36 actuarial variables across reserving, ratemaking, and claims management categories.
Authors report implementation of a four-script Python pipeline applied to synthetic FHIR-based claims and real documents, with 36 target variables defined.
We present a proof-of-concept framework using large language models (LLMs) to extract structured actuarial variables from unstructured claims data.
Authors implemented a prototype framework described in the paper (implementation details and pipeline described).
I have developed LLMbench, a research instrument for the comparative close reading of LLM outputs that visualises token probability distributions, entropy curves, and cross-model divergence.
Description of a tool/method developed by the author (LLMbench); claim about the tool's features as stated in the abstract; no implementation details or evaluation sample sizes provided in the abstract.
Journalists and editors exercise bounded and situational agency through local adaptation, self-training, and development of ethical guardrails that institutionalise responsible AI use.
Based on in-depth interviews with newsroom staff (journalists, editors, technical personnel) at Al-Masry Al-Youm; qualitative accounts of local practices such as self-training and the creation of internal ethical rules. Sample size not reported in the excerpt.
By bridging established knowledge with emerging governance challenges, this study advances a more comprehensive understanding of platform governance and outlines future research avenues related to technological change, dynamic capabilities, and ecosystem perception.
Authors' stated contribution based on their integrative framework and literature synthesis of 644 publications.
The paper proposes a research agenda that examines how emerging technologies, including algorithmic governance, generative AI, and agentic systems, are reshaping governance practices.
Paper's concluding/prospective section proposing future research directions; conceptual proposal rather than empirical test.
The identified governance mechanisms foster innovation in platform ecosystems.
Claim based on the paper's integrative synthesis of 644 publications indicating governance's role in fostering innovation.
The identified governance mechanisms ensure quality in platform ecosystems.
Argument and synthesis from the systematic literature review of 644 publications as presented in the paper's framework.
The identified governance mechanisms (incentives, control, boundary resources) enable platform owners to coordinate value creation.
Argument based on the integrative framework derived from the systematic literature review (644 publications).
There are three core types of governance mechanisms that enable platform owners to coordinate value creation, ensure quality, and foster innovation: incentives, control, and boundary resources.
Synthesis and classification resulting from the systematic literature review of 644 publications, producing an integrative framework that identifies the three mechanism types.
This study conducts a systematic literature review of 644 publications to synthesize the governance landscape and develop an integrative framework.
Methodological statement from the paper reporting the authors performed a systematic literature review analyzing 644 publications.
Platform owners orchestrate complementor participation through governance mechanisms.
Synthesis and conceptual argument based on the systematic literature review of 644 publications.
Digital platform ecosystems rely on loosely coupled complementors to jointly create value with platform owners.
Synthesis of prior literature via the paper's systematic literature review (644 publications); conceptual framing in the literature on platform ecosystems.
Work Flexibility is the strongest predictor of Employee Productivity (β = 0.562, p < 0.001), indicating flexible working conditions play an important role in improving employee performance and work efficiency.
Reported quantitative result from the study using PLS-SEM; β and p-value provided in the paper indicating the largest standardized effect among predictors. Sample size not reported in the excerpt.
Human-Centric AI Adoption has a positive and statistically significant effect on Employee Productivity (β = 0.263, p = 0.028).
Reported quantitative result from the study using Partial Least Squares Structural Equation Modeling (PLS-SEM); β and p-value provided in the paper. Sample size not reported in the excerpt.
Artificial intelligence (AI) increasingly participates in strategic decision-making, challenging leadership theories that assume human agency at the top of organizations.
Concept-centric literature review integrating management and information systems (IS) research; theoretical synthesis of prior empirical and conceptual studies (no primary empirical sample reported).
The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions.
Interpretive conclusion based on empirical results across the construction-method grid and performance patterns (discussion/implication in paper).
Best-cell Fisher-z rank-order correlation reaches r = 0.590 on the SOEP held-out evaluation set.
Reported best-performing cell Fisher-z (or Fisher-transformed correlation) from held-out evaluation on SOEP.
Best-cell accuracy reaches 78.8% on the SOEP held-out evaluation set.
Reported best-performing cell accuracy from held-out evaluation on SOEP.
Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth.
Empirical comparison between two embedding methods at 100% information depth across all model-by-reasoning cells (reported in results).
Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells.
Empirical evaluation across information-depth conditions, comparing hold-out performance by normalized Shannon entropy quartiles (reported in results).
The field can be organized around an integrated decision-system framework consisting of five connected constructs—delegation frontier, reliance wedge, decision-useful XAI, meaningful oversight, and reflexive AI loop—to support cumulative research on investment, trading, credit, asset management, risk, compliance, and financial regulation.
Proposal of a conceptual framework grounded in the paper’s integrative literature review (no empirical validation or sample size reported in the abstract).
The review integrates evidence on methods, data, scenarios, explainability, trust, governance, financial large language models (FinLLMs), and agentic finance.
Descriptive claim about the scope of this paper’s literature synthesis (the review itself; content-based rather than empirical).
The central question is moving from model performance to decision architecture: how authority, oversight, and accountability should be allocated across financial workflows.
Argument based on synthesis of prior literature across relevant fields (conceptual review; no single empirical study or sample size reported).
AI is moving from a predictive tool to a component of human–AI hybrid financial decision systems.
Integrative conceptual literature review synthesizing work across finance, management, human–computer interaction (HCI), and AI (no primary empirical sample reported).
Under linear local composition, every protocol tree defines a barycentric coordinate chart on the simplex of leaf weights; Tamari-cover reparameterizations of protocol trees preserve complementarity, and for N = 4 these reparameterizations satisfy the pentagon identity.
Mathematical construction and proofs in the paper linking protocol trees, barycentric coordinates, Tamari lattice reparameterizations, and the pentagon identity (theoretical work; no empirical sample).
For N = 2 in regression under squared loss, the optimal linear-pooling weight has a closed form and admits a residual-correction interpretation.
Closed-form derivation and interpretation provided in the paper (mathematical derivation; no empirical sample).
Across our large-scale empirical analysis, Parthenon substantially improves the performance of state-of-the-art models and harnesses on legal-matter tasks.
Reported evaluations in the paper comparing baseline state-of-the-art models/harnesses to the Parthenon framework across their empirical dataset (Harvey LAB), claiming substantial performance gains.
An anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience without touching model weights.
Paper describes a proposed/implemented learning loop (anti-leakage) that translates scored agent failures into edits to non-weight system components (skills, tools, knowledge) and claims this enables improvement without model weight updates.
We introduce Parthenon, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure.
Paper describes the design and implementation of the Parthenon framework and its modular decomposition into Model, Harness, Agent roles, Knowledge, Tools, and Skills, claiming these enable auditable traces and grounding.
Per-criterion accuracy climbs with stronger models.
Empirical comparison across model strengths reported in the Harvey LAB study (12,510 trajectories) showing per-criterion accuracy trends correlated with model strength.
TAs remained fully in control and could use, edit, or ignore AI-generated drafts at their discretion.
Study design statement from the randomized field experiment: intervention provided AI-assisted feedback drafts to TAs after grading but kept TAs fully in control to accept, edit, or ignore drafts. 11 TAs in the course.
Qualitative findings indicate AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort.
Qualitative interviews conducted as part of the mixed-methods study (course included 11 TAs and 88 students); thematic/qualitative analysis reported that TAs described drafts as scaffolds that made starting feedback easier and did not simply replace TA effort.
AI-assisted feedback increases feedback length by 39.8 characters.
Randomized field experiment in the same course; comparison of feedback length between treatment and control. Reported estimate: +39.8 chars, SE=3.45, p<0.001. Student-level random assignment (n=88); 11 TAs.
AI-assisted feedback significantly increases feedback provision by 10.8 percentage points.
Randomized field experiment in a 300-level machine learning course. Student submissions (n=88) were randomly assigned to treatment (TAs received AI-assisted feedback drafts) or control. Reported estimate: +10.8 percentage points, SE=1.1, p<0.001. 11 TAs participated and could use, edit, or ignore drafts.
A tool-augmented agentic AI method (equipped with analytical tools, structured DIKW reasoning agents, and transparent evidence chains) can automatically learn from experimental data to generate new interventions and produce superior interventions compared to Human + Chatbot co-design.
Two-stage field experiments in healthcare prescription messaging comparing Stage 1 (Human + Chatbot: 13 message variants, 444,691 patient visits) to Stage 2 (Tool-Augmented Agentic AI: 17 AI-generated variants, 248,448 patient visits).
The best AI-generated message achieved a 69.8% CTR (+6.5 percentage points over baseline).
Stage 2 field experiment in healthcare prescription messaging where AI-generated message variants were tested; result reported directly in paper.
We will open-source all evaluation codes, tasks, and data at https://github.com/mrwwk/DeskCraft.
Author statement promising release of code, tasks, and data (stated in abstract).
GPT-5.4 reaches 27.6% on interactive tasks.
Author-reported benchmark result for GPT-5.4 on interactive tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.
GPT-5.4 reaches 31.6% on standard tasks.
Author-reported benchmark result for GPT-5.4 on standard tasks from the evaluation (reported in abstract); presumably measured across the evaluation tasks.
We evaluate 18 proprietary and open source agents on 538 tasks.
Author-reported evaluation methodology and scale (number of agents and tasks) as stated in abstract.
Mid-turn interaction captures both agent-initiated clarification under uncertainty and user-initiated interruption during execution, while post-turn interaction accommodates user-driven feedback after the agent signals completion.
Author description of interaction protocol structure (design specification in paper abstract).
DeskCraft formalizes human-agent collaboration into an interaction protocol covering mid-turn and post-turn exchanges.
Author statement in abstract describing the protocol (design/method contribution).
DeskCraft covers professional creative software across design, video, audio, and 3D creation.
Author statement in abstract listing covered software domains.
DeskCraft organizes tasks into a multilevel difficulty taxonomy, with long horizon tasks requiring over 50 execution steps.
Benchmark design described in abstract (explicit statement that long-horizon tasks require over 50 execution steps).
We introduce DeskCraft, a desktop GUI benchmark targeting long horizon creative and engineering workflows and proactive human-agent collaboration.
Author statement describing the new benchmark (benchmark design and scope described in paper).
The paper constructs firm-level indicators of artificial intelligence and new quality productive forces for new energy vehicle firms.
Authors state they constructed firm-level indicators as part of their empirical approach on the Yangtze River Delta panel dataset.
Artificial intelligence affects firms' new quality productive forces through improvement of innovation output.
Mechanism tests reported by the authors showing empirical evidence that AI improves innovation output (e.g., measured innovation outcomes) which is linked to higher new quality productive forces.
Artificial intelligence affects firms' new quality productive forces through optimization of R&D personnel structure.
Mechanism tests reported by the authors using the constructed indicators and panel data; empirical evidence cited that links AI to changes in R&D personnel structure which in turn link to new quality productive forces.
The promoting effect of artificial intelligence on new quality productive forces is more pronounced among small-sized enterprises.
Heterogeneity tests by firm size in the panel data; authors report stronger positive effects for small-sized firms.