The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2432 claims)

Adoption
5126 claims
Productivity
4409 claims
Governance
4049 claims
Human-AI Collaboration
2954 claims
Labor Markets
2432 claims
Org Design
2273 claims
Innovation
2215 claims
Skills & Training
1902 claims
Inequality
1286 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 369 105 58 432 972
Governance & Regulation 365 171 113 54 713
Research Productivity 229 95 33 294 655
Organizational Efficiency 354 82 58 34 531
Technology Adoption Rate 277 115 63 27 486
Firm Productivity 273 33 68 10 389
AI Safety & Ethics 112 177 43 24 358
Output Quality 228 61 23 25 337
Market Structure 105 118 81 14 323
Decision Quality 154 68 33 17 275
Employment Level 68 32 74 8 184
Fiscal & Macroeconomic 74 52 32 21 183
Skill Acquisition 85 31 38 9 163
Firm Revenue 96 30 22 148
Innovation Output 100 11 20 11 143
Consumer Welfare 66 29 35 7 137
Regulatory Compliance 51 61 13 3 128
Inequality Measures 24 66 31 4 125
Task Allocation 64 6 28 6 104
Error Rate 42 47 6 95
Training Effectiveness 55 12 10 16 93
Worker Satisfaction 42 32 11 6 91
Task Completion Time 71 5 3 1 80
Wages & Compensation 38 13 19 4 74
Team Performance 41 8 15 7 72
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 17 15 9 5 46
Job Displacement 5 28 12 45
Social Protection 18 8 6 1 33
Developer Productivity 25 1 2 1 29
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 7 4 9 20
Clear
Labor Markets Remove filter
Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.
Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.
high positive The Evolution and Societal Impact of Artificial Intelligence... empirical evidence and standardized metrics for AI impacts (productivity, labor-...
AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.
Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.
high positive The Evolution and Societal Impact of Artificial Intelligence... technological evolution and cross-sector adoption (foundational-technology statu...
Policy recommendations include standards on explainability, audit trails, certification for finance/tax AI systems, stronger data governance, and public–private coordination to update regulatory guidance.
Paper's policy and governance recommendations drawn from case findings and literature synthesis; prescriptive content rather than evaluated interventions.
high positive Explore the Impact of Generative AI on Finance and Taxation existence/adoption of standards, improvements in regulatory clarity and complian...
Deployments should build governance, explainability, and auditability into systems and start with pilots on high-volume, well-structured tasks before scaling.
Paper recommendations based on case experience and analytic framing; advocated strategy rather than empirically validated at scale within the paper.
high positive Explore the Impact of Generative AI on Finance and Taxation deployment success rate, governance completeness, pilot-to-scale learning outcom...
To mitigate risks and realize benefits, AI systems in finance/tax should combine AI with human-in-the-loop controls and clear escalation paths.
Prescriptive recommendation grounded in case lessons and literature on safe AI deployment; presented as a best-practice guideline rather than tested intervention.
high positive Explore the Impact of Generative AI on Finance and Taxation safety/accuracy of outputs, reduction in erroneous autonomous actions
Technical building blocks leveraged in these deployments include large language models (LLMs), OCR plus structured information extraction, retrieval-augmented generation (RAG) and knowledge bases, and process automation/RPA.
Explicit technical characteristics section and case descriptions in the paper identify these components as core to implementations.
high positive Explore the Impact of Generative AI on Finance and Taxation capability enabling: natural language understanding, document extraction accurac...
Generative AI is used for risk control and audit functions, including real-time monitoring, fraud detection, KYC/AML screening, and automated exception reporting.
Reported use-cases in the two case organizations and corroborating industry reports discussed in the literature review portion of the paper.
high positive Explore the Impact of Generative AI on Finance and Taxation timeliness of monitoring, fraud detection rate, KYC/AML screening coverage, exce...
For tax declaration, generative AI enables extraction of tax-relevant facts from invoices and contracts, drafting of tax returns, compliance checks, and scenario simulations.
Case examples and literature synthesis describing OCR + information extraction and LLM-assisted drafting workflows used in practice.
high positive Explore the Impact of Generative AI on Finance and Taxation accuracy and speed of tax fact extraction, draft return quality, compliance-chec...
Generative AI is applied to fund management tasks such as cashflow forecasting, anomaly detection, and automated workflows for payments and collections.
Case descriptions and technical mapping in the paper showing implementations at the sharing center and professional services firm level.
high positive Explore the Impact of Generative AI on Finance and Taxation cashflow forecast accuracy, anomaly detection precision/recall, automation rate ...
Accounting automation use-cases include automated bookkeeping, reconciliations, journal entry suggestion, and error detection using LLMs and document understanding.
Detailed scope mapping and case examples in Xiaomi and Deloitte illustrating these accounting applications; supported by literature review of technical capabilities.
high positive Explore the Impact of Generative AI on Finance and Taxation functionality/performance in accounting tasks: bookkeeping accuracy, reconciliat...
Realizing those AI-driven gains in Vietnam requires legal and institutional redesigns.
Close reading of Vietnam's constitutional provisions, administrative statutes, procedural rules and judicial doctrine (doctrinal legal analysis) combined with comparative lessons from other jurisdictions; no quantitative data.
high positive ARTIFICIAL INTELLIGENCE AND ADMINISTRATIVE GOVERNANCE: A CRI... feasibility of AI deployment (legal/institutional compatibility enabling efficie...
Rigorous research priorities include randomized controlled trials with long-run follow-ups, cost-effectiveness studies, structural adoption models, and validated metrics for feedback quality and learning durability.
Actionable research recommendations produced by the 50-scholar interdisciplinary meeting; prescriptive synthesis rather than empirical results.
high positive The Future of Feedback: How Can AI Help Transform Feedback t... existence and quality of RCTs and long-run studies; availability of validated me...
Observations span multiple agent platforms (Moltbook, The Colony, 4claw) with more than 167,000 agents interacting as peers.
Author-reported coverage from naturalistic observations across the named platforms during the one-month observation window; count reported as ≈167k agents.
high positive When Openclaw Agents Learn from Each Other: Insights from Em... number of agents observed interacting as peers
Modular outputs (question histories, security checks, rubric scores, summaries) enable post-hoc review and explainability.
Architectural design and output artifacts described in the paper (logs and structured outputs per agent); these artifacts provide material for explanation and audit.
high positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... interpretability and auditability (availability of logs and structured outputs)
Adaptive difficulty and multidimensional evaluation allow dynamic tailoring of questions to candidate performance.
Implementation of adaptive testing logic within the workflow described in the paper, with experiments involving dynamic difficulty adjustment; detailed metrics of adaptation effectiveness are not provided in the summary.
high positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... ability to adapt question difficulty and evaluate multiple skill dimensions
Operating as a pre-processor (rather than modifying the generator) enables modular integration with existing LLMs and provides an explicit decision point for clarification.
Novelty/architecture claim in the paper explaining that C.A.P. runs before generation and therefore can be plugged into existing LLM pipelines; described design rationale (no empirical integration study presented).
high positive A Context Alignment Pre-processor for Enhancing the Coherenc... ease of integration / ability to attach to existing generation pipelines
C.A.P. verifies semantic alignment between the current expanded prompt and the weighted history and triggers a structured clarification protocol when similarity is below a threshold.
Component-level description: alignment verification via semantic embeddings (cosine similarity) or learned classifiers and threshold-based decision branching to initiate clarification; described protocol templates (no empirical validation provided).
high positive A Context Alignment Pre-processor for Enhancing the Coherenc... alignment detection (similarity score) and number/rate of triggered clarificatio...
C.A.P. retrieves dialogue history using a time-weighted decay so recent context is prioritized (approximating human conversational focus).
Design description of a 'time-weighted context retrieval' component; authors propose temporal decay functions (e.g., exponential decay, half-life parameter) applied to dialogue-turn embeddings or metadata (no empirical results reported).
high positive A Context Alignment Pre-processor for Enhancing the Coherenc... recency-weighted relevance of retrieved context / retrieval precision for recent...
C.A.P. is a pre-generation module that expands user utterances to recover omitted premises and implications.
Architecture and methods description in the paper specifying a 'semantic expansion' component; suggested implementations via knowledge-bases or small LLM prompts to generate premises, paraphrases, and implications (no empirical evaluation reported).
high positive A Context Alignment Pre-processor for Enhancing the Coherenc... recovered implicit premises / coverage of implied goals in expanded prompt
Structured argumentation frameworks make chains of inference inspectable and machine-checkable, improving transparency and verifiability of AI outputs.
Argument from formal properties of AFs and representation; no empirical user studies but relies on known formal semantics.
high positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... inspectability/traceability of inference chains (auditability)
Computational argumentation offers formal, verifiable reasoning representations (argumentation frameworks, attack/support relations).
Established literature on formal argumentation (e.g., Dung-style AFs) and the paper's conceptual description; no new empirical data reported.
high positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... existence and machine-checkability of formal inferential chains (inspectability/...
Evaluation metrics for the benchmark include task-specific metrics such as win-rate for battling and completion time for speedruns, as well as strategic robustness measures.
Paper's evaluation section lists metrics used: win-rate, completion time, strategic robustness; describes how they are computed and used to compare agents.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... evaluation metrics used (win-rate, completion time, strategic robustness)
Speedrunning Track includes an open-source multi-agent orchestration system and standardized evaluation scenarios for reproducible multi-agent comparisons.
Paper describes and releases an open-source orchestration harness for orchestrating LLMs/agents and provides standardized scenarios and evaluation tools meant for reproducibility.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... availability of open-source orchestration code and standardized evaluation scena...
Community interest in the benchmark was validated by a NeurIPS 2025 competition with 100+ teams and published analyses of winning submissions.
Paper reports organization/validation via a NeurIPS 2025 competition, states participation of 100+ teams, and includes documentation/analyses of top submissions.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of competing teams (100+), availability of competition analyses/winning s...
The project is a living benchmark: the Battling Track has a live leaderboard and the Speedrunning Track uses self-contained evaluation to ensure reproducibility.
Paper/documentation notes a live leaderboard for Battling and provides self-contained evaluation pipelines/orchestration for Speedrunning intended to support reproducible runs.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence of live leaderboard and self-contained evaluation pipelines
Baselines include heuristic rule-based agents, reinforcement-learning (RL) agents trained for specialist play, and LLM-based agents/harnesses for generalist approaches.
Paper presents baseline implementations and experiments spanning heuristic, RL, and LLM-based agents and describes training procedures and architectures used for each baseline category.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence and types of baseline agents (heuristic, RL, LLM)
The benchmark is split into two complementary tracks: a Battling Track (competitive, partial-observability battles) and a Speedrunning Track (long-horizon RPG tasks with a multi-agent orchestration harness).
Paper structure and dataset descriptions specify two tracks, their scopes, and the inclusion of a multi-agent orchestration system for the Speedrunning Track.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark partitioning (presence of Battling and Speedrunning tracks)
The Battling Track dataset contains more than 20 million recorded battle trajectories.
Paper reports a Battling Track dataset of >20M recorded battle trajectories collected from simulated/match play; size reported explicitly in dataset and methods section.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of recorded battle trajectories (>20,000,000)
PokeAgent Challenge is a large, realistic multi-agent benchmark built on Pokemon that stresses partial observability, game-theoretic reasoning, and long-horizon planning simultaneously.
Paper describes design and motivation of the benchmark, detailing two tracks (Battling and Speedrunning) intended to capture partial observability, adversarial/game-theoretic interactions, and long-horizon sequential planning; benchmark implementation built on Pokemon simulator and described task specifications.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark task characteristics (partial observability, game-theoretic complexity...
LEAFE achieves up to a 14% absolute improvement on Pass@128 versus the strongest baselines.
Empirical result explicitly reported in the paper: maximum observed improvement 'up to +14% Pass@128' in comparisons to baselines on the experimental tasks.
high positive Internalizing Agency from Reflective Experience Pass@128 (absolute percentage point improvement)
Compared with outcome-driven methods (e.g., GRPO) and experience-based baselines (e.g., Early Experience), LEAFE yields consistent gains in Pass@1 and Pass@k under fixed interaction budgets.
Head-to-head experimental comparisons reported between LEAFE and baselines GRPO and Early Experience on the task suite; fixed interaction-budget experimental regime; Pass@1 and Pass@k used as evaluation metrics.
high positive Internalizing Agency from Reflective Experience Pass@1 and Pass@k (fraction of problems solved among k candidate runs)
LEAFE substantially improves long-horizon agentic performance by internalizing recovery behavior learned from environment feedback.
Reported experiments on a suite of long-horizon interactive tasks (multi-step coding and agentic tasks) comparing LEAFE to baselines; evaluation using Pass@k metrics under fixed interaction budgets; qualitative description that LEAFE internalizes recovery behavior from environment feedback.
high positive Internalizing Agency from Reflective Experience Long-horizon agentic performance measured by Pass@k (Pass@1, Pass@k, Pass@128)
Historical transitions in standard work hours (e.g., six-day to five-day week) show that phased implementation, collective bargaining, and complementary policies can make work-time reductions feasible and economically beneficial.
Historical analyses and case studies of past industrialized-country workweek transitions cited in the synthesis; evidence drawn from historical institutional records and prior economic histories rather than a unified econometric analysis.
high positive A Shorter Workweek as a Policy Response to AI-Driven Labor D... feasibility and economic outcomes of phased work-time reductions (employment, pr...
The paper advances a replicable interdisciplinary synthesis method and provides a simulated dataset and transparent protocols enabling other researchers to adapt the approach.
Methods section detailing systematic literature search protocols (ACM/IEEE/Springer, 2020–2024), inclusion criteria, simulation parameterization for the cross-sectoral dataset (seven industries, 2020–2024), and stated reproducibility materials.
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Availability and description of reproducible methods and a simulated dataset (re...
AI adoption is strongly associated with workforce skill transformation (reported correlation r = 0.71).
Correlational analysis reported in the paper using the simulated cross-sectoral dataset that mirrors employment trends across seven industries (Manufacturing, Healthcare, Finance, Education, Transportation, Retail, IT Services) over 2020–2024. This corresponds to sector-year observations (7 sectors × 5 years = 35 observations) and is triangulated with findings from a systematic literature synthesis (ACM, IEEE, Springer publications 2020–2024).
high positive AI-Driven Transformation of Labor Markets: Skill Shifts, Hyb... Skill shift index (measure of changes in required skills and task composition)
Research priorities include rigorous real-world trials assessing patient outcomes, cost-effectiveness, and labor impacts; comparative studies of integration strategies; measurement of long-run workforce effects; and development of standard metrics and monitoring frameworks.
Explicit recommendations from the narrative review based on identified gaps: scarcity of RCTs, economic analyses, and long-term workforce studies.
high positive Human-AI interaction and collaboration in radiology: from co... number and quality of real-world trials, existence of standardized monitoring fr...
Reward shaping at the assignment layer enables an explicit trade-off between diagnostic accuracy and human labor by incorporating penalties for human involvement.
Methodology section describing reward shaping and experimental comparisons showing different accuracy/human-effort trade-offs (results reported in paper; exact experimental details not provided in the summary).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... diagnostic accuracy vs human effort (as controlled by reward shaping)
Masked reinforcement learning techniques constrain or mask action spaces, reducing exploration over huge symptom/action spaces.
Paper describes use of masked RL to limit action options during training and execution; used in both assignment and execution layers (methodological claim supported by algorithmic description and experiments).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... action-space reduction / sample efficiency / learning stability (as applied to s...
The upper layer ('master') learns turn-by-turn human–machine assignment using masked reinforcement learning with reward shaping to balance accuracy and human cost.
Methodological description in the paper and empirical results from experiments using masked RL and reward-shaped objectives at the assignment layer (implementation and experimental setup reported; dataset/sample size not specified in summary).
high positive Hierarchical Reinforcement Learning Based Human-AI Online Di... assignment policy performance; human effort allocation; diagnostic accuracy unde...
Returns to advanced digital skills vary by firm size/type: the wage return in large Chaebol conglomerates is approximately 18.7%, significantly higher than the ~9.5% return in Small and Medium-sized Enterprises (SMEs), indicating a 'skills–scale' complementarity effect.
Heterogeneity analysis within the extended Mincerian wage regression framework using KLIPS micro-data, comparing estimated returns across firm types (Chaebol vs SMEs). (Sample size and exact model specification not provided in the excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premiums by firm type: Chaebol ≈ 18.7%...
Workers with only general digital literacy receive a wage premium of approximately 5.8% (after controlling for education, experience, and demographics).
Same empirical framework: extended Mincerian wage equation on KLIPS micro-data with controls for education, experience, and demographic characteristics. (Sample size not specified in the provided excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premium ≈ 5.8%)
Workers possessing specialized digital skills (e.g., data analysis, programming, automation control) enjoy a significant wage premium of approximately 14.2% after controlling for years of education, work experience, and demographic characteristics.
Empirical estimation using an extended Mincerian wage equation on micro-data from the Korean Labor and Income Panel Study (KLIPS); models control for years of education, work experience, and demographic covariates. (Sample size not specified in the provided excerpt.)
high positive Measuring the Economic Returns of Vocational Digital Skills ... wage/worker compensation (percentage wage premium ≈ 14.2%)
The model is disciplined using data from the Michigan Survey of Consumers and the Survey of Professional Forecasters, targeting key empirical moments.
Calibration/estimation strategy described in the paper: parameters are chosen to match moments from the Michigan Survey of Consumers and SPF (targeted empirical moments). Specific moments and calibration targets are reported in the paper.
high positive Inaccurate Beliefs and Cyclical Labor Market Dynamics fit to targeted empirical moments (e.g., expectation dispersion, persistence mea...
I develop a search-and-matching model with sticky wages and endogenous separations.
Theoretical/model contribution: construction and analysis of a calibrated search-and-matching framework that incorporates wage stickiness and endogenous separation decisions.
high positive Inaccurate Beliefs and Cyclical Labor Market Dynamics wage dynamics and separation rates as generated by the model
Workers and firms face information frictions about the aggregate state of the economy (modeled explicitly).
Assumption and mechanism built into the paper's theoretical framework: a search-and-matching model with information frictions for both sides of the market (model specification).
high positive Inaccurate Beliefs and Cyclical Labor Market Dynamics information precision / belief heterogeneity about aggregate state (model primit...
Households form dispersed, backward-looking expectations about macroeconomic conditions.
Survey evidence from the Michigan Survey of Consumers showing dispersion in individual expectations and patterns consistent with backward-looking (slow/updating) belief formation about macro variables; exact sample sizes and empirical specifications are provided in the paper (not in the summary).
high positive Inaccurate Beliefs and Cyclical Labor Market Dynamics dispersion and updating dynamics of households' macroeconomic expectations
DARE posits that responsible AI deployment requires the simultaneous and integrated development of Digital readiness, Administrative governance, Resilience & ethics, and Economic equity.
Descriptive claim about the framework's components as reported in the abstract (conceptual proposition).
high positive The DARE framework: a global model for responsible artificia... responsible AI deployment (dependent on development across four DARE dimensions)
This paper introduces the DARE Framework, a holistic, four-dimensional model for national AI strategy and international cooperation.
Factual description of paper content in abstract — the framework is introduced by the authors (conceptual/model contribution).
high positive The DARE framework: a global model for responsible artificia... existence/introduction of a conceptual framework (DARE) for AI strategy
AI tools—ranging from machine learning algorithms in inventory management to natural language processing in customer engagement—are applied in micro‑enterprise contexts.
Descriptive synthesis from included articles reporting specific AI applications (ML for inventory management; NLP for customer engagement) across the reviewed literature.
high positive Role of AI in Enhancing Work Efficiency and Opportunities fo... types of AI applications deployed in micro‑enterprise settings (e.g., ML, NLP)
Global efforts toward establishing shared norms and multilateral cooperation are underway through initiatives led by the United Nations, OECD, UNESCO, and G7.
Qualitative document review identifying initiatives and normative efforts by multilateral organizations (organizations named; specific initiatives referenced qualitatively but not enumerated as a dataset).
high positive The Geopolitics of Artificial Intelligence: Power, Regulatio... existence and activity of multilateral initiatives for AI norms (UN, OECD, UNESC...