Marrying formal argumentation with large language models could make AI decisions inspectable and contestable, creating demand for verifiable AI services in regulated sectors; at the same time it will shift experts toward adjudication and oversight and create new governance needs.
Computational argumentation offers formal frameworks for transparent, verifiable reasoning but has traditionally been limited by its reliance on domain-specific information and extensive feature engineering. In contrast, LLMs excel at processing unstructured text, yet their opaque nature makes their reasoning difficult to evaluate and trust. We argue that the convergence of these fields will lay the foundation for a new paradigm: Argumentative Human-AI Decision-Making. We analyze how the synergy of argumentation framework mining, argumentation framework synthesis, and argumentative reasoning enables agents that do not just justify decisions, but engage in dialectical processes where decisions are contestable and revisable -- reasoning with humans rather than for them. This convergence of computational argumentation and LLMs is essential for human-aware, trustworthy AI in high-stakes domains.
Summary
Main Finding
The paper argues that integrating computational argumentation with large language models (LLMs) creates a new paradigm—Argumentative Human-AI Decision‑Making—where AI agents do more than justify outputs: they participate in dialectical, contestable, and revisable decision processes with humans. This synergy promises human-aware, verifiable, and trustable AI for high‑stakes domains by combining formal argument structures with LLMs’ ability to mine and generate rich, contextual arguments from unstructured text.
Key Points
- Motivation
- Computational argumentation: offers formal, verifiable reasoning representations (argumentation frameworks, attack/support relations), but has required heavy feature engineering and domain-specific knowledge.
- LLMs: excel at extracting and generating arguments from unstructured text but are opaque and hard to evaluate or trust.
- Proposed convergence
- Argumentation Framework Mining: use LLMs and NLP pipelines to extract claims, premises, relations (attack/support), and provenance from text corpora.
- Argumentation Framework Synthesis: combine mined fragments into coherent formal argumentation frameworks (AFs) with explicit semantics, enabling verification and automated inference.
- Argumentative Reasoning & Interaction: run formal dialectical/acceptability semantics and dialogue protocols (contest, rebuttal, revision) enabling agents that reason with humans through structured debates and revisions.
- Value-add
- Transparency and verifiability: structured AFs make chains of inference inspectable and machine-checkable.
- Contestability and revision: decisions are not final outputs but subject to dialectical challenge and update, increasing robustness and trust.
- Human‑AI collaboration: supports collaborative reasoning (“with” humans) rather than opaque automation “for” humans, improving uptake in high‑stakes settings.
- Challenges
- Faithful extraction: aligning LLM-extracted arguments with formal AF primitives and ensuring fidelity to source evidence.
- Robustness & adversarial manipulation: AFs and LLMs may be gamed or misled; incentives may drive strategic argumentation.
- Evaluation: developing metrics and benchmarks for argument quality, fidelity, contestability, and human trust.
- Integration costs: domain modeling, human-in-the-loop protocols, and regulatory/ liability frameworks required.
Data & Methods
(Conceptual / design-oriented; empirical work would instantiate these components) - Data sources - Domain corpora with argumentative content: legal opinions, clinical notes and guidelines, policy reports, regulatory filings, parliamentary debates, curated debate corpora (e.g., UKP, Persuasive Essays, IBM Debater-type collections). - Expert annotations: labelled claims, premises, stances, attack/support relations and provenance for supervised or weakly supervised training. - Extraction & representation methods - Fine-tuning or prompting LLMs for argument mining (claim detection, premise identification, relation classification). - Information‑extraction pipelines to attach provenance, uncertainty, and source metadata. - Structured representations: convert extracted elements into formal AF primitives (nodes = arguments/claims; edges = attack/support; weights or probabilities for strength). - Synthesis & verification - Algorithms for merging fragmentary arguments into coherent AFs (graph synthesis, resolution of conflicts, canonicalization). - Formal semantics: Dung-style acceptability, bipolar AFs, weighted/probabilistic AFs; model checking to verify logical consistency and provenance constraints. - Hybrid systems: symbolic reasoning layers over LLM-derived content for constraints, counterfactual checks, and rule compliance. - Interaction & evaluation protocols - Dialectical dialogue protocols: structured challenge/response cycles, revision operators, and adjudication mechanisms (human adjudicators or automated semantics). - Evaluation metrics: task performance, fidelity (how accurately AF reflects source), argument quality (coherence, relevance, completeness), contestability (ability to find justified counterarguments), revision responsiveness, human trust and reliance, calibration of uncertainty. - Experimental designs: benchmark tasks, human-subject studies in domain-specific simulations, adversarial stress tests, longitudinal deployment case studies.
Implications for AI Economics
- Adoption & market structure
- Demand shift toward AI systems that provide verifiable, contestable reasoning in regulated/high‑stakes sectors (healthcare, law, finance, public policy).
- Competitive advantage for firms offering argumentatively transparent AI—premium pricing for verifiability and auditability.
- Emergence of new service layers: argumentation-as-a-service, audit firms, explanation certification, and human-in-the-loop orchestration platforms.
- Labor, productivity & complementarities
- Human experts shift from sole decision-makers to adjudicators, challengers, and validators of AI-generated arguments—changing skill demands toward critical evaluation and dialectical oversight.
- Potential productivity gains from faster evidence synthesis and argument exploration, but complementary tasks (validation, stakeholder engagement) create new labor demand.
- Information asymmetry & transaction costs
- Structured AFs reduce information asymmetry by making reasoning traceable, lowering search and verification costs in transactions and contracting.
- Better contestability can reduce costly litigations and regulatory frictions if decisions are transparently defensible.
- Incentives, strategic behavior & regulation
- New incentives for strategic argument construction (gaming, persuasion without fidelity) suggest need for governance: standards for provenance, certification, and liability rules.
- Regulators may prefer systems that support contestability and audit trails—potentially mandating argumentation-style explainability in certain sectors.
- Welfare and risk
- Welfare gains from improved decision quality and trust in automation, particularly where human oversight is required.
- Risk of manipulation and misinformation if argument mining/synthesis is unregulated or misaligned with social incentives; externalities may justify public intervention (standards, audits, liability frameworks).
- Research & policy agenda for economists
- Quantify value of contestable explanations: willingness-to-pay for verifiable reasoning vs. opaque predictive performance.
- Study labor market impacts: reallocation of tasks, wage effects for validators/adjudicators.
- Design mechanisms and contracts that align incentives for truthful argument provision and penalize strategic misrepresentation.
- Evaluate regulatory interventions: certification regimes, liability assignment, mandatory audit trails in high-stakes domains.
Overall, integrating computational argumentation with LLM capabilities creates economically significant opportunities (trustworthy, auditable AI services) and risks (strategic manipulation, new regulatory needs). For AI economics, the key questions are how these systems affect incentives, market structure, labor complementarity, and the design of policies that extract social value while containing potential harms.
Assessment
Claims (26)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Integrating computational argumentation with large language models (LLMs) creates a new paradigm—Argumentative Human-AI Decision‑Making—where AI agents participate in dialectical, contestable, and revisable decision processes with humans. Decision Quality | positive | medium | degree of human-AI dialectical participation (ability to engage in contestable, revisable decision processes) |
0.01
|
| Combining formal argument structures with LLMs’ ability to mine and generate rich, contextual arguments from unstructured text promises human-aware, verifiable, and trustable AI for high‑stakes domains. Ai Safety And Ethics | positive | medium | trustworthiness/verifiability of AI outputs in high-stakes decision contexts |
0.01
|
| Computational argumentation offers formal, verifiable reasoning representations (argumentation frameworks, attack/support relations). Ai Safety And Ethics | positive | high | existence and machine-checkability of formal inferential chains (inspectability/verifiability) |
0.02
|
| Computational argumentation approaches have required heavy feature engineering and domain-specific knowledge to be effective. Other | negative | high | engineering cost / domain modeling effort required for AF-based systems |
0.02
|
| LLMs excel at extracting and generating arguments from unstructured text but are opaque and hard to evaluate or trust. Ai Safety And Ethics | mixed | high | argument extraction/generation performance and model interpretability/trustworthiness |
0.02
|
| Argumentation Framework Mining: LLMs and NLP pipelines can be used to extract claims, premises, relations (attack/support), and provenance from text corpora. Output Quality | positive | medium | accuracy/fidelity of extracted argument elements (claims, premises, relations, provenance) |
0.01
|
| Argumentation Framework Synthesis: mined fragments can be combined into coherent formal argumentation frameworks (AFs) with explicit semantics enabling verification and automated inference. Output Quality | positive | medium | coherence and correctness of synthesized AFs and verifiability of derived inferences |
0.01
|
| Running formal dialectical/acceptability semantics and dialogue protocols over AFs enables agents that reason with humans through structured debates and revisions. Decision Quality | positive | medium | capacity for structured debate/revision (dialogue performance, acceptability outcomes) |
0.01
|
| Structured argumentation frameworks make chains of inference inspectable and machine-checkable, improving transparency and verifiability of AI outputs. Ai Safety And Ethics | positive | high | inspectability/traceability of inference chains (auditability) |
0.02
|
| Framing decisions as contestable and revisable (via dialectical challenge and update) increases robustness and trust in AI-supported decision-making. Decision Quality | positive | medium | measures of robustness (resilience to error) and human trust in decisions |
0.01
|
| This approach supports collaborative reasoning ('with' humans) rather than opaque automation 'for' humans, improving uptake in high‑stakes settings. Adoption Rate | positive | medium | human adoption/uplift in uptake for high-stakes decision systems |
0.01
|
| Faithful extraction—aligning LLM-extracted arguments with formal AF primitives and ensuring fidelity to source evidence—is a key technical challenge. Error Rate | negative | high | fidelity/alignment error rate between extracted elements and source evidence |
0.02
|
| AFs and LLMs may be gamed or misled; adversaries may exploit systems leading to strategic argumentation or manipulation. Ai Safety And Ethics | negative | high | system vulnerability metrics / susceptibility to adversarial manipulation |
0.02
|
| Evaluation currently lacks metrics and benchmarks for argument quality, fidelity, contestability, and human trust; developing these is necessary. Research Productivity | null_result | high | availability and maturity of evaluation metrics and benchmarks |
0.02
|
| Integration costs—domain modeling, human-in-the-loop protocols, and regulatory/liability frameworks—are significant barriers to deployment. Organizational Efficiency | negative | high | implementation cost and organizational burden for deploying argumentative AI systems |
0.02
|
| Demand will shift toward AI systems that provide verifiable, contestable reasoning in regulated/high‑stakes sectors (healthcare, law, finance, public policy). Adoption Rate | positive | medium | market demand share for verifiable/contestable AI systems in regulated sectors |
0.01
|
| Firms offering argumentatively transparent AI can obtain competitive advantage and charge premium prices for verifiability and auditability. Firm Revenue | positive | medium | price premium and competitive advantage metrics for transparent-AI providers |
0.01
|
| New service layers may emerge (argumentation-as-a-service, audit firms, explanation certification, human-in-the-loop orchestration platforms). Market Structure | positive | low | emergence and market size of new service verticals around argumentative AI |
0.01
|
| Human experts will likely shift roles from sole decision-makers to adjudicators, challengers, and validators of AI-generated arguments, changing required skills toward critical evaluation and dialectical oversight. Skill Acquisition | mixed | medium | changes in job tasks, skill demand, and employment shares for expert validators/adjudicators |
0.01
|
| Structured AFs can reduce information asymmetry by making reasoning traceable, thereby lowering search and verification costs in transactions and contracting. Organizational Efficiency | positive | medium | reduction in transaction/search/verification costs attributable to traceable AFs |
0.01
|
| Better contestability may reduce litigation and regulatory frictions if decisions are transparently defensible. Regulatory Compliance | positive | low | frequency/cost of litigation and regulatory disputes post-adoption of contestable AI systems |
0.01
|
| The possibility of strategic argument construction (gaming) motivates governance needs: standards for provenance, certification, and liability rules. Governance And Regulation | neutral | medium | existence and effectiveness of governance mechanisms (standards, certification, liability) addressing strategic manipulation |
0.01
|
| Regulators may prefer systems that support contestability and audit trails and could mandate argumentation-style explainability in certain sectors. Governance And Regulation | positive | low | regulatory adoption rate of contestability/audit-trail requirements |
0.01
|
| There are potential welfare gains from improved decision quality and trust in automation, particularly where human oversight remains required. Consumer Welfare | positive | medium | welfare indicators (decision quality gains, trust levels, social surplus) from argumentative AI |
0.01
|
| There is a risk of manipulation and misinformation if argument mining/synthesis is unregulated or misaligned with social incentives, creating externalities that may justify public intervention. Ai Safety And Ethics | negative | medium | incidence of manipulation/misinformation attributable to argument-mining/synthesis systems |
0.01
|
| Research agenda items for economists include: quantifying willingness-to-pay for verifiable reasoning, studying labor-market impacts for validators, designing contracts/mechanisms to incentivize truthful argument provision, and evaluating regulatory interventions. Research Productivity | null_result | high | existence and prioritization of empirical research on WTP, labor impacts, mechanism design, and regulation evaluations |
0.02
|