Personalized Content Selection in Marketing Using BERT and GPT-Based AI Models

Improving consumer involvement and enabling conversions depend on the use of customised content in digital marketing.The requirement of including Artificial Intelligence (AI) and Natural Language Processing (NLP) to improve communication efficacy is shown by the fact that conventional marketing techniques often fail in their capacity to react to real-time user behaviour.This paper explores the use of Generative Pre-trained Transformer (GPT) models and Bidirectional Encoder Representations from Transformers (BERT) models inside AI-enhanced marketing automation thereby enabling dynamic, real-time, context-sensitive content personalising.While GPT-based models are competent in generating highly relevant and customised marketing material, BERT's great contextual comprehension improves consumer sentiment analysis, intent identification, and behavioural segmentation.Moreover, we employ retrieval-augmented generation (RAG) and reinforcement learning (RL) to create an adaptable framework that constantly improves content distribution depending on real-time user interactions and engagement patterns.This paper also addresses major issues related to AI-driven marketing including ethical consequences, data privacy problems, and biases in AI-generated content.As means to guarantee safe and regulatory-compliant personalisation (e.g., GDPR, CCPA), we support the acceptance of federated learning, differential privacy, and homomorphic encryption.There examine the efficacy of BERT-GPTbased content selection versus conventional marketing automation systems by means of empirical research and pragmatic case studies.The results show clear improvements in click-through rates (CTR), engagement measures, and conversion rates, therefore highlighting the effectiveness of artificial intelligence in offering extremely relevant, data-informed, and customised marketing experiences.This article presents a thorough framework allowing companies to apply scalable AI-driven marketing techniques while preserving ethical AI standards and data protection.

Summary

Main Finding

BERT (for contextual understanding) combined with GPT (for content generation), augmented by retrieval-augmented generation (RAG) and reinforcement learning (RL), produces substantially better marketing outcomes than traditional rule-based systems. Reported improvements in the paper: +35% click-through rate (CTR), +20% conversion rate, +40% engagement time, +25% customer satisfaction, 30% reduction in factual inconsistencies (via RAG), and a 15% reduction in algorithmic bias (via adversarial debiasing).

DOI: https://doi.org/10.64971/j.cph.eijtem.v13.i1.18.2026

Key Points

Combined architecture: BERT for sentiment/intent/segmentation → GPT for dynamic content generation; RAG used to improve factuality; RL (PPO-style policy gradients) used for real-time content optimization.
Quantitative impacts (paper-reported): CTR +35%, conversion +20%, session engagement +40%, customer satisfaction +25%.
Data privacy and ethics: advocates federated learning, differential privacy, homomorphic encryption; implements adversarial debiasing and fairness-aware training.
Evaluation: A/B testing and statistical significance testing (t-test/ANOVA) claimed; human validation and perplexity used to evaluate generated content.
Practical deployment: multimodal outputs for email, ads, chatbots; RAG ensures brand consistency and up-to-date factual content.
Limitations noted implicitly: bias risk from training data, privacy/regulatory constraints (GDPR/CCPA), and the need for transparency/explainability.

Data & Methods

Data sources: large-scale customer interaction logs, social media engagements, email campaign responses, historical marketing data; drawn from public marketing repositories, e-commerce platforms, and industry case studies (paper does not list specific datasets or sample sizes).
Preprocessing: tokenization, stop-word removal, sentiment tagging, entity recognition; data augmentation (synonym replacement, back-translation, contextual embeddings).
Models and pipeline:
- BERT: fine-tuned for sentiment classification (positive/neutral/negative), intent detection, and behavioral segmentation; evaluated by precision/recall/F1.
- GPT: conditionally generates personalized marketing messages from BERT outputs; evaluated by perplexity, human validation, and A/B tests.
- RAG: retrieval from brand/product knowledge store before generation to reduce factual errors.
- RL: PPO-like policy gradient agent optimizing content selection actions using rewards based on CTR, dwell time, conversion; continuous learning from live engagement signals.
Privacy & fairness techniques: federated learning to avoid centralizing raw data; differential privacy (noise addition); adversarial debiasing and fairness-aware training to reduce discrimination.
Evaluation metrics: CTR, conversion rate, engagement time, sentiment accuracy, user satisfaction surveys; statistical tests (t-test, ANOVA) used to assess significance (no detailed p-values or confidence intervals provided in the paper summary).

Implications for AI Economics

Productivity and ROI
- Substantial short-run productivity gains in marketing: higher CTRs and conversion rates imply improved marketing ROI and lower customer acquisition cost per conversion.
- Firms that adopt these stacks can scale personalization with lower marginal content costs, improving returns to marketing spend.
Capital & operational costs
- Implementing BERT/GPT/RAG/RL requires sizeable investments in data engineering, compute (especially for large generative models), and ongoing RL experimentation. Smaller firms face barriers unless served by third-party platforms.
- Privacy-preserving techniques (federated learning, differential privacy, homomorphic encryption) raise implementation costs and may degrade model performance—introducing trade-offs between accuracy and regulatory compliance.
Market structure and competition
- First-mover advantages and economies of scale in data can reinforce market concentration: firms with larger customer bases (and thus richer interaction logs) obtain better personalization models, potentially increasing market power.
- SaaS/ML-platform vendors could capture value by offering turnkey personalization stacks, further centralizing expertise and data access.
Labor and skill effects
- Automation reduces routine copywriting and campaign-assembly tasks but increases demand for AI-literate marketing strategists, ML engineers, data governance experts, and fairness auditors.
- Reallocation of labor within marketing teams toward strategy, oversight, and model governance.
Consumer welfare and externalities
- Higher relevance can increase consumer surplus by reducing irrelevant ads and improving match quality, but can also intensify attention capture and behavioral targeting externalities (e.g., increased nudging toward consumption).
- Privacy risks and perceived invasiveness may reduce trust; compliance costs and opt-outs can blunt personalization benefits.
Regulatory and distributional effects
- GDPR/CCPA compliance (and potential future regulation on automated profiling) will impose ongoing compliance costs and may restrict data-based advantages, altering competitive dynamics.
- Potential for discriminatory targeting (despite mitigation steps) implies regulatory scrutiny and possible liability costs; transparency and auditability will become economically relevant.
Measurement and research needs
- The paper reports large effect sizes but lacks detailed disclosures on datasets, experimental design, and statistical robustness; for policy and firm investment decisions, more transparent, reproducible causal evidence (including costs, heterogeneous treatment effects by firm size/sector, and long-run effects) is needed.
Strategic implications for firms and policymakers
- Firms should weigh marginal returns from personalization against the upfront and operational costs of advanced models and privacy engineering.
- Policymakers should consider standards for auditing personalization algorithms, data portability, and remedies for bias; incentives for smaller firms (e.g., shared, privacy-preserving model infrastructures) could mitigate concentration risks.

Suggestions for future economic research (brief) - Rigorous cost–benefit analyses across firm sizes and industries, including compute and compliance costs. - Causal studies on long-term consumer welfare effects and price/competition outcomes from widespread personalization. - Evaluation of trade-offs between privacy techniques and personalization performance, with welfare-weighted assessments. - Market-structure modeling of data-driven first-mover advantages and potential interventions (data trusts, regulated model access).

If you want, I can: (a) extract the paper’s numerical results into a compact table, (b) draft a short critique focused on identification and reproducibility, or (c) propose an empirical design to validate the paper’s claims in a specific industry. Which would you prefer?

Assessment

Paper Typedescriptive Evidence Strengthmedium — The paper reports randomized online experiments and consistent case-study improvements on short-term engagement and conversion metrics, which supports a causal interpretation in those contexts, but lacks transparent reporting of experimental design details, sample sizes, statistical significance, long-term outcomes, and pre-registered protocols; adaptive RL policies and off-policy evaluations further complicate clean causal attribution. Methods Rigormedium — The pipeline uses state-of-the-art models (BERT, GPT), RAG grounding, contextual-bandit/RL formulations, and privacy techniques (federated learning, DP) demonstrating technical competence; however, methodological reporting is incomplete on identification, robustness checks, multiple-testing correction, heterogeneity analysis, and long-horizon evaluation, and the operational complexity (e.g., live RL adaptation) introduces potential biases if not carefully controlled. SampleProprietary digital marketing datasets including user interaction logs (impressions, clicks, session events, dwell time), email open/click logs, CRM attributes, product/catalog metadata, and conversational logs; labeled subsets for intent/sentiment and conversion events and large unlabeled streams for online adaptation; sample sizes and firm/sector coverage are not specified. Themesproductivity adoption governance IdentificationComparative A/B and multi-armed online experiments are reported as the primary source of causal claims, supplemented by case studies, offline model evaluations (classification/generation metrics), and simulated/off-policy policy evaluation for RL components; no detailed identification protocol (randomization scheme, pre-analysis plan, or long-run dynamic identification) is provided. GeneralizabilityFindings likely depend on firms with rich, large-scale interaction data (may not hold for small firms with sparse data)., Reported results focus on short-term engagement/conversion metrics; long-term LTV, churn, and welfare effects are not established., Evidence may be concentrated in particular industries (e.g., e-commerce, consumer services) and languages/cultures, limiting cross-sector and cross-country generality., Performance and costs depend on model/engineering resources and regulatory environments (GDPR/CCPA), restricting applicability to regulated sectors or low-resource firms., Adaptive RL-driven campaigns complicate external validity because deployment and treatment effects depend on platform dynamics and feedback loops unique to each environment.

Claims (16)

Claim	Direction	Confidence	Outcome	Details
An integrated BERT–GPT pipeline augmented with retrieval-augmented generation (RAG) and reinforcement learning (RL) substantially outperforms conventional rule-based or template-driven marketing automation. Firm Revenue	positive	medium	click-through rate (CTR), engagement metrics, conversion rate, retention, revenue per user	0.11
GPT-family decoders generate tailored marketing content (ad copy, email text, chat responses) that matches user context and tone more effectively than template-based generation. Output Quality	positive	medium	generation relevance, tone match, human-rated content quality, automatic relevance/factuality scores	0.11
BERT-family encoders provide superior contextual understanding for sentiment analysis, intent detection, behavioural segmentation, and feature extraction from user signals compared to simpler feature pipelines. Output Quality	positive	high	intent classification accuracy, sentiment scoring accuracy, quality of user embeddings for segmentation	0.18
RAG anchors generated content to up-to-date product/catalog/contextual knowledge and reduces hallucinations, increasing factuality of marketing messages. Error Rate	positive	medium	factuality scores, rate of hallucinated assertions in generated content	0.11
An RL layer that formulates content selection as a contextual bandit / policy optimisation problem improves content selection and delivery using real-time reward signals (CTR, dwell time, conversions). Firm Revenue	positive	medium	CTR, session length (dwell time), conversion events, lifetime value proxies	0.11
Continuous online adaptation of models and policies—updating from streaming user interactions—enables per-session and lifetime personalization that improves engagement and conversion outcomes. Firm Revenue	positive	medium	per-session CTR, engagement metrics, conversion rate, retention	0.11
Comparative evaluations and case studies show consistent improvements over traditional marketing automation across engagement and conversion metrics, driven by better intent recognition, contextually appropriate messaging, and adaptive delivery policies. Firm Revenue	positive	medium	engagement metrics, conversion metrics (CTR, conversions), attribution to intent recognition/mesaging/policy adaptation	0.11
The system raises privacy, fairness, and safety risks including data leakage, demographic bias in generated content, manipulative targeting, and potential regulatory non-compliance. Ai Safety And Ethics	negative	high	incidence/risk of data leakage, demographic bias metrics, examples of manipulative targeting, regulatory compliance status	0.18
Privacy-preserving techniques such as federated learning, differential privacy (DP), and homomorphic encryption can mitigate privacy leakage while enabling model updates and secure aggregation. Ai Safety And Ethics	positive	medium	privacy leakage bounds (DP epsilon), model utility (accuracy/CTR) under DP/federated regimes, secure aggregation correctness	0.11
Offline evaluation metrics (intent/sentiment classification accuracy, human-rated generation quality and factuality, simulated policy evaluation) are useful for pipeline development but do not fully capture online performance. Research Productivity	null_result	high	offline classification accuracy, human-rated generation quality vs online CTR/engagement/conversion	0.18
Online A/B or multi-armed tests comparing the BERT–GPT pipeline with RAG+RL against baseline marketing automation produce measurable uplifts in CTR, engagement, conversion rate, retention, and revenue per user. Firm Revenue	positive	medium	CTR, engagement, conversion rate, retention, revenue per user	0.11
Improved targeting and dynamic personalization increase marketing ROI by raising conversion rates and lowering customer acquisition costs (CAC). Firm Revenue	positive	medium	marketing ROI, conversion rate, customer acquisition cost (CAC)	0.11
Access to diverse interaction data and the ability to train and maintain adaptive models create scale economies and barriers to entry, potentially consolidating advantage for large incumbents. Market Structure	mixed	low	market concentration indicators (e.g., HHI), firm-level advantage measures, entry/exit rates	0.05
Adaptive RL-driven campaigns complicate attribution and causal inference, so rigorous experimental designs (multi-armed trials, off-policy evaluation) are required for valid measurement. Research Productivity	negative	high	bias in causal estimates, validity of attribution, off-policy evaluation error	0.18
Compliance with GDPR/CCPA and auditing for bias/harms imposes non-trivial technical and legal costs; implementing federated learning and DP increases engineering complexity and compute cost. Regulatory Compliance	negative	medium	engineering complexity metrics, compute/resource costs, legal/compliance expenditure	0.11
Long-term effects of adaptive marketing (habit formation, churn, lifetime value) are important for welfare and valuation but are harder to measure and require longitudinal or structural economic models. Consumer Welfare	null_result	high	long-term churn rates, habit formation indicators, lifetime value (LTV)	0.18