Paper deep dive

One Model, Two Markets: Bid-Aware Generative Recommendation

Yanchen Jiang, Zhe Feng, Christopher P. Mah, Aranyak Mehta, Di Wang

Year: 2026Venue: arXiv preprintArea: cs.IRType: PreprintEmbeddings: 79

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/26/2026, 2:48:26 AM

Summary

GEM-Rec is a unified generative recommendation framework that integrates commercial monetization (ads) and organic semantic retrieval into a single sequence generation model. It uses control tokens to decouple slot allocation from item generation and introduces a Bid-Aware Decoding mechanism to dynamically steer recommendations toward high-value items based on real-time bids, ensuring allocation monotonicity and organic integrity without requiring model retraining.

Entities (5)

GEM-Rec · framework · 100%TIGER · model · 100%Bid-Aware Decoding · mechanism · 95%Semantic IDs · representation · 95%Allocative Monotonicity · property · 90%

Relation Signals (3)

GEM-Rec → buildsupon → TIGER

confidence 95% · We build upon the TIGER framework (Rajput et al., 2023) but introduce a novel control structure

GEM-Rec → uses → Semantic IDs

confidence 95% · We represent items using Semantic IDs. This approach moves beyond atomic integers

Bid-Aware Decoding → modulates → GEM-Rec

confidence 90% · we introduce GEM-Decoding, a mechanism that injects active bids (b) into the inference step.

Cypher Suggestions (2)

Identify mechanisms used by GEM-Rec to handle monetization. · confidence 95% · unvalidated

MATCH (m:Mechanism)-[:MODULATES]->(f:Framework {name: 'GEM-Rec'}) RETURN m.name

Find all frameworks that utilize Semantic IDs for recommendation. · confidence 90% · unvalidated

MATCH (f:Framework)-[:USES]->(s:Representation {name: 'Semantic IDs'}) RETURN f.name

Abstract

Abstract:Generative Recommender Systems using semantic ids, such as TIGER (Rajput et al., 2023), have emerged as a widely adopted competitive paradigm in sequential recommendation. However, existing architectures are designed solely for semantic retrieval and do not address concerns such as monetization via ad revenue and incorporation of bids for commercial retrieval. We propose GEM-Rec, a unified framework that integrates commercial relevance and monetization objectives directly into the generative sequence. We introduce control tokens to decouple the decision of whether to show an ad from which item to show. This allows the model to learn valid placement patterns directly from interaction logs, which inherently reflect past successful ad placements. Complementing this, we devise a Bid-Aware Decoding mechanism that handles real-time pricing, injecting bids directly into the inference process to steer the generation toward high-value items. We prove that this approach guarantees allocation monotonicity, ensuring that higher bids weakly increase an ad's likelihood of being shown without requiring model retraining. Experiments demonstrate that GEM-Rec allows platforms to dynamically optimize for semantic relevance and platform revenue.

PDF

Open source PDF →Open local PDF →

Full Text

78,426 characters extracted from source content.

Expand or collapse full text

2026-03-01 One Model, Two Markets: Bid-Aware Generative Recommendation Yanchen Jiang 2,3,B , Zhe Feng 1,B , Christopher P. Mah 1 , Aranyak Mehta 1 and Di Wang 1 1 Google Research, 2 Harvard University, 3 Work done during an internship at Google Research Generative Recommender Systems using semantic ids, such as TIGER (Rajput et al., 2023), have emerged as a widely adopted competitive paradigm in sequential recommendation. However, existing architectures are designed solely for semantic retrieval and do not address concerns such as monetization via ad revenue and incorporation of bids for commercial retrieval. We propose GEM-Rec, a unified framework that integrates commercial relevance and monetization objectives directly into the generative sequence. We introduce control tokens to decouple the decision of whether to show an ad from which item to show. This allows the model to learn valid placement patterns directly from interaction logs, which inherently reflect past successful ad placements. Complementing this, we devise a Bid-Aware Decoding mechanism that handles real-time pricing, injecting bids directly into the inference process to steer the generation toward high-value items. We prove that this approach guarantees allocation monotonicity, ensuring that higher bids weakly increase an ad’s likelihood of being shown without requiring model retraining. Experiments demonstrate that GEM-Rec allows platforms to dynamically optimize for semantic relevance and platform revenue. Keywords: Generative Recommendation, Semantic IDs, Recommender Systems, Computational Advertis- ing, Mechanism Design, Auctions, Deep Learning 1. Introduction Recommender Systems are undergoing a paradigm shift from discriminative ranking to Generative Information Retrieval. By representing items as hierarchical Semantic IDs, models like TIGER (Rajput et al., 2023) treat recommendation as a sequence generation task. This approach has achieved state-of-the-art results by learning to predict the deep semantic structure of user trajectories (Ju et al., 2025; Singh et al., 2024; Yang et al., 2024). However, applying generative models to industrial platforms presents a challenge: Monetization. Real-world systems must dynamically co-optimize for organic semantic relevance and sponsored commercial relevance, driving utility for both the user and the platform. Standard generative models fail to address this because they optimize purely for semantic likelihood. They treat every item as an organic prediction target, ignoring the economic constraints and auction dynamics that govern sponsored content. A major difficulty is that organic items and sponsored items operate under different objectives. Organic items are selected based on estimated user preference over a corpus without price signals. Sponsored items are selected based on a mixed objective: they must be generally relevant, but also particularly relevant to user commercial intent, incorporating bids and potential ads auction revenue. Simply training a model on a mixed stream of data confuses these signals. Furthermore, auction bids fluctuate in real-time. These dynamics are not merely economic constraints but informative signals: advertisers often adjust bids dynamically to reflect their confidence in an item’s inventory, quality and relevance to the user. A model trained solely on historical logs ignores this live feedback, locking in past valuations and preventing the system from adapting to new price opportunities without B Correspondence to: Yanchen Jiang <yanchen_jiang@g.harvard.edu>; Zhe Feng <zhef@google.com>. © 2026 Google. All rights reserved arXiv:2603.22231v1 [cs.IR] 23 Mar 2026 One Model, Two Markets: Bid-Aware Generative Recommendation AI GEMRec Overview Figure Share A. Training representation Mode token + semantic IDs per interaction. ... history <ORG> c1c2... mode flagsemantic IDs <AD> c1 a c2 a ... mode flagsemantic IDs B1. Flag sampling (slot decision) Boost the Ad flag using current max bid; λ controls aggressiveness. Hist Flag probabilities ORGAD bid boost (λ) uses max bid if <ORG> relevance-based decoding if <AD> bid-aware decoding (B2) B2. Bid-aware ad decoding (for <AD>) Boost each candidate token using max bid under its prefix branch. <AD> c1 a c1 c2 a c2 c3 a c3 Selected ad item baseboost Boost: candidate token is shifted by max bid under its prefix branch Bid monotonicity: higher bids → more sponsored exposure Organic integrity: if <ORG> is chosen, ranking stays relevance-based Figure 1|GEM-Rec Unified Architecture. Left: training on unified organic and ad logs and sampling the slot type. Right: bid-aware decoding for ads; organic decoding stays relevance-based. retraining. We propose GEM-Rec, a framework that integrates monetization into generative recommendation. We augment the semantic vocabulary with explicit control tokens. This distinguishes the decision of whether to show an ad from which item to show. By training on successful interaction logs, the model learns a “feasibility” policy—identifying contexts where ads were historically both semantically relevant and economically viable. However, learning from history is distinct from maximizing current utility. Historical logs reflect a baseline of user acceptance, but they do not encode the explicit economic value of current opportunities. To bridge this gap, we introduce a Bid-Aware Decoding mechanism that injects auction bids directly into the inference process. This allows the platform to dynamically steer the generation toward high-value items in real time and fine-tune the trade-off between revenue and relevance without retraining the model. Our contributions are as follows: 1. Unified Generation Architecture: We propose a novel sequence formulation that integrates explicit control tokens into the Semantic ID vocabulary. This factorizes the decision of slot allocation (predicting the display mode) from content retrieval (generating the item ID), enabling a single unified model to serve both organic and sponsored requests. 2. End-to-End Learning of Marketplace Policy: We show that the generative model implicitly learns the latent context dependencies that govern valid ad exposures. By training on successful interaction logs, the model learns to treat the decision to display an ad as an integral part of the user’s trajectory. 3.Bid-Aware Constrained Decoding: We derive an inference-time mechanism to inject bids directly into the decoding search space. We provide theoretical guarantees that this approach satisfies Allocative Monotonicity (higher bids weakly increase exposure) and Organic Integrity (ranking order for an organic recommendation remains invariant), providing practitioners with precise control over the system’s operating point. An overview of the architecture is provided in Figure 1. 2 One Model, Two Markets: Bid-Aware Generative Recommendation 1.1. Related Works As noted earlier, the field is evolving beyond discriminative ranking over atomic IDs (Kang and McAuley, 2018; Sun et al., 2019). Early generative approaches (Geng et al., 2022) recast recommendation as conditional generation. TIGER (Rajput et al., 2023) advanced this by introducing Semantic IDs to enable autoregressive decoding directly in a structured identifier space. This Semantic-ID paradigm has since been further explored and extended in subsequent work, including Hou et al. (2025); Ju et al. (2025); Penha et al. (2025); Singh et al. (2024); Yang et al. (2024). While more recent systems such as OneRec (Deng et al., 2025) extend with scaling mechanisms and preference alignment (e.g., sparse MoE scaling and DPO-style alignment), the objectives remain fundamentally preference-centric. They lack the architectural mechanisms to process economic constraints and bid information at inference time. In industrial feed systems, organic recommendation and sponsored delivery are often produced by separate stacks (e.g., a recommender ranking model and an ad ranking model) and then combined at serving time by a merging/blending layer (Chen et al., 2022; Liao et al., 2022; Yan et al., 2020). A line of research instead performs joint optimization of recommendation and ad insertion (often via reinforcement learning), e.g., Zhao et al. (2020, 2021). These RL-based insertion methods optimize whether/which/where to insert ads into a list, but they are not generative retrieval models and they do not provide a semantic-ID decoder whose outputs can be directly modulated by live bids. A few recent papers bring generative modeling into advertising, but they study different tasks than ours. RARE (Liu et al., 2025) and EGRM (Lian et al., 2019) focus on sponsored search given a query: they generate an intermediate retrieval key for ads, whereas our setting is sequential recommendation where the model must account for how earlier organic/ad exposures affect what should be shown next. GPR (Zhang et al., 2025) studies end-to-end advertising recommendation: it represents user journeys with tokens that include both organic content and ad, but the generation target is ads (and CTR) rather than a mixed organic/sponsored recommendation list. In contrast, we generate one sequence that can include both organic and sponsored items, and we cleanly separate (i) learning when sponsored exposure is appropriate from historical logs and (i) using the current bids at decoding time to decide which sponsored item to show, so the system can react to real-time bid changes. Additional related works is provided in Appendix A. 2. Problem Setup We consider sequential recommendation in a marketplace containing both organic content and ads. We represent user history as a sequence of tuples(푚, 푖), where푖is the item and푚 ∈ Organic, Sponsored is the display mode. This distinction is necessary as the same item may appear as organic in one context and as a sponsored ad in another. We do not have access to the oracle functions governing user satisfaction or advertiser value. Instead, we learn from interaction logs, which reflect only realized successful placements. An item appears in this record only if it satisfied two latent constraints: a Platform Filter (having a high enough bid to secure the display slot) and a User Filter (being relevant enough to prompt a click or interaction). Our goal is to learn a generative policy that mimics this joint acceptance distribution. By training on these logs, the model learns a “feasibility” policy—predicting items that historically proved to be both monetizable and relevant. 3 One Model, Two Markets: Bid-Aware Generative Recommendation 3. The GEM-Rec Framework To solve this problem, we require a generative architecture capable of jointly modeling the decision of whether or not to present an ad, and the semantic content of an item itself. We build upon the TIGER framework (Rajput et al., 2023) but introduce a novel control structure to explicitly support this dual mandate within a single autoregressive sequence. 3.1. Preliminaries: Semantic IDs Following the TIGER paradigm (Ju et al., 2025; Rajput et al., 2023), we represent items using Semantic IDs. This approach moves beyond atomic integers by mapping items to hierarchical tuples of discrete codes 푠 푖 =(푐 1 , . . . , 푐 퐷 ) derived from a Residual Quantized VAE (RQ-VAE). Crucially, this quantization is coarse-to-fine: the first code푐 1 captures broad semantic categories, while subsequent codes capture increasingly fine-grained details. Consequently, items with similar content embeddings map to IDs that share common prefixes. This structure allows the model to generalize relevance patterns across the inventory and enables the prefix-based constraints we utilize in decoding. 3.2. Unified Sequence Construction Recall from Section 2 that the user history consists of context-aware tuples(푚, 푖). To ingest this structured history into a Transformer, we flatten it into a linear stream by treating the display mode as a prefix modifier. We augment the vocabulary with two special control tokens:F=<ORG>, <AD>. For an item at step 푡 with semantic codes (푐 푡,1 , . . . , 푐 푡,퐷 ), the generative sequence segmentx 푡 is constructed as: x 푡 =[ 푓 푡 ] ⊕ [푐 푡,1 , 푐 푡,2 , . . . , 푐 푡,퐷 ](1) where푓 푡 = <AD>if푚 푡 = Sponsored, and<ORG>otherwise. The full autoregressive sequence is the concatenation of these segments:x=x 1 ⊕x 2 ⊕·⊕x 푇 . Unified Representation and Strategic Flexibility It is important to note that the Semantic ID generation process remains unchanged. We utilize standard item codes푠 푖 derived from a pre-trained RQ-VAE, ensuring the model’s understanding of item content is consistent with standard generative baselines. Our core innovation lies in the introduction of the control token푓 푡 , which acts as a learnable mode switch for the generative process. Since the interaction is represented as[ 푓 푡 ] ⊕ 푠 푖 , the Transformer’s attention mechanism processes the item content differently depending on the prefix: 1.Under<ORG>: The model operates in “Preference Mode," retrieving items that maximize semantic match to the user’s organic history. 2.Under<AD>: The model shifts to “Monetization Mode." It learns to target the subset of inventory that historically succeeded as sponsored impressions. By training on successful historic clicks, the model implicitly captures the intersection of high semantic relevance and high economic value, favoring items that are likely to both win an auction and satisfy the user. This design structurally factorizes the recommendation task. By generating the control token first, the model explicitly decides the intent of the slot (Organic vs. Sponsored) before committing to the specific content (Item ID). This modularity is crucial for the inference-time flexibility discussed in Section 4. 4 One Model, Two Markets: Bid-Aware Generative Recommendation 3.3. The Factorized Generative Objective To operationalize this structure, we train a Transformer to minimize the negative log-likelihood of the sequencex. Letting퐻 <푡 denote the sequence history (context) prior to step푡, the probability of generating the next interaction at step 푡 factorizes as follows: 푃 휃 (x 푡 |퐻 <푡 )= 푃 휃 ( 푓 푡 |퐻 <푡 ) | z Ad Satisfaction Modeling · 퐷 Ö 푘=1 푃 휃 (푐 푡,푘 |퐻 <푡 , 푓 푡 , 푐 푡,<푘 ) | z Mode-Conditional Retrieval (2) This factorization effectively disentangles the platform’s two distinct responsibilities: 1. Ad Satisfaction Modeling (푃( 푓)): The first term learns the contextual boundaries of moneti- zation. By maximizing likelihood over realized trajectories, the model internalizes the latent constraints governing ad exposure (e.g., fatigue or narrative disruption). It aligns the probability of the <AD> flag with the historical density of successful ad interactions, implicitly learning to suppress sponsored slots in contexts where they would degrade user utility. 2.Mode-Conditional Retrieval (푃(푐| 푓)): The second term learns semantic relevance. Conditioned on the chosen slot type, the model retrieves the item best suited for that specific mode. If푓 푡 = <ORG>, the model optimizes for pure organic preference. If푓 푡 = <AD>, the model retrieves items that maximize likelihood within the distribution of historically clicked sponsored items. This effectively learns to rank items that possess both high semantic relevance and the commercial viability implicit in the training logs. 4. Inference-Time Bid Modulation The framework established in Section 3 allows the model to learn a “Safe Baseline" from historical logs. However, this approach does not fully incorporate dynamic commercial relevance and economic utility. Because the training distribution reflects only a historical baseline of ad engagement, the model remains blind to real-time market opportunities, treating low-value and high-value ads identically if their semantic relevance is similar. To incorporate these dynamics, we introduce GEM-Decoding, a mechanism that injects active bids (푏) into the inference step. This allows the platform to dynamically "boost" the probability of ads based on their real-time economic value, effectively steering the generative process without retraining. 4.1. Two-Level Logit Modulation Consistent with our factorized architecture, we inject bid information at two levels: the Slot Decision (whether to show an ad) and the Item Decision (which ad to show). The strength of this modulation is controlled by a scalar 휆 ≥ 0. 1. Slot-Level Modulation (Dynamic Ad Load) First, we modulate the decision to open a sponsored slot. We wish to encourage the model to open a sponsored slot if the available inventory is valuable. Let푧 <AD> be the raw score (logit) for the sponsored flag. We boost this score using the maximum bid among valid sponsored candidates (푏 푚푎푥 ): ̃푧 <AD> = 푧 <AD> + 휆· log(1+ 푏 푚푎푥 )(3) 5 One Model, Two Markets: Bid-Aware Generative Recommendation If high-value inventory is available (푏 푚푎푥 is high), this term encourages the selection of<AD>flag more frequently. This allows Ad Rate to scale dynamically with market demand. 2. Item-Level Modulation (Maximizing Revenue) Conditioned on the sampled flag being<AD>, the model must retrieve the most valuable item. To guide the beam search effectively, we employ Prefix-Aware Bid Aggregation. Since Semantic IDs are hierarchical, an intermediate token푐 푘 represents a cluster of items sharing that prefix. We pre-compute a lookup tableB(푐 푘 |푐 <푘 )that stores the maximum bid of any valid item within that semantic cluster. Conditioned on the flag 푓 푡 = <AD>, we then modulate the token logits: ̃푧 푐 = 푧 푐 + 휆· log(1+B(푐))(4) This ensures Bid-Aware Logit Shaping: among semantically plausible tokens, the decoder is biased toward branches containing high-bidding items. This steers the generation path toward valuable outcomes early in the sequence, effectively pruning low-value branches before they are fully generated. 4.2. Pricing Mechanism For our experimental evaluation, we adopt the First-Price payment rule, where the winning advertiser pays their bid. This aligns with the recent structural shift in the digital advertising ecosystem, where major platforms have transitioned to first-price auctions to reduce complexity and improve transparency (Amazon Publisher Services, 2023; Google Ad Manager, 2019; Google AdSense, 2021; Magnite, 2021; Microsoft Xandr, 2025; OpenX, 2017; PubMatic, 2018; Sivan et al., 2020). Accordingly, we report “Revenue" in our results as the cumulative sum of the winning bids for all generated ad slots. While First-Price auctions are increasingly common in practice and allow the system to operate directly on observable bids, ideally, we would also want to guarantee dominant-strategy incentive compatibility (DSIC) to ensure these bids truthfully reflect underlying advertiser values. In our setting, moving from first-price to truthful (e.g., critical-bid/VCG-style) payments is conceptually natural but technically nontrivial because it requires a carefully coupled interaction between decoding randomness, monotonicity of the induced allocation rule, and counterfactual evaluation. Appendix E discusses these tradeoffs and outlines possible directions that do not require architectural changes; we leave a full DSIC implementation to future work. 4.3. Theoretical Properties We highlight two key properties of this mechanism. The proof is given in the Appendix C. Definition 4.1 (GEM-Allocation Rule). The allocation rule푥 푖 (푏)denotes the total probability that an ad item푖is generated given context퐻and bid profile푏. Reflecting the hierarchical decoding process, this factorizes into: 푥 푖 (푏)= 푃 휆 ( 푓= <AD> | 퐻, 푏) | z Stochastic Slot · 핀[D 퐾 (퐻, 푏)= 푖] | z Deterministic Item (5) where푃 휆 is the modulated probability of sampling the ad flag, andD 퐾 is the deterministic output of beam search with width 퐾 conditioned on 푓= <AD>. Proposition 1 (Allocative Monotonicity). The GEM-Allocation rule is monotone. That is, for any context 퐻and fixed opposing bids푏 −푖 , the exposure probability푥 푖 (푏 푖 , 푏 −푖 )for a sponsored item푖is non-decreasing in 푏 푖 . 6 One Model, Two Markets: Bid-Aware Generative Recommendation This confirms Allocative Monotonicity: increasing a bid cannot reduce the likelihood that the corresponding sponsored item is shown, so the system is rationally responsive to economic signals. Proposition 2 (Structural Consistency). The framework satisfies three design guarantees: 1.Safe Fallback: When휆=0, all modulated logits equal the base model logits. Therefore the system reduces to the underlying generative recommender trained on historical logs, and the ad rate is governed solely by the learned slot model푃 휃 (<AD>|퐻)(i.e., the model’s learned historical baseline). 2.Organic Integrity: The modulation of logits is strictly gated by the sponsored flag (푓=<AD>). Consequently, for any fixed context퐻, the relative ranking of any two organic items푖, 푗is invariant with respect to 휆. This guarantees that while휆may alter the frequency of organic slots, it never distorts the model’s assessment of which organic content is most relevant. 3.Generalization: In the limit where the training corpus contains only organic interactions,푃( 푓= <AD>) → 0. The system collapses to standard TIGER baseline. 4.4. Hierarchical Decoding Strategy Because our objective explicitly factorizes the slot decision from the item generation (Section 3.3), we naturally employ a hierarchical decoding strategy: 1. Flag Sampling: At the first step, we compute modulated logits for<ORG>, <AD>and sample the flag immediately. This enforces a hard commitment to the slot type before content generation begins. This is necessary because standard beam search maximizes joint sequence likelihood; without this constraint, the decoder would likely prune sponsored hypotheses in favor of organic sequences that inherently possess a much higher prior likelihood from the training data. 2. Content Beam Search: Conditioned on the sampled flag, we run beam search with width퐾on the modulated logits ̃푧 to generate the Semantic ID tokens. 5. Experimental Setup 5.1. Constructing the Synthetic Marketplace To validate GEM-Rec, we require an environment with concurrent organic and sponsored inventory, along with interaction logs generated under a joint system. Standard benchmarks are insufficient as they lack the bid logs and unified interaction history necessary to capture the tension between user intent and platform monetization. In real marketplaces, training data represents a “survivor” distribution of items that passed both a platform filter (high enough bid to secure the display slot) and a user filter (relevance enough to prompt a click). To approximate this, we construct a synthetic marketplace policy (휋 푑푎푡푎 ) that generates trajectories reflecting these joint constraints. Rationale for Simulation. We employ this simulation as a controlled test harness to validate our core architectural contribution: the ability of GEM-Rec to integrate monetization constraints into a generative sequence. If the model can recover the implicit rules of휋 푑푎푡푎 (learning when users accept ads) from the synthetic logs, it demonstrates the capacity to learn analogous constraints from real-world data where these signals are explicit. Data Generation Process. We utilize four datasets, Steam (Kang and McAuley, 2018), Amazon Beauty, Sports, and Toys (He and McAuley, 2016), to provide base organic inventory and user trajectories. In each dataset, we designate a random 20% subset as “Sponsored" candidates and assign them log-normal bids. See Appendix B and D. 7 One Model, Two Markets: Bid-Aware Generative Recommendation To approximate the selection funnel typical of industrial ad platforms, we generate interaction trajectories using a Two-Stage Policy: 1.Stage 1 (Retrieval): Semantic Relevance Filter. The policy first retrieves a candidate set of sponsored items that share a sufficiently deep semantic prefix with the user’s organic intent. This enforces a “User-Centric" constraint, mirroring the reality that users rarely engage with contextually irrelevant ads. 2. Stage 2 (Ranking): Probabilistic Auction. Among the relevant candidates, the winning impression is selected via softmax-weighted sampling based on bid price. This mimics the aggregate outcomes of a competitive auction where higher bidders are statistically more likely to win impressions. This policy generates a training corpus consistent with a functioning marketplace: observed ad interactions are exclusively items that possessed both high semantic relevance and sufficient economic value. This provides the necessary signal for the generative model to learn a "safe" baseline for valid ad injection. While our primary simulated environments feature a conservative baseline ad density (roughly 3-6%), we also evaluate the model on ablations with a higher baseline ad density of roughly 10-20% (Appendix I) to test its capacity to learn and adapt to different historical ad frequencies. 5.2. Evaluation Protocols We evaluate GEM-Rec on three axes: 1.Strict Policy Fit (Total NDCG): Measures how accurately the model reproduces the test set trajectories generated by휋 푑푎푡푎 . A hit is recorded only if the model predicts both the correct slot type (Ad vs. Organic) and the correct Item ID. This validates if the model has recovered the latent marketplace constraints. 2.Organic Integrity (Conditional Organic NDCG): We measure the ranking quality of organic items only in instances where the model chooses to generate an organic slot. This explicitly tests the “Preference Preservation" claim of Proposition 2. 3.Economic Value & Steerability: We quantify the system’s responsiveness to the control param- eter 휆 using three metrics: • Ad Rate: Percentage of generated slots that are ads. • Revenue: Cumulative winning bids for ads shown. 1 • Ad Relevance: NDCG of generated ads relative to the ground-truth user intent. 5.3. Baselines We compare GEM-Rec against TIGER (Rajput et al., 2023), the canonical framework for generative retrieval using semantic IDs. Since TIGER lacks a mechanism for sponsored slots, it serves as the “Pure Utility" reference point—representing an unconstrained model optimized solely for organic relevance. 1 Consistent with Section 4.2, we utilize the First-Price mechanism (sum of winning bids) for the Revenue metric to align with industrial standards. 8 One Model, Two Markets: Bid-Aware Generative Recommendation 050001000015000200002500030000 Total Revenue (First Price) 0.02 0.04 0.06 0.08 0.10 0.12 0.14 NDCG@10 Steam: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier: Total Welfare vs. Strict Policy Fit. 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% Ad Rate (Impressions) Steam: Ad Load Steerability (b) Steerability: Ad Rate Control via 휆. Figure 2|Macro-Dynamics of GEM-Rec. on the Steam dataset (a) The trade-off between Platform Utility (Revenue) and Policy Fit (NDCG with respect to the training data). We observe a frontier where significant platform utility can be generated before the model deviates significantly from the marketplace policy. (b) The parameter 휆 provides monotonic control over Ad Rate. 5.4. Implementation Details We implement GEM-Rec in PyTorch. As the official source code for TIGER (Rajput et al., 2023) is not publicly available, we adopt the codebase from (Yang et al., 2024). We strictly utilize their reproduction of the base TIGER architecture to ensure a fair and reproducible baseline, without employing the Yang et al. (2024)-specific training objectives. We extend this backbone to support the unified organic-sponsored vocabulary and the inference-time GEM-Logits mechanism. We provide the model details in Appendix F and will open-source the code after conference acceptance and publication. 5.5. Steerability and Ad Rate Control A core requirement for industrial recommendation is the ability to throttle ad density dynamically. At휆=0, the model roughly reproduces the baseline ad density (∼3.5%) inherent in the training logs. An increase in Ad Rate is structurally guaranteed by the logit modulation. Figure 2b empirically validates this, showing that the scaling is smooth and predictable. As휆increases, the logit boost smoothly shifts the distribution without causing an abrupt collapse or saturation of the generative policy. This confirms that the learned control tokens successfully disentangle slot allocation from content generation. This confirms that the learned control tokens successfully disentangle slot allocation from content generation. This control is robust: we verified that the model maintains a 100% validity rate for generated ad identifiers across all settings, ensuring that high bid pressure does not induce hallucinations (see Appendix G.1.4). 5.6. The Pareto Frontier: Revenue vs. Policy Fit Figure 2a illustrates the trade-off between Total Revenue (Allocative Efficiency) and Total NDCG. Ideally, a system should allow the platform to increase economic utility without deviating significantly from learned user policy. 9 One Model, Two Markets: Bid-Aware Generative Recommendation Table 1|Main Results: Marketplace Performance. Comparisons of GEM-Rec against the baseline (TIGER). We report separate columns for Economic Utility, Total Metrics, and Metrics for Conditional Generation to isolate the impact of bid modulation. Train Ad % refers to the realized frequency of ads in the training sequence. See full table in Appendix G, Table 3. Economic UtilityTotal MetricsMetrics for Organic Gen. DatasetOperating Mode Ad Rate Revenue NDCG@10 Recall@10 O.NDCG@10 O.Recall@10 Steam (Train Ad %: 3.49) TIGER (Baseline)0.0%-0.1442 † 0.18180.14870.1875 GEM-Rec (휆= 0.0) 2.5%5350.14110.17820.14680.1857 GEM-Rec (휆= 1.0) 4.7%1,1730.13810.17420.14670.1853 Sports (Train Ad %: 6.85) TIGER (Baseline)0.0%-0.0176 † 0.03080.01870.0327 GEM-Rec (휆= 0.0) 4.7%8040.01730.03350.01870.0364 GEM-Rec (휆= 1.0) 9.0%1,8040.01680.03230.01850.0359 Toys (Train Ad %: 4.52) TIGER (Baseline)0.0%-0.0278 † 0.05360.02910.0561 GEM-Rec (휆= 0.0) 3.5%3190.02770.05360.02950.0570 GEM-Rec (휆= 1.0) 7.1%7170.02720.05220.02980.0575 Beauty (Train Ad %: 4.23) TIGER (Baseline)0.0%-0.0282 † 0.05290.02930.0550 GEM-Rec (휆= 0.0) 3.1%3450.03010.05690.03180.0602 GEM-Rec (휆= 1.0) 6.0%7260.02950.05590.03200.0606 † Imputed Score: Baseline strictly predicts Organic, effectively scoring 0 on all Ad slots. As 휆 increases, revenue rises while Total NDCG declines. This decline is expected: high 휆 values force the model to systematically substitute organic items with ads, creating a proportional divergence from the organic-heavy test set. On the Steam dataset (Figure 2a), we observe a near-linear frontier, providing a highly predictable way to trade off policy fit for revenue. We note that the shape of this frontier is partly governed by the underlying data distribution. When the environment contains a higher historical density of ads, for example in the ablations in Appendix I, the model learns that sponsored content is viable across a broader range of sequential contexts. Consequently, when asked to increase the ad rate via휆, the model can initially satisfy this demand by utilizing these naturally viable slots, creating a more convex frontier with a distinct high-efficiency region before the steeper trade-off begins (e.g., see the high-ad-rate ablations on the Amazon Sports and Outdoors dataset in Appendix I.0.3 and the Amazon Toys and Games dataset in Appendix I.0.4). Regardless of the specific curvature, the core dynamic remains consistent: the mechanism allows the platform to smoothly increase revenue without suffering disproportionate, sudden drops in overall relevance. 5.7. Verifying Organic Integrity A central claim of GEM-Rec is that the injection of ads should not degrade the model’s understanding of organic user intent. Figure 3a provides empirical validation of the Preference Preservation property in Proposition 2. The red line shows Total NDCG declining as휆increases; this is expected, as high휆forces the model to insert ads in slots where the historical policy휋 푑푎푡푎 would have served organic items. However, the green dashed line, Conditional Organic NDCG, remains effectively flat. Similarly, Table 1 shows that Organic Recall@10 is preserved (e.g., in Steam: 0.1853 at 휆= 1.0 vs 0.1857 at 휆= 0). This stability confirms that the Semantic ID representations are robust. Because GEM-Decoding injects the bid term휆V(푏)only into sponsored branch, the logits for the organic branch remain unperturbed, as shown in Proposition 2. This ensures that GEM-Rec allows platforms to monetize 10 One Model, Two Markets: Bid-Aware Generative Recommendation 0246810 Bid Pressure ( ) 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 NDCG@10 Steam: Organic Integrity Check Total NDCG@10 NDCG@10 for organic items only (a) Decoupling Verification: Strict vs. Organic Integrity. 0246810 Bid Pressure ( ) 0.06 0.08 0.10 0.12 0.14 Ad Relevance (Prefix Match) Semantic Similarity Revenue 0 5000 10000 15000 20000 25000 30000 Total Revenue Steam: Ad Quality Tradeoff (b) Ad Quality Trade-off: Welfare vs. Semantic Simi- larity. Figure 3|Micro-Analysis of Recommendation Quality on the Steam dataset. (a) A validation of our central hypothesis: while NDCG drops as we force ads (Red line), the Conditional Organic NDCG (Green line) remains relatively flat. This shows that increasing휆changes when ads are shown, but does not degrade the model’s understanding of user intent for organic slots. (b) The cost of monetization: higher welfare requires displaying ads with lower semantic similarity to the user’s organic target. further without introducing "noise" into the organic ranking logic. 5.8. Ad Quality and Signal Restoration We utilize Prefix Match Depth to quantify the quality and relevance of generated ads. This metric measures the average number of hierarchical Semantic ID codes shared between the generated ad and the user’s organic ground truth, where a depth of≥2 typically indicates sub-category alignment. In general, we observe that ad relevance decreases as휆increases (Figure 3b). This confirms the expected tension in the marketplace: as the system prioritizes economic value (high bids), it is inevitably forced to select items that are semantically further from the user’s pure organic intent. This is the dominant trend across our experiments (see complete visualizations in Appendix G). In general, the generative model successfully internalizes the marketplace policy during training, effectively learning to identify high-relevance items that are also valid ad candidates. Consequently, the unmodulated model (휆=0) already operates near the semantic optimum; adding further explicit bias (휆 > 0) forces the model to deviate from this learned policy, degrading relevance. We did observe minor localized exceptions (e.g., the slight non-monotonicity in Figure 3b or the initial relevance gain in Appendix Figure 5). We attribute this to the resolution of Semantic Ambiguity. In sparse datasets, the model often correctly identifies the target category (e.g., “running shoes") but lacks sufficient samples to rank specific items within that cluster. However, because the training data is derived from realized auction logs, the “ground truth" item was historically likely to be a high bidder. Consequently, injecting bid information at inference acts as a context-aware tie-breaker: it guides the model toward the high-bidder within the correct semantic cluster, effectively aligning the prediction with the historical selection bias. Despite these localized recoveries, the broader system dynamic remains a convex trade-off between short-term revenue and semantic similarity. While our current framework successfully navigates this via a unified parameter휆, future work or practical deployments could explore decoupling this mechanism. Introducing independent scaling parameters for slot allocation (휆 푠푙표푡 ) and item selection (휆 푖푡푒푚 ) would provide platforms with even more 11 One Model, Two Markets: Bid-Aware Generative Recommendation fine-grained control, allowing them to precisely calibrate the balance between overall ad frequency and bid-aware ranking. 5.9. Dynamic Adaptation to Market Shifts In real-world marketplaces, advertiser valuations are volatile; bids can spike suddenly due to seasonal events or inventory demand. A critical advantage of GEM-Rec is its inference-time plasticity—the ability to pivot towards these real-time opportunities without model retraining. To demonstrate this, we simulate a “Bid Shock": we randomly select 5% of the inventory and multiply their bids by 10×. We evaluate the system’s efficiency by measuring the High-Value Share: the percentage of displayed ads that belong to this high-paying subset. Results. Table 2 illustrates the system’s rapid adaptation on the Steam dataset. Qualitatively similar results for all other datasets are provided in Appendix H. When휆=0, The model relies on historical priors. While it displays some ads (2.4%), only 21.8% of them come from the high- value group. The system effectively ignores the shift in demand. When휆=0.5: With even modest modulation, the composition transforms. While the total Ad Rate rises moderately to 7.1%, the High-Bid Share jumps to 81.5%. This demonstrates that GEM-Rec does not merely increase ad volume; it actively substitutes low-value impressions with high-value ones, achieving a 9×revenue uplift while keeping the overall ad load conservative and controllable. Table 2|Response to Market Volatility (Steam). We analyze the composition of generated ads before and after a price shock. At휆=0.5, the system aggressively targets the high-bidders. Note that while the total Ad Rate remains low (7.1%), the High-Value Share among ads dominates, proving the model is substituting low-value inventory for high-value inventory. SettingAd Rate High-Value Share (% of Ads)Revenue Baseline (휆= 0.0)2.4%21.8%1.0× GEM-Rec (휆= 0.5) 7.1%81.5%9.0× GEM-Rec (휆= 1.0) 18.0%97.4%28.2× GEM-Rec (휆= 2.0) 62.4%99.9%104.6× 6. Conclusion In this work, we introduced GEM-Rec to bridge the structural gap between generative recommendation and marketplace monetization. By unifying organic and sponsored items within a single vocabulary, our framework internalizes complex marketplace constraints, moving beyond disjointed architectures. To ensure controllability, we developed a bid-aware decoding mechanism that disentangles semantic retrieval from economic valuation. This preserves organic user preferences while enabling inference- time adaptation to volatile bids without retraining. GEM-Rec establishes a robust framework for a unified generative recommender capable of simultaneously optimizing semantic relevance and platform revenue. 12 One Model, Two Markets: Bid-Aware Generative Recommendation References Amazon Publisher Services. Amazon publisher services update. Amazon Ads Blog, Aug. 2023. URL https://advertising.amazon.com/blog/amazon-publisher-services-update. D. Chen, Q. Yan, C. Chen, Z. Zheng, Y. Liu, Z. Ma, C. Yu, J. Xu, and B. Zheng. Hierarchically constrained adaptive ad exposure in feeds. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 3003–3012, 2022. M. J. Curry, Z. Fan, Y. Jiang, S. S. Ravindranath, T. Wang, and D. C. Parkes. Automated mechanism design: A survey. ACM SIGecom Exchanges, 22(2):102–120, 2025. C. Daskalakis, I. Gemp, Y. Jiang, R. P. Leme, C. Papadimitriou, and G. Piliouras. Charting the shapes of stories with game theory. arXiv preprint arXiv:2412.05747, 2024. J. Deng, S. Wang, K. Cai, L. Ren, Q. Hu, W. Ding, Q. Luo, and G. Zhou. Onerec: Unifying re- trieve and rank with generative recommender and iterative preference alignment. arXiv preprint arXiv:2502.18965, 2025. S. Despotakis, R. Ravi, and A. Sayedi. First-price auctions in online display advertising. Journal of Marketing Research, 58(5):888–907, 2021. P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, and S. S. Ravindranath. Optimal auctions through deep learning: Advances in differentiable economics. Journal of the ACM, 71(1):1–53, 2024. B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):242–259, 2007. Z. Feng, H. Narasimhan, and D. C. Parkes. Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th international conference on autonomous agents and multiagent systems, pages 354–362, 2018. J. Fu, X. Ge, A. Karatzoglou, I. Arapakis, S. Verberne, J. M. Jose, and Z. Ren. Differentiable semantic id for generative recommendation. arXiv preprint arXiv:2601.19711, 2026. M. Gao, C. Gao, H. Liu, Q. Cai, P. Jiang, J. Chen, S. Yuan, and X. He. Mindrec: A diffusion-driven coarse-to-fine paradigm for generative recommendation. arXiv preprint arXiv:2511.12597, 2025. S. Geng, S. Liu, Z. Fu, Y. Ge, and Y. Zhang. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, page 299–315, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392785. doi: 10.1145/3523227.3546767. URLhttps: //doi.org/10.1145/3523227.3546767. Google Ad Manager. An update on first price auctions for google ad manager. Google Products Blog, May 2019.URLhttps://blog.google/products/admanager/ update-first-price-auctions-google-ad-manager/. Google AdSense.Moving adsense to a first-price auction.Google Products Blog (The Keyword), Oct. 2021.URLhttps://blog.google/products/adsense/ our-move-to-a-first-price-auction/. 13 One Model, Two Markets: Bid-Aware Generative Recommendation R. He and J. McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507–517, 2016. Y. Hou, J. Li, A. Shin, J. Jeon, A. Santhanam, W. Shao, K. Hassani, N. Yao, and J. McAuley. Generating long semantic ids in parallel for recommendation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 956–966, 2025. D. Huang, F. Marmolejo-Cossío, E. Lock, and D. Parkes. Accelerated preference elicitation with llm-based proxies. arXiv preprint arXiv:2501.14625, 2025. Y. Jiang, Z. Feng, and A. Mehta. Incentive-aligned multi-source llm summaries. arXiv preprint arXiv:2509.25184, 2025. C. M. Ju, L. Collins, L. Neves, B. Kumar, L. Y. Wang, T. Zhao, and N. Shah. Generative recommendation with semantic ids: A practitioner’s handbook. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 6420–6425, 2025. W.-C. Kang and J. McAuley. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pages 197–206. IEEE, 2018. Y. Lian, Z. Chen, J. Hu, K. Zhang, C. Yan, M. Tong, W. Han, H. Guan, Y. Li, Y. Cao, et al. An end-to-end generative retrieval method for sponsored search engine–decoding efficiently into a closed target domain. arXiv preprint arXiv:1902.00592, 2019. G. Liao, Z. Wang, X. Wu, X. Shi, C. Zhang, Y. Wang, X. Wang, and D. Wang. Cross dqn: Cross deep q network for ads allocation in feed. In Proceedings of the ACM Web Conference 2022, pages 401–409, 2022. T. Liu, Z. Wang, M. Qin, Z. Lu, X. Chen, Y. Yang, and P. Shu. Real-time ad retrieval via llm-generative commercial intention for sponsored search advertising. arXiv preprint arXiv:2504.01304, 2025. Magnite.Market insight:Ctv is moving to first-price auctions.Mag- niteBlog,Feb.2021.URLhttps://w.magnite.com/blog/ market-insight-ctv-is-moving-to-first-price-auctions/. Microsoft Xandr. Auction overview. Microsoft Learn documentation, Oct. 2025. URLhttps: //learn.microsoft.com/en-us/xandr/bidders/auction-overview. OpenX.How we are rolling out the first transparent first-price auction. OpenX Blog (TheXchange), Sept. 2017.URLhttps://blog.openx.com/ how-we-are-rolling-out-the-first-transparent-first-price-auction/. G. Penha, E. D’Amico, M. De Nadai, E. Palumbo, A. Tamborrino, A. Vardasbi, M. Lefarov, S. Lin, T. Heath, F. Fabbri, et al. Semantic ids for joint generative search and recommendation. In Proceedings of the Nineteenth ACM Conference on Recommender Systems, pages 1296–1301, 2025. PubMatic. First-price auctions: Reviving control in auction dynamics. PubMatic Blog, Feb. 2018. URL https://pubmatic.com/blog/first-price-auctions-auction-dynamics/. H. Qu, W. Fan, Z. Zhao, and Q. Li. Tokenrec: Learning to tokenize id for llm-based generative recommendations. IEEE Transactions on Knowledge and Data Engineering, 2025. 14 One Model, Two Markets: Bid-Aware Generative Recommendation S. Rajput, N. Mehta, A. Singh, R. H. Keshavan, T. Vu, L. Heldt, L. Hong, Y. Tay, V. Q. Tran, J. Samost, M. Kula, E. H. Chi, and M. Sathiamoorthy. Recommender systems with generative retrieval.In Advances in Neural Information Processing Sys- tems (NeurIPS) 36, 2023. URLhttps://proceedings.neurips.c/paper/2023/hash/ 20dcab0f14046a5c6b02b61da9f13229-Abstract.html. S. S. Ravindranath, Y. Jiang, and D. C. Parkes. Data market design through deep learning. Advances in Neural Information Processing Systems, 36:6662–6689, 2023. A. Shah, K. Zhu, Y. Jiang, J. G. Wang, A. K. Dayi, J. J. Horton, and D. C. Parkes. Learning from synthetic labs: Language models as auction participants. arXiv preprint arXiv:2507.09083, 2025. A. Singh, T. Vu, N. Mehta, R. Keshavan, M. Sathiamoorthy, Y. Zheng, L. Hong, L. Heldt, L. Wei, D. Tan- don, et al. Better generalization with semantic ids: A case study in ranking for recommendations. In Proceedings of the 18th ACM Conference on Recommender Systems, pages 1039–1044, 2024. B. Sivan, R. P. Leme, and Y. Teng. Why competitive markets converge to first price auctions. 2020. E. Soumalias, Y. Jiang, K. Zhu, M. Curry, S. Seuken, and D. C. Parkes. Llm-powered preference elicitation in combinatorial assignment. arXiv preprint arXiv:2502.10308, 2025. F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1441–1450, 2019. T. Wang, Y. Jiang, and D. C. Parkes. Gemnet: Menu-based, strategy-proof multi-bidder auctions through deep learning. In Proceedings of the 25th ACM Conference on Economics and Computation, pages 1100–1100, 2024. T. Wang, Y. Jiang, and D. C. Parkes. Bundleflow: Deep menus for combinatorial auctions by diffusion- based optimization. arXiv preprint arXiv:2502.15283, 2025. J. Yan, Z. Xu, B. Tiwana, and S. Chatterjee. Ads allocation in feed via constrained optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3386–3394, 2020. L. Yang, F. Paischer, K. Hassani, J. Li, S. Shao, Z. G. Li, Y. He, X. Feng, N. Noorshams, S. Park, et al. Unifying generative and dense retrieval for sequential recommendation. arXiv preprint arXiv:2411.18814, 2024. J. Zhang, Y. Li, Y. Liu, C. Wang, Y. Wang, Y. Xiong, X. Liu, H. Wu, Q. Li, E. Zhang, et al. Gpr: Towards a generative pre-trained one-model paradigm for large-scale advertising recommendation. arXiv preprint arXiv:2511.10138, 2025. X. Zhao, X. Zheng, X. Yang, X. Liu, and J. Tang. Jointly learning to recommend and advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3319–3327, 2020. X. Zhao, C. Gu, H. Zhang, X. Yang, X. Liu, J. Tang, and H. Liu. Dear: Deep reinforcement learning for online advertising impression in recommender systems. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 750–758, 2021. 15 One Model, Two Markets: Bid-Aware Generative Recommendation Appendix A. Additional Related Works A fast-growing literature studies how discrete item representations, tokenization, and decoding choices shape generative recommendation. Ju et al. (2025) systematize the design space of Semantic-ID recommenders, demonstrating how indexing strategies, model architectures, and decoding constraints materially impact performance. Moving beyond fixed, precomputed indices, DIGER (Fu et al., 2026) introduces differentiable semantic indexing, allowing recommendation gradients to directly update the semantic index while mitigating codebook collapse. Focusing on generation dynamics, MindRec (Gao et al., 2025) departs from strict left-to-right autoregressive decoding, leveraging a masked diffusion process to generate recommendations in a coarse-to-fine, non-sequential manner. Finally, bridging IDs and language models, TokenRec (Qu et al., 2025) proposes quantizing collaborative filtering representations into discrete tokens, optimizing how user and item semantics are tokenized and aligned for LLM-based recommendation. While these works explore alternative tokenization and decoding paradigms, our work builds upon the canonical Semantic-ID architecture (Rajput et al., 2023) to clearly isolate the impact of integrating real-time economic constraints into generative retrieval. Our setting inherits core ideas from computational advertising and auction design. Classical sponsored-search auctions such as the generalized second-price (GSP) mechanism are a standard reference point for bid-based ranking and pricing (Edelman et al., 2007), but there have been an increasing ecosystem’s shift toward first-price auctions to reduce complexity and improve transparency (Amazon Publisher Services, 2023; Google Ad Manager, 2019; Google AdSense, 2021; Magnite, 2021; Microsoft Xandr, 2025; OpenX, 2017; PubMatic, 2018). This shift has also motivated work studying why the change happens and analyzing equilibrium bidding and bid shading under first-price dynamics in modern ad exchanges (Despotakis et al., 2021; Sivan et al., 2020). In our setting, we treat bids as exogenous inputs available at serving time (as in standard ad auction interfaces), rather than modeling advertisers’ private values or strategic bid shading. We therefore use first-price payments as a transparent, low-overhead accounting rule consistent with common practice; incentive questions and DSIC-style payments are discussed separately in Appendix E. On the machine learning side, a growing literature uses deep learning parameterizations to opti- mize auctions and other mechanisms subject to incentive constraints (e.g., learning allocation/payment networks) (Curry et al., 2025; Dütting et al., 2024; Feng et al., 2018; Ravindranath et al., 2023; Wang et al., 2024, 2025). Our setting is different. GEM-Rec does not learn a new auction mechanism or payment network. GEM-Rec is designed around inference-time bid modulation of a fixed generative policy: we explicitly separate (i) learning a feasibility baseline from logs from (i) injecting current bids through constrained decoding to dynamically steer generation towards high-value items in real time, without retraining the underlying generative model or altering the standard first-price payment rule. A parallel line of research explores the broader intersection of generative models, game theory, and mechanism design. Recent works have increasingly utilized generative AI to formalize strategic incentives, proxy human preferences in complex allocations, and align model outputs with economic constraints (Daskalakis et al., 2024; Huang et al., 2025; Jiang et al., 2025; Shah et al., 2025; Soumalias et al., 2025). While this growing literature primarily casts the generative model as an agentic participant, judge, or a behavioral proxy within economic systems, GEM-Rec shifts the focus toward platform-side market design: embedding computational advertising and real-time bids mechanics directly into the token generation and decoding processes of a Semantic ID-based generative recommender. 16 One Model, Two Markets: Bid-Aware Generative Recommendation B. Experiment Details B.1. Ad Simulation Details To validate GEM-Rec in a controlled setting involving concurrent organic and sponsored inventory, we construct a synthetic environment derived from standard datasets. This approach ensures that the training data reflects a policy where ads are only displayed when they are both semantically relevant and contextually appropriate. We apply standard 5-core filtering to each dataset in line with Yang et al. (2024) to define the ground-truth semantic space. B.2. Model Details For the model architecture, we employ a T5-based encoder-decoder. Semantic IDs are generated via a Residual Quantized VAE (RQ-VAE) with a codebook size of 256 and a code depth of 3, trained on Sentence-T5 embeddings of the item metadata. All experiments are conducted on a single NVIDIA A100 GPU. Inference sweeps are performed with 휆 ∈ [0.0, 10.0] to trace the full Pareto frontier. C. Theoretical Proofs Proof for Proposition 1: Proof.The allocation rule푥 푖 (푏)factors into the slot probability푃(<AD>)and the outcome of the beam searchD 퐾 . We assume a one-to-one mapping between items and Semantic IDs, a fixed set of eligible candidatesI 푒푙푖푔 independent of bids, and thatD 퐾 uses deterministic beam search with fixed tie-breaking (no stochastic sampling). We define the bid-lookup functionB(푐 푡 | ℎ <푡 )for a next token푐 푡 given historyℎ <푡 as the maximum bid among all items in that subtree: B(푐 푡 | ℎ <푡 )= max푏 푗 | 푗∈I 푒푙푖푔 ∩ subtree(ℎ <푡 ⊕ 푐 푡 ) Part 1: Slot Decision. The probability of the sponsored slot is given by푃(<AD>) ∝ exp(푧 <AD> + 휆 log(1+ 푏 max )). Here,푏 max = max 푗∈I 푒푙푖푔 푏 푗 . SinceI 푒푙푖푔 is fixed,푏 max is non-decreasing in푏 푖 . Since the softmax function is strictly increasing with respect to its target logit (holding others fixed),푃(<AD>) is non-decreasing in 푏 푖 . Part 2: Beam Search Monotonicity. Conditioned on푓= <AD>, the model performs deterministic beam search. An item푖is selected if and only if its partial hypothesisℎ (푡) 푖 remains in the top-퐾set at every decoding step 푡= 1 . . . 퐷, and it is the top-ranked complete sequence at step 퐷. Let푆 푡 (ℎ;푏)be the cumulative log-probability score of a partial hypothesisℎat step푡given bid profile푏. We prove that increasing푏 푖 to푏 ′ 푖 strictly improves (or maintains) the score of푖’s hypothesis ℎ 푖 relative to any competitor ℎ 푗 at every step 푡. The score is a sum of step-wise log-probabilities: 푆 푡 (ℎ)= Í 푡 푘=1 log 푃(푥 푘 | 푥 <푘 ). 1.Target Hypothesis (ℎ 푖 ): Consider the hypothesis corresponding to item푖. At any step푘, the token푥 푘 is an ancestor of item푖, so푖 ∈ subtree(푥 <푘 ⊕ 푥 푘 ). SinceB(푥 푘 | 푥 <푘 )is a maximum over a set containing푖, increasing푏 푖 impliesBis non-decreasing. Thus, the numerator logit ̃푧 푡푎푟푔푒푡 is non-decreasing. The softmax probability for the target token is푝 푡푎푟푔푒푡 = 푒 ̃푧 푡푎푟푔푒푡 푒 ̃푧 푡푎푟푔푒푡 +퐶 , where퐶is the sum of exponentiated logits of all other tokens. Since퐶is independent of푏 푖 (as푖only appears 17 One Model, Two Markets: Bid-Aware Generative Recommendation in the target branch),푝 푡푎푟푔푒푡 is strictly increasing in ̃푧 푡푎푟푔푒푡 . Thus,푆 푡 (ℎ 푖 ;푏 ′ 푖 ) ≥ 푆 푡 (ℎ 푖 ;푏 푖 ). The target score is non-decreasing. 2.Competitor Hypothesis (ℎ 푗 , 푗≠ 푖): Consider any competitor hypothesisℎ 푗 . Let휏be the step where ℎ 푗 diverges from ℎ 푖 . •Shared Prefix (푘 < 휏): The tokens are identical toℎ 푖 . As established above, the log- probability for these tokens is non-decreasing in 푏 푖 . •Divergence Step (푘= 휏):ℎ 푗 selects a token푐 푗 ≠ 푐 푖 . Since푗≠ 푖and they diverge here, 푖∉ subtree(ℎ <휏 ⊕ 푐 푗 ). Therefore, the numerator logit for the competitor depends onB(푐 푗 ), which does not include푖in its maximization set. Thus, the competitor’s numerator is constant w.r.t푏 푖 . However, the partition function푍includes the term푒 ̃푧 푖 for the target token푐 푖 . Since ̃푧 푖 is non-decreasing in푏 푖 ,푍is non-decreasing. The log-probability is log 푝 푐표푚푝 = 푧 푐표푚푝 − log 푍. Since 푍 is non-decreasing, log 푝 푐표푚푝 is non-increasing. •Post-Divergence (푘 > 휏): The subtree ofℎ 푗 is disjoint from푖. All logits and partition functions for these steps are independent of 푏 푖 . 3. Relative Rank at step 푡: We compare the score gapΔ 푡 (푏)= 푆 푡 (ℎ (푡) 푖 ; 푏)− 푆 푡 (ℎ (푡) 푗 ; 푏). • For 푘 < 휏, changes cancel out (identical terms). •At푘= 휏, the target term is non-decreasing, while the competitor term is non-increasing. The partition function cancels out in the difference, but the monotonic divergence remains. Thus the gap contribution increases or stays constant. • For 푘 > 휏, terms for ℎ 푖 are non-decreasing, terms for ℎ 푗 are constant. Therefore, for any step푡and any competitorℎ 푗 , the gap푆 푡 (ℎ (푡) 푖 ) − 푆 푡 (ℎ (푡) 푗 )is monotonically non-decreasing in 푏 푖 . Since the score ofℎ (푡) 푖 cannot decrease relative to any competitorℎ (푡) 푗 , the rank ofℎ (푡) 푖 among all partial hypotheses at depth푡cannot worsen. Consequently, ifℎ 푖 survived the top-퐾pruning at every step푡under bid푏 푖 , it must also survive at every step under푏 ′ 푖 > 푏 푖 . Thus, the item푖remains generated.□ Proof for Proposition 2: Proof. These properties follow directly from the definition of the modulation mechanism in Section 4. 1.Safe Fallback: By construction, the modulated logit is ̃푧= 푧+ 휆·Δ. If휆=0, then ̃푧= 푧for all tokens, recovering the exact behavior of the base model trained on historical logs. 2. Organic Integrity: The modulation term in Eq. (4) is applied only conditional on the flag 푓= <AD>. If the sampled flag is푓= <ORG>, the decoding proceeds using unmodulated logits푧. Thus, the relative ranking of any two organic items푖, 푗is determined solely by the pre-trained weights 휃, invariant to 휆. □ D. Details on Dataset Construction D.1. Inventory and Bids For each dataset, we designate a random 20% subset of the total item universe as the Sponsored InventoryI 푎푑 . To simulate a realistic long-tail distribution of advertiser valuations, we assign a static 18 One Model, Two Markets: Bid-Aware Generative Recommendation bid 푏 푖 to each sponsored item 푖∈I 푎푑 drawn from a Log-Normal distribution: 푏 푖 ∼ LogNormal(휇= 0.0, 휎= 0.2)(6) Bids are clipped at the 99.9th percentile and normalized to the range[0.1,1.0]to prevent numerical instability. D.2. Data Generation Policy We generate training trajectories by replaying the user’s organic history and dynamically injecting ads. For every organic item푖 표푟푔 in the user’s sequence, the simulator determines if an ad should be displayed via a three-stage process: 1. Semantic Relevance Filter. To ensure ads are contextually relevant, a sponsored item푗∈I 푎푑 is considered a valid candidate only if it shares a hierarchical prefix with the organic target푖 표푟푔 . We require a prefix match depth of푑=2 (sharing the first two codes of the Semantic ID). If no candidates are found, the criterion is relaxed to 푑= 1. 2. Probabilistic Selection. Among the relevant candidates, a winner is selected probabilistically based on bid magnitude. This mimics a selection process where higher bidders are prioritized but do not have a deterministic guarantee of winning. The probability of selecting candidate푗is given by a softmax function over the bids: 푃(푗)= exp(푏 푗 /휏) Í 푘 exp(푏 푘 /휏) (7) We set the temperature 휏= 0.1, creating a distribution that strongly favors higher-value items. 3. Frequency Capping. To model user fatigue and prevent ad saturation, we enforce a spacing constraint. The probability of accepting an ad impression is scaled by a recovery function based on the time stepsΔ푡 since the last ad exposure: 푃(Display)= 푝min(1.0,Δ푡× 푟)(8) This linear recovery scales the base acceptance rate푝by a fatigue factor that recovers at rate푟, representing how people are averse against repeated exposure of ads. In the main experiments, we set the base rate푝=0.4 and recovery rate푟=0.05, meaning full acceptance probability is recovered after 20 steps. In the ablation study (Appendix I), we explore a high-density setting with푝=1.0 and 푟= 0.5, which allows full recovery after just 2 steps. E. Truthfulness and Decoding: Tradeoffs and Constraints We adopt first-price payments in our experiments because they are simple to deploy, in line with practical trends, and align with the inference-time objective used by our decoder: once an ad is selected, the platform can charge the bid directly with no additional counterfactual computation. At the same time, it is natural to ask whether one could instead implement a DSIC mechanism (e.g., a critical-bid or VCG-style payment rule). In mechanism design literature, DSIC is usually desired as it theoretically guarantees truthfulness for bids and prevents strategic deviations such as bid shading. Relatedly, incentive alignment has been studied in the context of generative pipelines (Jiang et al., 2025). In single-shot, single-parameter auctions, however, DSIC typically relies on two ingredients: (i) an allocation rule that is monotone in each bidder’s bid, and (i) payments defined by a threshold—the minimum bid at which the bidder would still receive the allocation. 19 One Model, Two Markets: Bid-Aware Generative Recommendation In our setting, the “allocation rule” is induced by an autoregressive decoding procedure. Even when the scoring function used by the decoder is monotone in bids (as in Proposition 1), a full DSIC implementation must define payments with respect to the implemented decoding algorithm. Concretely, the relevant threshold is the smallest bid at which the algorithm would still produce the same allocation under the same context and competing bids. Determining this threshold generally requires reasoning about counterfactual decoding outcomes (what would have been generated if a bidder’s bid were slightly lower), which can be computationally expensive in sequential generation. Moreover, modern decoding pipelines often include approximations (e.g., limited-width beam search or other search heuristics). These approximations make it harder to treat the allocation as an explicit function with a tractable threshold characterization, because small bid changes can alter intermediate search states and pruning decisions. As a result, deriving and computing exact truthful payments on top of a practical decoding stack can require additional algorithmic structure beyond the model architecture itself. Overall, first-price payments provide a robust and low-overhead baseline consistent with current practice. Designing an exact DSIC variant for autoregressive recommendation—with provable guaran- tees under a realistic decoding algorithm and with acceptable compute—is an important direction for future work. F. Implementation Details Model Architecture. We implement GEM-Rec using a T5 encoder-decoder architecture configured to match the scale of baseline implementation in (Yang et al., 2024). • Encoder/Decoder: 6 layers each. • Hidden Dimensions: 푑 푚표푑푒푙 = 128, 푑 푓 푓 = 1024. • Attention: 6 heads with 푑 푘푣 = 64. • Dropout: 0.2. Semantic IDs (RQ-VAE). Item codes represent a hierarchy of depth 3, learned via RQ-VAE with a codebook size of 256 per level. The VAE is trained for 8,000 epochs using the AdamW optimizer. Training. The generative model is trained for 100,000 steps with a batch size of 256. We use the AdamW optimizer with a weight decay of 0.035. The learning rate is initialized at 3푒 −4 with a linear warmup of 10,000 steps, followed by cosine decay. All experiments are conducted on a single NVIDIA A100 GPU. G. Additional Experimental Results G.1. Full result for main experiment We provide the full suite of analysis figures for the Beauty, Sports, and Toys datasets, complementing the Steam results presented in the main text. At the end, we provide Table 3 with all the detailed numerical results. G.1.1. Beauty Dataset 20 One Model, Two Markets: Bid-Aware Generative Recommendation 025005000750010000125001500017500 Total Revenue 0.005 0.010 0.015 0.020 0.025 0.030 NDCG@10 Beauty: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Beauty): Revenue vs. Policy Fit. 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% Ad Rate (Impressions) Beauty: Ad Load Steerability (b) Steerability (Beauty): Ad Rate Control. 0246810 Bid Pressure ( ) 0.005 0.010 0.015 0.020 0.025 0.030 NDCG@10 Beauty: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Beauty). 0246810 Bid Pressure ( ) 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 Ad Relevance (Prefix Match) Semantic Similarity Revenue 0 2500 5000 7500 10000 12500 15000 17500 Total Revenue Beauty: Ad Quality Tradeoff (d) Quality Trade-off (Beauty): Welfare vs. Semantic Similarity. Figure 4|Full Dynamics on Beauty Dataset. Macro-dynamics (top row) and Micro-analysis (bottom row) showing consistent behavior with the Steam dataset. 21 One Model, Two Markets: Bid-Aware Generative Recommendation G.1.2. Sports Dataset 050001000015000200002500030000 Total Revenue 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 NDCG@10 Sports_and_Outdoors: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Sports). 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% 100% Ad Rate (Impressions) Sports_and_Outdoors: Ad Load Steerability (b) Steerability (Sports). 0246810 Bid Pressure ( ) 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.0200 NDCG@10 Sports_and_Outdoors: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Sports). 0246810 Bid Pressure ( ) 0.02 0.03 0.04 0.05 0.06 0.07 Ad Relevance (Prefix Match) Semantic Similarity Revenue 0 5000 10000 15000 20000 25000 30000 Total Revenue Sports_and_Outdoors: Ad Quality Tradeoff (d) Quality Trade-off (Sports). Figure 5 | Full Dynamics on Sports Dataset. 22 One Model, Two Markets: Bid-Aware Generative Recommendation G.1.3. Toys Dataset 0200040006000800010000120001400016000 Total Revenue 0.005 0.010 0.015 0.020 0.025 NDCG@10 Toys_and_Games: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Toys). 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% Ad Rate (Impressions) Toys_and_Games: Ad Load Steerability (b) Steerability (Toys). 0246810 Bid Pressure ( ) 0.005 0.010 0.015 0.020 0.025 0.030 NDCG@10 Toys_and_Games: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Toys). 0246810 Bid Pressure ( ) 0.06 0.08 0.10 0.12 Ad Relevance (Prefix Match) Semantic Similarity Revenue 0 2000 4000 6000 8000 10000 12000 14000 16000 Total Revenue Toys_and_Games: Ad Quality Tradeoff (d) Quality Trade-off (Toys). Figure 6 | Full Dynamics on Toys Dataset. 23 One Model, Two Markets: Bid-Aware Generative Recommendation Table 3|Full Marketplace Performance Results. We compare GEM-Rec against the baseline (TIGER) across all datasets and 휆 settings. The Train Ad % column is provided for reference. Train Economic UtilityTotal MetricsMetrics for Organic Gen. Dataset MethodAd % Ad Rate Revenue NDCG@10 Recall@10 O.NDCG@10 O.Recall@10 Beauty TIGER (Baseline) 4.23% 0.0%0.00.0282 † 0.05290.02930.0550 GEM-Rec (휆= 0.0)3.1%345.10.03010.05690.03180.0602 GEM-Rec (휆= 0.5)4.2%489.30.02960.05600.03170.0603 GEM-Rec (휆= 1.0)6.0%726.60.02950.05590.03200.0606 GEM-Rec (휆= 2.0)10.7% 1,386.80.02820.05370.03190.0610 GEM-Rec (휆= 5.0)44.4% 6,795.40.02040.03850.03290.0622 GEM-Rec (휆= 7.5)76.9% 13,068.8 0.01030.02010.03210.0612 GEM-Rec (휆= 10.0)93.4% 17,427.6 0.00480.01010.02980.0574 Sports TIGER (Baseline) 6.85% 0.0%0.00.0176 † 0.03080.01870.0327 GEM-Rec (휆= 0.0)4.7%804.20.01730.03350.01870.0364 GEM-Rec (휆= 0.5)6.3% 1,177.70.01700.03300.01860.0362 GEM-Rec (휆= 1.0)9.0% 1,804.20.01680.03230.01850.0359 GEM-Rec (휆= 2.0)15.7% 3,731.30.01600.03100.01870.0363 GEM-Rec (휆= 5.0)52.8% 16,662.4 0.01140.02230.01980.0381 GEM-Rec (휆= 7.5)82.3% 27,712.1 0.00570.01170.01980.0383 GEM-Rec (휆= 10.0)95.9% 32,916.6 0.00250.00550.01570.0330 Toys TIGER (Baseline) 4.52% 0.0%0.00.0278 † 0.05360.02910.0561 GEM-Rec (휆= 0.0)3.5%319.30.02770.05360.02950.0570 GEM-Rec (휆= 0.5)5.4%509.70.02750.05300.02980.0576 GEM-Rec (휆= 1.0)7.1%717.90.02720.05220.02980.0575 GEM-Rec (휆= 2.0)12.9% 1,462.40.02650.05020.03030.0576 GEM-Rec (휆= 5.0)48.6% 6,746.00.01770.03430.02940.0568 GEM-Rec (휆= 7.5)79.1% 12,137.0 0.01010.02010.03110.0600 GEM-Rec (휆= 10.0)94.3% 15,467.5 0.00530.01140.03340.0645 Steam TIGER (Baseline) 3.49% 0.0%0.00.1442 † 0.18180.14870.1875 GEM-Rec (휆= 0.0)2.5%535.30.14110.17820.14680.1857 GEM-Rec (휆= 0.5)3.6%815.40.14010.17700.14720.1862 GEM-Rec (휆= 1.0)4.7% 1,173.60.13810.17420.14670.1853 GEM-Rec (휆= 2.0)8.9% 2,390.20.13220.16720.14640.1854 GEM-Rec (휆= 5.0)40.6% 12,224.1 0.09200.11730.15120.1917 GEM-Rec (휆= 7.5)76.3% 24,086.6 0.04190.05420.15820.1978 GEM-Rec (휆= 10.0)93.7% 30,598.5 0.01500.02120.16600.2070 † Imputed Score: Baseline strictly predicts Organic, effectively scoring 0 on all Ad slots. 24 One Model, Two Markets: Bid-Aware Generative Recommendation G.1.4. Generative Validity A critical robustness check for generative recommendation is ensuring that the model does not "hallucinate" invalid identifiers when forced to generate ads. We evaluated the validity of every generated ad sequence across all datasets and 휆 settings. GEM-Rec achieves a 100.0% Validity Rate for ad generation across all four datasets. This indicates that the model has learned the hierarchical structure of the Semantic IDs and output valid IDs. This observation aligns with established findings in generative retrieval, where the structured codebook facilitates robust learning of the item space. Notably, we achieve this validity purely through the model’s learned weights and logit boosting, without needing computationally expensive prefix-trie constraints during decoding. Table 4|Generative Validity. The percentage of generated ad sequences that decode to a valid, existing item in the inventory. DatasetValid Rate Beauty100.0% Sports100.0% Toys100.0% Steam100.0% H. Volatility Experiments We provide the detailed breakdown of the “Bid Shock" experiments across all four datasets. In each table, High-Value Share denotes the percentage of displayed ads that belong to the specific 5% subset of inventory that received the 10× bid multiplier. Table 5|Bid Shock Response (Steam). The baseline (휆=0) relies on historical priors and misses the new opportunity. GEM-Rec (휆 ≥ 0.5) dynamically pivots to capture the high-value inventory. SettingAd Rate High-Value Share Revenue Uplift Baseline (휆= 0)2.38%21.8%1.0× GEM-Rec (휆= 0.5) 7.01%81.5%9.0× GEM-Rec (휆= 1.0) 18.03%97.4%28.2× GEM-Rec (휆= 2.0) 62.44%99.9%104.6× Table 6|Bid Shock Response (Toys). At휆=0, the High-Value Share is extremely low (3.2%), indicating the model is effectively blind to the new bids. GEM-Rec corrects this rapidly, reaching 97% share by 휆= 2.0. SettingAd Rate High-Value Share Revenue Uplift Baseline (휆= 0.0)3.49%3.2%1.0× GEM-Rec (휆= 0.5) 10.31%49.3%14.3× GEM-Rec (휆= 1.0) 24.45%81.7%55.8× GEM-Rec (휆= 2.0) 68.66%97.2%202.4× 25 One Model, Two Markets: Bid-Aware Generative Recommendation Table 7|Bid Shock Response (Beauty). The model demonstrates consistent adaptability. The High-Value share jumps from a negligible 1.6% to 43% with only a minor increase in Ad Rate (from 3.1% to 8.8%). SettingAd Rate High-Value Share Revenue Uplift Baseline (휆= 0.0)3.12%1.6%1.0× GEM-Rec (휆= 0.5) 8.76%43.0%14.7× GEM-Rec (휆= 1.0) 20.51%77.6%57.1× GEM-Rec (휆= 2.0) 63.65%98.5%240.6× Table 8|Bid Shock Response (Sports). This dataset exhibits the most aggressive adaptation. At 휆= 1.0, nearly every ad shown (99.2%) is already from the high-value subset. SettingAd Rate High-Value Share Revenue Uplift Baseline (휆= 0.0)4.87%4.7%1.0× GEM-Rec (휆= 0.5) 12.56%72.9%14.6× GEM-Rec (휆= 1.0) 30.41%99.2%50.1× GEM-Rec (휆= 2.0) 74.71%100.0%136.2× I. Ablations: Higher ad rate in training set In this section, we present the complete visual analysis for the High Ad Rate Ablation. We include all numerical result in Table 9. This setting simulates a marketplace where the historical interaction logs already contain a significantly higher density of ad interactions (approximately 10–20%, compared to 3–6% in the main experiment). These results confirm that GEM-Rec successfully learns this higher baseline frequency directly from the data (evident at휆=0) and maintains the ability to perform precise trade-offs on top of this elevated base rate. I.0.1. Steam Dataset (High Rate) 26 One Model, Two Markets: Bid-Aware Generative Recommendation 50001000015000200002500030000 Total Revenue 0.02 0.04 0.06 0.08 0.10 0.12 NDCG@10 Steam: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Steam High-Rate) 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% 100% Ad Rate (Impressions) Steam: Ad Load Steerability (b) Steerability (Steam High-Rate) 0246810 Bid Pressure ( ) 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 NDCG@10 Steam: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Steam High-Rate) 0246810 Bid Pressure ( ) 0.04 0.06 0.08 0.10 0.12 Ad Relevance (Prefix Match) Semantic Similarity Revenue 5000 10000 15000 20000 25000 30000 Total Revenue Steam: Ad Quality Tradeoff (d) Quality Trade-off (Steam High-Rate) Figure 7|Dynamics on Steam (High Ad Rate). The model correctly learns the higher base ad rate (∼10%) at 휆= 0 and allows strictly monotonic boosting beyond this point. 27 One Model, Two Markets: Bid-Aware Generative Recommendation I.0.2. Beauty Dataset (High Rate) 25005000750010000125001500017500 Total Revenue 0.010 0.015 0.020 0.025 0.030 NDCG@10 Beauty: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Beauty High-Rate) 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% 100% Ad Rate (Impressions) Beauty: Ad Load Steerability (b) Steerability (Beauty High-Rate) 0246810 Bid Pressure ( ) 0.010 0.015 0.020 0.025 0.030 0.035 0.040 NDCG@10 Beauty: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Beauty High-Rate) 0246810 Bid Pressure ( ) 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 Ad Relevance (Prefix Match) Semantic Similarity Revenue 2500 5000 7500 10000 12500 15000 17500 Total Revenue Beauty: Ad Quality Tradeoff (d) Quality Trade-off (Beauty High-Rate) Figure 8 | Dynamics on Beauty (High Ad Rate). 28 One Model, Two Markets: Bid-Aware Generative Recommendation I.0.3. Sports Dataset (High Rate) 5000100001500020000250003000035000 Total Revenue 0.008 0.010 0.012 0.014 0.016 0.018 NDCG@10 Sports_and_Outdoors: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Sports High-Rate) 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% 100% Ad Rate (Impressions) Sports_and_Outdoors: Ad Load Steerability (b) Steerability (Sports High-Rate) 0246810 Bid Pressure ( ) 0.0075 0.0100 0.0125 0.0150 0.0175 0.0200 0.0225 NDCG@10 Sports_and_Outdoors: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Sports High-Rate). 0246810 Bid Pressure ( ) 0.03 0.04 0.05 0.06 0.07 Ad Relevance (Prefix Match) Semantic Similarity Revenue 5000 10000 15000 20000 25000 30000 35000 Total Revenue Sports_and_Outdoors: Ad Quality Tradeoff (d) Quality Trade-off (Sports High-Rate) Figure 9|Dynamics on Sports (High Ad Rate). Note here that the drop of the green dotted curve at휆=10 for figure c) here corresponds with a nearly 100% ad rate (see figure b) leaving very few organic generation, therefore it reflects a high variance and does not contradict our main conclusion and observations. 29 One Model, Two Markets: Bid-Aware Generative Recommendation I.0.4. Toys Dataset (High Rate) 200040006000800010000120001400016000 Total Revenue 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 NDCG@10 Toys_and_Games: Pareto Frontier 0 2 4 6 8 10 Bid Pressure ( ) (a) Pareto Frontier (Toys High-Rate) 0246810 Bid Pressure ( ) 0% 20% 40% 60% 80% 100% Ad Rate (Impressions) Toys_and_Games: Ad Load Steerability (b) Steerability (Toys High-Rate) 0246810 Bid Pressure ( ) 0.015 0.020 0.025 0.030 0.035 0.040 NDCG@10 Toys_and_Games: Organic Integrity Check Total NDCG@10 NDCG@10 (Organic Items) (c) Organic Integrity (Toys High-Rate) 0246810 Bid Pressure ( ) 0.06 0.08 0.10 0.12 0.14 Ad Relevance (Prefix Match) Semantic Similarity Revenue 2000 4000 6000 8000 10000 12000 14000 16000 Total Revenue Toys_and_Games: Ad Quality Tradeoff (d) Quality Trade-off (Toys High-Rate) Figure 10 | Dynamics on Toys (High Ad Rate). 30 One Model, Two Markets: Bid-Aware Generative Recommendation Table 9|Full Marketplace Performance Results (Strict Last-Item Protocol). We compare GEM-Rec against the baseline (TIGER) across all datasets and 휆 settings under the new evaluation setting. Train Economic UtilityTotal MetricsMetrics for Organic Gen. Dataset MethodAd % Ad Rate Revenue NDCG@10 Recall@10 O.NDCG@10 O.Recall@10 Beauty TIGER (Baseline) 11.99% 0.0%0.00.0245 † 0.04580.02770.0518 GEM-Rec (휆= 0.0)10.5% 1,137.10.03140.05650.03220.0589 GEM-Rec (휆= 0.5)14.3% 1,653.00.03020.05460.03170.0581 GEM-Rec (휆= 1.0)18.7% 2,285.20.03010.05390.03150.0578 GEM-Rec (휆= 2.0)30.0% 3,953.50.02860.05170.03170.0587 GEM-Rec (휆= 5.0)75.0% 11,722.6 0.02000.03640.03610.0649 GEM-Rec (휆= 7.5)93.8% 15,695.7 0.01390.02710.03660.0665 GEM-Rec (휆= 10.0)98.9% 17,765.6 0.01070.02160.04320.0748 Sports TIGER (Baseline) 20.73% 0.0%0.00.0139 † 0.02450.01740.0307 GEM-Rec (휆= 0.0)18.8% 3,279.40.01760.03300.01800.0340 GEM-Rec (휆= 0.5)24.8% 4,736.20.01750.03260.01750.0332 GEM-Rec (휆= 1.0)31.5% 6,506.60.01750.03300.01760.0334 GEM-Rec (휆= 2.0)47.4% 11,221.5 0.01740.03310.01800.0342 GEM-Rec (휆= 5.0)87.3% 26,486.6 0.01310.02800.02100.0389 GEM-Rec (휆= 7.5)97.4% 32,143.2 0.00980.02230.02380.0439 GEM-Rec (휆= 10.0)99.6% 33,827.0 0.00730.01720.01220.0190 Toys TIGER (Baseline) 12.79% 0.0%0.00.0235 † 0.04510.02720.0520 GEM-Rec (휆= 0.0)11.4% 1,046.90.02640.04950.02720.0516 GEM-Rec (휆= 0.5)15.0% 1,446.00.02540.04680.02650.0500 GEM-Rec (휆= 1.0)19.8% 2,037.80.02540.04730.02680.0509 GEM-Rec (휆= 2.0)32.9% 3,751.00.02490.04470.02720.0503 GEM-Rec (휆= 5.0)78.5% 10,964.2 0.01870.03470.02720.0497 GEM-Rec (휆= 7.5)95.6% 14,630.0 0.01460.02940.03170.0626 GEM-Rec (휆= 10.0)99.2% 16,116.1 0.01250.02640.03920.0818 Steam TIGER (Baseline) 10.13% 0.0%0.00.1376 † 0.17220.15150.1896 GEM-Rec (휆= 0.0)8.3% 1,782.70.13200.16610.14530.1818 GEM-Rec (휆= 0.5)11.0% 2,573.50.13160.16570.14780.1848 GEM-Rec (휆= 1.0)14.7% 3,638.60.12680.15970.14720.1841 GEM-Rec (휆= 2.0)25.4% 6,887.00.11540.14580.14860.1856 GEM-Rec (휆= 5.0)72.3% 21,748.5 0.05730.07680.15550.1903 GEM-Rec (휆= 7.5)93.7% 29,593.4 0.02700.04220.16130.1987 GEM-Rec (휆= 10.0)98.8% 32,289.6 0.01750.03070.15440.1860 † Imputed Score: Baseline strictly predicts Organic, effectively scoring 0 on all Ad slots. 31