Paper deep dive

Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

Donald Shenaj, Federico Errica, Antonio Carta

Year: 2026Venue: arXiv preprintArea: cs.CVType: PreprintEmbeddings: 71

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 96%

Last extracted: 3/26/2026, 2:35:30 AM

Summary

The paper introduces LoRA^2, an adaptive rank fine-tuning method for personalized image generation using diffusion models. By employing a variational framework that learns layer-wise rank importance, LoRA^2 dynamically adjusts the rank of each LoRA component during training. This approach achieves a superior trade-off between subject fidelity, text alignment, and memory consumption compared to fixed-rank LoRA baselines across 29 subjects and multiple backbones (SDXL, KOALA).

Entities (5)

LoRA^2 · method · 100%Low Rank Adaptation (LoRA) · technique · 100%Donald Shenaj · person · 99%KOALA-700m · model · 95%SDXL · model · 95%

Relation Signals (3)

Donald Shenaj → authored → LoRA^2

confidence 95% · In this paper, we propose LoRA^2... D. Shenaj, F. Errica, A. Carta

LoRA^2 → improvestradeoffover → LoRA

confidence 95% · LoRA^2 achieves a better trade-off between subject fidelity, text alignment, and memory consumption compared to fixed-rank LoRA baselines.

LoRA^2 → usesbackbone → SDXL

confidence 95% · We use SDXL [25] and KOALA-700m [18] as backbones for our experiments.

Cypher Suggestions (2)

List all models used as backbones for LoRA^2 · confidence 95% · unvalidated

MATCH (m:Method {name: 'LoRA^2'})-[:USES_BACKBONE]->(b:Model) RETURN b.name

Find all methods that improve upon LoRA · confidence 90% · unvalidated

MATCH (m:Method)-[:IMPROVES_TRADE_OFF_OVER]->(t:Technique {name: 'LoRA'}) RETURN m.name

Abstract

Abstract:Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion models. Choosing a good rank is extremely critical, since it trades off performance and memory consumption, but today the decision is often left to the community's consensus, regardless of the personalized subject's complexity. The reason is evident: the cost of selecting a good rank for each LoRA component is combinatorial, so we opt for practical shortcuts such as fixing the same rank for all components. In this paper, we take a first step to overcome this challenge. Inspired by variational methods that learn an adaptive width of neural networks, we let the ranks of each layer freely adapt during fine-tuning on a subject. We achieve it by imposing an ordering of importance on the rank's positions, effectively encouraging the creation of higher ranks when strictly needed. Qualitatively and quantitatively, our approach, LoRA$^2$, achieves a competitive trade-off between DINO, CLIP-I, and CLIP-T across 29 subjects while requiring much less memory and lower rank than high rank LoRA versions. Code: this https URL.

PDF

Open source PDF →Open local PDF →

Full Text

70,213 characters extracted from source content.

Expand or collapse full text

Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation Donald Shenaj 1 , Federico Errica 2 , Antonio Carta 1 1 University of Pisa 2 NEC Laboratories Europe Generative Model Denoising UNet 휖 휃 = Attention Block LoRA 2 ...... Q + K + V + Q + K + V + O +O + LoRA Ours 2 () LoRA rank 512() Memory : "A k cat riding a surfboard inside a crystal - clear turquoise wave, perfectly balanced, water splashes frozen mid - air, bright summer sunlight, low angle action shot" LoRA rank 8() "A k cat" Wrong scene Wrong cat Subject - Prompt Alignment Fig. 1: (Left) In LoRA 2 , each LoRA component is rank-adaptive and task-dependent. (Right) LoRA 2 achieves better subject-prompt alignment and memory consumption. Abstract. Low Rank Adaptation (LoRA) is the de facto fine-tuning strategy to generate personalized images from pre-trained diffusion mod- els. Choosing a good rank is extremely critical, since it trades off per- formance and memory consumption, but today the decision is often left to the community’s consensus, regardless of the personalized subject’s complexity. The reason is evident: the cost of selecting a good rank for each LoRA component is combinatorial, so we opt for practical shortcuts such as fixing the same rank for all components. In this paper, we take a first step to overcome this challenge. Inspired by variational methods that learn an adaptive width of neural networks, we let the ranks of each layer freely adapt during fine-tuning on a subject. We achieve it by imposing an ordering of importance on the rank’s positions, effectively encourag- ing the creation of higher ranks when strictly needed. Qualitatively and quantitatively, our approach, LoRA 2 , achieves a competitive trade-off between DINO, CLIP-I, and CLIP-T across 29 subjects while requiring much less memory and lower rank than high rank LoRA versions. Code: https://github.com/donaldssh/NotAllLayersAreCreatedEqual. arXiv:2603.21884v1 [cs.CV] 23 Mar 2026 2D. Shenaj, F. Errica, A. Carta 1 Introduction Personalized diffusion models [9, 17, 28] are a popular application where a pre- trained text-to-image generative model is finetuned to generate new subjects or styles with a few sample images. Online repositories such as Civitai [3] and Hug- gingFace [16] host thousands of personalized diffusion models trained to capture specific subjects or artistic styles. Most of these models are obtained via Low- Rank Adaptation (LoRA) [15], a parameter-efficient fine-tuning technique that injects low-rank updates into pretrained diffusion backbones. A successful personalized model should satisfy three key objectives: (1) high-quality generation of the desired subject or style, (2) strong fidelity to the textual prompt, and (3) low memory footprint (Fig. 1). In practice, these objectives are tightly coupled with the choice of the LoRA rank. Current practice adopts a simple heuristic: a fixed rank is selected and used uniformly across all LoRA components and all subjects. While this strategy provides reasonable average performance, it severely restricts flexibility for various reasons. First, the optimal rank depends on the subject; complex subjects may require higher ranks to capture fine-grained appearance variations, whereas simpler subjects can be modeled with substantially lower ranks. Second, the optimal ranks vary across layers and architectures; many layers may need small ranks while others would require higher capacities. A globally fixed rank prevents layer-wise specialization, resulting in a higher memory footprint without any performance benefits (Fig. 1). The reason for choosing such heuristic, regardless of the subject and layer, is the combinatorial explosion of a full layer-wise and subject-specific hyperpa- rameter search. In this paper, we propose LoRA 2 , a novel approach that adapts LoRA ranks during fine-tuning. Inspired by adaptive-width methods based on variational inference, LoRA 2 encourages an ordering over the rank indices of each LoRA component, effectively pushing it to achieve the minimal effective rank necessary for the task. This structured parameterization enables high im- age quality with reduced memory usage compared to a global LoRA rank. Experimental results demonstrate that LoRA 2 achieves a better trade-off be- tween subject fidelity, text alignment, and memory consumption compared to fixed-rank LoRA baselines. Across 29 personalized subjects and two diffusion backbones (SDXL and KOALA), our method improves this trade-off over fixed- rank configurations with similar or higher memory usage. For example, models with rank 512 achieve strong subject fidelity but require up to 2.8 GB of param- eters, whereas LoRA 2 attains comparable scores with only 0.40 GB, illustrating the efficiency of adaptive learning of the LoRA ranks. Our analysis also reveals that optimal ranks vary significantly across sub- jects and layers, confirming that a globally fixed rank is inherently suboptimal. The adaptive behavior enables the model to allocate capacity where it is most beneficial while minimizing unnecessary parameters. Finally, ablation studies further show that regularizing both the rank parameters and LoRA weights al- lows LoRA 2 to produce compact models with minimal degradation in generation quality. Adaptive LoRA Ranks for Personalized Image Generation3 2 Related Work 2.1 Personalization in Diffusion Models Diffusion models [14,26,34] have achieved remarkable success in image synthesis due to their strong representation capacity and compatibility with multi-modal conditioning, particularly text guidance. Their ability to generate high-fidelity and diverse images has made them the dominant paradigm for text-to-image generation. Beyond generic generation, recent advances have improved the adaptability of diffusion models through personalization techniques that tailor a pretrained backbone to specific subjects or styles while preserving creative flexibility. Meth- ods such as DreamBooth [28], Textual Inversion [9], and StyleDrop [33] adapt a base model using a small set of reference images, allowing it to generate new renditions of a particular object, person, or artistic style across diverse contexts. More recently, Low-Rank Adaptation (LoRA) [15] has emerged as a parameter- efficient alternative for personalization. Instead of fully fine-tuning model weights, LoRA introduces low-rank update matrices that significantly reduce the number of trainable parameters while maintaining generation quality. This design en- ables efficient training, lightweight storage, and modular deployment, allowing users to maintain separate personalization modules for individual subjects. The compact size of LoRA adapters further facilitates sharing and reuse through pub- lic model repositories, making it a widely adopted approach for subject-driven conditioning in diffusion models. 2.2 Adaptive Architectures The term adaptive architectures refers to all those methods that dynamically modify the computational graph of a machine learning model. Early works in this space are constructive approaches that progressively increase a model’s ca- pacity, for instance cascade correlation [7]. Firefly network descent [36] relies on an auxiliary objective function to expand both width and depth at fixed inter- vals. Other methods grow networks by either duplicating or splitting units in a continual learning setting [38], or by periodically creating identical offsprings of neurons [37]. More recently, [24] proposed natural gradient–based heuristics to grow or shrink layers in MLPs and CNNs. Contrary to growing methods, pruning [2] and distillation [13] aim to reduce network size, typically trading off performance for efficiency. Pruning methods remove connections [23] or entire neurons [4,35], including dynamic approaches that apply hard or soft masks during training [11,12]. Distillation instead trans- fers knowledge from a larger model to a smaller one [10]. Adaptive Width Neural Networks (AWNs) [5] take a different and simpler perspective by learning layer width directly through gradient descent within a single training loop. Instead of relying on explicit growth rules or splitting heuristics, AWNs introduce a continuous, monotonically decreasing importance distribution over neurons, allowing the model to smoothly expand or contract its 4D. Shenaj, F. Errica, A. Carta effective width during optimization. This formulation enables structured trunca- tion and dynamic capacity adaptation without separate architectural interven- tions. 2.3 Adaptive LoRA The literature on learning adaptive LoRA ranks tends to be more developed in the NLP domain. AdaLoRA [39] computes an importance score based on the gradients and adds a soft orthogonality constraint. DoRA [21] improves the importance measure of AdaLoRA by making it more robust to noise and sparse gradients at convergence. ARD-LoRA [31] introduces a scaling factor that controls the rank and it is learned by optimizing a meta-objective. To the best of our knowledge, the effectiveness of adaptive LoRA has not been validated for personalized diffusion models, possibly because these techniques do not trivially transfer to computer vision models. Empirical findings in the literature show benefits in adapting the rank of specific components, often found via an extensive manual search. [1] shows that LoRA has less adaptation and less forgetting in LLM post-training. MLPs drive most of the performance of LoRAs, while attention layers can be excluded. [19] finds that in during finetuning, the encoder features stay relatively constant, whereas the decoder features exhibit substantial variations across different time- steps. B-LoRA [8] showed that certain blocks in the SDXL UNet are more respon- sible for content, and some are more responsible for style. The same approach has been used by UnZipLoRA [20] to achieve subject-style separation. Overall, these results motivate our exploration of adaptive rank methods. 3 Method The idea behind our approach is to impose, for each LoRA, an adaptive ordering of importance across the rank dimension of LoRA weight matrices. Such order- ings, learned via backpropagation as any other parameter, are used to determine the adaptive rank of each LoRA. Before introducing our method, however, we provide a refresher on LoRA and the variational framework for adaptive width neural networks of [5], which we frame to our needs. 3.1 LoRA Refresher Low Rank Adaptation (LoRA) [15] is a Parameter-Efficient Fine-Tuning (PEFT) technique designed to adapt large pre-trained models, including diffusion models, without the need to update all model parameters. This is achieved by introducing low-rank weights alongside those of a frozen model’s component ℓ. Specifically, given a frozen weight matrix W ∗ ℓ ∈ R m×n , LoRA updates only a residual weight ∆W ℓ ∈ R m×n , which is computed as two low learnable rank matrices B ℓ ∈ R m×r and A ℓ ∈ R r×n , with rank r ≪ min(m,n). The choice of the rank r naturally induces a trade-off between flexibility and efficiency, and in the literature it is Adaptive LoRA Ranks for Personalized Image Generation5 typically set to the same value for all the model’s components. For each compo- nent ℓ, the final adapted weights can be represented as: W ′ ℓ = W ∗ ℓ + ∆W ℓ = W ℓ + B ℓ A ℓ .(1) 3.2 Adaptive Rank Variational Framework Given a dataset of N i.i.d. samples, with generic i-th input x i and output y i , a typical learning objective is maximizing the log-likelihood of the data logp(Y|X) = log N Y i=1 p(y i |x i ) = N X i=1 logp(y i |x i ). (2) where p(y i |x i ) is a probabilistic model, properly defined for each use case. To formalize learning of a possibly infinite rank for each LoRA component ℓ∈ [1,L] of our image-generation model, we first consider a continuous random variable λ ℓ that controls the finite choice of the rank for component ℓ, in a way that we will describe later. In addition, we introduce an infinite set of random variable θ ℓr ,r ∈ [1,∞], where r can be thought as a “rank index” meaning that, as the rank increases from r to r + 1, a new set of weights has to be introduced in LoRA – effectively expanding matrices B and A – and these new weights will be associated with the multidimensional random variable θ ℓr+1 . For notational convenience, we define θ ℓ =θ ℓr ∞ r=1 , θ =θ ℓ L ℓ=1 and λ =λ ℓ L ℓ=1 . Under these assumptions, we can write p(Y|X) = R p(Y,θ,λ|X)dθdλ, which is unfortunately intractable. Therefore, we apply the same variational approach of [5], which we refer to for the full details, with the only conceptual distinction that r here refers to a rank index instead of a neuron index. To maximize an intractable Eq. 2, we can instead work with the evidence lower bound (ELBO): logp(Y|X)≥ E q(λ,θ) log p(Y,λ,θ|X) q(λ,θ) , (3) where we make the following assumptions about the joint distribution p(Y,λ,θ|X) of the generative model and the associated variational distribution q(λ,θ): p(Y,λ,θ|X) = N Y i=1 p(y i ,λ,θ|x i )p(y i ,λ,θ|x i ) = p(y i |λ,θ,x i )p(λ)p(θ) (4) p(λ) = L Y ℓ=1 p(λ ℓ ) = L Y ℓ=1 N(λ ℓ ;μ λ ℓ ,σ λ ℓ ) p(θ) = L Y ℓ=1 ∞ Y r=1 p(θ ℓr ) (5) 6D. Shenaj, F. Errica, A. Carta p(θ ℓr ) =N(θ ℓr ;0, diag(σ θ ℓ ))p(y i |λ,θ,x i ) = LoRA Neural Net (6) q(λ,θ) = q(λ)q(θ|λ)q(λ) = L Y ℓ=1 q(λ ℓ ) = L Y ℓ=1 N(λ ℓ ;ν ℓ , 1) (7) q(θ|λ) = L Y ℓ=1 D ℓ Y r=1 q(θ ℓr ) ∞ Y r ′ =D ℓ +1 p(θ ℓr ′ ) q(θ ℓr ) =N(θ ℓr ;ρ ℓr ,I). (8) Here, μ λ ℓ ,σ λ ℓ ,σ θ ℓ represent hyper-parameters controlling our prior assumptions about ideal ranks and ideal value of the LoRA weights, whereas ν ℓ ,ρ ℓr are learn- able variational parameters that control the effective LoRA rank and LoRA weights at component ℓ, respectively. In particular, D ℓ represents the finite rank used for LoRA at component ℓ, and it is computed as the quantile function of a discretized exponential f ℓ (x;ν ℓ ) = (1− e −ν ℓ (x+1) )− (1− e −ν ℓ x ), evaluated at 0.9. In other words, the effective rank D ℓ at component ℓ is determined via a continuous parameter ν ℓ that acts as a proxy for the ideal rank and can be easily learned. The final probabilistic objective reduces to L X ℓ log p(ν ℓ ;μ λ ℓ ,σ λ ℓ ) q(ν ℓ ;ν ℓ ) + L X ℓ D ℓ X r=1 log p(ρ ℓr ;σ θ ℓ ) q(ρ ℓr ;ρ ℓr ) + N X i=1 logp(y i |ν,ρ,x i ), (9) which is essentially composed of an optional regularization term for the desired rank, an optional regularization over the LoRA weights, and a mandatory loss term associated with the fine-tuning task. This loss can be optimized via standard backpropagation: as ν ℓ changes, we dynamically recompute the rank of each LoRA component ℓ, effectively introducing or cutting parameters on the fly. This means that, in principle, the model’s size can change during training. 3.3 Adaptive Rank LoRA To learn an effective LoRA rank per LoRA component ℓ, we must incorporate the discretized exponential f ℓ (x;ν ℓ ) into ∆W ℓ , in a way that reflects how the variational framework of the previous section determines the effective rank D ℓ . For this reason, we remind that the role of the discretized exponential is to assign a decreasing ordering of importance to each rank index, meaning that we would like the last columns of B ℓ to be less important than the former ones (or, equivalently, the last rows of A ℓ ). This way, changes to the first rank indices will have a greater effect on performances, while we can safely increase the rank index without impacting ∆W ℓ too much. For this reason, we formally consider p(y i |ν,ρ,x i ) as a generic neural network and construct each LoRA component as follows: ∆W ℓ = B ℓ Λ ℓ A ℓ ,Λ ℓ = diag (f(1;ν ℓ ),...,f(D ℓ ;ν ℓ ))(10) Adaptive LoRA Ranks for Personalized Image Generation7 B ℓ A ℓ Λ ℓ adaptive rank D ℓ dynamic removal/addition f ℓ (r; ν ℓ ) ... adaptive rank D ℓ rank 1 D ℓ ∞ forward pass loss signal update ν ℓ ,B ℓ ,A ℓ recompute D ℓ Fig. 2: LoRA 2 works by dynamically determining an adaptive rank D ℓ for each LoRA component by truncating an exponential distribution f ℓ (r;ν ℓ ), parametrized by a learn- able ν ℓ . This makes the rank dependent on the component and the task. This approach is extremely easy to implement and can grow/shrink dynam- ically during training; in the case of a growing D ℓ , as new rank dimensions are added we randomly initialize the new weights of B ℓ and A ℓ . The approach is visually represented in Fig. 2. Weight Initialization. The rescaling generated by Λ ℓ has an effect on con- vergence speedup, since it affects the gradients. To counteract this effect, we apply a “rescaled” Kaiming initialization; in particular, we initialize A ℓ weights from a Gaussian distribution with standard deviation √ 2 q P D ℓ j=1 f 2 ℓ (j) . Instead, B ℓ is initialized as a zero matrix following [15]. Implicit Space Search. The main conceptual advantage of LoRA 2 is that it replaces the search over a very large number of different LoRA architectures. In principle, finetuning S subjects while trying K different ranks for a network with L components amounts to training SK L different architectural configura- tions, way beyond any practical application even for small values of K and L. Instead, continuous optimization of ν allows to softly introduce new ranks when needed and truncate those that are not necessary any longer, all in a single training run. Therefore, despite the introduction of (optional) regularization 8D. Shenaj, F. Errica, A. Carta hyper-parameters, we argue that our approach makes the search over a huge amount of LoRA architectures much more feasible than before. Training Loss. We finetune the LoRA modules using a combination of three losses, which are related in spirit to the ones of Equation 9 in the variational framework. The main reconstruction loss is L MSE = 1 N N X i=1 ∥ˆε i − ε i ∥ 2 , (11) where ˆε i is the model prediction, ε i the target noise, , and N the batch size. We regularize the adaptive LoRA rank rates to remain close to a target: L reg = X ℓ∈[1,...,L] |ν ℓ − ν target |, ν target =− log(1− q) r target , (12) with q being the quantile and r target the rank we would like to push the LoRA components towards. To encourage more selective and confident cross-token alignments, we minimize the entropy of the cross-attention maps: L entropy =− 1 |C| X ℓ∈C E p ℓ [logp ℓ ],(13) where C denotes the set of components over which the cross-attention is com- puted, and p ℓ represents the softmax-normalized attention map at component ℓ. The overall loss, therefore, can be written as: L total =L MSE + λ r L reg + λ e L entropy ,(14) with λ r and λ e weighting factors. k 4 Experiments We use SDXL [25] and KOALA-700m [18] as backbones for our experiments. On SDXL, we use 50 inference steps [29,30]; on KOALA-700m, 25 [6]. To learn personalized subjects, we employ LoRA finetuning using the DreamBooth proto- col [28]. Our experiments are conducted on a set of 30 subjects sourced from [28]. We select one random subject (vase) for hyper-parameter tuning, and then test on the remaining 29 subjects. For each subject, we explore LoRA models of different capacities, with ranks ∈8, 16, 32, 64, 128, 256, 512. In LoRA 2 experi- ments, the hyper-parameter tuning process selected 500 training steps for SDXL and 800 steps for KOALA. We fixed the learning rate of the Adam optimizer to 5e −5 and fixed weights λ r = λ e = 1e −4 . For LoRA, we use 1000 training steps as in [29, 30]. For each subject, we collect 10 prompts (please refer to the sup- plementary material) and then generate 5 images per prompt. We then compute Adaptive LoRA Ranks for Personalized Image Generation9 the DINO, CLIP-I, and CLIP-T scores, comparing the features of each gener- ated image with the features of the original subject image or the features of the prompt. To aggregate the score, we average the score of each subject across each generation in a prompt, and then across all prompts. In this way, we have a single score for each subject, and we average them across all subjects. 5 Results “a k clock next to a cup of coffee on a kitchen counter” “a k clock placed on pink silk fabric” “a k clock on a mossy rock in a forest” “a k clock with a city skyline in the background” “a k clock in the snow under warm sunlight” Rank 8 Rank 64 Rank 512 LoRA 2 Fig. 3: Images generated using SDXL backbone for the “clock" subject. The original subject is present on the top left. 5.1 Qualitative Results Figure 3 and 4 show images generated with finetuned SDXL and KOALA-700m backbones, respectively. The generated images confirm that low ranks are unable to faithfully reproduce the subject: both the yellow clock and the backpack are often generated with the wrong color at ranks 8 and 64. At rank 512, LoRA finetuning struggles to follow the finer details of the prompt, such as ignor- ing the requested background. For the clock, rank 512 remains suboptimal for 10D. Shenaj, F. Errica, A. Carta faithful reconstruction, with LoRA 2 being the only approach to fully reproduce the content at high fidelity. Notably, the numeral “3" on the clock face is pre- served exclusively in our result; rank 512 fails to render it in both second and fifth prompts. The same observation applies to the backpack: the patch eye on the right side is missing in the first and fourth prompts (and also the tongue). This suggests that subject fidelity does not necessarily improve with higher rank, likely because the model tends to overfit the background instead. Per-class scores are provided in Fig. 7. Finally, in some cases, the subject is not properly inte- grated with the background, exhibiting incorrect shadows or appearing to float above the ground. In contrast, images generated by LoRA 2 remain consistent with both the subject and the prompt. “a k backpack on a cobblestone street after rain” “a k backpack on a glass table with reflections” “a k backpack with mountains and mist in the background” “a k backpack floating in crystal clear water” “a k backpack surrounded by neon lights” Rank 8 Rank 64 Rank 512 LoRA 2 Fig. 4: Images generated using KOALA-700m backbone for the subject “backpack dog". The original subject is present on the top left. 5.2 Aggregated Results To quantitatively evaluate subject and prompt alignment in generated images, we use DINO, CLIP-I, and CLIP-T scores [9,28]. Figure 5 and 6 report the aver- age scores as a function of memory occupation for each trained model. Standard Adaptive LoRA Ranks for Personalized Image Generation11 050010001500200025003000 File size (MB) 0.60 0.62 0.64 0.66 0.68 0.70 DINO Score 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive 050010001500200025003000 File size (MB) 0.72 0.73 0.74 0.75 0.76 0.77 0.78 CLIP-I Score (Image) 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive 050010001500200025003000 File size (MB) 0.305 0.310 0.315 0.320 0.325 0.330 0.335 0.340 0.345 CLIP-T Score (Text) 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive Fig. 5: SDXL backbone. Aggregated results (average of all subjects). 0200400600800 File size (MB) 0.60 0.62 0.64 0.66 0.68 DINO Score 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive 0200400600800 File size (MB) 0.71 0.72 0.73 0.74 0.75 0.76 0.77 CLIP-I Score (Image) 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive 0200400600800 File size (MB) 0.305 0.310 0.315 0.320 0.325 0.330 0.335 CLIP-T Score (Text) 8 16 32 64 128 256 512 LoRA 2 fixed rank adaptive Fig. 6: KOALA-700m backbone. Aggregated results (average of all subjects). LoRA models exhibit a clear trend when trained with different ranks, where increasing the rank improves subject fidelity (higher DINO and CLIP-I) and decreases text alignment (lower CLIP-T). Low-rank models fail to consistently reproduce the target subject, frequently omitting distinctive attributes (e.g., in- correct colors or textures). High-rank models generate a stable and recognizable subject, but the surrounding scene and attributes increasingly deviate from the textual description. This indicates a tradeoff between subject consistency and text alignment as model capacity during finetuning grows, consistent with previ- ous work [1]. LoRA 2 achieves a more favorable tradeoff between these objectives. 5.3 Per-Subject Performance To empirically support the need for adaptive ranks, we computed per-subject scores showing how there is no single rank that fits all. Figure 7 shows per- subject scores for SDXL, while results on KOALA are in the supplementary material. We highlight with a grey band rank 64, the default value commonly used in previous works [8, 20, 27, 29, 30, 32]. We also highlight in red the best value for each subject. First, we notice that rank 64 is never optimal in any of the metrics for SDXL. However, it achieves a good tradeoff considering subject alignment, text alignment, and model size. The best models on DINO and CLIP- I scores are either the high rank models or our LoRA 2 . Instead, text alignment 12D. Shenaj, F. Errica, A. Carta is consistently the best at lower ranks. Our LoRA 2 has a model size comparable to the fixed rank 64. However, compared to the rank 64 baseline, our method achieves much higher DINO and CLIP-I scores, at the price of slightly lower CLIP-T. Instead, compared to the rank 512 model, LoRA 2 has similar scores with a much lower memory occupation (0.40 GB for LoRA 2 against 2.80 GB for rank 512). In conclusion, we observe that by using fixed ranks it is not possible to find an optimal solution for all the subjects, whereas LoRA 2 provides better control by tuning the regularization hyper-parameters, which is more efficient than testing a huge number of configurations (as discussed in Section 3.3). backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.600.620.650.410.720.550.680.680.570.590.700.740.470.670.790.670.690.610.670.670.410.540.490.530.370.550.590.580.630.60 0.610.640.620.480.730.600.690.790.580.560.730.770.500.670.790.660.710.600.710.680.440.550.540.550.360.580.620.640.630.62 0.630.620.610.450.740.540.690.790.580.620.740.790.540.700.820.690.710.610.700.690.480.560.540.630.360.580.630.670.640.63 0.620.710.690.450.740.620.700.780.550.600.740.810.550.740.840.720.740.640.720.700.510.580.620.660.400.610.620.680.680.66 0.630.720.690.630.740.660.720.840.580.590.770.800.560.750.830.750.750.650.750.750.520.600.680.720.350.670.670.690.700.68 0.650.730.700.740.760.670.700.850.590.670.770.800.570.740.860.730.740.660.750.730.520.620.630.710.370.700.660.720.720.69 0.650.740.670.730.760.660.730.830.600.620.760.810.580.770.840.720.760.670.740.730.540.590.740.730.360.760.690.710.700.70 0.620.760.740.530.760.660.760.850.610.650.770.820.500.720.860.730.740.700.790.740.570.580.710.710.520.700.640.570.690.69 DINO Score SIZE (GB) 0.04 0.09 0.17 0.35 0.69 1.40 2.80 0.40 backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.780.670.730.700.590.740.820.720.780.730.810.740.700.710.790.750.740.700.730.800.640.690.720.670.510.700.760.750.710.72 0.800.710.720.720.620.760.820.740.790.750.820.750.710.740.790.740.740.700.770.790.640.700.780.690.520.740.780.770.730.74 0.800.720.720.740.620.750.820.740.790.770.820.760.720.730.790.750.740.710.750.810.660.700.790.710.500.730.800.780.730.74 0.810.770.750.750.650.790.830.740.780.740.830.770.740.740.800.750.750.720.770.830.680.730.820.730.540.760.800.790.760.76 0.810.780.770.800.660.810.840.770.800.760.840.780.720.760.800.760.760.730.800.860.680.740.840.760.500.790.840.810.780.77 0.820.780.780.820.690.810.830.770.810.800.840.780.750.740.820.750.770.740.810.840.690.780.830.750.520.810.830.820.790.78 0.830.780.770.850.710.800.850.770.820.780.840.780.760.780.810.750.770.730.790.840.710.740.860.750.520.830.860.820.780.78 0.800.790.780.760.660.790.830.770.800.800.840.790.720.770.820.750.750.760.830.840.710.710.860.740.630.780.800.770.770.77 CLIP-I Score (Image) SIZE (GB) 0.04 0.09 0.17 0.35 0.69 1.40 2.80 0.40 backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.360.370.350.320.340.330.360.370.330.350.350.360.340.340.340.360.360.320.320.320.340.370.300.350.340.340.340.370.350.34 0.360.360.340.310.330.320.360.370.330.350.350.360.350.330.340.360.350.320.310.310.330.370.280.350.340.320.330.370.360.34 0.350.350.350.310.330.320.360.370.310.330.340.360.340.330.340.360.350.310.320.300.330.360.270.320.340.330.330.360.360.34 0.360.310.330.310.310.290.350.370.310.330.340.340.330.320.330.360.350.300.300.300.320.350.260.320.340.320.330.350.340.33 0.340.310.320.270.300.260.340.350.310.340.330.340.320.320.320.360.340.300.290.280.310.360.230.310.350.290.300.320.320.31 0.330.320.310.260.300.280.350.350.290.310.330.340.310.320.310.360.340.290.290.290.300.330.260.310.340.280.310.340.310.31 0.340.300.310.260.300.270.340.350.290.320.330.330.300.300.320.360.330.290.290.280.290.340.210.300.340.270.290.320.320.31 0.350.320.320.290.300.290.330.350.290.300.320.340.310.310.320.360.340.290.280.280.300.370.230.310.300.310.310.350.320.31 CLIP-T Score (Text) SIZE (GB) 0.04 0.09 0.17 0.35 0.69 1.40 2.80 0.40 Fig. 7: SDXL backbone, per-subject scores. We highlight with a grey band rank 64, the default value commonly used in previous work. We also highlight in red the best value for each subject. On the side, we also add the model size in GB. Adaptive LoRA Ranks for Personalized Image Generation13 0 20 40 60 80 100 Self-Attention (attn1) - Q Cat 2 Dog 8 Can Robot Toy Teapot 0 100 200 300 400 500 Self-Attention (attn1) - V 0 100 200 300 400 500 Cross-Attention (attn2) - Q 050100150200250 Attention Blocks (ordered down→mid→up) 0 100 200 300 400 500 Cross-Attention (attn2) - V Fig. 8: SDXL Self-Attention and Cross-Attention ranks, for five distinct subjects. 5.4 LoRA Rank Analysis One of the goals of LoRA 2 is to allow the finetuning strategy to detect LoRA components that do not need adaptation, lowering their rank, and use higher ca- pacity when necessary. To demonstrate that LoRA 2 learns an ad-hoc solution for different subjects, Figure 8 shows the ranks of self-attention and cross-attention layers (Query and Value matrices) for 5 randomly selected subjects: “Cat 2", “Dog 8", “Can", “Robot Toy", and “Teapot". While the figure shows the results for SDXL, and they are limited to the Query and Value matrices, we report full plots in the supplementary material. First, we notice that self-attention and cross-attention have different tendencies. Cross-attention has a higher preva- lence of max rank (512) LoRAs, while self-attention layers tend to have lower ranks. A large number of components collapse to rank 1, confirming the ability of LoRA 2 to save memory by reducing the rank of unnecessary components. We also notice that different subjects share most of the ranks, but they also have some differences, meaning LoRA 2 adapts to different subjects though they might share some similarity. Overall, LoRA 2 shows a high degree of diversity across layers and a moderate diversity across subjects and layer types, which is what we would expect from an adaptive rank method. 5.5 Ablation The MSE loss is a good proxy for subject fidelity (DINO and CLIP-I scores). Therefore, LoRA 2 uses a regularization loss on the ranks and an additional en- tropy loss to better control the subject-text-memory tradeoff. Figure 9 shows the file size of different configurations of LoRA 2 for each subject, while Table 14D. Shenaj, F. Errica, A. Carta 1 shows the aggregated file size and image scores. Removing the rank regular- ization increases the file size from an average of 406 MB to 2.7 GB. This is a consequence of the MSE loss and its strong bias towards better subject fidelity. As a result, the resulting model obtains marginally better DINO and CLIP-I scores. Removing the entropy loss while keeping the rank regularization results in a similar file size compared to the full LoRA 2 . However, the model trained with entropy regularization has a higher CLIP-T. The full LoRA 2 with both regularization losses is needed to obtain a good tradeoff between subject fidelity, textual alignment, and model size. backpack backpack dog bear plushie berry bowl can candle cat cat2 clock colorful sneaker dog dog2dog3dog5dog6dog7dog8 duck toy fancy boot grey sloth plushie monster toy pink sunglasses poop emoji rc car red cartoon robot toy shiny sneaker teapot vase wolf plushie 0 500 1000 1500 2000 2500 3000 File size (MB) LoRA 2 LoRA 2 λ e = 0 LoRA 2 λ r = 0 LoRA 2 λ e =λ r = 0 Fig. 9: File size of LoRA 2 for SDXL. Table 1: Ablation of regularization losses, the scores are averaged across all subjects. Backbone MethodDINO CLIP-I CLIP-T File size SDXL LoRA 2 0.689 0.773 0.313 406 MB LoRA 2 (λ e = 0)0.696 0.780 0.303 410 MB LoRA 2 (λ r = λ e = 0) 0.699 0.782 0.299 2.7 GB KOALA-700m LoRA 2 0.680 0.760 0.308 158 MB LoRA 2 (λ e = 0)0.681 0.762 0.304 160 MB LoRA 2 (λ r = λ e = 0) 0.693 0.768 0.302 734 MB 6 Conclusions We introduced LoRA 2 , an easy-to-implement, fully differentiable, and model- agnostic modification of LoRA to learn a proper rank for each LoRA component in deep learning models for personalized image generation. LoRA 2 encourages an ordering of importance across rank indices, allowing us to dynamically introduce or reduce the rank of each LoRA component depending on the specific subject at hand. Thanks to this approach, we do not need to manually select the rank for each LoRA component, which would have a combinatorial cost, nor to fix the same rank for all components, which we empirically show is not the best strategy. Across 29 subjects, LoRA 2 achieves a very good trade-off between DINO, CLIP- I and CLIP-T scores while requiring lower memory consumption. In the future, we will investigate the role of adaptive rank learning in the multi-subject and model-merging settings and its performance on larger diffusion models. Adaptive LoRA Ranks for Personalized Image Generation15 Acknowledgments This paper has been partially supported by the CoEvolution project, funded by EU Horizon 2020 under GA n 101168559. We acknowledge ISCRA for awarding this project access to the LEONARDO supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CINECA (Italy). References 1. Biderman, D., Portes, J., Ortiz, J.J.G., Paul, M., Greengard, P., Jennings, C., King, D., Havens, S., Chiley, V., Frankle, J., Blakeney, C., Cunningham, J.P.: LoRA learns less and forgets less. Transactions on Machine Learning Research (TMLR) (2024), https://openreview.net/forum?id=aloEru2qCG 4, 11 2. Blalock, D., Gonzalez Ortiz, J.J., Frankle, J., Guttag, J.: What is the state of neural network pruning? Proceedings of Machine Learning and Systems (MLSys) 2, 129–146 (2020), https://proceedings.mlsys.org/paper_files/paper/2020/ file/6c44dc73014d66ba49b28d483a8f8b0d-Paper.pdf 3 3. Civitai: The Home of Open-Source Generative AI. https://civitai.com/ (2025), accessed: November 2025 2 4. Dufort-Labbé, S., D’Oro, P., Nikishin, E., Rish, I., Bacon, P.L., Pascanu, R., Baratin, A.: Maxwell’s demon at work: Efficient pruning by leveraging satura- tion of neurons. Transactions on Machine Learning Research (TMLR) (2025), https://openreview.net/forum?id=nmBleuFzaN 3 5. Errica, F., Christiansen, H., Zaverkin, V., Niepert, M., Alesiani, F.: Adaptive width neural networks. In: Proceedings of the 14th International Conference on Learning Representations (ICLR) (2026), https://openreview.net/forum?id=p6Ek7Qg577 3, 4, 5 6. ETRI VILAB: Koala-700m. https://huggingface.co/etri-vilab/koala-700m (2023), hugging Face model repository, accessed: 2026-03-05 8 7. Fahlman, S., Lebiere, C.: The cascade-correlation learning architecture. In: Pro- ceedings of the 3rd International Conference on Neural Information Processing Systems (NIPS) (1989), https://proceedings.neurips.c/paper/1989/file/ 69adc1e107f7f7d035d7baf04342e1ca-Paper.pdf 3 8. Frenkel, Y., Vinker, Y., Shamir, A., Cohen-Or, D.: Implicit style-content separation using b-lora. In: European Conference on Computer Vision (ECCV). p. 181–198. Springer (2024), https://w.ecva.net/papers/eccv_2024/papers_ECCV/html/ 1549_ECCV_2024_paper.php 4, 11 9. Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G., Cohen-Or, D.: An image is worth one word: Personalizing text-to-image gener- ation using textual inversion. In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023), https://openreview.net/forum?id= NAQvF08TcyG 2, 3, 10 10. Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: A survey. Inter- national Journal of Computer Vision (IJCV) 129(6), 1789–1819 (2021), https: //link.springer.com/article/10.1007/s11263-021-01453-z 3 11. Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. Proceed- ings of the 30th International Conference on Neural Information Processing Sys- tems (NIPS) 29 (2016), https://proceedings.neurips.c/paper/2016/file/ 2823f4797102ce1a1aec05359c16d9-Paper.pdf 3 16D. Shenaj, F. Errica, A. Carta 12. He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI) (2018), https://dl.acm.org/doi/ 10.5555/3304889.3304970 3 13. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint (2015), https://arxiv.org/abs/1503.02531 3 14. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Proceed- ings of the 34th International Conference on Neural Information Processing Sys- tems (NeurIPS) 33, 6840–6851 (2020), https://proceedings.neurips.c/paper_ files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf 3 15. Hu, E.J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations (ICLR) (2022), https: //openreview.net/forum?id=nZeVKeeFYf9 2, 3, 4, 7 16. Hugging Face – The AI community building the future. https://huggingface.co/ (2025), accessed: November 2025 2 17. Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept cus- tomization of text-to-image diffusion. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). p. 1931–1941 (2023), https://openaccess.thecvf.com/content/CVPR2023/papers/Kumari_Multi- Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf 2 18. Lee, Y., Park, K., Cho, Y., Lee, Y.J., Hwang, S.J.: Koala: Empirical lessons to- ward memory-efficient and fast diffusion models for text-to-image synthesis. Pro- ceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS) 37, 51597–51633 (2024), https://proceedings.neurips.c/ paper_files/paper/2024/file/5c4e0e38691e2a08bba4cefc4c6e852- Paper- Conference.pdf 8 19. Li, S., Hu, T., van de Weijer, J., Khan, F.S., Liu, T., Li, L., Yang, S., Wang, Y., Cheng, M.M., Yang, J.: Faster diffusion: Rethinking the role of the en- coder for diffusion model inference. Proceedings of the 38th International Con- ference on Neural Information Processing Systems (NeurIPS) 37, 85203–85240 (2024), https://proceedings.neurips.c/paper_files/paper/2024/file/ 9ad996b5c45130de2bc00b60d8607904-Paper-Conference.pdf 4 20. Liu, C., Shah, V., Cui, A., Lazebnik, S.: Unziplora: Separating content and style from a single image. In: Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV). p. 16776–16785 (2025), https://openaccess. thecvf.com/content/ICCV2025/papers/Liu_UnZipLoRA_Separating_Content_ and_Style_from_a_Single_Image_ICCV_2025_paper.pdf 4, 11 21. Liu, S.Y., Wang, C.Y., Yin, H., Molchanov, P., Wang, Y.C.F., Cheng, K.T., Chen, M.H.: Dora: Weight-decomposed low-rank adaptation. In: Proceedings of the 41st International Conference on Machine Learning (ICML) (2024), https: //proceedings.mlr.press/v235/liu24bn.html 4 22. Meral, T.H.S., Simsar, E., Tombari, F., Yanardag, P.: Contrastive test-time com- position of multiple lora models for image generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). p. 18090– 18100 (2025), https://openaccess.thecvf.com/content/ICCV2025/papers/ Meral_Contrastive_Test-Time_Composition_of_Multiple_LoRA_Models_for_ Image_Generation_ICCV_2025_paper.pdf 30 Adaptive LoRA Ranks for Personalized Image Generation17 23. Mishra, A., Latorre, J.A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., Micikevicius, P.: Accelerating sparse deep neural networks. arXiv preprint (2021), https://arxiv.org/abs/2104.08378 3 24. Mitchell, R., Mundt, M., Kersting, K.: Self expanding neural networks. arXiv preprint (2023), https://arxiv.org/abs/2307.04526 3 25. Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: Improving latent diffusion models for high-resolution im- age synthesis. In: Proceedings of the 12th International Conference on Learning Representations (ICLR) (2024), https://openreview.net/forum?id=di52zR8xgf 8 26. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 10684– 10695 (2022), https://ieeexplore.ieee.org/document/9878449 3 27. Roy, A., Borse, S., Kadambi, S., Das, D., Mahajan, S., Garrepalli, R., Park, H., Nayak, A., Chellappa, R., Hayat, M., Porikli, F.: Duolora : Cycle-consistent and rank-disentangled content-style personalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). p. 15395–15404 (Oc- tober 2025), https://openaccess.thecvf.com/content/ICCV2025/papers/ Roy_DuoLoRA__Cycle- consistent_and_Rank- disentangled_Content- Style_ Personalization_ICCV_2025_paper.pdf 11 28. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 22500–22510 (2023), https://ieeexplore.ieee.org/ document/10204880 2, 3, 8, 10 29. Shah, V., Ruiz, N., Cole, F., Lu, E., Lazebnik, S., Li, Y., Jampani, V.: Ziplora: Any subject in any style by effectively merging loras. In: European Conference on Computer Vision (ECCV). p. 422–438. Springer (2024), https://w.ecva.net/ papers/eccv_2024/papers_ECCV/html/148_ECCV_2024_paper.php 8, 11 30. Shenaj, D., Bohdal, O., Ozay, M., Zanuttigh, P., Michieli, U.: LoRA.rar: Learn- ing to merge loras via hypernetworks for subject-style conditioned image gen- eration. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV). p. 16132–16142 (2025), https://openaccess.thecvf.com/ content/ICCV2025/papers/Shenaj_LoRA.rar_Learning_to_Merge_LoRAs_via_ Hypernetworks_for_Subject-Style_Conditioned_ICCV_2025_paper.pdf 8, 11 31. Shinwari, H.U.K., Usama, M.: Ard-lora: Dynamic rank allocation for parameter- efficient fine-tuning of foundation models with heterogeneous adaptation needs. arXiv preprint (2025), https://arxiv.org/abs/2506.18267 4 32. Soboleva, V., Alanov, A., Kuznetsov, A., Sobolev, K.: T-lora: Single image diffusion model customization without overfitting. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). vol. 40, p. 9051–9059 (2026), https://ojs. aaai.org/index.php/AAAI/article/view/37861 11 33. Sohn, K., Jiang, L., Barber, J., Lee, K., Ruiz, N., Krishnan, D., Chang, H., Li, Y., Essa, I., Rubinstein, M., Hao, Y., Entis, G., Blok, I., Castro Chin, D.: Style- drop: Text-to-image synthesis of any style. In: Proceedings of the 37th Interna- tional Conference on Neural Information Processing Systems (NeurIPS). vol. 36, p. 66860–66889 (2023), https://proceedings.neurips.c/paper_files/paper/ 2023/file/d33b177b69425e7685b0b1c05bd2a5e4-Paper-Conference.pdf 3 18D. Shenaj, F. Errica, A. Carta 34. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Proceedings of the 9th International Conference on Learning Representations (ICLR) (2021), https://openreview.net/forum?id=St1giarCHLP 3 35. Valerio, L., Nardini, F.M., Passarella, A., Perego, R.: Dynamic hard pruning of neu- ral networks at the edge of the internet. Journal of Network and Computer Appli- cations 200, 103330 (2022), https://w.sciencedirect.com/science/article/ pii/S1084804521003155 3 36. Wu, L., Liu, B., Stone, P., Liu, Q.: Firefly neural architecture descent: a general approach for growing neural networks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS). vol. 33 (2020), https://proceedings.neurips.c/paper/2020/file/ fdbe012e2e11314b96402b32c0df26b7-Paper.pdf 3 37. Wu, L., Wang, D., Liu, Q.: Splitting steepest descent for growing neural architec- tures. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS) (2019), https://proceedings.neurips.c/paper_ files/paper/2019/file/3a01fc0853ebeba94fde4d1c6fb842a-Paper.pdf 3 38. Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: Proceedings of the 6th International Conference on Learning Representations (ICLR) (2018), https://openreview.net/forum?id= Sk7KsfW0- 3 39. Zhang, Q., Chen, M., Bukharin, A., He, P., Cheng, Y., Chen, W., Zhao, T.: Adaptive budget allocation for parameter-efficient fine-tuning. In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023), https://openreview.net/forum?id=lq62uWRJjiY 4 Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation Supplementary Material A1 Additional Implementation Details All models were trained with a resolution of 1024× 1024, a batch size of 1, and a learning rate of 5× 10 −5 . We used mixed precision training (fp16), gradient checkpointing, and 8-bit Adam optimization. Experiments were conducted on NVIDIA Ampere A100 GPUs (64GB RAM). A2 Prompts Table A1: Full prompts used for evaluation SubjectPrompt backpacka <c> backpack on a wooden shelf surrounded by books a modern minimalistic <c> backpack on a white surface a <c> backpack in the snow under warm sunlight a <c> backpack on a cobblestone street after rain a vintage <c> backpack on an antique table a <c> backpack placed on pink silk fabric a <c> backpack on a mossy rock in a forest a glowing <c> backpack in the dark a <c> backpack on a glass table with reflections a <c> backpack on a sandy beach at sunset backpack_doga <c> backpack on a cobblestone street after rain a <c> backpack with a city skyline in the background a <c> backpack in the snow under warm sunlight a <c> backpack surrounded by neon lights a vintage <c> backpack on an antique table a <c> backpack on a glass table with reflections a <c> backpack on a wooden shelf surrounded by books a <c> backpack with mountains and mist in the back- ground a <c> backpack floating in crystal clear water a <c> backpack placed on pink silk fabric bear_plushiea <c> stuffed animal in the jungle 20D. Shenaj, F. Errica, A. Carta SubjectPrompt a wet <c> stuffed animal a <c> stuffed animal in the snow a <c> stuffed animal in a chef outfit a <c> stuffed animal in a police uniform a <c> stuffed animal wearing a rainbow scarf a <c> stuffed animal in a city park surrounded by flowers a <c> stuffed animal wearing a black top hat and a mon- ocle a <c> stuffed animal in a forest clearing with sunlight rays a <c> stuffed animal with the Eiffel Tower in the back- ground berry_bowla <c> bowl in the snow under warm sunlight a <c> bowl on a cobblestone street after rain a vintage <c> bowl on an antique table a <c> bowl with a city skyline in the background a modern minimalistic <c> bowl on a white surface a <c> bowl on a glass table with reflections a <c> bowl in a minimalist art gallery a <c> bowl on a sandy beach at sunset a glowing <c> bowl in the dark a <c> bowl floating in crystal clear water cana glowing <c> can in the dark a <c> can on a mossy rock in a forest a <c> can with mountains and mist in the background a <c> can on a wooden shelf surrounded by books a <c> can placed on pink silk fabric a <c> can on a sandy beach at sunset a vintage <c> can on an antique table a <c> can in the snow under warm sunlight a modern minimalistic <c> can on a white surface a <c> can on a marble table, studio lighting candlea <c> candle on a cobblestone street after rain a <c> candle on a sandy beach at sunset a <c> candle on a reflective mirror surface a <c> candle placed on pink silk fabric a <c> candle with a city skyline in the background a <c> candle in a minimalist art gallery a <c> candle in the snow under warm sunlight a <c> candle next to a cup of coffee on a kitchen counter a glowing <c> candle in the dark a <c> candle on a wooden shelf surrounded by books Supplementary Material21 SubjectPrompt cata <c> cat in a forest clearing with sunlight rays a <c> cat in a police uniform a <c> cat in the jungle a <c> cat in a chef outfit a <c> cat on the beach during sunset a <c> cat in the snow a <c> cat wearing a rainbow scarf a <c> cat driving a tiny car a shiny <c> cat a <c> cat with the Eiffel Tower in the background cat2a <c> cat with the Eiffel Tower in the background a <c> cat in the jungle a shiny <c> cat a <c> cat on the beach during sunset a <c> cat in a chef outfit a <c> cat in a city park surrounded by flowers a <c> cat floating in outer space a <c> cat wearing a rainbow scarf a <c> cat in a forest clearing with sunlight rays a <c> cat sitting on a red couch indoors clocka <c> clock surrounded by neon lights a <c> clock next to a cup of coffee on a kitchen counter a <c> clock on a reflective mirror surface a <c> clock on a glass table with reflections a <c> clock on a cobblestone street after rain a <c> clock placed on pink silk fabric a <c> clock on a marble table, studio lighting a <c> clock on a mossy rock in a forest a <c> clock with a city skyline in the background a <c> clock in the snow under warm sunlight colorful_sneakera <c> sneaker with a city skyline in the background a <c> sneaker placed on pink silk fabric a <c> sneaker on a glass table with reflections a <c> sneaker surrounded by neon lights a <c> sneaker in a minimalist art gallery a <c> sneaker on a marble table, studio lighting a modern minimalistic <c> sneaker on a white surface a <c> sneaker on a reflective mirror surface a <c> sneaker on a mossy rock in a forest a <c> sneaker with mountains and mist in the back- ground doga <c> dog with mountains in the background 22D. Shenaj, F. Errica, A. Carta SubjectPrompt a cube-shaped <c> dog a <c> dog wearing a black top hat and a monocle a <c> dog in a chef outfit a <c> dog in the jungle a <c> dog in a city park surrounded by flowers a <c> dog floating in outer space a <c> dog with the Eiffel Tower in the background a <c> dog wearing sunglasses a wet <c> dog dog2a <c> dog in the snow a <c> dog wearing a black top hat and a monocle a <c> dog in a chef outfit a <c> dog sitting on a red couch indoors a <c> dog in a forest clearing with sunlight rays a <c> dog in a city park surrounded by flowers a <c> dog on the beach during sunset a <c> dog with the Eiffel Tower in the background a <c> dog floating in outer space a <c> dog driving a tiny car dog3a cube-shaped <c> dog a <c> dog in the jungle a <c> dog in a wizard robe holding a staff a <c> dog wearing a rainbow scarf a <c> dog wearing sunglasses a <c> dog in a police uniform a <c> dog in the snow a <c> dog sitting on a red couch indoors a <c> dog in a forest clearing with sunlight rays a <c> dog in a chef outfit dog5a <c> dog wearing a red hat a shiny <c> dog a <c> dog wearing a black top hat and a monocle a <c> dog in a chef outfit a <c> dog floating in outer space a <c> dog with mountains in the background a <c> dog in a forest clearing with sunlight rays a wet <c> dog a <c> dog in a wizard robe holding a staff a <c> dog in the snow dog6a wet <c> dog a shiny <c> dog a <c> dog driving a tiny car Supplementary Material23 SubjectPrompt a <c> dog wearing a red hat a <c> dog with mountains in the background a <c> dog in a forest clearing with sunlight rays a <c> dog in the jungle a <c> dog in a police uniform a cube-shaped <c> dog a <c> dog floating in outer space dog7a <c> dog in the snow a <c> dog wearing a black top hat and a monocle a <c> dog in a chef outfit a <c> dog wearing a red hat a <c> dog on the beach during sunset a <c> dog wearing a rainbow scarf a <c> dog with the Eiffel Tower in the background a <c> dog in the jungle a <c> dog wearing sunglasses a <c> dog in a forest clearing with sunlight rays dog8a shiny <c> dog a <c> dog in a city park surrounded by flowers a <c> dog in a wizard robe holding a staff a <c> dog wearing sunglasses a <c> dog wearing a red hat a <c> dog in a forest clearing with sunlight rays a <c> dog wearing a black top hat and a monocle a wet <c> dog a <c> dog on the beach during sunset a <c> dog floating in outer space duck_toya <c> toy sitting on a red couch indoors a <c> toy on the beach during sunset a <c> toy in a police uniform a <c> toy with mountains in the background a <c> toy floating in outer space a <c> toy wearing a red hat a shiny <c> toy a <c> toy in a forest clearing with sunlight rays a <c> toy wearing a black top hat and a monocle a wet <c> toy fancy_boota <c> boot floating in crystal clear water a <c> boot with a city skyline in the background a <c> boot on a cobblestone street after rain a <c> boot placed on pink silk fabric a vintage <c> boot on an antique table 24D. Shenaj, F. Errica, A. Carta SubjectPrompt a <c> boot on a sandy beach at sunset a <c> boot on a marble table, studio lighting a <c> boot on a mossy rock in a forest a glowing <c> boot in the dark a <c> boot on a wooden shelf surrounded by books grey_sloth_plushiea <c> stuffed animal in the snow a <c> stuffed animal floating in outer space a <c> stuffed animal sitting on a red couch indoors a <c> stuffed animal driving a tiny car a shiny <c> stuffed animal a wet <c> stuffed animal a <c> stuffed animal in a forest clearing with sunlight rays a <c> stuffed animal with mountains in the background a <c> stuffed animal on the beach during sunset a cube-shaped <c> stuffed animal monster_toya <c> toy in a wizard robe holding a staff a <c> toy on the beach during sunset a shiny <c> toy a <c> toy wearing a black top hat and a monocle a cube-shaped <c> toy a <c> toy sitting on a red couch indoors a <c> toy in a city park surrounded by flowers a <c> toy driving a tiny car a <c> toy wearing a rainbow scarf a <c> toy wearing sunglasses pink_sunglassesa <c> glasses next to a cup of coffee on a kitchen counter a <c> glasses on a wooden shelf surrounded by books a vintage <c> glasses on an antique table a <c> glasses with a city skyline in the background a <c> glasses with mountains and mist in the back- ground a glowing <c> glasses in the dark a <c> glasses on a cobblestone street after rain a modern minimalistic <c> glasses on a white surface a <c> glasses on a marble table, studio lighting a <c> glasses placed on pink silk fabric poop_emojia <c> toy with the Eiffel Tower in the background a <c> toy in the snow a <c> toy driving a tiny car a <c> toy on the beach during sunset a <c> toy in a wizard robe holding a staff Supplementary Material25 SubjectPrompt a <c> toy wearing a rainbow scarf a <c> toy floating in outer space a cube-shaped <c> toy a <c> toy in a police uniform a shiny <c> toy rc_cara <c> toy wearing sunglasses a <c> toy wearing a rainbow scarf a shiny <c> toy a <c> toy in the jungle a <c> toy driving a tiny car a <c> toy floating in outer space a <c> toy in a police uniform a <c> toy in a chef outfit a <c> toy wearing a black top hat and a monocle a <c> toy in the snow red_cartoona shiny <c> cartoon a <c> cartoon wearing a black top hat and a monocle a wet <c> cartoon a <c> cartoon with the Eiffel Tower in the background a <c> cartoon sitting on a red couch indoors a <c> cartoon on the beach during sunset a <c> cartoon floating in outer space a <c> cartoon wearing a rainbow scarf a <c> cartoon in the jungle a <c> cartoon with mountains in the background robot_toya <c> toy in a police uniform a <c> toy in a chef outfit a <c> toy in a forest clearing with sunlight rays a <c> toy driving a tiny car a <c> toy sitting on a red couch indoors a <c> toy on the beach during sunset a <c> toy with mountains in the background a shiny <c> toy a cube-shaped <c> toy a <c> toy in a city park surrounded by flowers shiny_sneakera <c> sneaker on a glass table with reflections a <c> sneaker on a sandy beach at sunset a modern minimalistic <c> sneaker on a white surface a <c> sneaker on a cobblestone street after rain a <c> sneaker in the snow under warm sunlight a <c> sneaker on a marble table, studio lighting a <c> sneaker with a city skyline in the background 26D. Shenaj, F. Errica, A. Carta SubjectPrompt a vintage <c> sneaker on an antique table a <c> sneaker placed on pink silk fabric a <c> sneaker in a minimalist art gallery teapota modern minimalistic <c> teapot on a white surface a glowing <c> teapot in the dark a <c> teapot floating in crystal clear water a <c> teapot placed on pink silk fabric a <c> teapot on a sandy beach at sunset a <c> teapot on a mossy rock in a forest a <c> teapot with mountains and mist in the background a vintage <c> teapot on an antique table a <c> teapot on a glass table with reflections a <c> teapot next to a cup of coffee on a kitchen counter vasea <c> vase on a mossy rock in a forest a <c> vase next to a cup of coffee on a kitchen counter a <c> vase with a city skyline in the background a <c> vase on a sandy beach at sunset a glowing <c> vase in the dark a <c> vase floating in crystal clear water a <c> vase on a wooden shelf surrounded by books a <c> vase on a reflective mirror surface a <c> vase in a minimalist art gallery a <c> vase with mountains and mist in the background wolf_plushiea wet <c> stuffed animal a <c> stuffed animal driving a tiny car a <c> stuffed animal wearing a black top hat and a mon- ocle a <c> stuffed animal wearing a red hat a <c> stuffed animal in a chef outfit a <c> stuffed animal wearing a rainbow scarf a <c> stuffed animal in a city park surrounded by flowers a <c> stuffed animal floating in outer space a <c> stuffed animal in a forest clearing with sunlight rays a <c> stuffed animal on the beach during sunset Supplementary Material27 A3 Full Self-Attention and Cross-Attention Ranks 0 20 40 60 80 100 Q Cat 2 Dog 8 Can Robot Toy Teapot 0 20 40 60 80 100 K 0 100 200 300 400 500 V 050100150200250 Attention Blocks (ordered down→mid→up) 0 100 200 300 400 500 O (a) SDXL Self-attention ranks, for five distinct subjects. 0 100 200 300 400 500 Q Cat 2 Dog 8 Can Robot Toy Teapot 0 100 200 300 400 500 K 0 100 200 300 400 500 V 050100150200250 Attention Blocks (ordered down→mid→up) 0 100 200 300 400 500 O (b) SDXL Cross-attention ranks, for five distinct subjects. 28D. Shenaj, F. Errica, A. Carta 0 100 200 300 Q Cat 2 Dog 8 Can Robot Toy Teapot 0 100 200 300 400 500 K 0 100 200 300 400 500 V 020406080 Attention Blocks (ordered down→mid→up) 0 100 200 300 400 500 O (a) KOALA-700m Self-attention ranks, for five distinct subjects. 0 50 100 150 Q Cat 2 Dog 8 Can Robot Toy Teapot 0 100 200 300 400 500 K 0 100 200 300 400 500 V 020406080 Attention Blocks (ordered down→mid→up) 0 100 200 300 400 500 O (b) KOALA-700m Cross-attention ranks, for five distinct subjects. Supplementary Material29 A4 KOALA Per-Class Scores In Figure A3 we report the per-subject scores for KOALA-700m. Similar to the SDXL in the main paper, we note that the optimal rank changes depending on the subject. We note here more variability in the best subject rank selection. backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.630.610.630.730.610.520.540.700.500.610.700.720.490.670.810.660.650.590.670.610.390.520.570.440.360.610.590.690.530.60 0.640.600.650.710.650.550.550.720.460.600.730.700.490.690.820.660.660.600.680.650.410.530.590.450.410.620.600.650.580.61 0.610.630.680.760.680.550.570.770.490.630.730.740.540.690.840.670.700.650.690.660.420.550.640.540.410.660.640.680.600.64 0.640.640.670.830.720.560.580.800.530.660.750.770.480.730.840.700.700.680.720.680.440.540.640.610.430.650.650.710.640.65 0.650.630.740.850.730.550.610.830.510.650.740.770.540.750.850.700.720.670.750.720.540.580.700.630.410.670.660.730.660.67 0.650.640.720.840.740.560.610.820.530.650.750.790.530.760.830.690.710.670.760.730.520.570.690.660.420.710.680.720.690.68 0.650.680.740.710.770.670.610.830.550.670.760.800.530.750.830.720.730.690.750.710.550.610.710.710.420.720.690.720.720.69 0.640.640.700.750.730.650.720.830.490.680.760.740.540.720.860.720.750.690.690.710.510.610.730.700.410.680.620.740.710.68 DINO Score SIZE (MB) 13 25 50 100 200 400 799 144 backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.780.650.690.780.530.690.730.710.750.720.810.760.690.680.820.730.720.680.700.750.590.690.790.620.540.730.760.790.640.71 0.810.670.710.780.570.700.760.710.720.740.820.740.680.690.810.730.730.690.730.780.620.700.810.630.560.740.760.780.680.72 0.800.690.710.810.590.710.740.730.740.770.830.750.710.690.810.730.740.740.730.790.620.700.850.660.560.750.790.790.690.73 0.820.710.740.820.620.720.760.750.760.780.840.760.700.710.820.750.740.770.770.810.640.710.840.680.560.770.810.800.720.75 0.820.720.770.830.630.730.780.760.770.790.830.760.720.730.820.750.750.770.800.820.710.740.870.710.560.780.810.830.740.76 0.820.740.760.830.650.740.770.760.770.790.850.770.720.740.820.750.760.760.810.820.700.730.870.710.570.790.830.830.750.77 0.820.740.770.820.680.800.770.770.780.800.850.780.730.740.800.750.770.780.800.810.720.760.870.740.580.800.830.820.770.77 0.810.720.760.830.630.770.840.750.710.800.850.750.720.710.810.750.760.770.740.810.690.750.860.730.550.780.800.820.750.76 CLIP-I Score (Image) SIZE (MB) 13 25 50 100 200 400 799 144 backpack backpack_dog bear_plushie berry_bowl can candle cat cat2 clock colorful_sneaker dog dog2dog3dog5dog6dog7dog8 duck_toy fancy_boot grey_sloth_plushie monster_toy pink_sunglasses poop_emoji rc_car red_cartoon robot_toy shiny_sneaker teapot wolf_plushie AVG 8 16 32 64 128 256 512 LoRA 2 0.350.380.350.270.340.320.360.360.320.330.340.350.350.350.310.360.350.300.320.340.340.340.270.340.320.340.320.340.370.34 0.350.360.340.270.350.320.360.360.320.330.340.350.350.340.310.360.350.290.310.320.320.340.270.350.310.340.310.340.360.33 0.350.360.330.270.340.320.350.360.310.310.340.340.340.330.300.350.350.290.310.330.320.340.250.340.310.330.310.320.360.33 0.340.350.320.260.330.310.350.350.310.320.330.340.340.330.300.350.340.280.300.310.310.330.250.330.310.320.300.320.340.32 0.340.340.300.260.330.310.350.350.310.310.320.340.330.310.300.340.340.270.290.300.270.320.230.320.320.310.300.320.340.31 0.340.330.310.250.320.280.350.340.300.310.320.340.330.300.290.330.340.280.290.290.270.330.230.320.320.300.290.290.330.31 0.350.330.300.260.310.270.340.330.300.300.320.330.320.300.300.350.320.260.280.300.280.330.230.310.320.300.290.300.310.30 0.360.340.310.250.320.280.320.350.310.300.310.340.330.320.310.330.330.260.300.310.310.320.220.310.260.300.310.310.320.31 CLIP-T Score (Text) SIZE (MB) 13 25 50 100 200 400 799 144 Fig. A3: KOALA-700m backbone, per-subject scores. We highlight with a grey band rank 64, the default value commonly used in previous work. We also highlight in red the best value for each subject. On the side, we also add the model size in MB. A5 Additional Qualitative Results Figures A4 and A5 present additional qualitative comparisons using SDXL for the teapot and can subjects. Notably, our approach is the only method that 30D. Shenaj, F. Errica, A. Carta consistently reproduces the label on the can across all generated images, demon- strating superior fidelity to fine-grained subject details. Figures A6 and A8 showcase complex prompt generations, illustrating that LoRA 2 generalizes effectively to broader, more challenging generation scenarios beyond simple subject reconstruction, while LoRA with fixed rank in Figures A7 and A9 often fails to recontextualize properly. A6 Limitations Our current evaluation of LoRA 2 focuses on personalized subject learning; ex- tending the approach to style learning remains an interesting direction for future work. For model merging, a current limitation arises from the fact that LoRA 2 produces LoRA adapters of different ranks across subjects. To merge two such adapters, the lower-rank LoRA must be expanded to match the rank of the larger one prior to merging. Alternatively, composition-based approaches such as [22] sidestep this issue entirely by combining subjects without requiring ex- plicit adapter merging. Finally, when generating images with complex prompts, we observe that background colors can occasionally leak into the subject, subtly shifting its ap- pearance. However, this artifact is not unique to LoRA 2 and manifests across all competing approaches. Despite this, LoRA 2 consistently produces superior subject fidelity compared to existing methods, even under challenging prompt conditions. Supplementary Material31 “a k can on a mossy rock in a forest” “a k can with mountains and mist in the background” “a k can placed on pink silk fabric” “a k can on the snow under warm sunlight’ “a k on a sandy beach at sunset” Rank 8 Rank 64 Rank 512 LoRA 2 Fig. A4: Images generated using SDXL backbone for the “can" subject. The original subject is present on the top left. 32D. Shenaj, F. Errica, A. Carta “a modern minimalistic k teapot on a white surface” “a glowing k teapot in the dark” “a k teapot floating in crystal clear water” “a vintage k teapot on an antique table ” “a k teapot on a glass table with reflections ” Rank 8 Rank 64 Rank 512 LoRA 2 Fig. A5: Images generated using SDXL backbone for the “teapot" subject. The original subject is present on the top left. Supplementary Material33 “a k dog racing through an exploding tunnel of colorful paint splashes, motion blur, frozen droplets mid-air, low angle high-speed shot.” “a k dog launching off a snowy mountain peak on a snowboard, massive powder explosion, crisp blue sky, low angle ac- tion shot.” “a k dog kayaking through a rag- ing white-water rapid, water exploding around the boat, soaked fur, intense fo- cus, action shot frozen mid-crash”. “a k dog leaping between two glaciers over an icy blue crevasse, paws mid-air, frozen mist, dramatic arctic light, ultra- wide low angle”. “a k dog sitting on the waterfront, Golden Gate Bridge emerging from thick morning fog in the background, soft dif- fused light filtering through the mist”. “a k dog standing in front of the Colos- seum at golden hour, warm amber light on ancient stone, dramatic clouds above, cinematic wide angle”. Fig. A6: LoRA 2 generated images of “dog8" across complex scenarios. 34D. Shenaj, F. Errica, A. Carta “a k dog racing through an exploding tunnel of colorful paint splashes, motion blur, frozen droplets mid-air, low angle high-speed shot.” “a k dog launching off a snowy mountain peak on a snowboard, massive powder explosion, crisp blue sky, low angle ac- tion shot.” “a k dog kayaking through a rag- ing white-water rapid, water exploding around the boat, soaked fur, intense fo- cus, action shot frozen mid-crash”. “a k dog leaping between two glaciers over an icy blue crevasse, paws mid-air, frozen mist, dramatic arctic light, ultra- wide low angle”. “a k dog sitting on the waterfront, Golden Gate Bridge emerging from thick morning fog in the background, soft dif- fused light filtering through the mist”. “a k dog standing in front of the Colos- seum at golden hour, warm amber light on ancient stone, dramatic clouds above, cinematic wide angle”. Fig. A7: LoRA (rank 512) generated images of “dog8" across complex scenarios do not produce satisfactory results. Supplementary Material35 “a k boot standing on the moon sur- face, Earth rising on the horizon, ultra- realistic cinematic lighting”. “a k boot on a giant block of ice in an arctic tundra, northern lights glowing green and purple above, cinematic blue tones, photorealistic”. “a k boot in the Sonoran desert, cactus and red rocks behind, blue sky, warm natural light”. “a k boot on a Grand Canyon over- look, vast red canyon stretching behind, golden hour”. “a k boot next to a rubik’s cube”.“a cat inside a k boot, soft natural light, cozy home”. Fig. A8: LoRA 2 generated images of “fancy boot" across complex scenarios. 36D. Shenaj, F. Errica, A. Carta “a k boot standing on the moon sur- face, Earth rising on the horizon, ultra- realistic cinematic lighting”. “a k boot on a giant block of ice in an arctic tundra, northern lights glowing green and purple above, cinematic blue tones, photorealistic”. “a k boot in the Sonoran desert, cactus and red rocks behind, blue sky, warm natural light”. “a k boot on a Grand Canyon over- look, vast red canyon stretching behind, golden hour”. “a k boot next to a rubik’s cube”.“a cat inside a k boot, soft natural light, cozy home”. Fig. A9: LoRA (rank 512) generated images of “fancy boot" across complex scenarios do not produce satisfactory results.