Paper deep dive
$p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval
Yulong Ming, Mingyue Wang, Jijia Yang, Cong Wang, Xiaohua Jia
Abstract
Abstract:Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, or efficiency degradation with large $k$. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose $p^2$RAG, a privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, $p^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, $p^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that $p^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.
Tags
Links
- Source: https://arxiv.org/abs/2603.14778v1
- Canonical: https://arxiv.org/abs/2603.14778v1
Intelligence
Status: not_run | Model: - | Prompt: - | Confidence: 0%
Entities (0)
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Full Text
56,265 characters extracted from source content.
Expand or collapse full text
p2p^2RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-k Retrieval Yulong Ming Mingyue Wang Jijia Yang Cong Wang Xiaohua Jia Abstract Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-k retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary k due to their inability to change k, new security issues, or efficiency degradation with large k. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose p2p^2RAG, a privacy-preserving RAG service that supports arbitrary top-k retrieval. Unlike existing systems, p2p^2RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-k documents. For security, p2p^2RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner’s database and the user’s prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that p2p^2RAG is 3–300× faster than the state-of-the-art PRAG for k=16k=16–10241024. Machine Learning, ICML 1 Introduction Large Language Models (LLMs) are powerful tools, but they suffer from limitations, such as hallucinations (Ji et al., 2023) and lack of real-time or domain-specific data. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) provides a powerful solution to these issues without the high cost and complexity of fine-tuning. By retrieving relevant information from an external knowledge base and combining it with the user’s prompt, RAG enables the model to generate responses that are accurate, up-to-date, and factually grounded. RAG as a Service (RaaS) has recently emerged as a popular paradigm. In this architecture, the data owner outsources its proprietary database to the RAG service. The user sends a prompt to this service to retrieve the top-k relevant documents from the database. The user then submits the prompt, augmented by these documents, to the model to generate responses. However, this workflow raises privacy issues. The privacy of both the data owner and the user depends on the RAG service’s honesty, as it can access the data owner’s database and the user’s prompt. Moreover, malicious users can attempt to extract proprietary information from the data owner’s database. Privacy-preserving RAG has been proposed to address these issues. Existing systems treat privacy-preserving RAG as a secure top-k retrieval problem. They use techniques from secure k-Approximate Nearest Neighbor (k-ANN) or secure sorting to compute similarity scores, sort the candidates, and select the top-k documents. However, existing systems face challenges supporting arbitrary k. In particular, some do not support dynamic adjustments to k for user queries, some raise security issues, and others suffer from efficiency degradation with a large k (Servan-Schreiber et al., 2022b; Zyskind et al., 2024). Recent studies show that within a particular yet large context threshold, such as 64K tokens, RAG with a large k generally outperforms RAG with a smaller k for modern long-context models (Leng et al., 2024; Li et al., 2024a, b). This finding implies that k can be increased in RAG, which trades model inference efficiency for accuracy. A large k is also required by applications such as cache-augmented retrieval (Gim et al., 2024; Xu et al., 2024). System overview. We propose p2p^2RAG to address this challenge. Unlike existing systems, p2p^2RAG avoids sorting candidate documents. Instead, we use an interactive bisection method between the user and the servers to determine a threshold. We identify documents within the threshold as the top-k results. This design enables the user to select an arbitrary k by selecting the threshold. Moreover, for RAG applications, the user only requires the set of the top-k documents rather than their specific internal order. Therefore, with this design, the documents within or outside of the determined threshold remain unsorted. This method decreases the number of comparisons. Because comparisons are the primary computation and communication bottleneck in secure protocols, p2p^2RAG reduces these costs. p2p^2RAG protects both the data owner’s database and the user’s prompt. We use secret sharing to ensure that the semi-honest RAG service cannot extract the database or the prompt during the workflow. p2p^2RAG’s protocol runs on two semi-honest non-colluding servers where the database and the prompt are secret-shared. Neither server can know any information about the database or the prompt. Moreover, to defend against malicious users, we design restrictions and verifications for the interactions between servers and users. We limit the number of result documents, limit the number of bisection iterations, and verify that the final retrieval of textual documents matches the bisection results. The amount of leakage that a malicious user can extract from the database is tightly bounded. In summary, our contributions are as follows: • We propose p2p^2RAG, a privacy-preserving RAG system that supports arbitrary top-k retrieval. p2p^2RAG protects both the data owner’s database and the user’s prompt. p2p^2RAG uses an interactive bisection method to determine the top-k set without sorting all candidate documents, enabling efficient retrieval even for a large k. • We design p2p^2RAG’s security protocol to defend against both the semi-honest RAG service and malicious users. We use secret sharing to protect data privacy against semi-honest servers. We enforce restrictions and verification mechanisms to defend against malicious users and tightly bound the information leakage of the database. • We implement p2p^2RAG111https://github.com/myl7/p2rag and evaluate its performance. The experiments show that p2p^2RAG achieves higher performance with larger k and is 3–300× faster than the state-of-the-art system, PRAG, for k=16k=16–10241024. 2 Problem Formulation 2.1 System Model p2p^2RAG aims to provide privacy-preserving RAG as a service to the data owner and users. During the offline (i.e., preprocessing) stage, the data owner outsources its data as an encrypted database on p2p^2RAG. During the online stage, users use encrypted prompts to query top-k relevant documents from the encrypted database. The data owner’s database contains N documents. Each textual document corresponds to an m-dimensional document embedding, i.e., vector. Each embedding is generated from the text using an embedding model. It semantically describes the corresponding text, so that the distance between two embeddings can quantify the similarity of the two texts. We use the cosine distance as the distance metric, which is widely used in existing RAG applications. We assume all embeddings are ℓ2 _2-normalized. p2p^2RAG consists of two servers. We choose the two-server model over more servers because we can use the efficient cryptographic tools optimized for two servers. To outsource the database to two servers, the data owner secret-shares it. That is, for each document embedding, the two servers hold the data owner’s two shares, respectively. The sum of the two shares is the embedding, and a single share reveals no information about it. Both texts and embeddings are secret-shared in this way. A user has a textual prompt and a prompt embedding. The prompt embedding is generated from the text using the same embedding model. We assume that the embedding model is public. We also assume the user can get the prompt embedding using the model in advance, as in prior privacy-preserving RAG work (Zyskind et al., 2024; Cheng et al., 2025). To use p2p^2RAG, the user secret-shares its prompt embedding and sends the two shares to the two servers, respectively. By running p2p^2RAG’s protocol, the user finally receives the indices of k documents from the servers. These documents have the k highest cosine similarities to the prompt text. 2.2 Threat Model • The two servers are semi-honest and non-colluding. That is, the servers honestly follow p2p^2RAG’s protocol but passively attempt to infer information about the database and prompts, i.e., honest-but-curious. The servers do not share any extra information beyond what is defined in p2p^2RAG’s protocol. This assumption is also widely used in other secret-sharing-based security systems (Kamara et al., 2012; Mohassel and Zhang, 2017). • Users are malicious. Users can deviate from p2p^2RAG’s protocol arbitrarily to extract information from the database. Users do not trust the servers. • Data owners are honest and do not trust any server or user. 2.3 Security Goals We aim to protect the privacy of both the data owner and user, i.e., protect both the database and prompt, resulting in the following goals. • Privacy. No server can get any information about any prompt or document. In particular, because the returned k documents are the k most similar to the prompt and thus can reveal information about the prompt, we also need to protect them from the servers. • Bounded database leakage. No user can get any extra information about any document beyond the “baseline” leakage and a small amount of leakage defined in p2p^2RAG’s protocol. The “baseline” leakage refers to the k documents returned to the user, which is necessary for p2p^2RAG’s functionality. p2p^2RAG also leaks counts of documents that are in a particular range to accelerate p2p^2RAG’s protocol. This aggregate leakage does not identify any particular document, and the total number of leaked counts is limited, resulting in O(log2N)O( ^2N) leakage. 3 Preliminary We use Shamir secret sharing (1979) over a prime field pF_p to protect a value. We denote both shares of a value x by [x][x]. The shares are held by the servers, respectively. We denote each share by [x]i[x]_i for i∈0,1i∈\0,1\. That is, we protect a value x by secret-sharing it as [x]0+[x]1=x[x]_0+[x]_1=x and sending [x]i[x]_i to the server i, respectively. Only one [x]i[x]_i does not reveal any information about x. We have [x±y]=[x]±[y][x± y]=[x]±[y]. That is, for addition and subtraction, each server only needs to compute on its local shares, and no interaction is required. Distributed Comparison Functions (DCFs) (Boyle et al., 2015, 2016, 2019, 2021) are schemes to secret-share a comparison function. Each function share can be individually executed on a server. The outputs of the function shares are the shares of the output of the original function. Definition 3.1 (Comparison Functions). For the input domain in=0,⋯,p−1G_in=\0,·s,p-1\, a group (out,+)(G_out,+) as the output domain, a∈ina∈G_in, and b∈outb∈G_out, a comparison function fa,b<f^<_a,b is a function that for any input x, the output y has y=by=b only when x<ax<a, otherwise y=0y=0. Definition 3.2 (Distributed Comparison Functions). For the input domain in=0,⋯,p−1G_in=\0,·s,p-1\, the output domain (out,+)(G_out,+), a∈ina∈G_in, b∈outb∈G_out, and a security parameter λ, a DCF is a scheme consisting of the methods: • Key generation: Gen(1λ,a,b)→(k0,k1)Gen(1^λ,a,b)→(k_0,k_1). • Evaluation: Eval(ki,j)→yi,jEval(k_i,j)→ y_i,j for any i∈0,1i∈\0,1\ and any j∈inj∈G_in. That satisfies: • Correctness: y0,j+y1,j=by_0,j+y_1,j=b only when j<aj<a , otherwise y0,j+y1,j=0y_0,j+y_1,j=0. • Privacy: Neither k0k_0 nor k1k_1 reveals any information about a and b. Formally speaking, there exists a Probabilistic Polynomial Time (PPT) simulator SimdcfSim_dcf that can generate output computationally indistinguishable from any strict subset of the keys output by GenGen. We use DCFs and Boyle et al.’s interval containment gate (2019) to check if a secret-shared value [x][x] is in an interval [xl,xr)[x_l,x_r) as Algorithm 1. Its output is [1][1] when x∈[xl,xr)x∈[x_l,x_r), otherwise [0][0]. Note that this algorithm handles the addition in inG_in, which is an integer addition modulo p. Algorithm 1 Comparison: [0]/[1]←Cmp([x],[xl,xr])[0]/[1]← Cmp([x],[x_l,x_r]) The key generation method Cmp.Gen([xl,xr))Cmp.Gen([x_l,x_r)): Sample random r∈pr∈F_p and distribute [r][r]. xl′=xl+rx _l=x_l+r, xr′=xr+rx _r=x_r+r. (k0l,k1l)←Gen<(1λ,xl′,p−1)(k_0^l,k_1^l)← Gen^<(1^λ,x _l,p-1). Distribute (k0l,k1l)(k_0^l,k_1^l). (k0r,k1r)←Gen<(1λ,xr′,1)(k_0^r,k_1^r)← Gen^<(1^λ,x _r,1). Distribute (k0r,k1r)(k_0^r,k_1^r). Sample random [w]∈p[w]∈F_p s.t. [w]0+[w]1=1xl′>xr′[w]_0+[w]_1=1\x _l>x _r\. Distribute [w][w]. The evaluation method Cmp.Eval([x])Cmp.Eval([x]): Publish x+rx+r. x is masked by r and thus protected. [y]il=Eval(i,kil,x)[y]_i^l=Eval(i,k_i^l,x), [y]ir=Eval(i,kir,x)[y]_i^r=Eval(i,k_i^r,x). [y]i=[y]il+[y]ir+[w]i[y]_i=[y]_i^l+[y]_i^r+[w]_i. Output [y][y]. We assume that there is a trusted dealer that generates the shared random values required by p2p^2RAG, which is common in security protocols (Beaver, 1992, 1997). The trusted dealer runs during the offline stage and does not participate in the online stage. Therefore, we do not benchmark its performance. For example, the data owner can be the trust dealer, and other techniques can also achieve it (Damgård et al., 2013; Orsini et al., 2020). 4 System Design A user willing to use p2p^2RAG first secret-shares its prompt embedding as [p][p]. It then sends the two shares to the two servers, respectively. The servers first run the distance calculation to calculate the distances [dj][d_j] between the prompt embedding [p][p] and each document embedding [xj][x_j]. The servers then run the interactive distance bisection method with the user. In each iteration, the servers return the count [c][c] of documents with distances less than a user-specified distance threshold [dk][d_k]. The user then updates dkd_k by comparing the returned number c with the target k. When either c is close enough, which is defined by p2p^2RAG, or the iteration number reaches a server-specified limit, the servers return a secret-shared 0/10/1 array c d_c where the elements corresponding to the in-range documents are set to 11. The number of these elements, i.e., c, is also limited by a server-specified threshold. Finally, during the text retrieval, the user retrieves all 11’s indices without letting the servers know about the retrieved indices. The servers also verify that the user only retrieves the indices that match the 0/10/1 array c d_c. The workflow is shown as Figure 1. Figure 1: The workflow of p2p^2RAG. During the offline stage, the data owner sets up the secret-shared database. During the distance calculation, the servers compute the secret-shared distances between each document and the user’s prompt. During the distance bisection, the user determines a distance threshold dkd_k for the top-k documents. The bisection iteration ends when dkd_k is found, or the number of iterations exceeds an upper bound. During the text retrieval, the user retrieves textual documents using the indices of the top-k documents. 4.1 Distance Calculation The servers compute the cosine distance [dj][d_j] between the prompt embedding and each document embedding. The document embedding must be ℓ2 _2-normalized by the data owner. We assume the ℓ2 _2-norms of the database l=‖j‖l=\| x_j\| and the prompt lu=‖l_u=\| p\| are public because they do not leak any information about the database or the prompt. Note that the dot product dj=⋅jd_j= p· x_j differs from the cosine distance only by the factor llull_u. The dot product yields the same ranking as the cosine distance because the factor is constant. Therefore, we use the dot product as djd_j. We use Beaver triples (1992) for each dimension to compute [dj]=[⋅j][d_j]=[ p· x_j] from [][ p] and [j][ x_j] as Algorithm 2. This algorithm has O(N)O(N) communication costs, but the amount is 4N4N integers and small, e.g., 16MB for N=220N=2^20 and 64-bit integers. Algorithm 2 Dot Product: [c]=[⋅][c]=[ a· b] Offline Stage: Sample m random rn(a),rn(b),rn(ab)r^(a)_n,r^(b)_n,r^(ab)_n for n∈[0,m)n∈[0,m) s.t. rn(a)⋅rn(b)=rn(ab)r^(a)_n· r^(b)_n=r^(ab)_n. Distribute [rn(a)],[rn(b)],[rn(ab)][r^(a)_n],[r^(b)_n],[r^(ab)_n]. Online Stage: for k=0k=0 to m do [dk]=[ak]−rn(a),[ek]=[bk]−[rn(b)][d_k]=[a_k]-r^(a)_n,[e_k]=[b_k]-[r^(b)_n]. Publish dk,ekd_k,e_k. [ck]i=[rn(ab)]i+ek⋅[rn(a)]+dk⋅[rn(b)]+i⋅dk⋅ek[c_k]_i=[r^(ab)_n]_i+e_k·[r^(a)_n]+d_k·[r^(b)_n]+i· d_k· e_k. end for [c]=∑k=0m[ck][c]= _k=0^m[c_k]. Output [c][c]. Current embedding models output floating-point numbers that are ℓ2 _2-normalized to 11. To represent the embedding elements in pF_p, the data owner scales them by a factor 2f2^f and truncates the fractional parts (Catrina and Saxena, 2010), resulting in integers with precision f. While the truncation causes the embeddings’ norms to deviate slightly from 2f2^f, this error is negligible, as we show in Section 7.1. This scale requires that the cardinality of pF_p must be greater than 2f+12^f+1. Note that the product of two scaled integers has 2f2f precision. Therefore, we are required to truncate every product to avoid causing an overflow for the field pF_p (Zyskind et al., 2024). That is, for integers a,ba,b, we use the integer part of ab/2fab/2^f as the product. Definition 3.2 requires the input a∈ina∈G_in to be an unsigned integer because the internal comparison is performed bit by bit, starting from the most significant bit (Boyle et al., 2021). We extend Algorithm 1 to the signed integers by giving pF_p a new bit representation. The elements −⌊p/2⌋,⋯,⌊p/2⌋- p/2 ,·s, p/2 are represented as the bits of the unsigned integers 0,⋯,p−10,·s,p-1. This bit representation transformation does not change any algorithm step. 4.2 Distance Bisection Given all [dj][d_j], we need to get the top-k values of them. The naive method is to let the servers return all djd_j to the user, who then sorts them. However, though no embeddings are given to the user, letting the user know all N distances leaks information about the database and can even lead to data recovery (Li et al., 2015; Kornaropoulos et al., 2019; Liu et al., 2025). Existing systems avoid this by sorting [dj][d_j] on the servers, which is inefficient in security protocols. Unlike existing systems, p2p^2RAG runs an interactive bisection method between the servers and the user to determine a distance threshold dkd_k. The user starts with the interval [−llu,llu][-l_u,l_u] and dk=0d_k=0. During each bisection iteration, the servers compare each distance [dj][d_j] with [dk][d_k] and return to the user the count of distances c that are less than dkd_k. To compute [c][c], the servers compare [dk][d_k] with each [dj][d_j] using Algorithm 1 with the interval [dk,p)[d_k,p). That is, the user runs the key generation method Cmp.Gen([dk,p))Cmp.Gen([d_k,p)), distributing kil,kir,[w]ik_i^l,k_i^r,[w]_i to the server i, respectively. The servers run the evaluation method Cmp.Eval([dj])Cmp.Eval([d_j]) for each [dj][d_j], resulting in [0]/[1][0]/[1]. All [0]/[1][0]/[1] outputs of all [dj][d_j] forms a vector [c][ d_c]. The servers sum their elements to [c][c] and return c to the user. To avoid carrying, the cardinality of pF_p must be greater than N. The user compares c with k to update the interval and dkd_k. When either the user or the servers stop the iteration, as defined below, the servers return c d_c instead of c to the user. The servers must limit the number of iterations. At each iteration, the user receives the number c as leakage. This small leakage is aggregated from all documents and does not identify any particular document. However, this leakage can accumulate during users’ queries. The servers limit the number of iterations with an upper bound stepmstep_m to limit the total amount of leakage. The user can stop the iteration early to speed up the query because: 1) in the bisection method, as the step size becomes very small in later iterations, further steps produce little change. Therefore, stopping early still yields a result close to the true solution. Moreover, 2) recent studies show that within a particular yet large context threshold, such as 64K tokens, retrieving slightly more results in RAG generally outperforms RAG with a smaller k for modern long-context models (Leng et al., 2024; Li et al., 2024a, b). We model this early stop as a slack value ξ. The user can stop when 0≤c≤k+ξ0≤ c≤ k+ξ. We require ξ≤kξ≤ k. Otherwise, the extended stopping condition k≤c≤k+ξk≤ c≤ k+ξ can be satisfied by multiple iterations, leading to ambiguity in determining the correct termination point. After termination, the servers must limit the number of [1][1] in the final [c][ d_c], i.e., limit the maximum [c][c], because [c][ d_c] controls which indices can be retrieved by the user during the text retrieval. A user can use dk=llu+1d_k=l_u+1 to get a [c][ d_c] whose elements are all 11, resulting in accessing any document. The servers perform this check by publicly setting up an upper bound cmc_m and comparing the final [c][c] with cmc_m using Algorithm 1. Unlike the comparison during iterations, the key generation method of this comparison is executed offline by the trusted dealer. A malicious user can deviate from the protocol to manipulate [c][ d_c]. When the servers return c, this manipulation does not give the user advantages over the leakage of c. When the servers return c d_c, the servers must also verify that all elements of [c][ d_c] are [0]/[1][0]/[1]. To achieve that, the servers compare each element with 22 using Algorithm 1. The trusted dealer now executes the key generation method for this comparison offline. 4.3 Text Retrieval Given c d_c, the user retrieves all its 11’s indices to get documents using Private Information Retrieval (PIR). Because the database is stored with secret sharing, the user uses single-server PIR (Ali et al., 2021; Menon and Wu, 2022) to retrieve the documents from both servers. Following prior work (Servan-Schreiber et al., 2022a; Zyskind et al., 2024), we assume that an existing single-server PIR protocol can be used and do not benchmark its performance. Another requirement is that the servers must verify that the user only retrieves the indices that match [c][ d_c]. We observe that in single-server PIR, e.g., SimplePIR (Henzinger et al., 2023), there are operations to multiply the database by the query, where each document text is converted to a number. We can multiply this result by [c][ d_c] to enforce the requirement. 5 Construction of p2p^2RAG In this section, we summarize Section 4 to p2p^2RAG’s protocol. p2p^2RAG’s protocol has the offline and online stages. The offline stage includes setting up the data owner and servers. We do not benchmark its performance because it does not affect the online performance of a user’s query. The online stage involves computation and communication between the servers and the user. During the data owner’s offline stage, it has a database filled with textual documents and aims to use p2p^2RAG. It pads all documents to the same length and publishes this length, which is typically required by the text retrieval implemented as single-server PIR. It uses a public embedding model to transform each document to an embedding x. The embedding has m dimensions. Its elements are in a prime field pF_p as described in Section 4.1. The document embeddings are ℓ2 _2-normalized to l. The data owner publishes l. The data owner shares each document embedding as [][ x] and sends them to the servers, respectively. During the servers’ offline stage, they use a trusted dealer to run the offline stage of the dot product method. They are configured with an iteration step upper bound stepmstep_m, an upper bound cmc_m for c, and a slack value ξ for the result number. They use a trusted dealer to run Cmp.GenCmp.Gen of Algorithm 1 with the intervals [0,2),[0,cm+1)[0,2),[0,c_m+1), resulting in cmpkey<2/cmpkey≤cm=(kil,kir,[w]i)cmpkey^<2/cmpkey^≤ c_m=(k_i^l,k_i^r,[w]_i) for each server i. The user has a textual prompt and aims to retrieve top-k relevant documents. It uses the same public embedding model as the data owner to transform the prompt to an embedding p. It shares the embedding as [][ p] and sends [][ p] to the servers, respectively, to start the online stage. The online stage is shown as Algorithm 3. The user receives at least k and at most k+ξk+ξ indices from c d_c as the results. The user can then perform the text retrieval as described in Section 4.3 to retrieve the corresponding textual documents from these indices. Algorithm 3 Online stage of p2p^2RAG’s protocol The servers compute [dj]=[⋅j][d_j]=[ p· x_j] for j∈[0,N)j∈[0,N). The user sets dk=0d_k=0 and [dl,dr]=[−llu,llu][d_l,d_r]=[-l_u,l_u]. step=0step=0. while true do The user runs Cmp.Gen([dk,p))Cmp.Gen([d_k,p)) of Algorithm 1, resulting in cmpkey≥dk=(kil,kir,[w]i)cmpkey^≥ d_k=(k_i^l,k_i^r,[w]_i). The user sends cmpkey≥dkcmpkey^≥ d_k to the servers, respectively. The servers run Cmp.Eval([dj])Cmp.Eval([d_j]) of Algorithm 1 for cmpkey≥dkcmpkey^≥ d_k to get c d_c for j∈[0,N)j∈[0,N). [c]=∑jN[dc,j][c]= _j^N[d_c,j]. step←step+1step← step+1. if step≥stepmstep≥ step_m then break end if The servers return c to the user. if 0≤c−k≤ξ0≤ c-k≤ξ then break else if c>kc>k then dl←dk,dk←(dl+dr)/2d_l← d_k,d_k←(d_l+d_r)/2. else dr←dk,dk←(dl+dr)/2d_r← d_k,d_k←(d_l+d_r)/2. end if end while The servers run Cmp.Eval([dc,j])Cmp.Eval([d_c,j]) with cmpkey<2cmpkey^<2 for each element of c d_c. The servers publish cmpkey<2cmpkey^<2’s results. if there is any 0 result then goto abort end if The servers run Cmp.Eval([c])Cmp.Eval([c]) with cmpkey≤cmcmpkey^≤ c_m. The servers publish cmpkey≤cmcmpkey^≤ c_m’s results. if there is any 0 result then goto abort end if The servers return c d_c to the user. return abort: The servers frame the user as malicious and abort. 6 Analysis In this section, we analyze p2p^2RAG’s performance theoretically and prove p2p^2RAG’s security attributes. Our performance analysis covers the online stage described in Section 5, which starts from the user’s prompt embedding and ends with the indices of the top-k documents. We quantify the analysis with the computation and communication complexity. Our security analysis focuses on the security goals described in Section 2.3. We use the simulation-based method (Canetti, 2020) to prove that p2p^2RAG achieves the privacy goal. We quantify the information leakage of the database for the bounded database leakage goal. 6.1 Performance Analysis Computation costs. For the primitives of Section 3, both DCF key generation and evaluation have O(λlogN)O(λ N) (Boyle et al., 2021), and so for the two methods of the comparison of Algorithm 1. In p2p^2RAG’s online stage, the dot product method has O(m)O(m) and is executed N times by the servers. The bisection iteration is executed at most O(logN)O( N) times. We denote the number of iterations by S. Cmp.GenCmp.Gen is executed S times by the user. Cmp.EvalCmp.Eval is executed SN+N+1SN+N+1 times for each server. Therefore, the online stage has O(λlog2N)O(λ ^2N) for the user and O(mN+λNlog2N)O(mN+λ N ^2N) for the servers. Communication costs. For the primitives of Section 3, each key generated by DCFs has O(λlogN)O(λ N) (Boyle et al., 2021), and so for the keys of the comparison. The evaluation method of the comparison of Algorithm 1 has O(1)O(1). In p2p^2RAG’s online stage, the user sends the prompt embedding share and 2S2S comparison keys, and receives S counts and a share of c d_c from each server. While the dot product method has O(m)O(m), the two servers run it m times and Cmp.EvalCmp.Eval N times as the communication costs. Therefore, the online stage has O(m+N+λlog2N)O(m+N+λ ^2N) between the user and the servers, and O(mN+NlogN)O(mN+N N) between the two servers. 6.2 Security Analysis Claim 6.1 (p2p^2RAG’s Privacy). For a Probabilistic Polynomial Time (PPT) adversary A corrupting the server i∗i* for i∗∈0,1i*∈\0,1\ but following the protocol, Algorithm 3 guarantees that A learns no information about the prompt and the database. Proof. The view of A includes: 1) the prompt embedding share [p]i∗[p]_i*; 2)pn−rn(a),xj,n−rn(b)p_n-r^(a)_n,x_j,n-r^(b)_n for n∈[0,m)n∈[0,m) during the distance calculation, where rn(a),rn(b)r^(a)_n,r^(b)_n are random for n∈[0,m)n∈[0,m); 3) At most minstepm,⌈log2(N/(k+ξ))⌉min\step_m, _2(N/(k+ξ)) \ keys cmpkey≥dkcmpkey^≥ d_k for different dkd_k, which are output by Cmp.GenCmp.Gen and sent from the user; 4) dj+r(dj),c+r(c),dc,j+r(dc,j)d_j+r^(d_j),c+r^(c),d_c,j+r^(d_c,j) during Cmp.EvalCmp.Eval, where r(dj),r(c),r(dc,j)r^(d_j),r^(c),r^(d_c,j) are random. We construct a PPT simulator SimSim that generates a view computationally indistinguishable from what A learns by attacking the protocol, proving p2p^2RAG’s privacy goal. For 1), 2), and 4), SimSim outputs random values, which are indistinguishable because these values are either secret-shared or masked by random values. For 3), as each cmpkeycmpkey has cmpkey=(ki∗l,ki∗r,[w]i∗)cmpkey=(k_i^*^l,k_i^*^r,[w]_i^*), SimSim outputs a random value for [w]i∗[w]_i^*, which are indistinguishable because it is secret-shared. SimSim use SimdcfSim_dcf’s output for ki∗l,ki∗rk_i^*^l,k_i^*^r, which are indistinguishable due to DCF’s privacy as described in Section 3. ∎ Bounded database leakage. We follow Carlini et al. (2021) and Servan-Schreiber et al.’s (2022a) perspectives to divide the database leakage to the user into two incomparable and complementary parts: 1) physical leakage, which is the leakage of p2p^2RAG’s protocol beyond the top-k results; 2) functional leakage, which is the leakage from the top-k results themselves. In particular, p2p^2RAG’s functional leakage is at most k+ξk+ξ results, which are O(k(m+logN))O(k(m+ N)) bits of information, regardless of p2p^2RAG’s protocol (Servan-Schreiber et al., 2022a). This leakage is mentioned as the “baseline” leakage in Section 2.3. p2p^2RAG’s physical leakage is at most stepmstep_m numbers of c given to the user, where 0≤c≤N0≤ c≤ N. To distinguish between N+1N+1 possible outcomes (from 0 to N), the maximum information bits contained in c is log2(N+1) _2(N+1) and O(logN)O( N). We have stepm≤⌈log2(N/(k+ξ))⌉step_m≤ _2(N/(k+ξ)) because so many bisection iterations are enough to get the accurate top-k results. That is, p2p^2RAG’s physical leakage is O(log2N)O( ^2N) bits. We identify O(log2N)O( ^2N) as small, proving p2p^2RAG’s bounded database leakage goal. 7 Experiments In this section, we benchmark p2p^2RAG’s accuracy and performance with experiments. We show that: 1) for accuracy, it has no computation errors, and its results are relevant to the prompt; 2) for performance, it supports arbitrary k and outperforms the state-of-the-art system, PRAG (Zyskind et al., 2024). We run p2p^2RAG’s experiments on a physical on-premises server. The server has two AMD EPYC 7352 CPUs. Each CPU has 24 cores with hyper-threading, for a total of 96 logical cores. The server’s memory is 188 GiB, which is sufficient for the experiments. Its operating system is Ubuntu 24.04. We run p2p^2RAG’s servers and users one by one on the physical server. We do not simulate the network but count the transmitted data volume because network conditions vary widely in secret-sharing systems and do not affect the system’s throughput, as CPUs can run other computational work while waiting for transmission. We vary k and the document number N. We use k′=k+ξk =k+ξ for the number of the retrieved documents, simplifying notations. We set the embedding dimension m=1024m=1024. We save embedding elements as 64-bit integers. We set p=264−59p=2^64-59, which is the largest prime number that fits within 64 bits. We set f=32f=32, matching the precision of 32-bit floating-point numbers. We set ξ=kξ=k because 2k2k results of the experiments remain below the 64K tokens threshold, which decreases RAG performance. We set λ=128λ=128 for DCFs. We use Matyas-Meyer-Oseas one-way compression functions with AES-128 as cryptographically secure PRGs of DCFs for speed (Wang et al., 2017). We use OpenSSL 3 for AES-128. 7.1 Accuracy p2p^2RAG’s retrieval is accurate after computing the distances. It has small numerical errors introduced by converting embedding elements to integers and the multiplication of the distance calculation. We use BEIR’s trec-covid dataset (Kamalloo et al., 2024), which has 171332 documents. We use BEIR’s precomputed embeddings from Cohere/beir-embed-english-v3 (Cohere, 2024). We measure the recall between the results from p2p^2RAG’s integer distances and the original distances, i.e., the dot products of the floating-point embeddings, resulting in Figure 2. p2p^2RAG’s recall keeps 11 for k′=16k =16–10241024, showing p2p^2RAG’s high computation accuracy. Because p2p^2RAG only considers cosine and dot product distances, we measure the results’ average relevance score to show if p2p^2RAG retrieves the relevant documents, resulting in Figure 2. The relevance score is a human-annotated categorical level of relevance. 0,1,20,1,2 correspond to irrelevant, relevant, and highly relevant, respectively. p2p^2RAG’s relevance score is larger than 11 for k′≤300k ≤ 300, meaning all of the retrieved documents are generally the relevant ones. Figure 2: Recall and relevance score (higher is better) with varying numbers of retrieved documents k′k 7.2 Performance p2p^2RAG’s performance is independent of the particular document or prompt that is involved in its protocol. This is a necessary attribute for a security system to defend against the timing attack (Kocher, 1996). As a result, we use synthetic datasets for the experiments. We benchmark p2p^2RAG’s end-to-end performance. For Algorithm 2 and DCFs used by Algorithm 1, we use multi-threading and start 96 threads. Because the baseline PRAG does not support multi-threading, we run a batch of 96 instances and report the amortized time per instance. Computation costs. Figure 3 shows that p2p^2RAG is 3× faster than PRAG when k′=16k =16 and changing N, and 7–300× faster when N=214N=2^14 and k=32k=32–10241024. Both p2p^2RAG and PRAG’s time is linear to N and k′k . However, contrary to the trend in PRAG, p2p^2RAG achieves higher performance with larger k. Figure 3: Server time (lower is better) with varying numbers of documents N and retrieved documents k′k . k′=16k =16 and N=217N=2^17 for the two figures, respectively. Communication costs. Table 1 shows p2p^2RAG’s intra-server (IS) and user-server (US) communication volume, together with the number of Round-Trip Time (RTT). We use S=⌈log2(N/k′)⌉S= _2(N/k ) as the number of iterations, simplifying notations. While intra-server communication can reach hundreds of MB, this is well-tolerated via high-bandwidth data center links. The user-server communication remains manageable, as even the largest one is comparable to only a few standard images. Table 1: Communication volume and number of RTTs. Comm. Type Volume (B) RTT User-server 16384+4224S+16N16384+4224S+16N S+1S+1 Intra-server 64N+16NS+3264N+16NS+32 S+1S+1 N, k′k Volume of US, IS RTT 2172^17, 1616 2.1682.168MB, 35.6535.65MB 1414 2172^17, 128128 2.1562.156MB, 29.3629.36MB 1111 2202^20, 1616 16.8616.86MB, 335.5335.5MB 1717 8 Related Work Privacy-preserving RAG. Privacy-preserving RAG has the retrieval and inference phases to be protected. During the retrieval phase, both the data owner’s database and the user’s prompt require protection. p2p^2RAG belongs to the systems protecting the retrieval phase and protecting both the data owner and the user (Zyskind et al., 2024; Bassit and Boddeti, 2025). Some other systems make a weaker security assumption, protecting only the user’s prompt (Cheng et al., 2025; Wang et al., 2025; Hemmat et al., 2025) or the data owner’s database (Zhou et al., 2025; Koga et al., 2025; Wu et al., 2025; Mori et al., 2025; Yao and Li, 2025). The other systems focus on the inference phase (Dowlin et al., 2016; Mohassel and Zhang, 2017; Li et al., 2023; Zheng et al., 2024; Thomas et al., 2025) Cryptographic tools. Secure k-Approximate Nearest Neighbor (Indyk and Motwani, 1998; Chen et al., 2020; Zuber and Sirdey, 2021; Servan-Schreiber et al., 2022a) and secure sorting (Patel et al., 2012; Ngai et al., 2024; Agarwal et al., 2024; Cong et al., 2025) provide cryptographic tools for privacy-preserving RAG systems. p2p^2RAG’s protocol is adapted from the secure sorting based on Function Secret Sharing (FSS) (Boyle et al., 2015, 2016, 2019, 2021) for its communication and computation efficiency. Some other systems use cryptographic or hardware tools such as Differential Privacy (DP) (Dwork et al., 2006; Koga et al., 2025), Homomorphic Encryption (HE) (Gentry, 2009; Bassit and Boddeti, 2025), or Trusted Execution Environment (TEE) (McKeen et al., 2013; Sulich et al., 2025). 9 Conclusion In this paper, we have presented p2p^2RAG, a privacy-preserving RAG service supporting arbitrary top-k retrieval while protecting both the data owner’s database and the user’s prompt. p2p^2RAG uses secret sharing on two semi-honest non-colluding servers to protect data privacy. p2p^2RAG uses an interactive bisection method to select the top-k relevant documents, without sorting candidate documents. In particular, the bisection method determines a distance threshold by comparing each distance to the threshold using Distributed Comparison Functions (DCFs), while both the threshold and all distances are secret-shared. This design enables the user to choose an arbitrary k, even when k is large. To defend against malicious users, p2p^2RAG uses restrictions and verification mechanisms to tightly bound the information leakage of the database. Both the number of bisection iterations and the result set size are limited. The experiments show that p2p^2RAG achieves higher performance with larger k and is 3–300× faster than the state-of-the-art PRAG for k=16k=16–10241024. Impact Statement This paper focuses on privacy-preserving Retrieval-Augmented Generation (RAG) systems, aiming to enhance their functionality and efficiency while maintaining the same level of data privacy protection. The potential broader impact of this work includes advancing the deployment of RAG in privacy-critical domains, such as healthcare and finance, where data privacy is a strict requirement. References A. Agarwal, E. Boyle, N. Chandran, N. Gilboa, D. Gupta, Y. Ishai, M. Kelkar, and Y. Ma (2024) Secure sorting and selection via function secret sharing. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS ’24, New York, NY, USA, p. 3023–3037. External Links: ISBN 9798400706363, Link, Document Cited by: §8. A. Ali, T. Lepoint, S. Patel, M. Raykova, P. Schoppmann, K. Seth, and K. Yeo (2021) Communication–Computation trade-offs in PIR. In 30th USENIX Security Symposium (USENIX Security 21), p. 1811–1828. External Links: ISBN 978-1-939133-24-3, Link Cited by: §4.3. A. Bassit and V. Boddeti (2025) SecureRAG: end-to-end secure retrieval-augmented generation. In The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance, External Links: Link Cited by: §8, §8. D. Beaver (1992) Efficient multiparty protocols using circuit randomization. In Advances in Cryptology — CRYPTO ’91, J. Feigenbaum (Ed.), Berlin, Heidelberg, p. 420–432. External Links: ISBN 978-3-540-46766-3 Cited by: §3, §4.1. D. Beaver (1997) Commodity-based cryptography (extended abstract). In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’97, New York, NY, USA, p. 446–455. External Links: ISBN 0897918886, Link, Document Cited by: §3. E. Boyle, N. Chandran, N. Gilboa, D. Gupta, Y. Ishai, N. Kumar, and M. Rathee (2021) Function secret sharing for mixed-mode and fixed-point secure computation. In Advances in Cryptology – EUROCRYPT 2021, A. Canteaut and F. Standaert (Eds.), Cham, p. 871–900. External Links: ISBN 978-3-030-77886-6 Cited by: §3, §4.1, §6.1, §6.1, §8. E. Boyle, N. Gilboa, and Y. Ishai (2015) Function secret sharing. In Advances in Cryptology - EUROCRYPT 2015, E. Oswald and M. Fischlin (Eds.), Berlin, Heidelberg, p. 337–367. External Links: ISBN 978-3-662-46803-6 Cited by: §3, §8. E. Boyle, N. Gilboa, and Y. Ishai (2016) Function secret sharing: improvements and extensions. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, New York, NY, USA, p. 1292–1303. External Links: ISBN 9781450341394, Link, Document Cited by: §3, §8. E. Boyle, N. Gilboa, and Y. Ishai (2019) Secure computation with preprocessing via function secret sharing. In Theory of Cryptography, D. Hofheinz and A. Rosen (Eds.), Cham, p. 341–371. External Links: ISBN 978-3-030-36030-6 Cited by: §3, §3, §8. R. Canetti (2020) Universally composable security. J. ACM 67 (5). External Links: ISSN 0004-5411, Link, Document Cited by: §6. N. Carlini, S. Deng, S. Garg, S. Jha, S. Mahloujifar, M. Mahmoody, A. Thakurta, and F. Tramèr (2021) Is private learning possible with instance encoding?. In 2021 IEEE Symposium on Security and Privacy (SP), Vol. , p. 410–427. External Links: Document Cited by: §6.2. O. Catrina and A. Saxena (2010) Secure computation with fixed-point numbers. In Financial Cryptography and Data Security, R. Sion (Ed.), Berlin, Heidelberg, p. 35–50. External Links: ISBN 978-3-642-14577-3 Cited by: §4.1. H. Chen, I. Chillotti, Y. Dong, O. Poburinnaya, I. Razenshteyn, and M. S. Riazi (2020) SANNS: scaling up secure approximate k-Nearest neighbors search. In 29th USENIX Security Symposium (USENIX Security 20), p. 2111–2128. External Links: ISBN 978-1-939133-17-5, Link Cited by: §8. Y. Cheng, L. Zhang, J. Wang, M. Yuan, and Y. Yao (2025) RemoteRAG: a privacy-preserving LLM cloud RAG service. In Findings of the Association for Computational Linguistics: ACL 2025, W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar (Eds.), Vienna, Austria, p. 3820–3837. External Links: Link, Document, ISBN 979-8-89176-256-5 Cited by: §2.1, §8. Cohere (2024) Cohere/beir-embed-english-v3 · Datasets at Hugging Face. External Links: Link Cited by: §7.1. K. Cong, R. Geelen, J. Kang, and J. Park (2025) Revisiting oblivious top-k selection with applications to secure k-n classification. In Selected Areas in Cryptography – SAC 2024: 31st International Conference, Montreal, QC, Canada, August 28–30, 2024, Revised Selected Papers, Part I, Berlin, Heidelberg, p. 3–25. External Links: ISBN 978-3-031-82851-5, Link, Document Cited by: §8. I. Damgård, M. Keller, E. Larraia, V. Pastro, P. Scholl, and N. P. Smart (2013) Practical covertly secure mpc for dishonest majority – or: breaking the spdz limits. In Computer Security – ESORICS 2013, J. Crampton, S. Jajodia, and K. Mayes (Eds.), Berlin, Heidelberg, p. 1–18. External Links: ISBN 978-3-642-40203-6 Cited by: §3. N. Dowlin, R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing (2016) CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, p. 201–210. Cited by: §8. C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, S. Halevi and T. Rabin (Eds.), Berlin, Heidelberg, p. 265–284. External Links: ISBN 978-3-540-32732-5 Cited by: §8. C. Gentry (2009) Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, New York, NY, USA, p. 169–178. External Links: ISBN 9781605585062, Link, Document Cited by: §8. I. Gim, G. Chen, S. Lee, N. Sarda, A. Khandelwal, and L. Zhong (2024) Prompt cache: modular attention reuse for low-latency inference. In Proceedings of Machine Learning and Systems, P. Gibbons, G. Pekhimenko, and C. D. Sa (Eds.), Vol. 6, p. 325–338. Cited by: §1. A. Hemmat, M. Moqadas, A. Mamanpoosh, A. Rismanchian, and A. Fatemi (2025) VAGUE-Gate: Plug-and-Play Local-Privacy shield for Retrieval-Augmented generation. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, K. Inui, S. Sakti, H. Wang, D. F. Wong, P. Bhattacharyya, B. Banerjee, A. Ekbal, T. Chakraborty, and D. P. Singh (Eds.), Mumbai, India, p. 3715–3730. External Links: Link, ISBN 979-8-89176-298-5 Cited by: §8. A. Henzinger, M. M. Hong, H. Corrigan-Gibbs, S. Meiklejohn, and V. Vaikuntanathan (2023) One server for the price of two: simple and fast single-server private information retrieval. In Proceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, USA. External Links: ISBN 978-1-939133-37-3 Cited by: §4.3. P. Indyk and R. Motwani (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, New York, NY, USA, p. 604–613. External Links: ISBN 0897919629, Link, Document Cited by: §8. Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung (2023) Survey of hallucination in natural language generation. ACM Comput. Surv. 55 (12). External Links: ISSN 0360-0300, Link, Document Cited by: §1. E. Kamalloo, N. Thakur, C. Lassance, X. Ma, J. Yang, and J. Lin (2024) Resources for brewing beir: reproducible reference models and statistical analyses. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, New York, NY, USA, p. 1431–1440. External Links: ISBN 9798400704314, Link, Document Cited by: §7.1. S. Kamara, P. Mohassel, and B. Riva (2012) Salus: a system for server-aided secure function evaluation. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, New York, NY, USA, p. 797–808. External Links: ISBN 9781450316514, Link, Document Cited by: 1st item. P. C. Kocher (1996) Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems. In Advances in Cryptology — CRYPTO ’96, N. Koblitz (Ed.), Berlin, Heidelberg, p. 104–113. External Links: ISBN 978-3-540-68697-2 Cited by: §7.2. T. Koga, R. Wu, Z. Zhang, and K. Chaudhuri (2025) Privacy-preserving retrieval-augmented generation with differential privacy. External Links: 2412.04697, Link Cited by: §8, §8. E. M. Kornaropoulos, C. Papamanthou, and R. Tamassia (2019) Data recovery on encrypted databases with k-nearest neighbor query leakage. In 2019 IEEE Symposium on Security and Privacy (SP), Vol. , p. 1033–1050. External Links: Document Cited by: §4.2. Q. Leng, J. Portes, S. Havens, M. Zaharia, and M. Carbin (2024) Long context RAG performance of large language models. In Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning, External Links: Link Cited by: §1, §4.2. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. External Links: ISBN 9781713829546 Cited by: §1. D. Li, H. Wang, R. Shao, H. Guo, E. Xing, and H. Zhang (2023) MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC. In The Eleventh International Conference on Learning Representations, External Links: Link Cited by: §8. F. Li, R. Shin, and V. Paxson (2015) Exploring privacy preservation in outsourced k-nearest neighbors with multiple data owners. In Proceedings of the 2015 ACM Workshop on Cloud Computing Security Workshop, CCSW ’15, New York, NY, USA, p. 53–64. External Links: ISBN 9781450338257, Link, Document Cited by: §4.2. X. Li, Y. Cao, Y. Ma, and A. Sun (2024a) Long context vs. rag for llms: an evaluation and revisits. External Links: 2501.01880, Link Cited by: §1, §4.2. Z. Li, C. Li, M. Zhang, Q. Mei, and M. Bendersky (2024b) Retrieval augmented generation or long-context LLMs? a comprehensive study and hybrid approach. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, F. Dernoncourt, D. Preoţiuc-Pietro, and A. Shimorina (Eds.), Miami, Florida, US, p. 881–893. External Links: Link, Document Cited by: §1, §4.2. Y. Liu, Y. Zhang, J. Xie, H. Li, J. X. Yu, and J. Cui (2025) Privacy-Preserving Approximate Nearest Neighbor Search on High-Dimensional Data . In 2025 IEEE 41st International Conference on Data Engineering (ICDE), Vol. , Los Alamitos, CA, USA, p. 3017–3029. External Links: ISSN , Document, Link Cited by: §4.2. F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi, V. Shanbhogue, and U. R. Savagaonkar (2013) Innovative instructions and software model for isolated execution. In Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy, HASP ’13, New York, NY, USA. External Links: ISBN 9781450321181, Link, Document Cited by: §8. S. J. Menon and D. J. Wu (2022) SPIRAL: Fast, High-Rate Single-Server PIR via FHE Composition. In 2022 IEEE Symposium on Security and Privacy (SP), p. 930–947 (English). External Links: ISBN 978-1-6654-1316-9, Link, Document Cited by: §4.3. P. Mohassel and Y. Zhang (2017) SecureML: a system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), Vol. , p. 19–38. External Links: Document Cited by: 1st item, §8. J. Mori, K. Kakizaki, T. Miyagawa, and J. Sakuma (2025) Differentially private synthetic text generation for retrieval-augmented generation (rag). External Links: 2510.06719, Link Cited by: §8. N. Ngai, I. Demertzis, J. G. Chamani, and D. Papadopoulos (2024) Distributed & scalable oblivious sorting and shuffling. In 2024 IEEE Symposium on Security and Privacy (SP), Vol. , p. 4277–4295. External Links: Document Cited by: §8. E. Orsini, N. P. Smart, and F. Vercauteren (2020) Overdrive2k: efficient secure mpc over from somewhat homomorphic encryption. In Topics in Cryptology – CT-RSA 2020: The Cryptographers’ Track at the RSA Conference 2020, San Francisco, CA, USA, February 24–28, 2020, Proceedings, Berlin, Heidelberg, p. 254–283. External Links: ISBN 978-3-030-40185-6, Link, Document Cited by: §3. S. Patel, S. Garasia, and D. Jinwala (2012) An efficient approach for privacy preserving distributed k-means clustering based on shamir’s secret sharing scheme. In Trust Management VI, T. Dimitrakos, R. Moona, D. Patel, and D. H. McKnight (Eds.), Berlin, Heidelberg, p. 129–141. External Links: ISBN 978-3-642-29852-3 Cited by: §8. S. Servan-Schreiber, S. Langowski, and S. Devadas (2022a) Private approximate nearest neighbor search with sublinear communication. In 2022 IEEE Symposium on Security and Privacy (SP), Vol. , p. 911–929. External Links: Document Cited by: §4.3, §6.2, §8. S. Servan-Schreiber, S. Langowski, and S. Devadas (2022b) Private approximate nearest neighbor search with sublinear communication. In 2022 IEEE Symposium on Security and Privacy (SP), Vol. , p. 911–929. External Links: Document Cited by: §1. A. Shamir (1979) How to share a secret. Commun. ACM 22 (11), p. 612–613. External Links: ISSN 0001-0782, Link, Document Cited by: §3. B. Sulich, P. O’Neill, and J. Schrater (2025) Securing Enterprise RAG Deployments. External Links: Link Cited by: §8. R. K. Thomas, L. Zahran, E. Choi, A. Potti, M. Goldblum, and A. Pal (2025) Hidden no more: attacking and defending private third-party LLM inference. In Forty-second International Conference on Machine Learning, External Links: Link Cited by: §8. B. Wang, Q. Lou, M. Zheng, and D. Zhao (2025) PIR-rag: a system for private information retrieval in retrieval-augmented generation. External Links: 2509.21325, Link Cited by: §8. F. Wang, C. Yun, S. Goldwasser, V. Vaikuntanathan, and M. Zaharia (2017) Splinter: practical private queries on public data. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), Boston, MA, p. 299–313. External Links: ISBN 978-1-931971-37-9, Link Cited by: §7. R. Wu, E. Wang, Z. Zhang, and Y. Wang (2025) Private-rag: answering multiple queries with llms while keeping your data private. External Links: 2511.07637, Link Cited by: §8. P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Subramanian, E. Bakhturina, M. Shoeybi, and B. Catanzaro (2024) Retrieval meets long context large language models. In The Twelfth International Conference on Learning Representations, External Links: Link Cited by: §1. D. Yao and T. Li (2025) Differentially private retrieval augmented generation with random projection. In ICLR 2025 Workshop on Building Trust in Language Models and Applications, External Links: Link Cited by: §8. F. Zheng, C. Chen, Z. Han, and X. Zheng (2024) PermLLM: private inference of large language models within 3 seconds under wan. External Links: 2405.18744, Link Cited by: §8. P. Zhou, Y. Feng, and Z. Yang (2025) Provably secure retrieval-augmented generation. External Links: 2508.01084, Link Cited by: §8. M. Zuber and R. Sirdey (2021) Efficient homomorphic evaluation of k-N classifiers. Proceedings on Privacy Enhancing Technologies. External Links: ISSN 2299-0984, Link Cited by: §8. G. Zyskind, T. South, and A. Pentland (2024) Don’t forget private retrieval: distributed private similarity search for large language models. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, I. Habernal, S. Ghanavati, A. Ravichander, V. Jain, P. Thaine, T. Igamberdiev, N. Mireshghallah, and O. Feyisetan (Eds.), Bangkok, Thailand, p. 7–19. External Links: Link Cited by: §1, §2.1, §4.1, §4.3, §7, §8.