← Back to papers

Paper deep dive

The Landscape of Generative AI in Information Systems: A Synthesis of Secondary Reviews and Research Agendas

Aleksander Jarzębowicz, Adam Przybyłek, Jacinto Estima, Yen Ying Ng, Jakub Swacha, Beata Zielosko, Lech Madeyski, Noel Carroll, Kai-Kristian Kemell, Bartosz Marcinkowski, Alberto Rodrigues da Silva, Viktoria Stray, Netta Iivari, Anh Nguyen-Duc, Jorge Melegati, Boris Delibašić, Emilio Insfran

Year: 2026Venue: arXiv preprintArea: cs.CYType: PreprintEmbeddings: 255

Abstract

Abstract:As organizations grapple with the rapid adoption of Generative AI (GenAI), this study synthesizes the state of knowledge through a systematic literature review of secondary studies and research agendas. Analyzing 28 papers published since 2023, we find that while GenAI offers transformative potential for productivity and innovation, its adoption is constrained by multiple interrelated challenges, including technical unreliability (hallucinations, performance drift), societal-ethical risks (bias, misuse, skill erosion), and a systemic governance vacuum (privacy, accountability, intellectual property). Interpreted through a socio-technical lens, these findings reveal a persistent misalignment between GenAI's fast-evolving technical subsystem and the slower-adapting social subsystem, positioning IS research as critical for achieving joint optimization. To bridge this gap, we discuss a research agenda that reorients IS scholarship from analyzing impacts toward actively shaping the co-evolution of technical capabilities with organizational procedures, societal values, and regulatory institutions--emphasizing hybrid human--AI ensembles, situated validation, design principles for probabilistic systems, and adaptive governance.

Tags

ai-safety (imported, 100%)cscy (suggested, 92%)preprint (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: failed | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 0%

Last extracted: 3/13/2026, 1:16:18 AM

OpenRouter request failed (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 65536 tokens, but can only afford 52954. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher monthly limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2shvuzpVFCCndDdGXIdfi40gIMy"}

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

254,772 characters extracted from source content.

Expand or collapse full text

The Landscape of Generative AI in Information Systems: A Synthesis of Secondary Reviews and Research Agendas Aleksander Jarzębowicz 1* , Adam Przybyłek 1,2,3 , Jacinto Estima 4 , Yen Ying Ng 5 , Jakub Swacha 6 , Beata Zielosko 7 , Lech Madeyski 8 , Noel Carroll 2 , Kai-Kristian Kemell 9 , Bartosz Marcinkowski 10 , Alberto Rodrigues da Silva 11 , Viktoria Stray 12 , Netta Iivari 13 , Anh Nguyen-Duc 14 , Jorge Melegati 15 , Boris Delibašić 16 , Emilio Insfran 17 1* Department of Software Engineering, Gdańsk University of Technology, Gdańsk, Poland. 2 J.E. Cairnes School of Business and Economics, University of Galway, Galway, Ireland. 3 Lero, the Research Ireland Centre for Software, Limerick, Ireland. 4 Department of Informatics Engineering, CISUC/LASI, University of Coimbra, Coimbra, Portugal. 5 Center for Language Evolution Studies, Nicolaus Copernicus University in Toruń, Toruń, Poland. 6 Department of IT in Management, University of Szczecin, Szczecin, Poland. 7 Institute of Computer Science, University of Silesia in Katowice, Katowice, Poland. 8 Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, Wroclaw, Poland. 9 Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland. 10 Department of Business Informatics, University of Gdańsk, Sopot, Poland. 11 INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal. 12 Department of Informatics, University of Oslo, Oslo, Norway. 13 INTERACT Research Group, University of Oulu, Oulu, Finland. 14 Department of Business and IT, University of South Eastern Norway, Bo i Telemark, Norway. 15 INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal. 16 Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia. 17 Department of Software Systems and Computation, Polytechnic University of Valencia, Valencia, Spain. *Corresponding author(s). E-mail(s): aleksander.jarzebowicz@pg.edu.pl; Contributing authors: adam.przybylek@gmail.com; estima@dei.uc.pt; nyysang@hotmail.com; jakub.swacha@usz.edu.pl; beata.zielosko@us.edu.pl; Lech.Madeyski@pwr.edu.pl; noel.carroll@universityofgalway.ie; kai-kristian.kemell@tuni.fi; bartosz.marcinkowski@ug.edu.pl; alberto.silva@tecnico.ulisboa.pt; stray@ifi.uio.no; Netta.Iivari@oulu.fi; anh.nguyen.duc@usn.no; melegati@fe.up.pt; boris.delibasic@fon.bg.ac.rs; einsfran@dsic.upv.es; Abstract As organizations grapple with the rapid adoption of Generative AI (GenAI), this study synthesizes the state of knowledge through a systematic literature review of secondary studies and research agendas. Analyzing 28 papers published since 2023, we find that while GenAI offers transformative potential for productivity and innovation, its adoption is constrained by multiple interrelated challenges, including technical unreliability (hallucinations, performance drift), societal-ethical risks (bias, misuse, skill erosion), and a systemic governance vacuum (privacy, accountability, intellectual property). Interpreted through a socio-technical lens, these findings reveal a persistent misalignment between GenAI’s fast-evolving technical 1 arXiv:2603.11842v1 [cs.CY] 12 Mar 2026 subsystem and the slower-adapting social subsystem, positioning IS research as critical for achieving joint optimization. To bridge this gap, we discuss a research agenda that reorients IS scholarship from analyzing impacts toward actively shaping the co-evolution of technical capabilities with organizational procedures, societal values, and regulatory institutions—emphasizing hybrid human-AI ensembles, situated validation, design principles for probabilistic systems, and adaptive governance. Keywords: Generative AI (GenAI), Large Language Models (LLM), ChatGPT, Information Systems, Systematic Literature Review, Research Agenda, Roadmap, AI Ethics, AI Governance, Socio-Technical Systems 1 Introduction The public release of ChatGPT in late 2022 was a decisive turning point in the evolution of artificial intelligence, triggering an unprecedented surge in the visibility and adoption of Generative Artificial Intelligence (GenAI) systems. In the following months, large language models (LLMs) and multimodal GenAI systems evolved from experimental technologies to widely deployed digital infrastructures, reshaping how information has been created, accessed, interpreted and acted upon across organizations and society (Dwivedi et al., 2023; Madsen & Toston I, 2025; Wessel, Adam, Benlian, Majchrzak, & Thies, 2025). Unlike earlier waves of AI that were largely constrained by narrow and task-specific applications (Philipp, Mladenow, Strauss, & Völz, 2021), GenAI systems exhibit general-purpose capabilities that directly intersect with the core concerns of the information systems (IS) discipline, such as work practices, organizational processes, decision-making, governance, and socio-technical change (Bendig & Bräunche, 2024; Chau & Xu, 2025; Jackson et al., 2025; Lambiase, Catolino, Palomba, Ferrucci, & Russo, 2025; Russo, 2024; Triando, Simaremare, Wang, & Prasad, 2025; X. Wang, Attal, Rafiq, & Hubner-Benz, 2024). For IS scholars, GenAI is not simply a continuation of prior AI research. It introduces a qualitatively different class of digital artifacts, namely probabilistic, generative, and conversational systems, that shift how system behavior is produced and evaluated and, in turn, blur boundaries between users and systems, automation and user interaction, and the production and consumption of knowledge (French & Shim, 2025; Schöbel et al., 2024; Seymour, Ruster, Riemer, Peter, & Kautz, 2025). These systems are increasingly integrated into activities traditionally regarded as human-centric, including sense-making, creative work (Jackson et al., 2025), software development (Neumann et al., 2026), and professional judgment. Early evidence from knowledge-intensive institutional settings illustrates the implications for governance and adoption. In higher education, for example, GenAI adoption has raised concerns about academic integrity, critical thinking skills, academic standards, and implications for institutional policy and governance (Hughes, Malik, Dettmer, Al-Busaidi, & Dwivedi, 2025). In healthcare, the extent to which patients adopt GenAI health assistants remains unclear (Goldberg et al., 2026). Survey data suggests that trust and perceived benefits predict intention to adopt, while privacy concerns and resistance to change are associated with higher perceived risk (Al-Lataifeh, Harris, Smith, & Chin, 2025; M.M. Li et al., 2026). Furthermore, IS scholarship has begun to frame GenAI as augmentation embedded in socio-technical systems rather than as an autonomous replacement (French & Shim, 2025; Jackson et al., 2025; Russo, 2024), which raises questions about system agency, control, accountability, and the division of labor between humans and machines (Dwivedi et al., 2021; Jackson et al., 2025; Lambiase et al., 2025). Such questions are addressed by recent legal and governance initiatives (Krancher, Nagbøl, & Müller, 2025; Zamani & Rousaki, 2026). The rapid dissemination of GenAI has been accompanied by an equally rapid expansion of scholarly work. For instance, in a remarkably short time span, the IS community and related disciplines have produced a substantial body of secondary studies, such as literature reviews, scoping reviews, mapping studies, as well as forward-looking research agendas and roadmaps (Bendig & Bräunche, 2024; Chau & Xu, 2025; Storey, Yue, Zhao, & Lukyanenko, 2025). Collectively, these works have aimed to synthesize early empirical evidence, propose conceptual frameworks, and reflect on directions for future research. However, the diversity of this literature has created a fragmented and difficult-to-navigate knowledge landscape. Individual reviews often focus on specific application domains (e.g., healthcare, education, software engineering), particular risk dimensions (e.g., bias, privacy, reliability), or narrow methodological perspectives, making it challenging to grasp an integrated understanding of the state of GenAI research within or related to the IS discipline (Dwivedi et al., 2021). GenAI adoption has shown a growing tension between technical capabilities and social readiness (Lambiase et al., 2025; Russo, 2024; X. Wang et al., 2024). While the literature consistently highlights substantial benefits, such as productivity and personalization gains (Poličar, Špendl, Curk, & Zupan, 2025), scalability, and innovation (Jackson et al., 2025; X. Wang et al., 2024), it also reports profound challenges, including hallucinations and technical unreliability, ethical and societal risks, and unsolved governance and regulatory issues (Goldberg et al., 2026; Huang et al., 2026; M.M. Li et al., 2026; Madsen & Toston I, 2025; Neumann et al., 2026; X. Wei, Kumar, & Zhang, 2025). These tensions are not merely 2 implementation issues; they reflect deeper misalignment between rapidly evolving technical systems and more slowly adapting social, organizational, and institutional structures (Goldberg et al., 2026; M.M. Li et al., 2026; Russo, 2024). They place GenAI within the intellectual core of IS as a socio-technical discipline concerned with the joint optimization of this current wave of technologies and social systems. Therefore, conducting a systematic review in this domain is necessary but poses several non-trivial challenges. First, the GenAI literature is young and exceptionally fast-evolving, with conceptual vocabularies, application boundaries, and methodological conventions still in flux. This instability increases the risk of conceptual fragmentation, speculative claims, and overlapping or redundant syntheses. Second, existing secondary studies and research agendas vary considerably in scope, rigour, and epistemological orientation, ranging from tightly focused domain reviews to broad, visionary position papers. Integrating insights across such heterogeneous sources requires careful methodological design and transparent synthesis procedures. A third challenge lies in the inherently socio-technical nature of GenAI adoption: many reviewed studies span technology-centric narratives and socially oriented critiques, often without explicitly integrating these perspectives. As a result, benefits and risks are frequently discussed in isolation rather than as interdependent aspects of complex socio-technical systems. Addressing this gap requires a synthesis approach that considers both technical and social dimensions. Thus, the primary goal of this research is to provide a coherent, field-level synthesis of how GenAI is currently being framed, adopted and evaluated within the Information Systems scholarship. Specifically, this study has the following objectives: • (O1) Map the landscape of secondary studies and research agenda papers on GenAI in IS; • (O2) Synthesize the benefits and challenges identified in these works; and • (O3) Identify and discuss research gaps and future directions as an integrated agenda that can be theoretically grounded and socially relevant for the IS community. This study addresses these objectives by conducting a systematic literature review (SLR) of secondary studies and research agenda papers on GenAI in the IS domain published since 2023. By explicitly focusing on integrative contributions and agenda-setting contributions, the review captures how the IS community is collectively interpreting early evidence, diagnosing risks, and envisioning future research in this scope. Through a combination of bibliometric analysis and thematic synthesis, the review maps the structure of this emerging knowledge base, synthesizes benefits and challenges, and presents the research gaps and directions proposed in the identified literature. The contribution of this study is threefold. First, it provides a comprehensive and methodologically consistent overview of the secondary and agenda-setting literature on GenAI in IS, offering clarity in a rapidly expanding fragmented research space. Second, it synthesizes the dominant narratives surrounding GenAI’s transformative potential and its associated risks, revealing persistent forces that cross application domains and methodological traditions. Third, by aggregating and structuring the proposed research directions, the study articulates a consolidated research agenda that highlights critical opportunities for IS scholars to advance theory, inform practice, and even shape policy. By positioning GenAI as a socio-technical phenomenon rather than a purely technical innovation, this review underscores the IS discipline’s distinctive role in shaping the responsible evolution of generative technologies. The remainder of this paper is structured as follows. Section 2 provides essential background on GenAI and situates our study within the Information Systems perspective. Section 3 discusses related work. Section 4 details the method, including the research questions that guide our analysis, search strategy, selection, and analysis procedures. Section 5 presents the descriptive and thematic results of our analysis. Section 6 elaborates the synthesized benefits, challenges, and future directions. In Section 7, we interpret these findings through a socio-technical lens and discuss their broader implications. Section 8 translates the findings into the future research agenda. In Section 9, threats to validity and limitations are discussed. Finally, Section 10 concludes the paper by summarizing our contributions and outlining key implications for research and practice. 2 Background 2.1 Generative AI Since its inception in 1943, the concept of Artificial Intelligence (AI) has encompassed different paradigms, with the most relevant being the symbolic and the connectionist (Mira, 2008; Smolensky, 1987). The symbolic paradigm is based on concepts and their relationships, i.e., inferential rules, that are used to perform reasoning (Mira, 2008). While being effective for constrained domains, symbolic systems struggled with ambiguity, unstructured data, and open-ended tasks (Dwivedi et al., 2021). On the other hand, the connectionist paradigm are based on large networks of simple processors, i.e., artificial neural networks (ANN), in which 3 knowledge is encoded in the numerical strength of the connections between these processors (Smolensky, 1987). These connections are optimized through a training process based on a set of inputs and expected outputs. In other words, an ANN approximates a (non-linear) mathematical function for which some inputs and outputs are given (training dataset). Initially, these processors, i.e., neurons, were grouped into layers that were connected in a single direction, i.e., the outputs of a layer were the inputs of the following layer. A limitation of these networks (feed-forward networks) was their stateless nature which limited their application for sequence analysis, such as natural language processing. The proposal of recurrent neural networks (RNNs), in which the outputs of some layers could be used as inputs for previous layers, showed the capacity of internally storing a state, i.e., memory, obtaining better results for tasks demanding such aspect (Elman, 1990). However, these networks still had limited memory and, consequently, obtaining reduced performance with long sentences, leading to the proposal of the “Long Short-Term Memory” networks (LSTM) (Hochreiter & Schmidhuber, 1997). In the last decade, the hardware improvements and reduced costs allowed the creation of larger, or deeper, networks which could be trained using the large amounts of data available in the Internet. These deep neural networks have been extensively explored to support software development (Y. Yang, Xia, Lo, & Grundy, 2022). Since neural networks represent mathematical functions, the analysis of text-based inputs, such as programming code, requires a conversion into numerical data, i.e., an embedding procedure. A way to perform the embedding is representing words as vectors from a vector space. An example of this approach is Word2Vec (Mikolov, Chen, Corrado, & Dean, 2013), which tried to capture syntactic and semantic relation- ships between words based on a training on large text corpora. However, since these approaches mapped words to vectors, they did not consider the context and struggled with polysemous words. ELMo (Peters et al., 2018) tackled these issues employing an LSTM architecture. A limitation of RNNs, including LSTMs, is their inherently sequential nature, which, for example, inhibits parallelization (Vaswani et al., 2017). To tackle this issue, Vaswani et al. (2017) proposed the transformer model, which became a key innovation for neural networks. Transformers rely on self-attention mechanisms that evaluate how each token in a sequence relates to others, enabling efficient parallelization and the capture of long-range dependencies. LLMs represent the most widely deployed class of transformer-based foundation models. The transformer is based on an encoder, which converts the complete input into an output of embeddings, and a decoder that, based on the previous tokens of the output, predicts the next output token. The stack of different layers of encoders and decoders led to different models, divided into encoder-only, decoder-only, and encoder-decoder models (J. Yang et al., 2024). Encoder-only models, such as BERT (Devlin, Chang, Lee, & Toutanova, 2019), were the earliest LLMs and had good performance for natural language understanding (J. Yang et al., 2024). However, decoder-only or autoregressive models, such as the GPT-series, obtained better performance in few-shot or even zero-shot generative tasks (Brown et al., 2020), especially after the inclusion of further training based on human feedback, as initially done for the InstructGPT by OpenAI (Ouyang et al., 2022). This training approach allowed the launch of ChatGPT, in November 2022, that inaugurated the popularity of GenAI technologies. Foundational models, such as BERT, GPT-4, Gemini, or LLaMa, are trained on large datasets for generic tasks, such as predicting the next token or masked token modelling, but they can be adapted to different downstream tasks. Initially, this process consisted on further training the model with a specific dataset, i.e., fine-tuning. For example, based on a dataset of vulnerable software functions, Fu and Tantithamthavorn (2022) fine-tuned the pre-trained CodeBERT model for the task of vulnerability detection. However, the few-shot or even zero-shot performance of decoder-only models has several advantages over fine-tuning, such as no need of larger datasets, knowledge and dedicated hardware for training these models, obtaining similar results in a fraction of the time. Therefore, it became important to identify techniques to prompt these models, i.e., prompt engineering, in a way to improve the quality of the output (X. Wang et al., 2024; White et al., 2023). A particular approach that significantly improved the performance, especially for tasks requiring complex reasoning, was the addition of a series of intermediate reasoning steps, i.e., chain-of-thought (CoT) prompting (J. Wei et al., 2022). CoT was incorporated in many LLMs, being executed by default in the so- called reasoning models (Sun et al., 2025). Another approach to adapt foundational models for more specific tasks has been the use of agents, i.e., different instances of LLMs responsible for specific tasks, that are orchestrated to reach a common goal. Agentic AI became a very active area of research (Roychoudhury, 2025). Generative capabilities now extend far beyond text. High-fidelity images, videos, and design prototypes can now be generated from textual descriptions (M.M. Li et al., 2026; Yazdani et al., 2025). GenAI systems now generate multi-modal content, which is particularly relevant for IS and organizations which generate diverse data formats. The multi-modal systems generally have similar design principles, and the large transformer networks are trained on massive paired datasets (e.g., text–speech, text–image), so that the trained embedding spaces allow for mapping from one modality into another. Besides transformers, other key GenAI technologies are Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. GANs consist of two neural networks, a generative, 4 and an adversarial, which compete with each other to produce new content. VAEs produce encodings, i.e. a compressed latent space, and add variations to that space to generate new content. Diffusion models gradually add noise to training data, and then learn reversely to learn the original data gradually. Although the learning idea resembles an encoding-decoding process, the gradual learning process provides a new quality to diffusion models. GenAI is not only a major technological breakthrough, but also a socio-technical organizational infras- tructure (Lambiase et al., 2025) that shapes the future of work (Jackson et al., 2025; Poličar et al., 2025; X. Wang et al., 2024), decision-making (Russo, 2024), and digital platforms. Storey et al. (2025) argue that GenAI’s conversational interfaces, open-ended generativity, and human-like communication capabilities create new forms of human–AI interaction and hybrid intelligence. Similarly, research on decision-support systems highlights the potential of GenAI to synthesize information, generate alternative scenarios, and serve as an interactive interface to organizational analytics and databases (Albashrawi, 2025; M.M. Li et al., 2026). At the ecosystem level, GenAI is reshaping digital platforms by enabling AI-augmented services, new forms of value creation, and changes in competitive dynamics (Wessel et al., 2025). GenAI is a challenging technology. Their probabilistic nature, and the implications of this, are sometimes not easy to grasp (Bommasani et al., 2021; M.M. Li et al., 2026). Additionally, there are well-known challenges identified concerning fairness, transparency, provenance, and model explainability, which are also characteristics of other machine-learning algorithms. Therefore, the integration of GenAI models into organizations requires new frameworks for responsible AI, and human oversight (Goldberg et al., 2026; Huang et al., 2026; M.M. Li et al., 2026). In general, GenAI represents a fundamental shift in information production and user-machine interaction. The combination of a general-purpose decision support system tool, with multi-modal content and easy integration into organizational processes, makes it already an indispensable technology for IS research related to digital transformation, organizational decision-making, human–machine collaboration, and responsible technology management. 2.2 Information Systems Perspective While Information Systems (IS) and Software Engineering (SE) share a common interest in the design and development of Information Technology (IT) for human use, IS research is distinguished by its socio- technical emphasis. In IS, information systems are viewed not only as technical systems but also as social and organizational ones, comprising technical, organizational, and semiotic components (e.g., (Alter, 2008; Lyytinen & Newman, 2008; Orlikowski & Iacono, 2001; Walsham, 2012)). The socio-technical approach has been widely discussed in IS already during 1970s and 1980s, with ETHCIS and SSM methodologies as examples advocating it. The socio-technical approach argues for the joint optimization of both social and technical components, recognizing their interaction (Mumford, 1983). Within such a framing, IS as a discipline is broadly interested in the application of IT artifacts to support particular task(s) embedded within particular context(s), with an aim to increase our understandings of IS design, use and impacts, i.e. of “(1) how IT artifacts are conceived, constructed, and implemented, (2) how IT artifacts are used, supported, and evolved, and (3) how IT artifacts impact (and are impacted by) the contexts in which they are embedded” (Benbasat & Zmud, 2003). As Benbasat and Zmud (2003) indicate, IS research is interested in a variety of aspects intertwined with the design, use and impacts of IT. IS research has ranged from individual to organizational and society level analyses, including studies, for example, on individual technology acceptance (Davis, 1989), organizational digital transformation (Cavalcante, Varajão, & Silva Rodrigues, 2025; Vial, 2021), and societal level implications of IT (Butler, Gozman, & Lyytinen, 2023). The discipline, with its socio-technical emphasis, has since its early days had an interest in power, politics, and ethics as intermingled with IT (e.g., (Hirschheim & Klein, 1989; Markus, 1983; Mumford, 1983)). Recently, there has been an increased interest to address such concerns (e.g., (Pang, Kankanhalli, Aanestad, Ram, & Maruping, 2024; Walsham, 2012)), especially in the context of emerging technologies, including AI, and their regulation (e.g,. (Berente, Gu, Recker, & Santhanam, 2021; Butler et al., 2023; Marabelli, Newell, Ahuja, & Galliers, 2025)). Moreover, while the original focus of IS research has been on work, organizations, business and management (e.g., (Alter, 2008; Lyytinen & Newman, 2008; Orlikowski & Iacono, 2001; Walsham, 2012)), during the past decades, it has been acknowledged that IT has spread to different everyday contexts and life spheres beyond work and organizations (Yoo, 2010), with IS researchers nowadays addressing various user groups, usages, and impacts of IT in diverse everyday contexts, including, for example, studies with children, people with special needs or marginalized communities (e.g., (Iivari, Kinnula, Molin-Juustila, & Kuure, 2018; Majchrzak, Markus, & Wareham, 2016; Pang et al., 2024; Wass, Thygesen, & Purao, 2023)). Overall, IS as a discipline addresses a broader range of concerns compared to SE, which is more technology and engineering focused discipline, even if also SE has acknowledged the significance of human and social 5 factors for SE practice already for long (e.g., (Boehm, 2006; Sharp & Robinson, 2005)), and also recently related to GenAI (Russo et al., 2024). 3 Related Work For this study, we consider SLRs, tertiary studies, other literature surveys, and also research agendas or roadmaps on GenAI from other computing disciplines as the related work. As our SLR was focused specifically on IS literature, such papers from related computing disciplines were not included in our SLR, whereas related IS research agendas and secondary studies are already covered as a part of our SLR results. Thus, in this section, we discuss related work not covered by our SLR. Primarily, we discuss related work from the field of software engineering (SE) specifically. As there is a plethora of related work that has been published in related disciplines, we aim to give a general overview of it, acknowledging some of the more directly related work that has been published outside IS literature. We therefore focus on related work discussing GenAI or LLMs more generally, as opposed to work with more specific scopes related to GenAI (e.g., GenAI specifically for software testing). Moreover, we limit this discussion to peer-reviewed works, thus excluding the numerous pre-prints on the topic. First, the related work includes SLRs and other literature surveys from related disciplines. In terms of more general SLRs, Zheng et al. (2025) and Hou et al. (2024) both review studies on using LLMs for SE overall, with the studies published in 2025, 2024, and 2023 respectively. While Karlovs-Karlovskis (2024) discuss GenAI for “optimising the software engineering process”, their SLR is similarly wide in scope, covering various SE use cases. Fan et al. (2023) also conduct a survey of literature on LLMs for SE overall. Bazzan et al. (2024) conduct an MLR on the role of GenAI in SE overall. Finally, more specific but still quite general, Cornide-Reyes, Monsalves, Durán, Silva-Aravena, and Morales (2025) conduct on SLR on GenAI in Agile SE. Past such more general literature reviews and surveys, we are able to identify various reviews and surveys with more specific foci. These include the following areas, among others; this is by no means an exhaustive list or survey, as such studies are numerous. Security and privacy: Yao et al. (2024), Xu et al. (2024), Hasanov, Virtanen, Hakkala, and Isoaho (2024), and Chen et al. (2025) all review literature on LLMs in relation to security, such as code security or cybersecurity overall. Software testing: Qi, Hou, Lin, Bao, and Xu (2024) and J. Wang et al. (2024) survey literature related to software testing with LLMs, in addition to a number of literature reviews on using AI overall for software testing. Requirements engineering: both Cheng et al. (2025) and Hemmat, Sharbaf, Kolahdouz-Rahimi, Lano, and Tehrani (2025) conduct SLRs on GenAI for requirements engineering. These are but three examples of some of the more specific foci seen in literature studies from related disciplines, among many others, as providing a systematic review of such studies is out of the scope of this paper. Second, we consider research agendas or roadmaps from related disciplines as related work. Nguyen-Duc et al. (2025) present a research agenda for GenAI for SE overall. While the research roadmap of Ahmed et al. (2025) discusses AI overall for SE rather than GenAI specifically, much of their roadmap is nonetheless related to recent advances in GenAI and LLMs. While we were able to identify various research agenda papers discussing GenAI as a part of the agenda otherwise focused on some other topic, we have only included papers more focused on GenAI specifically here. In Table 1, we provide an overview of related work from other computing disciplines with a brief comparison to our work. The table includes only related work with a general viewpoint. This means that it includes papers discussing “GenAI in SE”, but not papers focused on more specific application contexts like “GenAI in requirements engineering (in SE)”, of which there are currently plenty as we have highlighted earlier in this section. In brief, we were unable to identify a notable number of research roadmaps and research agendas with a more general viewpoint like ours. We identified two such SE papers. However, neither of these were supplemented by a systematic review of literature like ours. While we have included in the table some SLRs focused on primary studies, we were unable to identify tertiary studies focusing on GenAI like ours for the time being. 4 Method The rapidly expanding body of research on GenAI presents multiple avenues for evidence synthesis, ranging from aggregating individual primary studies to conducting higher-level reviews that integrate consolidated knowledge. While synthesizing primary studies provides granular insights into specific phenomena, the GenAI-in-IS literature has evolved in a way that disperses contributions across heterogeneous empirical designs, disciplinary contexts, and emerging conceptual framings. In contrast, the field has begun to produce secondary studies and research agenda papers that already distill key themes and articulate future directions. 6 Table 1 Comparison of our study to related work identified from other computing disciplines. Authors Title RQs or ROs Difference compared to our RQs Zheng et al. Towards an understanding oflarge language models in soft- ware engineering tasks RQ1: What are the current works focusing on combining LLMs andsoftware engineering?RQ2: Can LLMs truly help better perform current software engineeringtasks? Focus on SE domain, performance met-rics. Not a research roadmap or agenda. Hou et al. Large language models for soft- ware engineering: A systematic literature review RQ1: What LLMs Have Been Employed to Date to Solve SE Tasks?RQ2: How are SE-related datasets collected, pre-processed, and usedin LLMs?RQ3: What techniques are used to optimize and evaluate LLM4SE?RQ4: What SE tasks have been effectively addressed to date usingLLM4SE? Focus on SE domain, SE tasks,datasets, and LLM optimization andevalution techniques for SE. Not aresearch roadmap or agenda. Karlovs Generative Artificial Intelli-gence Use in Optimising Soft- ware Engineering Process: A Systematic Literature Review RQ1: Which Software Engineering sub-fields are actively experimentedon with Generative AI and which ones are underrepresented?RQ2: Who are the active researchers in the field for future collabora-tion?RQ3: What are the most common research methods used in the fieldof Generative AI for Software Engineering research? Focus on SE domain, focus on method-ology. Not a research roadmap oragenda. Fan et al. Large language models for soft- ware engineering: Survey and open problems No RQs/ROs explicitly defined. Implicit RO statement: “This paper surveys the recent developments, advances and empirical results on LLM-based SE; the application of Large Language Models (LLMs) to Software Engineering (SE) applications.” Focus on SE domain, SE use cases. Nota research roadmap or agenda. Bazzan et al. Analysing the Role of Genera-tive AI in Software Engineering-Results from an MLR RQ1: What is Generative AI? RQ2: How is Generative AI used inSoftware Engineering? RQ3: What are the benefits associated withusing Generative AI in Software Engineering? RQ4: What are therisks associated with using Generative AI in Software Engineering? Focus on SE domain. RQ3 and 4 verysimilar to our RQ2, but that reviewfocuses on primary studies and greyliterature. Not a research roadmap oragenda. Cornide-Reyeset al. Generative Artificial Intelli-gence in Agile Software Devel-opment Processes: A LiteratureReview Focused on User eXpe-rience RQ1: How can GenAI tools optimize user experience in agile softwaredevelopment projects? RQ2: What are agile teams’ main challenges when integrating GenAI tools into software development projects? and RQ3: What stages of the agile software development cycle benefitfrom implementing GenAI tools? Focus specifically on agile softwaredevelopment (ISD or SE). Not aresearch roadmap or agenda. Nguyen-Duc etal. Generative artificial intelligence for software engi-neering—A research agenda No RQs/ROs explicitly defined. Implicit RO statement formulatedas challenge/gap: “we do not have an overall picture of the current state of GenAI technology in practical software engineering usagescenarios.” Difference: Focus on SE domain, prac-tical SE use cases. No SLR conductedas part of the research agenda. Ahmed et al. Artificial Intelligence for Soft- ware Engineering: The Journey so far and the Road ahead No RQs/ROs explicitly defined. Implicit RO statement: [to] “highlight the recent deep impact of artificial intelligence on software engineering by discussing successful stories of applications of artificial intelligence to classic and new software development challenges”. Focus on SE domain. No SLR con-ducted as part of the research roadmap. Kotti et al. Machine learning for softwareengineering: A tertiary study No RQs/ROs explicitly defined. Contributions formulated as follows: “we identify what SE tasks have been tack-led with ML techniques, which SE knowledge areas could be better covered by ML techniques aswell as the prominent ML techniques applied in SE. We also provide a classification scheme for categorizing ML techniques in SE along four axes.” Focus on SE domain. Tertiary studybut focused on AI/ML overall, notGenAI specifically. 7 4.1 Methodological Framework Reviewing this specific layer of literature—comprising both retrospective syntheses and prospective agen- das—necessitates a methodological strategy that extends beyond standard protocols typically designed for synthesizing homogeneous sets of primary studies. To address this challenge, we adopted a tailored methodological approach informed by established guide- lines from multiple complementary sources. Rather than rigidly adhering to a single SLR standard, we integrated the foundational SLR guidelines by Kitchenham, Charters, et al. (2007) for structuring the review protocol with the practical evidence synthesis procedures from Brereton, Kitchenham, Budgen, Turner, and Khalil (2007) to guide our search and screening strategy. To ensure disciplinary alignment with Information Systems, particularly regarding our qualitative synthesis, we incorporated the methodological guidance by Bandara, Miskon, and Fielt (2011), while simultaneously consulting recent recommendations for soft- ware engineering secondary studies by Kitchenham, Madeyski, and Budgen (2023b) to address the specific methodological requirements of tertiary-level synthesis. Additionally, we adopted the framework by Ampat- zoglou, Bibi, Avgeriou, Verbeek, and Chatzigeorgiou (2019) to structure our Threats to Validity section. This multi-perspective approach allowed us to maintain methodological rigor while flexibly accommodating the distinctive requirements of synthesizing both secondary studies and research agenda papers within the Information Systems domain. 4.1.1 Rationale for Hybrid Synthesis Design This study adopts a hybrid synthesis design that integrates two methodologically distinct literature types: (1) secondary studies that retrospectively synthesize empirical evidence from primary research, and (2) research agenda papers that prospectively articulate expert-driven propositions about future directions. We explicitly justify this design choice and explain how it enhances rather than compromises methodological rigor. The rationale for combining these literature types stems from the nascent and rapidly evolving nature of the GenAI field. As noted by Kitchenham et al. (2023b), mixed-methods approaches are “particularly important for industry-based interventions, when outcomes are influenced by the complex nature of the relationship between the intervention and its environment.” GenAI in IS represents precisely such a complex intervention: its impacts span technical, organizational, and societal dimensions, and its rapid evolution means that empirical evidence inevitably lags behind practitioner experience and expert foresight. Secondary studies (such as SLRs and scoping reviews) provide retrospective synthesis of what empirical research has established about GenAI’s benefits, challenges, and applications. However, in a field where the foundational technology (transformer-based Large Language Models) and tools like ChatGPT have only been widely accessible since late 2022, such retrospective evidence necessarily captures only a narrow temporal window and may not reflect emerging concerns or opportunities. Research agenda papers, authored by domain experts, provide prospective analysis of where the field should direct its attention. These papers synthesize expert judgment, identify gaps in current knowledge, and propose directions that may not yet have accumulated sufficient empirical evidence for inclusion in traditional secondary studies. By synthesizing both types of literature, this study provides a more complete picture than either source alone could offer. The retrospective evidence from secondary studies grounds our findings in empirical reality, while the prospective propositions from research agendas extend our analysis to emerging concerns and future-oriented recommendations. This approach aligns with the fourth step of Evidence-Based Software Engineering (EBSE), which concerns “integrating the critical appraisal with software engineering expertise and stakeholders’ values” (Kitchenham, Budgen, & Brereton, 2016). Importantly, we maintain methodological transparency by clearly distinguishing these two evidence streams throughout our analysis. Tables 4 and 5 separately catalogue secondary studies (coded S01–S18) and research agenda papers (coded R01–R10). In our synthesis, we attribute findings to their source type, enabling readers to assess the evidentiary basis for each synthesized theme. This approach is consistent with the recommendation by Kitchenham, Madeyski, and Budgen (2023a) that “it is critical that readers know the provenance of all recommendations, so they can properly judge their credibility.” 4.1.2 Differentiated Treatment of Literature Types While our hybrid synthesis integrates findings from both secondary studies and research agenda papers, we acknowledge that these literature types have fundamentally different epistemological characteristics, which necessitate differentiated treatment in quality assessment and synthesis. Epistemological Distinctions. Secondary studies (S01–S18) represent systematic aggregations of primary empirical research. Their value derives from the rigor of their review methodology and the quality of the primary studies they synthesize. 8 As such, they can be assessed using established criteria for evaluating systematic reviews, such as the DARE (Database of Abstracts of Reviews of Effects) criteria (Budgen, Brereton, Drummond, & Williams, 2018). Secondary studies report findings that are, in principle, traceable to underlying primary evidence. Research agenda papers (R01–R10) represent expert-driven conceptualizations of future research needs. Their value derives from the expertise and insight of their authors, the comprehensiveness of their envi- ronmental scanning (which involves monitoring and analyzing the broader external context—technological, social, and industrial trends—to identify emerging threats and opportunities that should guide future research directions), and the actionability of their proposed directions. These papers do not claim to syn- thesize empirical evidence in the same manner as secondary studies; rather, they offer informed scholarly judgment about where research attention should be directed. As Kitchenham et al. (2023a) note, expert opinion can provide valuable insights but should be distinguished from empirically-grounded evidence. Synthesis Strategy. Our synthesis employs a parallel-but-integrated approach. During thematic analysis, we coded both literature types using the same coding scheme to enable the identification of common themes. However, we maintain attribution to source type throughout, allowing readers to distinguish empirically-grounded findings (derived primarily from secondary studies) from expert-proposed directions (derived primarily from research agendas). In the Discussion section, we analyze findings across these two evidence streams. Where secondary studies and research agendas converge on the same themes, this convergence strengthens confidence in those findings. Where they diverge—for example, where research agendas propose directions not yet reflected in secondary study findings—this may indicate emerging areas that have not yet attracted sufficient empirical attention. Conversely, where secondary studies report challenges that research agendas do not identify, this may indicate blind spots in current expert discourse. This approach is consistent with mixed-methods review methodology as described by Harden et al. (2018): qualitative and quantitative findings are synthesized separately, then compared and integrated to produce more nuanced overall findings. In our case, we adapt this approach to integrate retrospective evidence synthesis with prospective expert judgment. Implications for Interpretation. Readers should interpret our synthesized findings with awareness of their evidentiary basis. Findings supported primarily by secondary studies (e.g., many of the challenges in categories C1–C4) have stronger empirical grounding but may reflect a narrower temporal window. Findings supported primarily by research agendas (e.g., some of the future directions in categories F1–F6) represent expert consensus about important directions but await empirical validation. Findings supported by both types of literature represent the strongest form of convergent evidence available in this hybrid synthesis. 4.1.3 Reproducibility and Tool Support To support transparency, reproducibility, and methodological scrutiny, we provide a complete replication package—including search logs, screening decisions, extraction sheets, and coding scripts—on GitHub 1 . Several tools supported the process:Zoterofor reference management and PDF annotation,Rayyanfor the collaborative screening of bibliographic records,Google Sheetsfor structured data extraction,Pythonfor documentation and audit trail analysis, and PowerPoint for thematic visualizations. 4.2 Research Questions The review was guided by three research questions (RQs), each targeting a distinct dimension of the current knowledge landscape on GenAI in IS: RQ1:What is the landscape of secondary studies and research agenda papers on GenAI in Information Systems? RQ1.1: What types of papers have been published (e.g., SLR, mapping study, research agenda)? RQ1.2: What are the publication trends over time and across venues? RQ1.3: What specific application sectors are covered? RQ2:What benefits and challenges have been identified in the use of Generative AI within Information Systems? RQ3: What research gaps and future research directions have been proposed? RQ1 examines the structure of the evidence base—types of studies, publication trends, and application sectors. RQ2 synthesizes the reported benefits and challenges associated with the use of GenAI in IS. RQ3 1 https://github.com/przybylek/GenAI4IS 9 aggregates reported research gaps and recommended research directions. Together, these questions enable a comprehensive view of the present state of knowledge and the projected research trajectory in this rapidly evolving domain. 4.3 Search Strategy The search strategy was designed to identify relevant secondary studies and research agenda papers on GenAI in IS. We targeted three bibliographic repositories that provide broad and authoritative coverage of IS research: Scopus 2 , the Web of Science Core Collection (WoS) 3 , and the AIS eLibrary (AISeL) 4 . 4.3.1 Search String Construction Following established guidelines (Kitchenham, 2004), we constructed a search string composed of three distinct fragments, joined by the AND operator: • Study Type: Targets terms identifying secondary studies (e.g., systematic review or mapping study) or forward-looking papers (e.g., “research agenda” or “roadmap”). • Phenomenon: Includes keywords for the core topic of interest—Generative AI—using both broad terms (e.g., “generative AI”) and specific, highly prevalent examples (e.g., “ChatGPT”) to maximize coverage. • Domain: Scopes the search using terms that characterize the IS discipline or closely related areas. The generic search string is as follows: (SLR OR "systematic review" OR "systematic mapping" OR "mapping study" OR "literature survey" OR "literature review" OR "scoping review" OR "meta-analysis" OR "tertiary study" OR "secondary study" OR "research agenda" OR "roadmap") AND ("large language model" OR "LLM" OR "generative AI" OR "GenAI" OR "Gen AI" OR "generative artificial intelligence" OR "ChatGPT" OR "GPT" OR "conversational AI" OR "artificial intelligence language model" OR "AI language model") AND ("information systems" OR "MIS" OR "information technology" OR "informatics" OR "project management") 4.3.2 Search Execution The search string was adapted to the syntax and field constraints of each database while preserving its semantic integrity. We limited the search to publications from 2023 onward and executed it on April 12, 2025. This temporal scope was intentionally selected to capture the accelerated surge of GenAI-related research that followed the public release of ChatGPT in late 2022, ensuring that our review reflects the period in which substantive scholarly engagement with GenAI began to emerge. Table 2 presents the number of records returned from each database and the specific fields that were queried. The initial search yielded a total of 242 documents before duplicate removal. Table 2 Database Search Strategy and Initial Results DatabaseFields SearchedResults ScopusTITLE-ABS-KEY105 AIS eLibrary (AISeL) All Metadata Fields102 Web of Science (WoS) Title, Abstract, Author Keywords35 Total Initial Records242 4.4 Inclusion and Exclusion Criteria To systematically filter the studies identified during the search, we established a set of inclusion and exclusion criteria. These criteria were formulated based on our research questions to ensure that only the most relevant studies were retained for data extraction and synthesis. The inclusion criteria (IC) define the necessary attributes for a paper to be included in our review, while the exclusion criteria (EC) specify conditions that lead to a paper’s exclusion. A study was carried forward to the analysis phase only if it met all inclusion criteria and met none of the exclusion criteria. The criteria are detailed in Table 3. 2 https://w.scopus.com/ 3 https://w.webofknowledge.com 4 https://aisel.aisnet.org/ 10 Table 3 Inclusion and Exclusion Criteria ID Inclusion Criteria IC1 The paper is classified as a secondary study or a research agenda/roadmap paper. IC2 The paper’s primary focus is on Generative AI; a superficial or passing mention is not sufficient. IC3 The study is situated within the context of Information Systems or a closely related domain (such as Information Technology or Project Management). IC4 The paper is a peer-reviewed publication (e.g., journal article, conference paper). IC5The paper provides sufficient methodological detail to allow for an assessment of its rigor (e.g., describes the search process or analysis method). IC6 The paper is written in English. ID Exclusion Criteria EC1The paper is a purely bibliometric or scientometric analysis without a qualitative synthesis of the literature’s contributions. EC2 The paper is written in a language other than English. EC3The full text of the paper could not be retrieved through institutional subscriptions or publicly available archives. EC4 The paper is a duplicate of another study already included in the review. 4.5 Study Selection The study selection process followed a multi-stage filtering approach to systematically reduce the initial set of 242 documents to the final set of relevant studies. The entire process is visually summarized in the PRISMA flow diagram in Figure 1. Identi fi cation Screening Included Identification of new studies via databases Records identified from databases (n = 3): Scopus (n = 105) AIS (n = 102) WoS (n = 35) Records removed before screening: Duplicate records (n = 32) Records screened (n = 210) Records excluded (n = 161) Reports sought for retrieval (n = 49) Reports not retrieved (n = 1) Reports assessed for eligibility (n = 48) Reports excluded: Not a secondary study or research agenda (n = 5) Focus not on GenAI (n = 6) Insufficient methodological rigor (n = 3) Non-IS Context (n = 2) Non-English language (n = 3) Duplicate (n = 1) Studies included in review (n = 28): – Secondary studies (n = 18) – Roadmaps (n = 10) Reports of included studies (n = 28) Fig. 1 PRISMA flow diagram (Haddaway, Page, Pritchard, & McGuinness, 2022) All bibliographic records were exported in RIS format from the three databases and imported into Rayyan.ai, which automatically detected and flagged duplicates. After removing duplicates, 210 unique records remained for screening. 11 4.5.1 Phase 1: Title and Abstract Screening Three researchers independently screened the titles and abstracts of the 210 records using the predefined inclusion and exclusion criteria (Table 3). Prior to screening, a calibration meeting (17 April 2025) was held to ensure consistent interpretation of the criteria. Independent assessments produced an initial inter-rater agreement of 74%. All conflicts (54 cases) and papers marked “maybe” by one or more reviewers (33 cases) were resolved in a consensus meeting on 15 May 2025, occasionally supported by brief full-text inspection. This phase resulted in the exclusion of 161 papers and the advancement of 49 studies to the next phase. The list of these studies is available in the online appendix. 5 4.5.2 Phase 2: Integrated Full-Text Screening and Data Extraction To improve both rigor and efficiency, we combined full-text screening with data extraction into a single integrated phase. This ensured that inclusion decisions were made only after a complete and detailed reading of each study, reducing the likelihood of erroneous exclusions and producing higher-quality extracted data. Pilot Phase and Protocol Refinement The phase began with a pilot in which the three core researchers independently applied a draft extraction protocol to three common papers. Insights from the pilot were used to refine the extraction form, clarify decision rules, and establish a shared approach to evaluating borderline cases. A consensus meeting was then used to finalize the protocol, after which three additional researchers were trained, bringing the total team to six reviewers. Execution and Quality Assurance Before screening, all full texts were verified in Zotero. Four papers were excluded at this stage: one previously undetected duplicate, two non-English papers, and one paper with an inaccessible full text (despite attempts to contact the authors). The remaining 42 studies were distributed across the six reviewers, who conducted full-text reading while simultaneously extracting data into the shared Google Sheet and annotating relevant PDF passages. Direct quotations were recorded verbatim to maintain traceability. To ensure consistent application of the protocol, the core researchers hosted eight weekly alignment meetings where reviewers could raise questions and discuss ambiguous cases. A quality audit revealed deficiencies in one reviewer’s extraction work; the reviewer subsequently withdrew, and their assigned papers were re-evaluated by a new (trained) team member. Following the integrated screening and extraction, 17 papers were excluded for failing the inclusion criteria upon full-text inspection. This resulted in a final set of 28 studies: 18 secondary studies and 10 research agenda papers. Each included study was assigned a unique identifier—“S#” for secondary studies and “R#” for agenda papers—as listed in Table 4 and Table 5, respectively. The entire selection phase spanned approximately seven weeks and concluded on 16 July 2025. 4.6 Data Extraction As described in Section 4.5.2, data extraction was conducted concurrently with full-text screening during Phase 2. Data was extracted using a structured form implemented in Google Sheets. This form was developed based on our RQs and iteratively refined during the pilot phase. Table 6 details the data fields, their descriptions, and the research questions they primarily address. 4.7 Quality Assessment Following the SEGRESS guidelines Kitchenham et al. (2023b), we conducted quality assessment of all included studies. Given the hybrid nature of our synthesis—combining secondary studies with research agenda papers—we applied differentiated assessment criteria appropriate to each literature type. 4.7.1 Assessment of Secondary Studies For the 18 secondary studies (S01–S18), we applied the DARE criteria, which are specifically designed for assessing the methodological quality of systematic reviews (Budgen et al., 2018). The five DARE criteria are as follows: 1. Are the review’s inclusion and exclusion criteria described and appropriate? 2. Is the literature search likely to have covered all relevant studies? 3. Did the reviewers assess the quality/validity of the included studies? 5 https://github.com/przybylek/GenAI4IS 12 Table 4 List of included secondary studies (S studies). ID Title Authors Venue type Venue WoS Year S01 Meng, X. et al. The application of large language models in medicine:a scoping review Journal iScience Q1 2023 S02 Beheshti, M. et al. Evaluating the reliability of ChatGPT for health-related questions: a systematic review Journal Informatics Q3 2025 S03 Bellanda, V.C.F. et al. Applications of ChatGPT in the diagnosis, manage-ment, education, and research of retinal diseases: ascoping review Journal International Journal ofRetina and Vitreous Q2 2024 S04 Bendig, D. & Bräunche, A. The role of artificial intelligence algorithms in infor-mation systems research: a conceptual overview andavenues for research Journal Management Review Quar-terly Q1 2024 S05 Bracken, A. et al. Artificial Intelligence (AI)–powered documentation sys-tems in healthcare: a systematic review Journal Journal of Medical Systems Q1 2025 S06 Clear, T. et al. AI integration in the IT professional workplace: a scop-ing review and interview study with implications foreducation and professional competencies Conference Innovation and Technologyin Computer Science Edu-cation NA 2025 S07 Ghebrehiwet, I. et al. Revolutionizing personalized medicine with generative AI: a systematic review Journal Artificial Intelligence Review Q1 2024 S08 Gumusel, E. A literature review of user privacy concerns in conver-sational chatbots: a social informatics approach: anannual review of information science and technology (ARIST) paper Journal Journal of the Associationfor Information Science and Technology Q1 2025 S09 Laine, J. et al. Understanding the ethics of Generative AI: establishedand new ethical principles Journal Communications of the Association for Information Systems Q3 2025 S10 Lareyre, F. et al. Comprehensive review of natural language processing (NLP) in vascular surgery Journal EJVES Vascular Forum Q2 2023 S11 Li, M. & Guenier, A.W. ChatGPT and health communication: a systematicliterature review Journal International Journal of E-Health and Medical Com-munications Q4 2024 S12 Maita, I. et al. Pros and cons of Artificial Intelligence-ChatGPT adop-tion in education settings: a literature review andfuture research agendas Journal IEEE Engineering Manage-ment Review — 2024 S13 Mambile, C., & Ishengoma,F. Exploring the non-linear trajectories of technologyadoption in the digital age Journal Technological Sustainabil-ity — 2024 S14 Mohammad, B. et al. The pros and cons of using ChatGPT in medical edu-cation: a scoping review Conference Healthcare Transformation with Informatics and Arti- ficial Intelligence NA 2023 S15 Onatayo, D. et al. Generative AI applications in architecture, engineer-ing, and construction: trends, implications for practice,education & imperatives for upskilling—a review Journal Architecture Q3 2024 S16 Ouanes, K. Generative artificial intelligence in healthcare: currentstatus and future directions Journal Italian Journal of Medicine Q4 2024 S17 Pool, J. et al. Large language models and generative AI in telehealth:a responsible use lens Journal Journal of the AmericanMedical Informatics Associ-ation Q1 2024 S18 Schneider, J. Explainable Generative AI (GENXAI): a survey, con-ceptualization, and research agenda Journal Artificial Intelligence Review Q1 2024 13 Table 5 List of included roadmaps (R studies). ID Title Authors Venue type Venue WoS Year R01 Wei, X. et al. Addressing bias in Generative AI: challenges andresearch opportunities in information management Journal Information & Manage-ment Q1 2024 R02 Chau, M. & Xu, J. An IS research agenda on Large Language Models:development, applications, and impacts on busi-ness and management Journal ACM Transactions onManagement Informa-tion Systems Q2 2025 R03 Dwivedi, Y.K. et al. GenAI’s impact on global IT management: a multi-expert perspective and research agenda Journal Journal of Global Infor-mation Technology Man-agement Q2 2025 R04 Feuerriegel, S. et al. Generative AI Journal Business & InformationSystems Engineering Q1 2024 R05 Haase, J. et al. Interdisciplinary directions for researching theeffects of robotic process automation and large lan-guage models on business processes Journal Communications of the Association for Informa- tion Systems Q3 2024 R06 Jarvenpaa, S. & Klein, S. New frontiers in information systems theorizing:human-gAI collaboration Journal Journal of the Associa-tion for Information Sys-tems Q1 2024 R07 Nah, F.F.H. et al. An activity system-based perspective of generative AI: challenges and research directions Journal AIS Transactions onHuman-Computer Inter-action — 2023 R08 Sigala, M. et al. ChatGPT and service: opportunities, challenges,and research directions Journal Journal of Service Theoryand Practice Q2 2024 R09 Srivastava, A. et al. The present and future of AI: ethical issues andresearch opportunities Journal Communications of the Association for Informa- tion Systems Q3 2025 R10 Storey, V.C. et al. Generative Artificial Intelligence: Evolving tech-nology, growing societal impact, and opportunitiesfor information systems research Journal Information SystemsFrontiers Q1 2025 14 Table 6 Data Extraction Form Data FieldDescriptionMapped RQ(s) Bibliographic & Contextual Data Type of publicationClassification as either “Journal” or “Conference”.RQ1.2 IS Application sectorsThe industry or organizational contexts in which GenAI applications are examined (e.g., healthcare, edu- cation, manufacturing). RQ1.3 Methodological Details—Secondary Studies Type of Secondary StudyThe specific review method used (e.g., SLR, SMS).RQ1.1 Databases searchedList of electronic databases used to find primary liter- ature. RQ1 Number of primary studies included The total count of primary studies included in the review. RQ1 Publication date rangeThe publication date range of the primary studies included. RQ1 Methodological Details—Research Agendas Research method usedThe research method used to identify research gaps and/or directions. RQ1.1 Data for Qualitative Synthesis Reported benefitsQuoted or summarized benefits, advantages, or oppor- tunities. RQ2 Reported challenges or lim- itations Quoted or summarized challenges, risks, ethical con- cerns, or limitations. RQ2 Suggestions about gaps or future research Identified research gaps and specific suggestions for future work. RQ3 4. Were basic data/studies adequately described? 5. Were the included studies synthesised? Each criterion was rated as Yes (1.0), Partly (0.5), or No (0.0). A composite quality score was calculated for each study as the arithmetic mean across the five criteria. The resulting scores and detailed ratings are reported in Section 4.7.3. 4.7.2 Assessment of Research Agenda Papers The DARE criteria are designed for systematic reviews and are not directly applicable to research agenda papers, which represent expert-driven conceptualizations rather than systematic evidence syntheses. Following the principle of differentiated treatment outlined in Section 4.1.2, we developed custom quality criteria appropriate to this literature type: 1. Is the method for identifying research gaps or directions transparently described? 2. Are the proposed directions grounded in cited empirical evidence or prior reviews? 3. Are the proposed research directions actionable? 4. Does the paper consider more than one stakeholder perspective (e.g., technical, organizational, societal)? 5. Is the scope and context of applicability clearly delimited? These criteria assess the transparency, empirical grounding, actionability, stakeholder comprehensiveness, and delimitation of research agenda papers. The same rating scale (0.0–1.0) was applied. 4.7.3 Quality Assessment Results Each of the 28 studies was independently assessed by two reviewers using the defined criteria (140 rating items in total). Observed agreement was 69.3% (97/140 items), yielding Cohen’sκ= 0.47 (z= 7.01, p <0.001), indicating moderate agreement. All disagreements were subsequently resolved through discussion until consensus was reached. Secondary Studies Results Of the 18 secondary studies, three (S03, S08, S11) met all five DARE criteria, achieving a perfect score of 1.0. The most prevalent limitation concerned criterion 3, as many studies failed to formally assess the quality of the primary literature they synthesized. Table 7 presents the detailed ratings. 15 Table 7 Quality Assessment of Secondary Studies Study C1C2C3C4C5 Score S01YesYesPartly PartlyYes0.8 S02YesPartly PartlyYesYes0.8 S03YesYesYesYesYes1.0 S04PartlyYesNoYesNo0.5 S05YesYesYesYesPartly0.9 S06YesPartlyNoNoPartly0.4 S07YesPartly PartlyYesYes0.8 S08YesYesYesYesYes1.0 S09YesYesNoNoYes0.6 S10PartlyNoNoNoPartly0.2 S11YesYesYesYesYes1.0 S12YesNoNoPartly Partly0.4 S13PartlyYesNoYesYes0.7 S14NoYesNoYesPartly0.5 S15NoYesNoYesYes0.6 S16NoNoNoNoNo0.0 S17YesYesNoYesYes0.8 S18NoNoNoYesYes0.4 Research Agenda Results Among the 10 research agenda papers, one (R09) met all five custom criteria. The most common limitation was criterion 1 (methodological transparency), reflecting the expert-opinion nature of this document class. Table 8 presents the detailed ratings. Table 8 Quality Assessment of Research Agenda Papers Study C1C2C3C4C5 Score R01NoPartlyYesYesYes0.7 R02NoYesYesPartlyYes0.7 R03PartlyYesPartlyYesPartly0.7 R04Partly Partly PartlyYesPartly0.6 R05PartlyYesPartlyYesYes0.8 R06Partly PartlyYesNoPartly0.5 R07PartlyYesYesYesYes0.9 R08PartlyYesYesYesYes0.9 R09YesYesYesYesYes1.0 R10PartlyYesPartlyYesPartly0.7 Inclusion Strategy Importantly, no study was excluded on the basis of its quality score. Because GenAI in the IS field has only produced a limited corpus of secondary studies and research agendas since late 2022, excluding lower- scoring contributions would narrow an already small evidence base and risk suppressing early yet potentially valuable insights. 4.7.4 Certainty of Evidence To enable readers to assess confidence in each synthesized finding, we adopted a structured transparency approach. Rather than imposing a single assessor-driven confidence rating—which would introduce an additional layer of subjective interpretation—we make the evidential basis of each finding directly inspectable. For every synthesized code (the most granular unit of our thematic structure), we report the complete set of contributing studies together with their individual quality scores. This design allows readers to immediately see how many studies support a given finding and the methodological rigor of those studies. This information is presented in Tables 9, 10, and 11 (Section 5.2). 4.8 Data Analysis and Synthesis The analysis of the extracted data followed two distinct streams: (1) quantitative descriptive analysis for RQ1 and (2) a qualitative analysis informed by grounded theory coding techniques for RQ2 and RQ3. 16 For RQ1 (the landscape of research), we synthesized the structured data from the extraction form. This allowed us to characterize the literature and identify publication trends, dominant research areas, and key methodological parameters of the included studies. For RQ2 (Benefits and Challenges) and RQ3 (Gaps and Future Research), we employed a hybrid thematic analysis inspired by grounded theory principles (Wolfswinkel, Furtmueller, & Wilderom, 2013). Our approach combined a deductive framework aligned with our research questions and an inductive, iterative coding process to ensure both theoretical coherence and openness to emerging insights. The analysis progressed through the following four phases: 1.Phase 1: Deductive Framework Application. The deductive component of our analysis was embedded directly into the data extraction form. The form included separate columns corresponding to our three high-level themes:Benefits,Challenges or limitations, andGaps or future research. This design ensured that as data was extracted, it was immediately structured according to the top-down deductive framework derived from our RQs. 2.Phase 2: Open Coding. Within each main theme, two researchers independently conducted open coding to identify and label discrete concepts emerging from the text segments. Following grounded theory conventions, this stage focused on breaking down the data into meaningful units that captured distinct ideas. Codes were continuously compared and refined in an iterative fashion—known as constant comparative analysis—to ensure conceptual clarity and consistency across studies. This phase was conducted iteratively until all the data within each thematic column were coded. 3.Phase 3: Axial Coding. After open coding, the researchers examined relationships and dependencies among codes to link related concepts and organize them into higher-order categories. 4.Phase 4: Refinement and Validation. The final step involved organizing the developed categories within each of the three main themes to ensure they were distinct, comprehensive, and accurately represented the underlying data, forming a coherent thematic structure. Throughout all coding stages, three major consensus meetings served as critical quality assurance checkpoints, ensuring the consistent application of the coding protocol. In line with our commitment to transparency and replicability, the final codebooks and the fully annotated data spreadsheet are publicly available in our online repository 6 . The resulting coding schemes are detailed in Section 5.2. These schemes provide the structure for the synthesized findings addressing RQ2 and RQ3, which are presented in Section 6. 5 Results 5.1 Bibliometrics of Qualified Sources Given the topic being highly up-to-date and dynamically advancing, the vast majority of papers that qualified for data extraction and subsequent analysis were published in 2024 or 2025. Only three papers dated back to 2023 (Figure 2). Roadmaps constituted 35.7% of processed sources, whereas secondary studies – 64.3%. On top of that, the quality of the sources could be regarded as high. All ten roadmaps were published in scientific journals. Likewise, 16 of 18 secondary studies constituted journal material (which in total amounts to 92.9% of the sources), with another two publications in conference proceedings. All the papers had DOIs assigned. Fig. 2 Papers Analyzed: Publication Years and Types of Venues Out of the journal papers, 23 came from scientific journals ranked by the Journal Citation Reports (2024 edition). In total, 42.3% of the journals in which the articles were published were classified in Q1, 19.2% in Q2, with 19.2% and 7.7% in Q3 and Q4, respectively. It is shown in Figure 3 (X axis indicates the citable items; Y axis the impact factor; the bubble size represents the number of sources included in our review published by a given journal). It is Springer Nature that is the entity most commonly responsible for publishing papers that were found relevant for this study (6 occurrences), with the Association for 6 https://github.com/przybylek/GenAI4IS 17 Fig. 3 Journal Distribution in Terms of Journal Impact Factors and Citable Items Fig. 4 Papers Analyzed: IS Application Sectors Information Systems listed in the second place (4 occurrences from JCR + one outside of JCR). In fact, the Communications of the Association for Information Systems was the most common journal in our analysis (3 publications), and the Artificial Intelligence Review followed with two published papers. The latter also had the highest Impact Factor among the group. With one exception (i.e., iScience), the journals were low- to medium-scale, with the number of citable items across two years ranging from 23 to 505. The papers qualified for data extraction covered the bulk of universally recognized IS domains. As far as the application sectors are concerned, the substantial part of distinct aggregates of industries and industry groups covered by the International Standard Industrial Classification of All Economic Activities (ISIC) (United Nations, 2008) is screened by the studies. As a matter of fact, it is the Human Health & Social Work Activities (listed within ISIC classification as sector Q) that was the most universally covered by the studies (Figure 4) – with 15 different articles analyzing or providing implications for this sector. Among other sectors that stood out, one should highlight (1) Information & Communication; (2) Professional, Scientific & Technical Activities; as well as (3) Education. The former covers, inter alia, a wide range of IS-centric activities related to software development, IT services, data processing, and communicating via web portals and other IS-relevant media. On the other hand, Sector M (Professional, Scientific & Technical Activities) features market research and public opinion polling division that is enables tracking Gen AI enhancement of marketing-centric information systems. 18 As far as secondary studies that were covered by our research are concerned, most of them were constructed around SLR as the primary research method (Figure 5). Half of the secondary studies under analysis were confirmed to employ this method with satisfactory rigor. The scoping review followed with 5 occurrences. It should be noted that one of the studies combined SLR with Narrative Review, which is reflected in Figure 5. In 8 cases, the PRISMA protocol was followed. Most secondary studies relied on more than one database. Scopus was the predominant choice here, with Web of Science, Google Scholar, and PubMed also being used more frequently than average (see Figure 5; only databases with multiple counts are displayed). A few studies queried individual publishers’ databases directly. In one case, Google Scholar was only manually searched (and so was ResearchGate) after scrutinizing reference lists and when attempting to enrich the source list with gray literature. In another case, the database-based search was intentionally narrowed down to a dozen leading IS journals. The number of primary studies qualified for subsequent analysis was reported to be between 11 and 550, with a mean of 97.9 and a median value of 41.5. Fig. 5 Method-related Specificity of Secondary Studies 5.2 Overview of Themes and Categories Our thematic analysis of the literature identified three overarching themes that structure the current discourse on GenAI in Information Systems: (1) the transformative Benefits that drive the technology’s adoption (Figure 6); (2) the significant Challenges and Limitations that temper its deployment (Figure 7); and (3) the critical Research Gaps and Future Research Directions that chart a path forward for the academic community (Figure 8). Each theme comprises multiple categories with associated codes that emerged inductively from our systematic analysis. Below, we present each theme along with its constituent categories. A detailed exploration of the findings is provided in Section 6. To support the assessment of evidential strength, each theme is accompanied by a certainty-of-evidence table (Tables 9, 10, and 11) that lists every synthesized code alongside its contributing studies and their individual quality scores. These tables enable readers to directly inspect the methodological foundation of any finding of interest and to form their own judgment about the confidence warranted by the underlying evidence. 5.2.1 Benefits This theme encompasses the wide-ranging benefits of GenAI across Information Systems domains. GenAI enhances healthcare and education, supports research and creative work, boosts software engineering productivity, improves accessibility and user experience, and advances data management and content generation—while underscoring the importance of ethical and responsible implementation. Healthcare and Life Sciences [B1] In healthcare and life sciences, GenAI improves information flow, clinical decision-making, and include technologies that help patients manage and improve their health. It supports more accurate and efficient clinical workflows, accelerates drug discovery, and facilitates precision medicine through data-driven insights that enable personalized treatment. Moreover, GenAI improves documentation practices and administrative efficiency within healthcare institutions while facilitating clearer communication of medical and statis- tical information. Beyond clinical contexts, GenAI contributes to emotional support and mental health management, promoting well-being and improving access to healthcare knowledge across populations. 19 Education and Learning Healthcare and Life Sciences Research, Innovation, and Design Software Engineering and Technical Productivity Data Management, Creative Content, Privacy and Ethics Communication, Accessibility, Services, and SocialImpact ClinicalDocumentationand Workflow Diagnostics, Decision-Making, and PatientCare StrengtheningMentalHealth DrugDiscovery Acceleration and Precision Medicine Access to Healthcare Knowledge and SimplifiedCommunication PersonalizedLearning Digital Learning Resources, Reduced Educator Administrative Tasks Assistance in Language Translationand Accessibility SupportsDesign Knowledge and Research AcrossDisciplines Tool-Supported Idea Generation Analysis, Coding, Te s t i n gand Translations (Automatically) Efficiencyin Software Development Communication and Accessibility Digital GovernmentServices Enhanced User Interactionand Experience Summariesand Notes SyntheticData, ReducingBias, IncreasingResponsibility Content Creation Ta s k Automationand ServiceScaling BENEFITS B5 B2 B1 B3 B4 B6 Fig. 6 Theme: Benefits Societal, Ethical and Fairness Concerns Reliability and Accuracy Transparency and Explainability Privacy, Security, Governance and Legal Model Opacityand the Black-Box Problem ExplainabilityShortcomings Data Privacy Violations Security Vulnerabilities Regulatory and LegalGaps Accountabilityand LiabilityAmbiguity Copyright and OwnershipDisputes Hallucinationsand FactualInaccuracy Limited ContextualAwareness Lack of Evidenceand Validation OutdatedorLimited Knowledge Performance Inconsistencyand Drift Potential for Misuseand Harmful Content Human Autonomyand Skill Devaluation FairnessImplementationChallenges EnvironmentalSustainability Biasesand Discrimination CHALLENGES AND LIMITATIONS C1 C2 C3 C4 Fig. 7 Theme: Challenges and Limitations Education and Learning [B2] GenAI transforms education by enabling adaptive, accessible, and efficient learning. It personalizes instruction by tailoring content to individual learning styles and automates the creation of educational materials. By reducing educators’ administrative and assessment workloads, GenAI allows greater focus on teaching and student engagement. Its translation and accessibility features further ensure equitable learning opportunities for diverse learners, including those with disabilities, fostering more inclusive educational environments. Research, Innovation, and Design [B3] GenAI accelerates research and innovation by supporting knowledge synthesis, ideation, and cross-disciplinary collaboration. It automates information analysis and extraction, enabling the integration of insights from multiple domains. Within design science and innovation processes, GenAI assists in developing design requirements, principles, and prototypes, supporting design thinking methodologies. By combining human and computational creativity, it enables rapid exploration of novel ideas and solutions, driving innovation on a larger scale and at a faster pace. 20 User Perspective Societal Perspective Ethical Perspective Engineering Perspective Organizational Perspective Quality Requirements Perspective Risks Introducedby GenAI to Societies New Use Casesfor GenAI Applications Application to SpecificDomains and Business Sectors Regulationsand Legal IssuesRelated to GenAI PsychologicalEffectsof GenAIUseon Individuals The RequiredSkills and Education Ethical Use of GenAI Security and Protection DemonstratingTrustworthiness Data Privacy Protection GenAIAdoptionand AcceptanceFactors Organizational Transformations to IncorporateGenAI GenAIBias Mitigation Transparency and Explainability Awarenessof User’sSpecifics ProperRepresentation and Inclusivity EmpiricalEvaluation of GenAI in the Field GenAI SystemDevelopmentandMaintenance Efficiencyand Scalability Accountabilityand Contestability Effecton Job Market Societal Impactof GenAI PersonalizationTailored to the IndividualUser BenefitsProvidedby GenAIto Organizations RisksIntroducedby GenAIto Organizations Automation of RoutineTa s k s Formation of Hybrid Human-AI Teams User-GenAI InteractionDesign RisksAffecting IndividualUsers Design Principlesfor GenAISolutions Definition of GenAIMetrics GenAIModel Training FUTURE RESEARCH F1 F2 F3 F4 F5 F6 Fig. 8 Theme: Research Gaps and Future Research Directions Table 9 Certainty of Evidence: Benefits Cat. Code NameContributing Studies B1Diagnostics, Decision-Making, and Patient Care R09(1.0); S01(0.8); S02(0.8); S03(1.0); S10(0.2); S11(1.0); S16(0.0); S17(0.8) Drug Discovery Acceleration and Preci- sion Medicine S01(0.8); S16(0.0) Clinical Documentation and WorkflowS01(0.8); S02(0.8); S03(1.0); S05(0.9); S10(0.2) Strengthening Mental HealthR09(1.0); S01(0.8); S17(0.8) Access to Healthcare Knowledge and Sim- plified Communication R09(1.0); S01(0.8); S02(0.8); S03(1.0); S10(0.2); S11(1.0); S14(0.5); S17(0.8) B2Personalized LearningR04(0.6); S01(0.8); S10(0.2); S12(0.4); S14(0.5); S15(0.6); S18(0.4) Digital Learning Resources, Reduced Edu- cator Administrative Tasks S12(0.4); S14(0.5) Assistance in Language Translation and Accessibility S12(0.4); S14(0.5) B3Supports Design Knowledge and Research Across Disciplines R01(0.7); R03(0.7); R04(0.6); R06(0.5); S02(0.8) Tool-Supported Idea GenerationR04(0.6); R05(0.8); R06(0.5); R07(0.9); S15(0.6); S18(0.4) B4Analysis, Coding, Testing and Transla- tions (Automatically) R05(0.8); S06(0.4) Efficiency in Software DevelopmentR02(0.7); R05(0.8); S06(0.4) B5Communication and AccessibilityR08(0.9); R09(1.0); S09(0.6); S13(0.7); S18(0.4) Enhanced User Interaction and Experi- ence R02(0.7); R04(0.6); R08(0.9); S10(0.2); S13(0.7) Digital Government ServicesR03(0.7); R04(0.6); R08(0.9) B6Summaries and NotesR04(0.6); S01(0.8) Content CreationR04(0.6); R07(0.9); S04(0.5); S18(0.4) Synthetic Data, Reducing Bias, Increasing Responsibility R08(0.9); S07(0.8); S13(0.7) Task Automation and Service ScalingR03(0.7); R04(0.6); R08(0.9); R09(1.0); S01(0.8); S14(0.5); S18(0.4) Software Engineering and Technical Productivity [B4] In software engineering, GenAI improves productivity and software quality by automating development and testing processes. It generates and translates code, produces test cases, and supports optimization throughout the development lifecycle, reducing manual effort and accelerating delivery. GenAI-based tools enhance workflow efficiency and decision-making, allowing teams to focus on complex design and architecture tasks while improving maintainability and reducing development costs. 21 Communication, Accessibility, Services, and Social Impact [B5] GenAI enhances communication and service delivery by enabling natural, responsive, and context-aware interactions between users and systems. Automated translation, summarization, and conversational capabili- ties improve accessibility and engagement across diverse audiences. In both private and public sectors, GenAI personalizes and scales service delivery, enhancing inclusivity and responsiveness. Public administrations benefit from GenAI-driven translation and accessibility features that expand reach and equity in digital government services. Data Management, Creative Content, Privacy, and Ethics [B6] GenAI advances data management and content creation through automated summarization, organization, and multimodal generation. It enables efficient extraction of key information from complex documents and supports the production of text, image, audio, and video content, expanding creative potential across industries. Moreover, GenAI facilitates synthetic data generation to enhance privacy, mitigate bias, and support ethical AI development by providing realistic yet anonymized datasets for training and validation. Automation of routine tasks further enhances productivity and allows organizations to focus on higher-value strategic activities. 5.2.2 Challenges and Limitations This theme consolidates the significant risks and obstacles that temper the adoption of GenAI. The challenges are multifaceted, spanning from the technology’s inherent technical unreliability and “black-box” nature to its profound societal and ethical implications, including the amplification of bias and the potential for misuse. These issues are further compounded by a lagging legal and governance landscape that struggles to address critical gaps in privacy, security, and accountability. Table 10 Certainty of Evidence: Challenges and Limitations Cat. Code NameContributing Studies (Quality Score) C1Biases and DiscriminationR01(0.7); R02(0.7); R04(0.6); R07(0.9); R08(0.9); R09(1.0); R10(0.7); S01(0.8); S09(0.6); S11(1.0); S14(0.5); S16(0.0) Fairness Implementation ChallengesR01(0.7); R08(0.9); S09(0.6) Potential for Misuse and Harmful ContentR03(0.7); R06(0.5); R07(0.9); R08(0.9); R10(0.7); S01(0.8); S09(0.6); S12(0.4); S14(0.5); S17(0.8) Human Autonomy and Skill DevaluationR08(0.9); R09(1.0); S01(0.8); S08(1.0); S09(0.6); S11(1.0); S12(0.4); S14(0.5); S17(0.8) Environmental SustainabilityR01(0.7); R02(0.7); R04(0.6); R10(0.7); S07(0.8); S09(0.6); S10(0.2); S15(0.6) C2Hallucinations and Factual InaccuracyR02(0.7); R03(0.7); R04(0.6); R05(0.8); R06(0.5); R07(0.9); R08(0.9); R10(0.7); S01(0.8); S02(0.8); S03(1.0); S05(0.9); S09(0.6); S10(0.2); S11(1.0); S14(0.5); S16(0.0); S17(0.8) Outdated or Limited KnowledgeR04(0.6); R06(0.5); S07(0.8); S10(0.2); S11(1.0); S14(0.5) Limited Contextual AwarenessR04(0.6); R05(0.8); S01(0.8); S07(0.8); S09(0.6); S11(1.0) Performance Inconsistency and DriftR04(0.6); S05(0.9); S07(0.8); S09(0.6); S11(1.0) Lack of Evidence and ValidationR06(0.5); S01(0.8); S07(0.8); S08(1.0); S18(0.4) C3Model Opacity and the Black-Box ProblemR02(0.7); R06(0.5); R08(0.9); R09(1.0); R10(0.7); S08(1.0); S09(0.6); S14(0.5); S16(0.0) Explainability ShortcomingsR02(0.7); S06(0.4); S07(0.8); S09(0.6); S18(0.4) C4Data Privacy ViolationsR03(0.7); R08(0.9); R09(1.0); S01(0.8); S02(0.8); S08(1.0); S09(0.6); S11(1.0); S15(0.6); S16(0.0); S17(0.8) Security VulnerabilitiesR03(0.7); S08(1.0); S09(0.6); S10(0.2); S16(0.0); S17(0.8) Regulatory and Legal GapsR07(0.9); R09(1.0); S08(1.0); S13(0.7); S16(0.0) Accountability and Liability AmbiguityR03(0.7); S09(0.6); S17(0.8) Copyright and Ownership Disputes R04(0.6); R08(0.9); R10(0.7); S09(0.6); S10(0.2); S12(0.4); S14(0.5) Societal, Ethical and Fairness Concerns [C1] This category encompasses broad societal and ethical risks, beginning with the GenAI models’ propensity to inherit and amplify societal biases from training data, leading to discriminatory outcomes and representational harm. This issue is compounded by the profound challenge of implementing fairness, which is hindered by a lack of universal standards and conflicts with organizational goals. The category also covers the significant potential for deliberate misuse, where GenAI is exploited to generate harmful content like misinformation, propaganda, and deepfakes at scale. Furthermore, it examines the erosion of human autonomy and cognitive 22 skills, such as critical thinking, due to over-reliance, and finally, it includes the significant environmental costs associated with the high energy and computational resources required for model training and operation. Reliability and Accuracy [C2] This category focuses on the technical shortcomings that undermine the dependability and precision of GenAI outputs, highlighting risks in real-world applications where accuracy is critical. Core challenges include hallucinations (plausible but incorrect or fabricated information), outdated knowledge from fixed training cut-offs, and a limited contextual awareness resulting in superficial or inappropriate outputs. Reliability is further jeopardized by unpredictable performance, including inconsistent results and degradation or “drift” over time. Underpinning all of these operational risks is a pervasive lack of domain-specific benchmarks and validation frameworks, which ultimately undermines confidence in GenAI’s readiness for critical applications. Transparency and Explainability [C3] This category deals with the challenges of making GenAI systems understandable and auditable. At its core is the “black-box” nature of transformer architectures, a fundamental opacity that hinders interpretation, auditing, and trust. This problem is exacerbated by the profound immaturity of current explainability (XAI) methods, which are insufficient for revealing the rationale behind model outputs. Privacy, Security, Governance and Legal [C4] This category addresses the constellation of vulnerabilities and regulatory hurdles surrounding GenAI deployment. It details significant risks to data privacy, where sensitive information is exposed through both model memorization and system flaws, as well as the system’s susceptibility to deliberate security breaches such as jailbreaking and prompt injection. Underpinning these technical risks is a significant regulatory and legal gap, where governance frameworks lag far behind the pace of technological development. This gap, in turn, creates a critical ambiguity in accountability and liability, leaving the question of who is responsible for AI-generated harm unanswered. Finally, the category examines profound challenges to intellectual property, from unresolved disputes over the use of copyrighted material in training data to the ambiguous legal ownership of AI-generated content. 5.2.3 Research Gaps and Future Research Directions This theme encompasses the identified research gaps and the prospective future research directions identified by the scholars exploring the applications of GenAI in IS domain. The analysis of GenAI influence on individuals, organizations and societies reveals numerous topics that require further exploration. The clash between new technology and existing processes and structures raises numerous ethical questions for which there are no simple answers or solutions. The wider use of GenAI solutions requires ensuring an adequate level of quality, and to this end, it is necessary to identify appropriate quality characteristics and related quality requirements. It is also impossible to ignore the perspective of the engineers responsible for developing GenAI solutions and adapting them for use in their target context. 23 Table 11 Certainty of Evidence: Research Gaps and Future Directions Cat. Code NameContributing Studies (Quality Score) F1User-GenAI Interaction DesignR01(0.7); R02(0.7); R04(0.6); R05(0.8); R07(0.9); R08(0.9); R10(0.7); S01(0.8); S08(1.0); S11(1.0); S18(0.4) The Required Skills and EducationR03(0.7); R04(0.6); R05(0.8); R07(0.9); R08(0.9); R10(0.7); S06(0.4); S12(0.4) Personalization Tailored to the Individual UserR01(0.7); R05(0.8); R07(0.9); R08(0.9); R10(0.7); S11(1.0); S18(0.4) Psychological Effects of GenAI Use on Individ- uals R05(0.8); R07(0.9); R08(0.9); R09(1.0); R10(0.7); S08(1.0) Risks Affecting Individual UsersR03(0.7); R04(0.6); R07(0.9); S08(1.0); S11(1.0); S12(0.4) F2GenAI Adoption and Acceptance FactorsR02(0.7); R05(0.8); R07(0.9); R08(0.9); R09(1.0); R10(0.7); S01(0.8); S07(0.8); S12(0.4); S13(0.7) Formation of Hybrid Human-AI TeamsR03(0.7); R05(0.8); R07(0.9); R10(0.7); S06(0.4); S12(0.4) Automation of Routine TasksR02(0.7); R04(0.6); R05(0.8); R08(0.9); R10(0.7); S15(0.6) Organizational Transformations to Incorporate GenAI R04(0.6); R07(0.9); R08(0.9); R10(0.7) Benefits Provided by GenAI to OrganizationsR02(0.7); R07(0.9); S01(0.8); S14(0.5); S17(0.8) Risks Introduced by GenAI to OrganizationsR05(0.8); R08(0.9); R09(1.0); S03(1.0) F3Societal Impact of GenAIR01(0.7); R03(0.7); R04(0.6); R05(0.8); R07(0.9); R09(1.0); R10(0.7); S01(0.8); S11(1.0); S17(0.8); S18(0.4) Application to Specific Domains and Business Sectors R01(0.7); R02(0.7); R04(0.6); R07(0.9); R08(0.9); R09(1.0); S01(0.8); S02(0.8); S03(1.0); S05(0.9); S11(1.0); S14(0.5); S16(0.0) Regulations and Legal Issues Related to GenAIR01(0.7); R07(0.9); R08(0.9); R10(0.7); S11(1.0); S16(0.0); S18(0.4) Risks Introduced by GenAI to SocietiesR02(0.7); R07(0.9); R08(0.9); R10(0.7); S01(0.8); S14(0.5); S15(0.6) New Use Cases for GenAI ApplicationsR02(0.7); R04(0.6); R06(0.5); R07(0.9); S04(0.5); S12(0.4); S16(0.0) Effect on Job MarketR04(0.6); R07(0.9); R08(0.9) F4Ethical Use of GenAIR01(0.7); R02(0.7); R03(0.7); R07(0.9); R08(0.9); R09(1.0); R10(0.7); S06(0.4); S09(0.6); S11(1.0); S12(0.4); S18(0.4) GenAI Bias MitigationR01(0.7); R02(0.7); R07(0.9); R08(0.9); R09(1.0); S11(1.0); S17(0.8); S18(0.4) Transparency and ExplainabilityR01(0.7); R02(0.7); R08(0.9); R09(1.0); S01(0.8); S07(0.8); S09(0.6); S12(0.4); S16(0.0); S17(0.8); S18(0.4) Awareness of User’s SpecificsR01(0.7); R03(0.7); R07(0.9); S11(1.0); S17(0.8); S18(0.4) Proper Representation and InclusivityR01(0.7); R03(0.7); R09(1.0); S09(0.6); S11(1.0) F5Definition of GenAI MetricsR01(0.7); R02(0.7); R06(0.5); R07(0.9); R08(0.9); R10(0.7); S02(0.8); S07(0.8); S13(0.7); S14(0.5); S18(0.4) Empirical Evaluation of GenAI in the FieldR01(0.7); R06(0.5); S01(0.8); S05(0.9); S08(1.0); S11(1.0); S17(0.8) Design Principles for GenAI SolutionsR04(0.6); R05(0.8); R07(0.9); S07(0.8); S08(1.0) GenAI Model Training R01(0.7); R02(0.7); R03(0.7); R04(0.6); R07(0.9); R08(0.9); S02(0.8); S03(1.0); S07(0.8); S11(1.0); S16(0.0) GenAI System Development and MaintenanceR04(0.6); R05(0.8); S11(1.0) F6Data Privacy ProtectionR03(0.7); R08(0.9); R09(1.0); R10(0.7); S08(1.0); S11(1.0); S12(0.4); S17(0.8) Security and Protection R01(0.7); R03(0.7); R06(0.5); R08(0.9); R10(0.7); S11(1.0); S12(0.4); S17(0.8) Demonstrating TrustworthinessR02(0.7); R04(0.6); R08(0.9); R09(1.0); R10(0.7); S01(0.8); S02(0.8); S05(0.9); S07(0.8); S16(0.0); S18(0.4) Efficiency and ScalabilityR01(0.7); R03(0.7); R08(0.9); R10(0.7); S16(0.0) Accountability and ContestabilityR01(0.7); R07(0.9); R08(0.9); S17(0.8) User Perspective [F1] This category encompasses several aspects, from designing effective means of human-AI interaction (e.g. multimodal interfaces) and recognizing user’s individual needs, through risks affecting individual users and GenAI’s psychological influence on humans to the skills necessary to use GenAI for particular purposes and the ways of educating people to use GenAI more effectively and safely. Organizational Perspective [F2] This is the perspective of a company, enterprise or other organization using GenAI in its business processes. It includes identification of the factors important to the acceptance and adoption of GenAI by the organizations, its employees and other stakeholders. The key decision is the composition of human-AI teams including the assignments of tasks and responsibilities as well as dynamics of team’s operations. This requires decisions about which tasks should be automated and which left to be performed by humans. Moreover, adopting GenAI usually requires transforming the organization and changing its processes and structures. Finally, the possible benefits and threats introduced to an organization by GenAI are a viable topic of future research. Societal Perspective [F3] This category focuses on the consideration in what ways GenAI can and should impact societies (professions, social groups, business sectors, nations, countries). A crucial aspect identified here is the specificity of different domains and business sectors which results in the need for investigating the specific context factors and the adjustments necessary for effective use of GenAI in a given domain/sector. The identified needs for GenAI regulations as well as the related issues of copyrights/propriety rights and legal challenges also form 24 a promising area for research. The new, previously unknown use cases of GenAI that can bring new value as well as societal risks GenAI contributes to are directions worth investigating. Finally, the impact of GenAI on job market i.e. replacement of jobs by GenAI solutions but also the emergence of new jobs related to GenAI usage are included in this perspective. Ethical Perspective [F4] The ethical issues related to GenAI are widely discussed as further research topics. Many sources indicate the need for researching the ways of ensuring that GenAI outcome is transparent with respect to algorithm, data, rationale and that its meaning is well-explained. Another widely recognized ethical concern is the possible bias and the effective ways of bias mitigation. Another issue is the requirement that the GenAI solution needs to be aware of user’s specifics (e.g. national, cultural, ethnic) in order to provide the outcome suitable for such user instead of e.g. the outcome relevant only to the countries of the highest income. Also, GenAI solutions should be trained on data that mirrors diversity in the real world and can be effectively used by users of different backgrounds and abilities. Engineering Perspective [F5] This is the perspective of engineers (e.g., software developers, data scientists) responsible for creating GenAI solutions. It includes the core issue of GenAI model creation and training. Also, the need for metrics that capture key properties of GenAI is clearly recognized (such metrics may differ from the ones known from systems not dealing with AI or be entirely new). Evaluation and validation of existing GenAI systems is identified as a research gap – by evaluation/validation we mean applying the GenAI system to real context (business sector, organization), using real data and observing results in the long term rather than running it on a test set and computing metrics like F1. There is a perceived lack of such evaluation, which seems necessary for GenAI’s adoption in real life processes, especially in domains that require reliable evidence (e.g. medicine/healthcare). More research on development and maintenance of systems with GenAI components resulting in definitions of the corresponding processes, techniques and good practices is expected, with the emphasis on design principles for such systems. Quality Requirements Perspective [F6] This perspective encompasses several quality properties and related categories of quality requirements relevant for GenAI solutions and not explored in detail among the previous perspectives. One of them is the trustworthiness of GenAI to its users i.e. what factors make users willing to use GenAI and consider its responses to be reliable. Other issues include data privacy and protection of users from security threats as well as various frauds and misuses of GenAI. The well-known properties like efficiency, performance and scalability are also considered important and requiring further research. Finally, the accountability of GenAI providers and the availability of channels enabling users’ complaints about inappropriate GenAI output are among topics that require more investigation. 6 Findings This section provides a detailed exploration of the three core themes that structure the current scholarly discourse on GenAI in Information Systems. Building upon the framework introduced in Section 5, we synthesize evidence across the selected literature to examine how GenAI is reshaping the field—from the transformative Benefits driving adoption, through the multifaceted Challenges and Limitations tempering deployment, to the emerging Research Gaps and Future Research Directions charting the path forward. For each theme, we elaborate on its constituent categories, illustrating key concepts with evidence drawn from diverse application domains. 6.1 Benefits GenAI represents a significant step forward in the development of modern digital technologies. Its applications cover a wide range of fields, from business and the creative industry to education and medicine. By leveraging the capabilities of automatic content generation and data analysis, GenAI contributes to process optimization, increased efficiency, and drives innovation in various areas of human activity. 6.1.1 Healthcare and Life Sciences [B1] Diagnostics, Decision-Making, and Patient Care GenAI enhances diagnostic accuracy and clinical decision-making through multiple mechanisms. LLM- based models such as GPT-4 integrate extensive medical literature to recommend relevant examinations 25 and provide evidence-based diagnostic insights (Meng et al., 2024), while also improving medical imaging for disease detection and monitoring (Srivastava et al., 2025). NLP solutions automatically extract data from medical documents, identify patient conditions and disease severity, and strengthen interdepartmental communication and care coordination citepS10. GenAI models further mitigate diagnostic bias through multimodal data integration and advanced machine learning techniques (Ouanes, 2024), with applications in triage and diagnosis support (Beheshti et al., 2025), retinal healthcare (Bellanda, Santos, Ferraz, Jorge, & Melo, 2024), dental telemedicine, and chest radiograph interpretation (Pool, Indulska, & Sadiq, 2024). The technology also delivers high-quality semantic health information (M. Li & Guenier, 2024) and supports the development of personalized treatment plans through the analysis of large patient datasets (Ouanes, 2024). Drug Discovery Acceleration and Precision Medicine GenAI accelerates drug discovery by predicting drug properties and supporting genomic research (Meng et al., 2024). It enhances the design and synthesis of novel compounds, expands and optimizes compound libraries, and enables the creation of molecules with targeted therapeutic properties (Ouanes, 2024). By reducing the human, material, and financial resources typically required in traditional drug development, GenAI advances precision medicine through the rapid identification and optimization of candidate drugs tailored to specific patient populations and disease characteristics. Clinical Documentation and Workflow GenAI demonstrates enhanced efficiency in medical documentation through literature synthesis and data organization, handling diverse data types including textual patient records, diagnostic reports, research papers, medical imaging data such as MRIs and CT scans, voice recordings, and biomarkers (Meng et al., 2024). ChatGPT offers substantial potential for automating administrative tasks (Bellanda et al., 2024) and supporting clinical documentation processes (Beheshti et al., 2025), while NLP-based technologies improve interdepartmental communication, coordinate patient care, and ensure appropriate follow-up care (Lareyre et al., 2023). According to Bracken et al. (2025), research reports significant efficiency gains, with AI systems signifi- cantly reducing documentation time, particularly in ambient intelligence systems and complex clinical cases. These time savings directly impact clinician workload and patient care availability, offering a promising means to reduce healthcare professional burnout. Six of nine studies reviewed found that AI-generated documentation met or exceeded traditional standards. Healthcare professionals also reported improved usability and reduced cognitive load, supporting broader adoption of AI-assisted documentation. Addition- ally, NLP-based systems function as virtual assistants for health professionals, streamlining both clinical and administrative workflows (Lareyre et al., 2023). Strengthening Mental Health GenAI demonstrates strong potential to enhance mental health by providing accessible emotional and psychological support. Chatbots and conversational interfaces offer patients guidance and reassurance, delivering benefits across educational, healthcare, and broader social contexts (Srivastava et al., 2025). GenAI facilitates preliminary patient consultations and psychological assistance, helping patients manage the psychological stresses associated with illness (Meng et al., 2024). Research highlights its effectiveness in improving holistic understanding, reducing workload for mental health professionals, mitigating loneliness, and reducing the emotional burden on patients (Pool et al., 2024), collectively contributing to better mental health outcomes and quality of life through scalable, accessible support systems. Access to Healthcare Knowledge and Simplified Communication GenAI improves access to healthcare knowledge and facilitates the communication of complex medical information, enabling patients to obtain reliable insights more quickly and make informed decisions (Srivastava et al., 2025). It simplifies medical terminology and statistical data, providing patients with foundational knowledge before consultations and enhancing their understanding of medical results (Meng et al., 2024). GenAI also strengthens doctor–patient interactions through preliminary consultation tools and clearer explanations, while applications such as ChatGPT show strong potential for patient education (Beheshti et al., 2025; Bellanda et al., 2024; Meng et al., 2024). NLP-based technologies act as virtual assistants, offering patients information and support with tasks such as planning, follow-up, and scheduling (Lareyre et al., 2023). They provide high-quality, semantically rich health information by simplifying medical texts, conveying disease information effectively, and addressing low-risk health queries (M. Li & Guenier, 2024). GenAI further enhances accessibility by translating medical reports into plain language, generating personalized health guidance, and supporting lifestyle interventions (Mohammad et al., 2023; Pool et al., 2024). In dental telemedicine, its multilingual capabilities improve 26 scalability and facilitate effective consultations (Pool et al., 2024), helping democratize access to healthcare knowledge across diverse populations and literacy levels. 6.1.2 Education and Learning [B2] Personalized Learning GenAI supports personalized learning through digital teaching assistants and the creation of supplemental materials such as teaching cases and recap questions (Feuerriegel, Hartmann, Janiesch, & Zschech, 2024). In medical education, it enables advanced training with real-time simulations (Meng et al., 2024) and acts as a virtual assistant that generates educational content and personalized study plans (Lareyre et al., 2023). ChatGPT further adapts content delivery to individual needs, fostering active engagement, self-paced learning, and deeper understanding of the subject (Maita, Saide, Putri, & Muwardi, 2024). GenAI also promotes equitable education by providing flexible, efficient and cost-effective learning oppor- tunities. It offers instant feedback and explanations that improve self-directed learning and curiosity (Maita et al., 2024), while facilitating rapid information access and innovative teaching approaches (Mohammad et al., 2023). AI-driven tools, such as chatbots, enhance these experiences by supporting creativity, automa- tion, personalization, collaboration, multimodal content creation, and accessibility (Onatayo et al., 2024; Schneider, 2024). Collectively, GenAI advances educational systems that adapt to individual learner profiles, preferences, and learning trajectories in diverse contexts. Digital Learning Resources, Reduced Educator Administrative Tasks GenAI streamlines the creation of digital learning resources while significantly reducing the administrative workload of educators. It enables the development of diverse and engaging materials that accommodate different learning styles and enhance instructional quality (Maita et al., 2024). GenAI reduces the time spent on routine tasks by automating tasks such as generating multiple-choice questions, planning lessons, and supporting technology-based teaching (Maita et al., 2024). It also assists in creating new educational content and exam questions, with automatic scoring and grading capabilities (Mohammad et al., 2023), allowing educators to focus on higher-value instructional activities and student engagement. Assistance in Language Translation and Accessibility GenAI enhances accessibility and inclusivity in education by supporting language translation and adapting content for diverse learner populations. It streamlines classroom tasks such as lesson planning and technology- based instruction while providing instant answers that promote self-directed learning and exploration (Maita et al., 2024). GenAI also assists in writing assignments, developing research papers, and generating educational materials and exam questions (Mohammad et al., 2023), ensuring that learning resources are accessible to students with different linguistic and accessibility needs, including those with disabilities. 6.1.3 Research, Innovation, and Design [B3] Supports Design Knowledge and Research Across Disciplines GenAI demonstrates transformative potential in improving productivity, decision-making, and economic value across business sectors and research disciplines (X. Wei et al., 2025). It improves efficiency (Dwivedi et al., 2025) and supports process discovery by generating process descriptions that help organizations identify and analyze different workflow stages (Feuerriegel et al., 2024). The capacity of GenAI to model complex, non-linear business processes enables its use in implementation, simulation, and predictive monitoring, while fostering innovation through new business ideas, products, services, and models (Feuerriegel et al., 2024). The technology reshapes organizational knowledge management by automating knowledge discovery from large volumes of unstructured, distributed data. It enhances knowledge sharing through automatic generation and dissemination of multilingual content, such as Wikis and FAQs, and delivers personalized insights to employees (Feuerriegel et al., 2024). In design science research, GenAI supports the construction of novel IT artifacts by extracting design knowledge in the form of requirements, principles, and features, from interdisciplinary sources, making it collectively available to researchers and practitioners (Feuerriegel et al., 2024). Integrated into design thinking and related methodologies, GenAI augments human creativity in idea generation, user needs elicitation, prototyping, evaluation, and automation (Feuerriegel et al., 2024). Furthermore, it enables the algorithmic identification of knowledge gaps and inconsistencies, promotes new dialogic and methodological approaches, and supports the formulation of innovative research questions across disciplines (Beheshti et al., 2025; Jarvenpaa & Klein, 2024). Tool-Supported Idea Generation According to Feuerriegel et al. (2024), GenAI enhances idea generation across organizational functions by combining human and computational creativity. In business process management, it supports innovative 27 process redesign and automation, driving the development of next-generation process guidance systems. It enables automated knowledge discovery, improves knowledge sharing through content generation, and maintains enterprise models at multiple abstraction levels, while supporting digital twin applications for enterprise asset management. GenAI also delivers high-quality natural language interfaces that enhance usability and accessibility, producing optimized content for social media, emails, and reports (Feuerriegel et al., 2024). It improves collaboration through intelligent agents, automates personalized marketing, and strengthen recommender systems through advanced personalization (Feuerriegel et al., 2024). In design thinking and innovation contexts, GenAI supports user needs elicitation, prototyping, evaluation, and design automation (Feuerriegel et al., 2024), showcasing strong potential for human–AI collaboration that amplifies creative problem-solving (Haase et al., 2024; Jarvenpaa & Klein, 2024). In architecture, engineering, and construction, it facilitates concept visualization and generation of alternative design solutions, supports data-driven decision-making and provides instant training and feedback (Onatayo et al., 2024). GenAI has demonstrated exceptional creative performance, including passing university-level exams, achieved through reinforcement learning from human feedback (Nah, Cai, Zheng, & Pang, 2023; Schneider, 2024). 6.1.4 Software Engineering and Technical Productivity [B4] Analysis, Coding, Testing and Translations (Automatically) GenAI provides enhanced support for content analysis and code generation (Haase et al., 2024), enabling developers to automate routine coding tasks and improve development efficiency. It generates and optimizes test cases to accelerate testing and meet coverage criteria (Clear et al., 2025). Additionally, GenAI and LLMs enable seamless code translation between programming languages, reducing manual effort and supporting more efficient, interoperable software development workflows (Clear et al., 2025). Efficiency in Software Development GenAI improves the efficiency of software development by improving productivity, optimizing resource utilization, and reducing operational costs (Chau & Xu, 2025). Its integration supports strategic decision- making and fosters human–AI collaboration that augments creative problem-solving within development teams (Haase et al., 2024). Advanced tools such as Bard, ChatGPT, and Copilot contribute to the design of more accurate and robust software, enabling the production of higher-quality systems in shorter develop- ment cycles (Clear et al., 2025). Furthermore, by incorporating pair programming methodologies derived from Extreme Programming, AI agents can function as collaborative team members, assisting developers throughout the software lifecycle to accelerate time-to-market and enhance code quality and maintainability (Clear et al., 2025). 6.1.5 Communication, Accessibility, Services, and Social Impact [B5] Communication and Accessibility GenAI facilitates the bridging of communication gaps and the delivery of tailored services across diverse audiences, promoting societal inclusion through enhanced engagement and mutual understanding. Intelligent automation enables organizations to provide personalized and adaptive services at scale, resulting in favorable outcomes for more people in society (Sigala et al., 2024). Integrating personalization and automation into complex processes enables GenAI to produce customized content and interactions that support informed decision-making and address individual needs (Laine, Minkkinen, & Mäntymäki, 2025; Srivastava et al., 2025). Moreover, the technology fosters social cohesion by improving cross-cultural and interdisciplinary communication, thereby enhancing societal connectivity (Laine et al., 2025). As GenAI redefines traditional workflows and user interactions, its integration requires adaptive socio-technical frameworks that reflect evolving modes of human–AI collaboration (Mambile & Ishengoma, 2024). It offers distinct advantages, including creativity, automation, personalization, multimodal content generation, and improved accessibility, while ensuring responsible adoption requires the use of explainable GenAI approaches (Schneider, 2024). Enhanced User Interaction and Experience GenAI facilitates more natural, efficient, and adaptive communication between users and systems, rendering products and services increasingly intuitive and personalized. When integrated with LLMs, smart devices, and the Internet of Things, GenAI functions as an intelligent assistant that enhances individual support, productivity, and overall user experience (Chau & Xu, 2025). It further enables the automated generation of personalized marketing content tailored to personality traits, such as introversion or extroversion, demonstrating superior effectiveness compared to uniform communication strategies (Feuerriegel et al., 2024). Within service marketing and customer relationship management, GenAI supports strategic planning and operational execution by streamlining service delivery and improving customer engagement (Sigala et al., 28 2024). It facilitates the design of personalized service offerings and the development of targeted marketing strategies for specific customer segments, enabling scalable and adaptive service personalization through intelligent automation (Sigala et al., 2024). In parallel, NLP capabilities enable automated data extraction and analysis, while GenAI-driven conversational platforms simulate human-like interaction, contributing to widespread adoption across diverse domains (Lareyre et al., 2023). As these technologies continue to reshape user interactions and organizational workflows, their integration requires flexible socio-technical frameworks that accommodate the evolving patterns of human–AI collaboration (Mambile & Ishengoma, 2024). Digital Government Services GenAI enhances digital government services through translation and accessibility features that increase service reach and improve citizen engagement. It improves efficiency across the public sector (Dwivedi et al., 2025), supporting the digital management of non-tangible organizational assets, such as procedures, legal texts, and service documentation, throughout their lifecycles. Comparable advantages extend to the management of physical assets in Industry 4.0 environments (Feuerriegel et al., 2024). In digital service delivery, GenAI improves the performance of existing services by producing human-like conversations with customers, providing personalized and cost-effective services (Sigala et al., 2024). It serves as a disruptive force across digital services including video streaming, recommendation agents on e-commerce platforms, online financial and banking services, education, legal, and healthcare services (Sigala et al., 2024). Governments can leverage GenAI’s translation and text-to-speech technologies to broaden access to public services, while its content moderation and misinformation detection capabilities contribute to safer and more equitable digital ecosystems (Sigala et al., 2024). 6.1.6 Data Management, Creative Content, Privacy, and Ethics [B6] Summaries and Notes GenAI automates information summarization and note generation, thereby improving knowledge management and organizational efficiency. It enhances collaboration within teams by providing intelligent agents that suggest, summarize, and synthesize information based on team context, such as through automated meeting notes (Feuerriegel et al., 2024). The technology creates summaries and notes for various applications, including medical contexts such as surgeries (Meng et al., 2024), enabling knowledge workers to extract essential insights from complex information while reallocating time to higher value analytical and decision-making tasks. Content Creation GenAI accelerates content creation across multiple media formats and enables novel creative applications. It automates various tasks in marketing and media, including news writing, summarization of web content for mobile platforms, thumbnail generation, and accessibility adaptations such as text-to-speech and Braille- supported content (Feuerriegel et al., 2024). Beyond text, GenAI facilitates multimodal content generation encompassing images, audio, and video (Schneider, 2024), while reducing labeling requirements and expanding content creation use cases (Bendig & Bräunche, 2024; Nah et al., 2023). However, the same capabilities also enable the production of realistic disinformation, including fake news and propaganda, which are increasingly difficult to detect. Advances in GenAI have lowered the cost of disinformation generation and introduced unprecedented personalization by adapting tone and narrative to specific audiences (Feuerriegel et al., 2024). Moreover, GenAI can replace traditional crowdsourcing through automated annotation and execution of knowledge tasks, underscoring the need for robust ethical and governance frameworks to balance creative innovation with responsible information dissemination. Synthetic Data, Reducing Bias, Increasing Responsibility GenAI facilitates the generation of synthetic data to enhance privacy, mitigate harmful biases, and promote ethical and responsible AI practices. Advances in generative models, particularly Generative Adversarial Networks (GANs), have improved the accuracy and realism of synthetic data while maintaining privacy protection (Ghebrehiwet, Zaki, Damseh, & Mohamad, 2024). In medical and scientific research, GenAI enables the creation of high-quality datasets that preserve patient confidentiality and support data-driven innovation (Mambile & Ishengoma, 2024). Beyond research, synthetic data generation supports organizations in improving operational efficiency, reducing costs, and enhancing service delivery in a secure and ethical manner (Sigala et al., 2024). By enabling model training and validation without exposing sensitive information, GenAI contributes to fairer and more transparent AI systems through the creation of balanced datasets that better represent diverse populations and contexts. 29 Task Automation and Service Scaling GenAI automates routine tasks and enables scalable business processes, thereby enhancing productivity and allowing human workers to focus on higher-value activities. It improves organizational efficiency (Dwivedi et al., 2025) by automating key business process management functions such as process extraction from text, event management, resource allocation, and social media operations (Feuerriegel et al., 2024). GenAI further increases productivity by automating content creation, customer service, and code generation, with the potential to transform entire industries through large-scale process optimization (Feuerriegel et al., 2024). In service contexts, GenAI enhances customer experience through authentic automation and cost-effective personalized interactions (Sigala et al., 2024). It enables scalable and efficient service delivery while reducing employee workload, improving both productivity and job satisfaction (Sigala et al., 2024). By automating complex organizational processes (Srivastava et al., 2025) and handling routine tasks such as generating budget proposals (Meng et al., 2024) and data analysis (Mohammad et al., 2023), GenAI facilitates greater operational agility and informed decision-making. While offering clear benefits in automation, personalization, collaboration, and accessibility, responsible adoption requires transparency and interpretability through explainable GenAI approaches (Schneider, 2024). 6.2 Challenges and Limitations While generative AI offers transformative potential, its deployment introduces a complex landscape of challenges spanning technical, ethical, social, and governance dimensions. Our thematic analysis reveals four primary categories of concerns that demand careful attention to ensure responsible and effective implementation of these technologies. 6.2.1 Societal, Ethical and Fairness Concerns [C1] Biases and Discrimination A predominant ethical challenge, identified consistently across the literature, is the propensity of GenAI models to perpetuate and amplify societal biases. This concern transcends domain boundaries, manifesting across general management information systems (Chau & Xu, 2025; Feuerriegel et al., 2024; Laine et al., 2025; Nah et al., 2023; Sigala et al., 2024; Storey et al., 2025; X. Wei et al., 2025), healthcare applications (M. Li & Guenier, 2024; Meng et al., 2024; Ouanes, 2024; Srivastava et al., 2025), and educational contexts (Mohammad et al., 2023). The problem’s origin lies in the models’ training data, which often encapsulates historical and systemic prejudices. As Chau and Xu explain,“the training data of LLMs may contain biases from various sources reflecting racial, gender, and other discriminant judgments in humans and society. Trained on these data, LLMs may inherit and amplify such biases, causing the decisions to be unfair for some social groups, communities, or societies” (2025). The ramifications of this inherited bias are severe, leading to discriminatory outputs and representational harms that disproportionately affect marginalized groups (Chau & Xu, 2025; Laine et al., 2025; M. Li & Guenier, 2024). This is compounded by what one study terms “exclusionary norms”, where models trained on data from affluent regions neglect global diversity, thereby reflecting the “practices of the wealthiest communities and countries” and fostering cultural insensitivity (Laine et al., 2025). The tangible risks of such biases are particularly acute in high-stakes applications. In healthcare, they can manifest as clinically inappropriate recommendations stemming from a failure to grasp linguistic or cultural nuances (Meng et al., 2024). In organizational settings, biased algorithms can unfairly influence critical decisions like hiring and firing (Srivastava et al., 2025), while in education, they risk reinforcing discriminatory worldviews among learners (Mohammad et al., 2023). Fairness Implementation Challenges Addressing bias is not merely a technical problem of detection but a profound normative challenge of implementation, fraught with both conceptual and practical barriers. A primary conceptual hurdle is the absence of a universal definition of fairness, as what is considered equitable is deeply embedded in cultural, legal, and social norms (X. Wei et al., 2025). This definitional ambiguity becomes particularly salient in content moderation, where, as one study highlights, “there is no universally accepted definition of what qualifies as hate speech or toxic speech” (Laine et al., 2025). Without clear, agreed-upon criteria for what constitutes harmful content, creating globally consistent and fair moderation policies becomes exceptionally difficult. Compounding this normative challenge are practical tensions, as the goals of fairness and equity often conflict with organizational objectives such as profitability and operational efficiency (Sigala et al., 2024; X. Wei et al., 2025). These intertwined conceptual and practical obstacles mean that even when biases are identified, rectifying them in a consistent and meaningful way remains a formidable task. 30 Potential for Misuse and Harmful Content GenAI systems present unprecedented capabilities for generating harmful, manipulative, and malicious content at scale. The scope of potential exploitation is expansive. Laine et al. provide a comprehensive catalog of deliberate misuse scenarios, ranging from the generation of “malevolent material, including spam, fraudulent reviews, or even cyberattacks on a large scale” to “creating deceptive phishing emails and malicious code” (2025). Deepfake technology represents a particularly insidious vector, enabling sophisticated identity fraud and deception, while GenAI’s capacity for emotional manipulation introduces novel forms of psychological harm (Storey et al., 2025). The propagation of misinformation and disinformation represents another pressing concern (Dwivedi et al., 2025; Sigala et al., 2024). GenAI models “risk blurring the line between fact and fiction, as they can rapidly disseminate false or misleading information, fake news, and malicious content, making it difficult for users to discern truth from fantasy” (Laine et al., 2025). The consequences of such information pollution vary dramatically by domain. In healthcare contexts, misinformation can directly endanger patient safety through inaccurate medical guidance (Meng et al., 2024), while in political spheres, GenAI “can be exploited for manipulative purposes, such as the generation of propaganda or misinformation, thereby influencing public opinion and potentially harming, for example, the electoral process or other fraud and scams” (Laine et al., 2025). Educational settings reveal different manifestations of misuse, including examination fraud and plagiarism that undermine academic integrity (Maita et al., 2024; Mohammad et al., 2023). Underlying these diverse threats is the observation that GenAI tools lack a moral compass, operating without the “intuition, plausibility, and temporal relevance” that guide human judgment (Jarvenpaa & Klein, 2024). This technological amorality, combined with the potential for widespread misuse, erodes societal trust in institutions and information ecosystems (Laine et al., 2025). Consequently, it raises urgent questions about accountability and the difficulty of moderating AI-generated content at scale (Laine et al., 2025; Meng et al., 2024; Nah et al., 2023; Pool et al., 2024). Human Autonomy and Skill Devaluation A recurring theme in the literature is the concern that over-reliance on GenAI may erode essential human skills and autonomy (Gumusel, 2025; Meng et al., 2024; Pool et al., 2024; Srivastava et al., 2025). The primary mechanism for this is the development of a “human automation bias,” where users accept AI- generated answers without critical assessment, leading to a dependency that “risks eroding skills such as creativity and critical thinking” (Laine et al., 2025). This risk is particularly pronounced in educational settings, where the ease of content generation is seen as a threat to the development of students’ independent reasoning, perseverance, and resilience (Maita et al., 2024; Mohammad et al., 2023). Beyond cognitive decline, the literature also points to the degradation of social bonds, as increased automation may reduce meaningful human interaction and lead to more profound harms, including the “loss of autonomy and dignity, dehumanization, social isolation, and addiction” (M. Li & Guenier, 2024; Sigala et al., 2024). This erosion of autonomy is compounded by the human tendency to anthropomorphize conversational AI, which introduces distinct “psychological vulnerabilities” (Laine et al., 2025). When users perceive AI systems as human-like, they become more susceptible to developing inappropriate dependencies and may be more likely to disclose sensitive personal information (Laine et al., 2025). By fostering a false sense of relationship, anthropomorphism can deepen the very risks of skill erosion and social isolation previously discussed, blurring the lines between tool and companion in ways that may undermine user agency. Environmental Sustainability The environmental costs of GenAI constitute an often-overlooked yet critical dimension of ethical deployment. Training and operating large-scale generative models impose substantial resource demands, resulting in significant energy consumption and carbon footprints (Feuerriegel et al., 2024; Ghebrehiwet et al., 2024; Lareyre et al., 2023; Onatayo et al., 2024; Storey et al., 2025; X. Wei et al., 2025). As Laine et al. emphasize, “the energy demands for training and operating these models contribute to resource depletion and pollution, leave a significant carbon footprint, and have high computation costs” (2025). This challenge is intrinsically linked to what Chau and Xu describe as “the blessing and curse of the scaling law”—the empirical observation that model performance improves with increases in size, training data, and computational power (2025). This dynamic creates perverse incentives that encourage ever-larger and more resource-intensive architectures, establishing a tension between performance optimization and environmental sustainability. Compounding these concerns is the research community’s limited visibility into the full scope of environ- mental impacts. Laine et al. observe that “many environmental factors related to the operation of LLMs that are in widespread use are currently unknown” (2025), implying that documented concerns may represent merely a fraction of the true ecological cost. This opacity raises fundamental questions about the sustain- ability of current GenAI development trajectories and highlights an ethical imperative to systematically account for environmental impacts alongside performance metrics (Onatayo et al., 2024). 31 6.2.2 Reliability and Accuracy [C2] Hallucinations and Factual Inaccuracy The most pervasive technical challenge undermining GenAI reliability is the phenomenon of “hallucina- tion”—the generation of outputs that appear plausible but are fundamentally incorrect (Chau & Xu, 2025; Dwivedi et al., 2025; Laine et al., 2025; Sigala et al., 2024; Storey et al., 2025). This issue manifests consis- tently across all application domains examined, from healthcare (Beheshti et al., 2025; Bellanda et al., 2024; Bracken et al., 2025; Lareyre et al., 2023; M. Li & Guenier, 2024; Meng et al., 2024; Ouanes, 2024; Pool et al., 2024) and education (Mohammad et al., 2023) to general information systems (Chau & Xu, 2025; Dwivedi et al., 2025; Feuerriegel et al., 2024; Haase et al., 2024; Jarvenpaa & Klein, 2024; Nah et al., 2023; Sigala et al., 2024). The consequences of such inaccuracies vary dramatically by context, with healthcare applications facing the greatest risks. Beheshti et al. warn that “inaccurate or misleading information in healthcare can have severe consequences, including misdiagnoses, improper treatments, and potential harm to patients’ well- being and safety” (2025). Empirical evidence substantiates this concern: Bracken et al. documented “the presence of hallucinations or fictitious information in three of nine studies utilizing ChatGPT” (2025), raising fundamental questions about safe clinical implementation. The fabrication problem extends beyond medical contexts to academic and professional settings, where models have been observed generating fictitious references (Mohammad et al., 2023). This phenomenon of hallucination renders human oversight and validation indispensable for any responsible application of GenAI (Chau & Xu, 2025; Haase et al., 2024). Outdated or Limited Knowledge A structural constraint on GenAI reliability stems from the temporal constraints of their training data, which has a fixed knowledge cut-off date (Feuerriegel et al., 2024; Jarvenpaa & Klein, 2024; Mohammad et al., 2023). This temporal freezing creates an expanding knowledge gap as models age, rendering them increasingly obsolete for queries requiring current information. This inherent limitation is compounded by the fact that models “may not remember everything that they saw during training” (Feuerriegel et al., 2024), leading to incomplete or sparse knowledge—a particular challenge when domain-specific expertise is required (Ghebrehiwet et al., 2024; Lareyre et al., 2023; M. Li & Guenier, 2024). These knowledge gaps manifest differently across domains: in healthcare as an inability to answer complex medical questions (M. Li & Guenier, 2024), in research as dependency on limited or unrepresentative datasets (Ghebrehiwet et al., 2024), and in general applications as an absence of temporal awareness that undermines contextual relevance (Jarvenpaa & Klein, 2024). Furthermore, the training process itself amplifies these limitations, as developers “often tend to rely on second-hand information from official organizations, which has a certain degree of authority but is often lagging behind” (M. Li & Guenier, 2024). This creates a cascade of temporal delays—from the original data collection to model training to user deployment—each stage introducing additional staleness into the system. Limited Contextual Awareness Beyond factual accuracy, a more subtle yet critical challenge is GenAI’s limited contextual awareness—an inability to interpret situational, cultural, or domain-specific nuances (Feuerriegel et al., 2024; Ghebrehiwet et al., 2024; Haase et al., 2024; Laine et al., 2025; M. Li & Guenier, 2024; Meng et al., 2024). This deficiency can render outputs technically correct yet practically useless, inappropriate, or even harmful. The risk is particularly acute in healthcare, where this limitation manifests as a failure to navigate linguistic and cultural subtleties (Meng et al., 2024) or provide genuinely personalized health information (M. Li & Guenier, 2024). Performance Inconsistency and Drift The reliability of GenAI is further complicated by its dynamic and often unpredictable nature. The literature highlights concerns about performance “drift”, described as the unexpected deterioration of model performance over time (Feuerriegel et al., 2024), as well as inconsistent reliability across different clinical scenarios (Bracken et al., 2025). This instability also appears at a technical level, with some models exhibiting instability during training (Ghebrehiwet et al., 2024) or vulnerability to “semantic perturbations, whereby input with different syntax but similar meaning to the training data leads to errors” (Laine et al., 2025), and susceptibility to system crashes (M. Li & Guenier, 2024). The combination of performance drift and inherent instability creates a particularly challenging scenario for deployment, as initial testing may not reveal latent failure modes that emerge over time or under specific conditions. This unpredictability undermines trust and necessitates constant vigilance, transforming what are marketed as autonomous systems into tools requiring continuous human oversight and validation (Feuerriegel et al., 2024). 32 Lack of Evidence and Validation Underpinning all other reliability concerns is a critical meta-challenge: the lack of rigorous, evidence-based validation of GenAI systems in real-world applications. This issue is particularly pronounced in specialized domains like healthcare, where there is a “scarcity of evidence-based medical research concerning the application of LLMs in healthcare settings” (Meng et al., 2024). The validation gap extends across multiple dimensions: absence of external validation studies, a lack of comprehensive and domain-specific evaluation metrics, and the immaturity of assessment methods and theoretical frameworks (Ghebrehiwet et al., 2024; Gumusel, 2025; Meng et al., 2024; Schneider, 2024). Without established benchmarks and empirical evidence, practitioners find it difficult to delineate when these tools are productive and when they are not (Jarvenpaa & Klein, 2024). Compounding this challenge is a lack of reproducibility, as it may not be possible to reliably replicate tool responses through prompt engineering (Jarvenpaa & Klein, 2024). These validation deficiencies not only hinder safe and effective integration (Ghebrehiwet et al., 2024; Meng et al., 2024) but also underscore the urgent need for rigorous, domain-specific evaluation frameworks before widespread deployment (Ghebrehiwet et al., 2024; Schneider, 2024). 6.2.3 Transparency and Explainability [C3] Model Opacity and the Black-Box Problem The inherent opacity of GenAI models, frequently described as a “black box” problem, stems from the difficulty in interpreting how their complex transformer-based architectures arrive at specific outputs (Chau & Xu, 2025; Laine et al., 2025; Storey et al., 2025). This lack of transparency is a pervasive concern that obstructs the ability to audit decision-making processes (Chau & Xu, 2025; Jarvenpaa & Klein, 2024), assess model limitations (Mohammad et al., 2023; Sigala et al., 2024), and ensure user control (Srivastava et al., 2025), eroding trust across all reviewed domains (Gumusel, 2025; Mohammad et al., 2023; Ouanes, 2024; Sigala et al., 2024). Consequently, stakeholders are confronted with systems of immense size and “opaque behaviors” (Laine et al., 2025) whose operational logic remains inscrutable, hindering both accountability and safe adoption. Explainability Shortcomings Compounding the problem of model opacity are significant deficiencies in the tools and methodologies for explaining GenAI systems (Clear et al., 2025; Ghebrehiwet et al., 2024; Laine et al., 2025; Schneider, 2024). Existing Explainable AI (XAI) techniques are described as “still far from optimal,” with a general consensus that available tools are marked by their “immaturity” (Schneider, 2024). This deficiency makes it difficult to “interpret and explain the rationale behind the decision-making process of a model” (Chau & Xu, 2025) or its “non-interpretable learned representations” (Ghebrehiwet et al., 2024). The challenge is further amplified by the technology’s scale, as “XAI for GenAI faces significant challenges due to the increasing complexity and societal reach of these models” (Schneider, 2024). Ultimately, without robust explainability, it is nearly impossible to debug, validate, or ethically govern GenAI systems, leaving a critical gap between their advanced capabilities and the human capacity to responsibly manage them. 6.2.4 Privacy, Security, Governance and Legal [C4] Data Privacy Violations A pervasive concern across all reviewed domains is the risk of unintended leakage or non-consensual exposure of sensitive, personal, or proprietary information (Beheshti et al., 2025; Gumusel, 2025; M. Li & Guenier, 2024; Meng et al., 2024; Onatayo et al., 2024; Ouanes, 2024; Pool et al., 2024; Sigala et al., 2024; Srivastava et al., 2025). This risk originates from multiple sources, beginning at the model level where systems demonstrate “a tendency to memorize and reproduce personally identifiable information” from their training data (Laine et al., 2025). These inherent risks are then amplified by systemic vulnerabilities in deployment. Laine et al. report that “owing to system glitches in ChatGPT, the chat logs of certain users have become accessible to others,” (2025) affecting both individuals and organizations. Similarly, in enterprise contexts, organizations hesitate to use public AI tools because the prompts they submit can reveal sensitive information (Dwivedi et al., 2025). This risk of disclosure is further compounded by a behavioral dimension, as users are more likely to reveal private information when they anthropomorphize the technology and “treat models as if they are human” (Laine et al., 2025). Collectively, these “privacy hazards” (Laine et al., 2025) create significant challenges for compliance with regulations like GDPR and HIPAA (Laine et al., 2025; Ouanes, 2024). 33 Security Vulnerabilities GenAI systems are also susceptible to security vulnerabilities that expose them to malicious exploitation (Dwivedi et al., 2025; Gumusel, 2025; Lareyre et al., 2023; Ouanes, 2024; Pool et al., 2024). Laine et al. document a range of adversarial attack vectors designed to compromise system integrity, including susceptibility to “jailbreaking”, where prompts are used to bypass safety measures; “prompt injection” to cause malfunctions; and “data poisoning attacks” to corrupt the model’s knowledge base (2025). Such security breaches create pathways for unauthorized access, data theft, and other intentional misuses that threaten both personal and organizational security (Gumusel, 2025; Laine et al., 2025). Regulatory and Legal Gaps The rapid proliferation of GenAI has created a significant governance vacuum, as existing legal frameworks are ill-equipped to manage the technology. The literature consistently notes that “current laws and regulations have become inadequate to account for new phenomena brought about by [GenAI]” (Nah et al., 2023), with the technology being adopted far faster than it can be theorized or governed (Mambile & Ishengoma, 2024). This creates a profound lack of globally agreed-upon standards for ethical and safe deployment (Srivastava et al., 2025). This regulatory gap poses a direct challenge for organizations attempting to ensure legal compliance (Gumusel, 2025) and for regulatory bodies tasked with developing policies to protect the public, particularly in high-stakes domains like healthcare (Ouanes, 2024). Accountability and Liability Ambiguity The legal and regulatory vacuum directly contributes to a critical ambiguity regarding accountability and liability. When a GenAI system causes harm through an error or biased output, there is no clear framework for assigning responsibility, leaving a notable “ambiguity over liability” (Pool et al., 2024). This uncertainty extends to defining moral responsibility for model outputs (Laine et al., 2025) and is exemplified by the practical “warranty problem”, where model suppliers refuse to provide performance guarantees, forcing adopters to shoulder the operational and legal risks (Dwivedi et al., 2025). Copyright and Ownership Disputes Finally, GenAI systems raise novel challenges to established notions of intellectual property, creating unresolved disputes over both model inputs and outputs (Feuerriegel et al., 2024; Lareyre et al., 2023; Maita et al., 2024; Mohammad et al., 2023; Sigala et al., 2024; Storey et al., 2025). On the input side, models are trained on vast datasets that may include copyrighted material without permission, potentially violating existing rights (Feuerriegel et al., 2024; Laine et al., 2025). On the output side, the “distinction between original and AI-generated content is blurred” (Laine et al., 2025), leading to profound “doubts about who is a legal owner of GenAI generated contents” (Feuerriegel et al., 2024). This ambiguity fuels practical concerns about plagiarism and academic integrity (Lareyre et al., 2023; Maita et al., 2024) and complicates questions of ownership and control in commercial contexts (Laine et al., 2025). 6.3 Future Research Directions Although GenAI solutions have seen impressive development in recent years, progress in this field continues to raise new questions. This is especially true when considering a wider context beyond a technical focus. Successful application of GenAI systems and related digital transformations requires addressing many challenges and providing solutions that are much more mature than those currently available. This creates the need for future research in many areas and directions. 6.3.1 User Perspective [F1] User-GenAI Interaction Design Future research on user-GenAI interaction design in IS should prioritize the development of trustworthy, transparent, and explainable AI systems, as these themes recur across multiple sources (Chau & Xu, 2025; Schneider, 2024; Sigala et al., 2024). In order to foster user’s trust it is necessary to address issues such as hallucinations, inherited biases, and the interpretability of LLM outputs. Another identified direction is the need for personalization and adaptive design, which includes tailoring GenAI systems to individual cognitive and emotional needs, supporting diverse user routines (both professional and private), and enabling proactive emotional management (Haase et al., 2024; M. Li & Guenier, 2024; Sigala et al., 2024). Cultural and linguistic sensitivity is another prominent direction, with calls to localize GenAI systems and mitigate stereotypes in interactions (M. Li & Guenier, 2024; X. Wei et al., 2025). The integration of multimodal capabilities and sensory engagement further expands the design space for user experience (Meng et al., 2024; Sigala et al., 2024). Finally, IS researchers are encouraged to explore the broader systemic implications of 34 GenAI, including its impact on digital work, human-machine symbiosis, and the evolving boundaries of end-user computing (Feuerriegel et al., 2024; Gumusel, 2025; Nah et al., 2023; Storey et al., 2025). The Required Skills and Education The rise of GenAI is prompting a fundamental rethinking of the skills and educational frameworks required across industries. A recurring theme in the literature is the need to define and cultivate new competencies, including AI literacy and ethical usage, especially among non-technical users (Clear et al., 2025; Maita et al., 2024; Nah et al., 2023). Researchers are encouraged to explore how training programs and professional development initiatives can equip both workers and educators to effectively integrate GenAI into their work- flows and pedagogical practices (Maita et al., 2024; Sigala et al., 2024). Closely linked is the transformation of job roles and labor markets, with GenAI expected to automate routine tasks, reshape existing roles, and create entirely new categories of work (Feuerriegel et al., 2024; Haase et al., 2024; Nah et al., 2023; Sigala et al., 2024). Understanding which tasks are most affected and how workers can adapt is a critical area for future inquiry. Additionally, the integration of GenAI into organizational contexts—such as IT outsourcing, service marketing, and mission-critical domains like healthcare and finance—will require a reassessment of workforce capabilities and strategic positioning (Dwivedi et al., 2025; Storey et al., 2025). Finally, the emergence of AI agents as team members introduces new questions about the competencies needed to work effectively alongside intelligent systems (Clear et al., 2025). Personalization Tailored to the Individual User Personalization in GenAI systems is an emerging topic in IS research, with multiple sources emphasizing the need to tailor interactions to individual users’ preferences, values, and backgrounds (Haase et al., 2024; Schneider, 2024; Sigala et al., 2024). Future studies should explore how AI-augmented services can proactively support users through personalized prompts, assistance, and emotional alignment, while also adapting to diverse cognitive and behavioral patterns. Closely related is the challenge of balancing hyper-personalization with ethical accountability, particularly in marketing and customer engagement contexts where biases may be amplified (Sigala et al., 2024; X. Wei et al., 2025). Another important direction involves understanding how GenAI affects individual users—both workers and consumers—and how these technologies can enhance satisfaction, engagement, and perceived significance in digital interactions (Nah et al., 2023; Storey et al., 2025). Finally, personalization of explanations and communication, especially in domains like healthcare and education, is seen as a key factor for improving user comprehension and decision-making (M. Li & Guenier, 2024; Schneider, 2024). Psychological Effects of GenAI Use on Individuals The psychological effects of GenAI on individuals represent a multifaceted research area within IS domain. A dominant theme across sources is the need to understand how continuous interaction with GenAI influences users’ cognitive processes, emotional states, and work habits over time (Haase et al., 2024; Nah et al., 2023; Sigala et al., 2024; Srivastava et al., 2025; Storey et al., 2025). Researchers are encouraged to investigate both the risks of over-reliance and the potential for loss of control due to over-automation, as well as strategies to foster healthy human-AI relationships (Nah et al., 2023; Sigala et al., 2024). Closely related is the impact of GenAI on decision-making and information-seeking behavior, which may alter users’ autonomy (Nah et al., 2023). The evolving dynamics of social and personal interactions mediated by GenAI also call for interdisciplinary, user-centered approaches that account for ethical and emotional dimensions (Gumusel, 2025; Nah et al., 2023; Srivastava et al., 2025). Moreover, future studies should explore how GenAI systems can be designed to proactively manage emotional experiences, especially during periods of uncertainty and change (Sigala et al., 2024). Overall, this research agenda highlights the importance of capturing the nuanced and long-term psychological implications of GenAI use across diverse user populations. Risks Affecting Individual Users The increasing integration of GenAI into everyday digital experiences raises significant risks for individual users, prompting a need for focused IS research. A recurring concern is user’s privacy and data security, especially in real-world and real-time chatbot interactions where user data may be exposed without sufficient safeguards (Gumusel, 2025; M. Li & Guenier, 2024). Researchers are urged to develop ethical and regulatory frameworks to address these vulnerabilities. Additionally, the amplification of cyber threats through GenAI and the spread of AI-generated disinformation call for advanced countermeasures and real-time detection systems (Dwivedi et al., 2025; Feuerriegel et al., 2024). Another identified risk concerns over-reliance on GenAI and the psychological consequences of excessive automation, including loss of control and technological resistance (Maita et al., 2024; Nah et al., 2023). Understanding these risks is essential to ensure safe and responsible use of GenAI technologies. 35 6.3.2 Organizational Perspective [F2] GenAI Adoption and Acceptance Factors A prominent direction for future research on GenAI adoption and acceptance in IS involves understanding the trust-related factors that influence user interaction with LLMs. Multiple sources emphasize the importance of explainability, interpretability, and transparency as key enablers of trust and acceptance, particularly in domains such as healthcare and education (Chau & Xu, 2025; Ghebrehiwet et al., 2024; Meng et al., 2024; Srivastava et al., 2025). These factors are closely tied to concerns about bias, hallucinations, and data quality, which design science researchers are encouraged to address through novel system architectures and validation methods (Chau & Xu, 2025; Ghebrehiwet et al., 2024). Another widely discussed area is the organizational and cultural context of GenAI adoption, including how organizational norms, structures, and processes shape individual and collective attitudes toward automation and AI integration (Haase et al., 2024; Nah et al., 2023; Storey et al., 2025). Additionally, scholars call for investigations into sector-specific adoption dynamics, such as in education (Maita et al., 2024; Nah et al., 2023), customer service (Sigala et al., 2024), and clinical decision-making (Srivastava et al., 2025), highlighting the need for context-aware frameworks. Finally, there is a recognized need to refine or develop new theoretical models that capture the non-linear and dynamic nature of GenAI adoption, moving beyond traditional technology acceptance paradigms (Mambile & Ishengoma, 2024). Formation of Hybrid Human-AI Teams An emerging and widely discussed research avenue in the IS field concerns the formation and functioning of hybrid human-AI teams, where humans and GenAI systems collaborate in shared tasks. Several sources highlight the need to explore collaborative dynamics, including how responsibilities, decision-making authority, and handover points are distributed between human and AI agents (Clear et al., 2025; Haase et al., 2024; Nah et al., 2023). This includes investigating symbiotic relationships that augment human intelligence rather than replace it, and understanding how to design collaboration frameworks that prevent over-reliance on AI while leveraging its strengths (Maita et al., 2024; Nah et al., 2023). The organizational implications of such hybrid teams are also significant, as GenAI adoption is expected to reshape business processes, IT capabilities, and workforce structures (Dwivedi et al., 2025; Storey et al., 2025). Moreover, researchers are encouraged to examine how cultural, regulatory, and sector-specific contexts influence the integration of AI into teams (Dwivedi et al., 2025). Finally, the evolving composition of teams—including the competencies required to work effectively with AI agents—presents a rich area for inquiry, calling for new models of team design and skill development (Clear et al., 2025). Automation of Routine Tasks A key area of interest in GenAI research within IS is the automation of routine and repetitive tasks, which has implications across organizational and managerial levels. Several sources emphasize the potential of GenAI to support or fully automate tasks in domains such as education, enterprise management, and service operations, often through the use of engineered prompts and workflows tailored to specific functions (Chau & Xu, 2025; Feuerriegel et al., 2024; Sigala et al., 2024). This shift invites deeper investigation into how automation can augment human capabilities, redefine job roles, or even create new forms of work, rather than merely replacing existing ones (Haase et al., 2024). Strategic concerns such as risk containment, accountability, and transparency in automated processes are also critical, especially in customer-facing and regulated environments (Haase et al., 2024; Sigala et al., 2024). Furthermore, researchers are called to explore the implications of automation, including the extent of job replacement (Storey et al., 2025). Finally, the relevance of automation extends to broader contexts such as smart cities and sustainable infrastructure, where GenAI can play a role in optimizing design, construction, and operations (Onatayo et al., 2024). Organizational Transformations to Incorporate GenAI The integration of GenAI into organizational contexts is expected to drive significant transformations in business processes, structures, and strategic models. Multiple sources emphasize the role of GenAI in revealing opportunities for process innovation and supporting process (re-)design initiatives, particularly through automation and augmentation of decision-making and resource management (Feuerriegel et al., 2024; Nah et al., 2023; Sigala et al., 2024). Researchers can also explore how GenAI can facilitate digital transformation across industries, including shifts from low- to high-value services and the emergence of new business models (Nah et al., 2023; Sigala et al., 2024). Moreover, GenAI’s impact on organizational capabilities, including IT infrastructure, human resource management, and knowledge systems, calls for a rethinking of traditional organizational boundaries and roles (Nah et al., 2023; Storey et al., 2025). Scholars are also urged to adopt a sociotechnical perspective, examining GenAI not merely as a technical tool but exploring its boundary conditions that define its presence and impact within the context of an organization (Storey et al., 2025). 36 Benefits Provided by GenAI to Organizations Future research in IS should examine GenAI’s impact across diverse sectors such as medicine, education, tourism and e-commerce (Meng et al., 2024; Mohammad et al., 2023; Nah et al., 2023; Pool et al., 2024) assessing its potential benefits, e.g. the ability to personalize and augment services (Pool et al., 2024), as well as enhanced productivity, creativity, and service quality (Chau & Xu, 2025). Importantly, these benefits must be evaluated alongside ethical considerations and societal implications, ensuring responsible deployment that aligns with values such as social justice and sustainability (Chau & Xu, 2025; Pool et al., 2024). Risks Introduced by GenAI to Organizations Understanding the risks and challenges associated with GenAI adoption is essential for responsible organi- zational integration. Key concerns include the strategic containment of automation risks, particularly in service industries where GenAI may disrupt established marketing and operational practices (Haase et al., 2024; Sigala et al., 2024). In high-stakes domains like healthcare, researchers are urged to address issues of accuracy, guideline compliance, and implementation challenges, ensuring safe and effective use of AI-driven tools (Bellanda et al., 2024; Srivastava et al., 2025). These risks highlight the need for robust governance frameworks and continuous evaluation of GenAI’s organizational impact. 6.3.3 Societal Perspective [F3] Societal Impact of GenAI The societal impact of GenAI is a multifaceted area of inquiry in IS, with researchers increasingly called to examine its implications for equity, labor markets, global development, and ethical governance. A recurring theme across sources is the need to understand how GenAI may displace or transform jobs and crowdsourced initiatives, and what welfare consequences this may entail (Feuerriegel et al., 2024; Haase et al., 2024; Storey et al., 2025). These changes raise additional concerns about “societal stres” caused by job replacement (Storey et al., 2025). Another major research direction involves the global dimension of GenAI’s impact, including its role in widening or bridging the digital divide between countries at different stages of technological development, and its influence on resource allocation across the Global North–South divide (Nah et al., 2023). The integration of GenAI into global IT management and outsourcing structures further highlights the importance of addressing regulatory diversity, cultural sensitivity, and language-specific capabilities to ensure inclusive and deployment (Dwivedi et al., 2025). Ethical tensions also emerge in scenarios where bias mitigation efforts may reduce profitability or create competitive disadvantages, prompting calls for frameworks that balance efficiency, fairness, and social responsibility (Schneider, 2024; X. Wei et al., 2025). In sectors such as healthcare and education, GenAI’s societal role is particularly pronounced. Researchers emphasize the need for empirical, interdisciplinary, and user-centered studies to validate its pact on public health systems and education systems (M. Li & Guenier, 2024; Meng et al., 2024; Pool et al., 2024). These studies should account for cultural, linguistic, and socio-economic diversity, ensuring that GenAI technologies are designed and evaluated in ways that reflect real-world complexity. Finally, scholars are encouraged to adopt a sociotechnical systems perspective, recognizing GenAI as a deeply embedded actor within societal ecosystems, whose boundaries, interactions, and ethical implications must be critically examined (Srivastava et al., 2025; Storey et al., 2025). Application to Specific Domains and Business Sectors Overall, future IS research should focus on developing tailored GenAI solutions that are not only technically sound but also ethically responsible and contextually relevant. A recurring issue related to GenAI contex- tual adaptation and domain-specific performance is the need to fine-tune generative models to meet the unique requirements of sectors such as healthcare, finance, education, marketing, e-commerce, tourism, and entertainment (Chau & Xu, 2025; Feuerriegel et al., 2024; Nah et al., 2023; Sigala et al., 2024; X. Wei et al., 2025). Researchers are encouraged to explore how GenAI can support enterprise management, as well as process design and re-design, contributing to operational efficiency and strategic innovation (Feuerriegel et al., 2024). In high-stakes domains, such as healthcare and finance, the development of ethical guidelines and regulatory frameworks is essential to balance the competing demands (Srivastava et al., 2025; X. Wei et al., 2025). Healthcare, in particular, emerges as a focal point for future research, with calls for interdisciplinary collaboration, clinical validation, and standardized evaluation tools to assess GenAI’s utility in diagnostics, documentation, and patient communication (Beheshti et al., 2025; Bellanda et al., 2024; Bracken et al., 2025; M. Li & Guenier, 2024; Meng et al., 2024; Ouanes, 2024). In education, GenAI’s potential to enhance learning experiences and develop student skills invites further investigation into pedagogical models and responsible use practices (Mohammad et al., 2023; Sigala et al., 2024). Additionally, it is worth to examine how GenAI can transform public services, including government-led digital initiatives, while minimizing misuse in sensitive domains such as legal and healthcare services (Sigala et al., 2024). 37 Regulations and Legal Issues Related to GenAI Legal and regulatory issues surrounding GenAI are becoming increasingly central to IS research, as organiza- tions and governments grapple with the challenges of ethical deployment, data governance, and intellectual property protection. A key direction involves the development of ethical guidelines and legal frameworks tailored to sectors such as healthcare and finance, where the tension between rapid deployment, accuracy, and inclusivity creates unique regulatory demands (X. Wei et al., 2025). Another prominent area of inquiry concerns the governance of bias and fairness, including how AI laws and data filtering techniques can be designed to reduce algorithmic discrimination while maintaining transparency and accountability (Nah et al., 2023; Schneider, 2024). The rise of GenAI also raises complex questions about intellectual property rights, prompting calls for new metrics and legal interpretations that reflect the generative nature of these technologies (Nah et al., 2023; Sigala et al., 2024). In the context of service industries and marketing, gov- ernments and organizations must address emerging privacy, security, and user data protection challenges, while establishing clear standards for responsible use (M. Li & Guenier, 2024; Sigala et al., 2024). The applications of GenAI in areas such as healthcare and legal cases further underscore the need for compre- hensive regulatory oversight, which requires building interdisciplinary collaborations among technologists, professionals, ethicists, and policymakers to ensure that GenAI tools are aligned with ethical principles and legal requirements (M. Li & Guenier, 2024; Ouanes, 2024; Storey et al., 2025). Risks Introduced by GenAI to Societies GenAI introduces a range of societal risks that warrant close attention from IS researchers. Key concerns include its potential to disrupt industries such as tourism, e-commerce, and healthcare, and to alter human interactions and decision-making in ways that may affect creativity, productivity, and social justice (Chau & Xu, 2025; Nah et al., 2023; Sigala et al., 2024). In sensitive domains like medicine and education, scholars emphasize the need for rigorous evaluation and standardized tools to assess GenAI’s safety and effectiveness (Meng et al., 2024; Mohammad et al., 2023). Broader issues such as privacy, security (Storey et al., 2025), and the impact of automation on urban environments (Onatayo et al., 2024) also require investigation to ensure GenAI supports sustainable and equitable societal development. New Use Cases for GenAI Applications Exploring new use cases for GenAI applications is a dynamic and promising research direction in IS. Scholars are increasingly interested in how GenAI can enhance existing research methods or even propose novel ones, potentially transforming the way knowledge is produced and disseminated (Chau & Xu, 2025; Jarvenpaa & Klein, 2024). In design science, GenAI is seen as a tool to foster creativity in the development of new IT artifacts (Feuerriegel et al., 2024). This opens the door to new genres of academic publication and alternative theorizing processes, which may challenge traditional norms and encourage methodological diversity (Jarvenpaa & Klein, 2024). Beyond academia, GenAI is being applied to address global grand challenges, such as environmental protection and the Sustainable Development Goals, by expanding modes of explicit knowledge production and improving efficiency in problem-solving (Nah et al., 2023). In management and cybersecurity research, generative algorithms are used to simulate cyber-attacks, generate marketing content, and explore social media dynamics, demonstrating their versatility in both analytical and creative tasks (Bendig & Bräunche, 2024). In education, GenAI supports digital pedagogical innovations, blending AI-driven assistance with traditional teaching to create modern, adaptive learning environments (Maita et al., 2024). In healthcare, emerging use cases include remote patient monitoring and predictive analytics, which promise to enhance care delivery and operational efficiency (Ouanes, 2024). All such new and innovative applications require further research to enable both efficient and safe use of GenAI. Effect on Job Market GenAI is expected to significantly reshape the job market, particularly by automating routine tasks and transforming roles in information-intensive sectors (Feuerriegel et al., 2024; Nah et al., 2023). While some jobs may be replaced, new roles will likely emerge that focus on collaborating with AI systems and leveraging human-specific skills (Nah et al., 2023). In consumer-facing services, GenAI may alter employee responsibilities and require reskilling to adapt to changing workplace dynamics (Sigala et al., 2024). As the impact of GenAI on the job market can result in tensions and resistance in societies, this topic is of particular interest to researchers. 6.3.4 Ethical Perspective [F4] Ethical Use of GenAI Future research on the ethical use of GenAI should aim to balance innovation with societal responsibility. Achieving this will require interdisciplinary research focusing on both systemic frameworks and user-centered 38 practices. Key priorities include developing comprehensive ethical and regulatory guidelines to address privacy, data security, and fairness, alongside investigating user training to mitigate misuse, especially by non-technical audiences (Clear et al., 2025; M. Li & Guenier, 2024; Maita et al., 2024; Nah et al., 2023; X. Wei et al., 2025). Researchers should also empirically evaluate social, psychological, and economic impacts while exploring complementary ethical approaches, taking into account diverse stakeholder perspectives (Chau & Xu, 2025; Laine et al., 2025; Srivastava et al., 2025). The emergence of explainable GenAI (GenXAI) highlights a need for transparent, contextualized explanations and effective bias mitigation in high-stakes domains (Schneider, 2024). Across sectors—from IT outsourcing to education—future work must ensure ethical integration of GenAI technologies, promoting social justice and operational accountability as commercial adoption accelerates (Dwivedi et al., 2025; Sigala et al., 2024; Storey et al., 2025). GenAI Bias Mitigation Future research on mitigating bias in GenAI, especially LLMs, calls for multidisciplinary approaches focusing on dynamic, scalable, and context-aware methods. These should exploit various approaches, from real-time user feedback systems to resource-efficient longitudinal studies for bias detection and mitigation (X. Wei et al., 2025). Understanding bias requires addressing how foundational model biases propagate into downstream applications where speed and scalability often challenge fairness objectives (Srivastava et al., 2025; X. Wei et al., 2025). Dealing with the problem of bias also requires prior research on bias sources and types (Chau & Xu, 2025; Nah et al., 2023; Sigala et al., 2024; X. Wei et al., 2025). Such research could be supported with explainable AI (Schneider, 2024). Understanding bias across cultural and national contexts requires balancing localized adaptations with global standards (X. Wei et al., 2025). Ethical governance is critical for effective bias mitigation, requiring multi-stakeholder frameworks emphasizing transparency, accountability, explainability, human oversight, privacy, and the right to contest AI outcomes, all embedded in informed regulatory standards and laws tailored for GenAI (M. Li & Guenier, 2024; Nah et al., 2023; Pool et al., 2024). Transparency and Explainability Personalization of explanations based on user expertise and context is a critical direction for enhancing accessibility and ensuring that diverse audiences could meaningfully interpret model outputs (Laine et al., 2025; Schneider, 2024; Sigala et al., 2024). Design science perspectives can support the development of innovative techniques that strengthen user understanding, adoption, and trust in these systems (Chau & Xu, 2025; Pool et al., 2024). In order to improve interpretability while maintaining performance, explainable AI methods should be considered (Chau & Xu, 2025; X. Wei et al., 2025). Apart from technical solutions, developing comprehensive ethical frameworks and guidelines addressing transparency should be considered (Maita et al., 2024). The explainability is highlighted as a crucial factor for GenAI’s healthcare applications, where future research studies should examine how explainability affects clinician reliance on automated diagnostics (Ghebrehiwet et al., 2024; Meng et al., 2024; Ouanes, 2024; Srivastava et al., 2025). Awareness of User’s Specifics Research on user-specific awareness in GenAI should explore applications where GenAIs interact with diverse users in various contexts, such as employee recruitment, credit scoring, customer service, sentiment analysis, and content recommendation (X. Wei et al., 2025). The lack of such awareness can affect fairness of such GenAI solutions. Integrating generative models within outsourcing and localization strategies demands sensitivity to local regulations, cultures, and language capabilities (Dwivedi et al., 2025). Localization extends to the adaptation of GenAI tools for new languages and new communication contexts, putting stress on appropriate translation and cultural relevance (M. Li & Guenier, 2024). This is particularly relevant in healthcare applications that require careful evaluation across linguistic, cultural, and socio-economic landscapes, with attention to environmental sustainability (Pool et al., 2024). The design of GenAI must consider whether models are built for static or dynamic contexts, further complicating the awareness issue and inducing likely fairness and transparency challenges (Nah et al., 2023). Finally, future work should develop explanatory frameworks attuned to user needs and ethical-social factors, as generative models mature (Schneider, 2024). Proper Representation and Inclusivity Inclusivity-oriented research is needed to mitigate data biases that result in unequal service or treatment quality and to strengthen data governance practices (M. Li & Guenier, 2024; Srivastava et al., 2025; X. Wei et al., 2025). Similarly, the perspectives of underrepresented users on AI ethics — especially regarding explainability and perceived fairness — require systematic investigation (Laine et al., 2025). Scholars should examine how GenAI can enable culturally adaptive and inclusive digital localization processes that serve global audiences while preserving local authenticity (Dwivedi et al., 2025). Future research on proper representation and inclusivity in AI faces a key challenge in developing ethical and fairness-aware frameworks 39 for LLMs across domains such as healthcare, finance, and marketing, where rapid deployment often conflicts with equity objectives (X. Wei et al., 2025). Research should further explore how to distinguish between purposeful and unintended differentiation in such systems to prevent inequitable outcomes for marginalized groups (X. Wei et al., 2025). 6.3.5 Engineering Perspective [F5] Definition of GenAI Metrics A strong message is voiced about the need for new metrics that would enable evaluation of GenAIs. Such dedicated metrics could provide a base for specialized tools to assess benefits and risks of GenAI models in various fields (Mohammad et al., 2023). Traditional measures, such as accuracy, precision, and recall are insufficient for GenAI tasks. Therefore, new metrics incorporating helpfulness, harmlessness, honesty (H), security, and standardized validation on independent datasets are necessary (Beheshti et al., 2025; Chau & Xu, 2025; Ghebrehiwet et al., 2024). These should address the impact of GenAI on individuals, workers, and organizations, as well as the inter-organizational impacts (Storey et al., 2025). Regarding individuals, measuring GenAI’s influence on cognitive aspects such as questioning, rigor, and clarity is underexplored and warrants specific evaluative frameworks (Jarvenpaa & Klein, 2024). Accuracy assessment, especially concerning AI hallucinations in generative models, remains a challenge and calls for dedicated reliability metrics (Sigala et al., 2024). New metrics are also needed to measure fairness and diversity in LLM-driven systems (X. Wei et al., 2025) and explainable AI could help in their application (Schneider, 2024). Also intellectual property rights protection demands comprehensive metrics to safeguard legal and ethical standards in GenAI applications (Nah et al., 2023). Lastly, refinement of theoretical frameworks on technology adoption is needed to better capture the nuances of GenAI integration and acceptance in diverse environments (Mambile & Ishengoma, 2024). Empirical Evaluation of GenAI in the Field The empirical evaluation of GenAI systems calls for developing resource-efficient longitudinal designs that could monitor how GenAI tools mitigate bias over time while accounting for expertise gaps and computational constraints (X. Wei et al., 2025). It should definitely include user-centered aspects of GenAI systems, particularly the ethical and privacy implications of conversational models. Future studies should move beyond simulated testing to investigate real-world user interactions, collecting authentic behavioral data while safeguarding user privacy and ensuring ecological validity (Gumusel, 2025). There is also a recognized need to empirically assess whether GenAI-based tools genuinely enhance academic rigor, questioning, and clarity in scholarly inquiry (Jarvenpaa & Klein, 2024). In the healthcare domain, clinical trials, observational studies, and cross-institutional collaborations could help determine the real clinical utility of language models (Meng et al., 2024). Achieving this goal requires engaging healthcare professionals directly involved in documentation and decision-making processes, as well as expanding the scope of research to encompass diverse medical contexts (Bracken et al., 2025). A coherent approach to validating AI systems in health communication must also consider technical accuracy, patient satisfaction, and public health outcomes through real-world trials (M. Li & Guenier, 2024). GenAI-driven personalized healthcare services should be studied across cultural and socio-economic boundaries, evaluating their sustainability and impact on equitable care delivery (Pool et al., 2024). Design Principles for GenAI Solutions Research on design principles for GenAI should first address the question of effective design principles that guide GenAI development holistically, integrating human-centered and technical perspectives (Feuerriegel et al., 2024). An important research direction explores how to design and test GenAI-based systems with varying levels of automation to optimize human benefit considering individual diversity (Haase et al., 2024). This personalization-oriented approach complements efforts focused on defining the most effective design processes for creating GenAI that operates collaboratively with humans, balancing autonomy and human oversight (Nah et al., 2023). Another theme is the simplification of model architectures, improving data quality, and implementing standardized validation procedures in a strive to ensure that GenAI systems are reliable, transparent, and suitable for critical settings such as healthcare (Ghebrehiwet et al., 2024). Moreover, privacy-oriented design emerges as a necessary complement to usability and transparency research. Addressing gaps in privacy-aware design demands systematic exploration of how user-privacy principles can be embedded throughout the development lifecycle, from early prototyping to deployment. This includes not only identifying potential risks but also developing practical techniques, frameworks, and design guidelines for privacy-sensitive interfaces like chatbots (Gumusel, 2025). 40 GenAI Model Training Future research on GenAI model training converges around several key domains: model adaptation, bias mitigation, domain specialization, and ethical application. One major direction involves developing strategies for effective model fine-tuning to balance accuracy and generalization, without overfitting or sacrificing scalability (X. Wei et al., 2025). Closely related to these concerns is the effort to fine-tune or adapt models for specific domains and contexts (Chau & Xu, 2025; Feuerriegel et al., 2024). Research should examine whether GenAI is best suited for static or dynamic environments, and how models can evolve continuously without structural conflicts during extended use (Nah et al., 2023). The integration of GenAI in outsourcing sector necessitates to train the models considering global regulatory variations, cultural differences, and language specific capabilities while addressing efficiency and innovation (Dwivedi et al., 2025). The challenge of preventing bias in algorithms and data used to train models persists as a critical focus area (Sigala et al., 2024; X. Wei et al., 2025). Healthcare is an exemplary target area for which domain-specific improvements in model training should be considered. These include enhancing model accuracy, reliability, and robustness for safe clinical deployment (Ouanes, 2024), guidance by up-to-date medical standards (Bellanda et al., 2024), standardized validation and diverse training datasets representing multiple disease categories for supporting reproducibility and generalization of research (Ghebrehiwet et al., 2024), as well as advanced multimodal learning integrating textual and visual medical data to improve diagnostic understanding (M. Li & Guenier, 2024). As an alternative (or a support) to domain-specialized models, Retrieval-Augmented Generation could be considered (Beheshti et al., 2025). As regards ethical application, guidelines for the responsible integration of LLMs should be developed for high-stakes sectors such as healthcare and finance, emphasizing the balance between rapid deployment, accuracy, and inclusivity with regulatory considerations (X. Wei et al., 2025). GenAI System Development and Maintenance A promising future research direction is the exploration of GenAI’s role in design science research to enhance creativity in developing new IT artifacts which could lead to advancing the theoretical and practical foundations of IT development (Feuerriegel et al., 2024). A related focus is the development and testing of partially automated tools aimed at maximizing human benefit. Specifically, advancing the technical integration of AI within robotic process automation (RPA) tools could lead to more sophisticated, adaptive automation solutions in organizational contexts (Haase et al., 2024). In terms of system integration, future work should target addressing interoperability, usability, and compliance challenges. For instance, in the case of existing healthcare information systems, including electronic health records (EHRs), it would enable GenAI to augment clinical workflows and decision-making (M. Li & Guenier, 2024). 6.3.6 Quality Requirements Perspective [F6] Data Privacy Protection Data privacy protection should be considered of primary importance in future research on GenAI due to the risks and harms, especially the ones arising from real-time interactions with chatbots. Current studies often rely on simulated data, leaving a gap for research based on actual user data in real-world settings (Gumusel, 2025). Future work should focus on developing user-privacy-centric chatbot designs, incorporating robust privacy safeguards and intellectual property protections (Gumusel, 2025; Sigala et al., 2024). Personalization and user interaction improvements, particularly in health communication, should ensure that privacy and consent safeguards are in place to deem the related GenAI solutions trustworthy. There is a need to study GenAI telehealth applications across diverse cultural and socio-economic contexts to evaluate informed clinical decision-making and sustainability (M. Li & Guenier, 2024; Pool et al., 2024). The broader societal impact includes transformative changes in workforce and operational processes across sectors. This calls for renewed research into end-user computing and examination of societal issues like privacy and security in the pervasive use of GenAI (Dwivedi et al., 2025; Storey et al., 2025). The growing sophistication of GenAI-powered privacy-related cyber threats requires IT outsourcing firms leveraging GenAI to develop advanced real-time detection, response, and mitigation systems to protect sensitive data and maintain trust with clients (Dwivedi et al., 2025). There is a growing demand for comprehensive guidelines addressing data privacy protection that would define accountability structures involving all stakeholders to ensure safe and transparent deployment (M. Li & Guenier, 2024; Maita et al., 2024; Pool et al., 2024). This could be undertaken as a part of a wider effort on data governance concerning AI, especially in healthcare, which remains an understudied area in IS research (Srivastava et al., 2025). Security and Protection Security challenges are amplified by GenAI’s capabilities, leading to sophisticated cyber threats. The development of advanced AI-driven real-time detection, response, and mitigation methods is critical, alongside protocols against attacks such as prompt injections and data poisoning. Human oversight and 41 clear accountability frameworks are essential to secure GenAI systems effectively (Dwivedi et al., 2025; Pool et al., 2024). This context necessitates a deeper understanding of broader societal impacts, especially concerning the security of business applications (Storey et al., 2025). Furthermore, strengthening defenses against GenAI-fueled security threats is of paramount importance. As adversarial actors exploit generative tools to scale cyber attacks, new AI-driven mechanisms for real-time threat detection, adaptive response, and mitigation must be developed (Dwivedi et al., 2025). This effort should align with a sociotechnical assurance framework emphasizing human oversight, resilience against data poisoning and prompt injection, and clear accountability chains (Pool et al., 2024). A closely related yet distict research direction should concern combating the misuse of GenAI. Govern- ments must proactively prepare for the potential misuse of GenAI by establishing comprehensive regulations and guidelines that address its deployment in sensitive services such as healthcare and legal sectors. This involves creating frameworks to minimize generative tools’ misuse, ensuring robust oversight and account- ability. Additionally, preventative measures should be implemented to guard against malicious uses of AI systems, fostering a responsible environment that prioritizes ethical considerations and protects individuals and institutions from harm (Sigala et al., 2024). Security also gives reasons to design and enforce comprehen- sive frameworks to guide the responsible development and deployment of GenAI systems. These frameworks should address data protection and transparency while ensuring accountability among developers and insti- tutional actors (M. Li & Guenier, 2024; Maita et al., 2024). Future studies should clarify how such ethical principles can be operationalized in high-stakes fields like healthcare and academic publishing, ensuring that GenAI tools uphold the expected integrity and respect data sovereignty. The proposal of a modern Turing test to detect AI-generated research submissions further highlights the urgency of maintaining integrity in scholarly communication (Jarvenpaa & Klein, 2024). As GenAI is increasingly used in decision systems, including the area of security, investigators should also strive to address biases in GenAI-powered fraud detection systems which can result in higher false positive rates for transactions from certain demographic groups, leading to discriminatory practices (X. Wei et al., 2025). Demonstrating Trustworthiness Transparency is considered fundamental for demonstrating GenAI’s trustworthiness, especially in healthcare where the need for interpretable models that clearly communicate decision-making processes is crucial to foster clinician acceptance (Chau & Xu, 2025; Meng et al., 2024; Ouanes, 2024). Explainability mechanisms are equally necessary for building confidence among users, especially healthcare professionals, particularly as design science researchers develop new methods to enhance these capabilities and study their effects on adoption (Chau & Xu, 2025; Feuerriegel et al., 2024). Reliability and accuracy improvements cannot be ignored either. Studies should evaluate GenAI’s performance across broader domains, e.g. medical specialties which require complex diagnostic and treatment decisions where accuracy is extremely important (Beheshti et al., 2025; Ouanes, 2024). This could be supported with simplifying model architectures, implementing standardized validation procedures, and addressing data quality challenges to ensure models meet clinical standards (Bracken et al., 2025; Ghebrehiwet et al., 2024). Mitigating hallucinations and inaccuracies in LLM outputs demands urgent attention as these issues can undermine trust (Chau & Xu, 2025; Schneider, 2024). Researchers should strive to develop techniques beyond Chain-of-Thought reasoning to mitigate inaccuracies and improve verifiability (Schneider, 2024; Sigala et al., 2024). Human-AI interaction research should investigate user attitudes toward GenAI, examining why people appreciate or avoid these tools and exploring trust-related factors (Chau & Xu, 2025; Srivastava et al., 2025). This includes studying impacts on business workers and general users, requiring renewed focus on end-user computing topics in the GenAI era (Storey et al., 2025). Design approaches that foster trust through improved system reliability and transparency will be essential (Feuerriegel et al., 2024). Future work should also address bias issues affecting GenAI integration e.g. in clinical workflows (Srivastava et al., 2025). Understanding the factors that influence user trust in LLM outputs—despite potential hallucinations and training data inaccuracies—is essential (Chau & Xu, 2025). Efficiency and Scalability Future research on GenAI efficiency and scalability encompasses several interconnected dimensions. Among them, operational integration emerges as a critical theme, focusing on how GenAI can transform digital services through enhanced efficiency and operational capabilities (Sigala et al., 2024). This includes investi- gating government adoption strategies for building scalable digital services accessible to broader populations (Sigala et al., 2024). Parallel to service delivery, the localization domain demonstrates significant potential, where GenAI integration enhances efficiency, scalability, and cultural customization, enabling hyper-localized content creation (Dwivedi et al., 2025). Quality assurance and reliability represent another essential research direction, particularly in high-stakes applications. Prioritizing model accuracy, reliability, and robustness ensures safe and effective clinical applications (Ouanes, 2024). This intersects well with bias mitigation 42 challenges, prompting researchers to design dynamic methodologies that integrate real-time user feedback for continuous detection and mitigation of LLM bias while maintaining scalability and operational efficiency (X. Wei et al., 2025). Beyond individual organizational contexts, inter-organizational dynamics call for system- atic investigation. As GenAI permeates organizations, interaction patterns between and among entities will evolve, necessitating research into efficient cooperation across organizational boundaries (Storey et al., 2025). Accountability and Contestability A valid research challenge lies in balancing competing priorities: high-speed decision-making versus ethical accountability, and hyper-personalization versus inclusivity (X. Wei et al., 2025). In this context, opera- tionalizing bias mitigation is important yet requires transparent, explainable interfaces that preserve human agency while implementing robust safeguards (Pool et al., 2024). Research should clarify the obligations of GenAI providers to determine which GenAI applications should face restrictions or prohibition (Nah et al., 2023). This involves establishing clear accountability frameworks that delineate responsibilities among developers, providers, and regulators (Pool et al., 2024). Another identified research direction is addressing the challenge of responsibly automating organizations’ business processes while ensuring transparency and accountability in GenAI deployment (Sigala et al., 2024). Future work should also advance reliability through information resilience protocols, including security measures against prompt injection and data poisoning (Pool et al., 2024). Essential components include human-in-the-loop oversight and contestability mechanisms that enable stakeholders to challenge AI-driven outcomes (Pool et al., 2024). 7 Discussion GenAI is transitioning from a standalone technological novelty to a core constituent of modern socio- technical systems. Our findings indicate that its transformative potential is not an inherent property of the models themselves, but rather an emergent outcome of the deliberate alignment between artifacts, human expertise, organizational processes, and institutional frameworks. We argue that by reframing GenAI as a socio-technical entity, we can better understand the variance in model performance across different domains and recognize that governance, workflow design, and human oversight are as critical as algorithmic precision. This perspective shifts the focus from simple tool adoption toward the design of hybrid human-AI ensembles, i.e., systems where value is contingent upon the collaborative interaction between human intelligence and machine capability. The synthesis of findings across diverse sectors reveals that GenAI value materializes primarily through complementarity rather than the wholesale substitution of human labor. For example, within a healthcare context, ambient AI documentation systems have demonstrated significant reductions in administrative burden and after-hours work (provided that the workflow maintains rigorous human review and safety-case attestation). Similarly, in software engineering, AI-augmented programming accelerates routine tasks and facilitates knowledge diffusion particularly among less experienced developers. These gains are maximized when organizations implement clear role allocations, such as the assistant–reviewer–attestor triad, and maintain robust provenance of AI-generated contributions. These findings align with broader productivity studies suggesting that the benefits of GenAI accrue disproportionately to those who leverage it for hybrid learning and skill augmentation. Yet this very complementarity creates a distinctive socio-technical contradiction, i.e., the more successfully GenAI lowers cognitive load and entry barriers, the greater the risk of human automation bias and long-term erosion of critical judgment (the exact capabilities required to supervise the system itself). Despite these benefits, our findings indicate that the transition from successful laboratory pilots to sustained field impact remains fraught with socio-technical challenges. Inconsistencies in outcomes often stem from a lack of “fit” rather than technical failure. For example, while GenAI enables hyper-personalization in education and marketing, its success depends on pre-existing AI literacy, redesigned assessment frameworks and stringent accessibility standards. Furthermore, the persistent challenges of representational harm and cultural bias highlight how models encode latent societal tendencies, requiring participatory governance to prevent marginalization. Reliability also remains a significant hurdle. Fixed training cut-offs produce widening knowledge gaps. Hallucinations also yield outputs that are plausible yet wrong. Our findings indicate that limited contextual awareness yields formally correct but practically dangerous outputs, and performance drift silently alters behavior after deployment. Together, these failure modes strike at the IS discipline’s foundational commitments to system dependability, traceability, and accountability and transforms apparent technical limitations into a full-blown crisis of socio-technical legitimacy. Addressing these risks requires a shift toward defense-in-depth security architectures and risk-based governance frameworks, such as the NIST AI Risk Management Framework (National Institute of Standards and Technology, 2023), to manage the emerging threats of prompt injection and shadow AI while preserving the human subsystem as the ultimate locus of responsibility and control. 43 7.1 The Sociotechnical GenAI Outcomes Matrix (SGOM) This section introduces the Sociotechnical GenAI Outcomes Matrix (SGOM). SGOM (Table 12) provides a conceptual framework for synthesizing the evidence gathered in this study. It posits that GenAI outcomes are co-produced across multiple levels of analysis, ranging from individual users to technical engineering layers. To interpret this matrix, each row provides a distinct socio-technical dimensions that must be syn- chronized for successful implementation. The columns represent a progression from empirical observation (“What works”/“Why it fails”) to normative intervention (“Design moves”) and finally to evaluative rigor (“Evidence/KPIs”). Effective GenAI deployment requires moving horizontally across a row to ensure that every benefit is protected by a corresponding control and measured by a domain-specific metric. Conversely, vertical alignment ensures that technical engineering controls (e.g., RAG hardening) support higher-level ethical and organizational goals. The SGOM serves as a diagnostic instrument that nudges IS research away from monolithic evaluations of model “intelligence” toward a nuanced understanding of situated performance. By mapping technical failures (e.g., performance drift) directly to organizational controls (e.g., accountability maps), the matrix forces researchers to move beyond the black box view of GenAI. It also provides a structured vocabulary to describe the interdependencies between subsystems and illustrates, for example, how societal biases are not merely data errors but governance failures that require participatory design. Consequently, we contend that the SGOM acts as a roadmap for future research, encouraging scholars to investigate not just whether a model works, but under what specific socio-technical configurations its benefits become durable and legitimate. 7.2 Implications for Research and Practice From a Socio-Technical Systems (STS) theory perspective, the introduction of GenAI into IS represents a fundamental shift in the joint optimization of the social and technical subsystems. Our research implies that the primary unit of analysis in IS must evolve from the individual tool or the isolated user to the hybrid human-AI ensemble. 7.2.1 Implications for Research Researchers should move beyond simple adoption models to explore how GenAI alters organizational structures and power dynamics. This necessitates the development of formal constructs for ensemble coordination, specifically focusing on how role allocation (assistant vs. attestor) influences the variance in work outcomes. Applying the lens of Structural Contradiction, future work should investigate the tension between the technical efficiency of GenAI and the social requirement for accountability. Furthermore, the IS field must lead in establishing ‘evidence ladders’, i.e., methodological frameworks that transition from simulated model testing to high-fidelity field trials. This will ensure that “social fit” is measured with the same rigor as technical accuracy. 7.2.2 Implications for Practice For practitioners, the socio-technical perspective mandates that GenAI deployment is treated as an organiza- tional redesign rather than a software upgrade. Managers must prioritize the “secondary design” of the social subsystem. This includes investing in AI literacy and new job descriptions to match the capabilities of the technical subsystem. This also involves operationalizing risk-based governance (e.g., NIST AI RMF (National Institute of Standards and Technology, 2023)) not as a compliance checklist, but as a dynamic mechanism for maintaining institutional legitimacy. Finally, practitioners must implement “contestability by design”, providing human users with the technical tools to adjudicate, override, and audit AI outputs. This will help to ensure that the human subsystem remains the ultimate locus of responsibility in high-stakes environments. Our research indicates that the frontier of GenAI is fundamentally socio-technical. The ultimate value of these systems is not derived from the raw power of the underlying models, but from the sophistication of the ensembles in which they are embedded. By aligning technical advances with institutional legitimacy, standardized field metrics, and risk-based governance, GenAI can transition from an experimental technology to a durable, trustworthy pillar of modern IS. 8 Future Research Agenda Our study highlights transformative benefits across domains, but also a triad of constraints—technical unreliability, pervasive societal and ethical risks, and a systemic governance vacuum—that together signal a persistent misalignment between GenAI’s fast-evolving technical subsystem and the slower-adapting social and institutional arrangements in which it is embedded. GenAI capabilities are advancing faster than the norms, governance structures, and regulatory institutions required for responsible deployment. Consequently, 44 Table 12 Sociotechnical GenAI Outcomes Matrix (SGOM) Perspective What Works (Benefits) Why It Fails (Challenges) High-Value Design Moves (Controls) Evidence / KPIs User (e.g., Clinicians, Educators, Develop-ers) Hybrid human–AI teaming improvesthroughput and quality via role clarity andcalibration. It also reduces professionalburnout and cognitive load. Over-reliance; contextual blind spots;knowledge staleness; literacy gaps. Human-in-the-loop attestation; literacyprograms; Retrieval-Augmented Genera-tion (RAG) freshness with citations Task time reduction; correction rates;user satisfaction (SPACE). Organizational (e.g., Workflows, Processes) Standardized artifact provenance and ensem-ble design enable knowledge diffusion. Lack of evaluation frameworks; pilot-to-production attrition; brittle governance. Assistant – reviewer – attestor patterns;evidence ladders; accountability maps. Time to insight; defect density; con- version from proof of concept to production. Societal (e.g., Culture, Equity, Public Ser- vices) Democratizes access to specialized knowl-edge. Multilingual access and personalizationat scale improve service participation. Representational harms; latent culturaltendencies; accessibility gaps; environ-mental costs of model training and oper-ation. Participatory governance; cross-culturalevaluation; content watermarking; carbon-aware deployment policies. Inclusion indices; bias audits; prove-nance coverage; carbon footprintassessment. Ethical (e.g., Fairness, Transparency) Reason-giving transparency (explaining “why”) sustains trust and enables redress. Opaque behaviors; immature XAI; lackof clear liability for harms; unresolved IPownership disputes. Model/data cards; explainability UIs; con-testability and incident registers; IP clear-ance protocols. Audit pass rates; explainability ade-quacy; time to redress; IP disputerates; attribution coverage. Engineering (e.g., Security, Robustness) Defense-in-depth and validated pipelines (repair/validation loops) improve safety. Prompt injection; performance drift; lim-ited contextual awareness. OWASP LLM Top 10; red teaming; RAGhardening; canary tests; drift dashboards. Attack block rate; jailbreak detection;semantic robustness score. Quality (e.g., Evaluation, Safety) Efficiency gains materialized via validatedframeworks (e.g., PDQI-9). Evidence gaps; weak metrics; repro-ducibility issues; lack of public datasets. Standardized domain metrics; multi-sitebenchmarking; error taxonomies. PDQI-9 scores; hallucination rate;inter-rater agreement; benchmarkpass rates. 45 the path forward is not merely a holistic investigation but demands proactive intervention and socio-technical design aimed at achieving joint optimization. The SGOM operationalizes this design challenge, providing a scaffold that links observed failure modes to the necessary technical, social, organizational, and institutional controls—and the KPIs required to evaluate whether performance becomes durable in practice. Building on this logic, we articulate a research agenda that reorients IS scholarship from analyzing impacts toward actively shaping the co-evolution of these interdependent subsystems, organized around three critical frontiers. 8.1 Frontier 1: Organizational Reconfiguration and Governance The “governance vacuum” identified in our results indicates that traditional IT governance structures are ill-equipped for the decentralized, pervasive nature of GenAI. Governing “Shadow AI”: Unlike traditional enterprise software, GenAI is easily accessible to individual employees, leading to unsanctioned use. Research is needed to develop governance frameworks that balance the innovation potential of bottom-up adoption with the risks of data leakage, privacy violations, and regulatory non-compliance. New Workflows and Role Definitions: The transition to hybrid human-AI collaboration requires revisiting organizational routines. Research should explore how accountability is distributed in the “assis- tant–reviewer–attestor” triad. Who is liable when an AI-generated, human-reviewed artifact fails? How must job descriptions evolve to prioritize “verification skills” over “creation skills”? How do organizations redesign business processes to integrate GenAI while maintaining transparency and human oversight. More broadly, this transition points toward the emergence of new occupational categories and labor market structures shaped by human–AI collaboration. Regulatory Translation: How do emerging regulatory frameworks (e.g., the EU AI Act, NIST AI RMF) translate into organizational governance practices? Comparative studies are needed to understand the barriers to effective implementation of these external mandates within internal workflows. 8.2 Frontier 2: Societal Alignment, Ethics, and Law Our findings on bias and representational harm confirm that GenAI is not culturally neutral. The agenda here should move from problem identification to solution engineering informed by socio-technical perspectives. Human-AI Symbiosis: Future research should empirically investigate the boundary between helpful augmentation and harmful dependency. Longitudinal studies are needed to measure how relying on GenAI for coding, writing, or diagnostics affects human domain expertise, critical thinking, and professional identity over time. How do we design interactions that maintain “human-in-the-loop” vigilance without causing fatigue or the erosion of tacit knowledge? IS scholarship can extend this by theorizing “trust calibration” and “appropriate reliance” in routine work, knowledge work, and high-stakes decision support. Operationalizing Fairness and Algorithmic Justice: While the literature identifies bias as a major risk, there is a scarcity of frameworks for operationalizing fairness in specific industries. Research should focus on developing domain-specific audit protocols (e.g., for healthcare triage) that align algorithmic outputs with local legal and ethical standards. Participatory Design: To counter “exclusionary norms”, IS researchers should lead participatory design initiatives that involve marginalized communities in the fine-tuning and evaluation of models, ensuring that GenAI systems reflect diverse cultural and linguistic realities rather than just dominant training data. Intellectual Property and Value Attribution: As GenAI disrupts the economics of knowledge production, research is needed into new legal and economic models for attributing value. How can we trace provenance in AI-generated content to ensure fair compensation for original creators? IS research should examine the implications for digital platforms and content ecosystems, investigating how provenance- tracking technologies can enable transparent attribution. Studies should analyze the economic sustainability of creative industries in the GenAI era, the emergence of new intermediaries and market mechanisms for AI-generated content. Research drawing on platform governance and digital rights management literature can inform the design of attribution systems that balance creator rights with innovation incentives. Information Integrity and Authenticity in Digital Ecosystems: Research should explore socio- technical safeguards against AI-generated misinformation (verification routines, provenance signals that users can interpret, moderation policies, and institutional responses). 8.3 Frontier 3: Design and Validation of GenAI Artifacts The probabilistic and generative nature of GenAI challenges the deterministic assumptions often held in IS design and evaluation. To address the “reliability crisis” identified in our findings, IS scholars must spearhead a research program focused on the rigorous design and contextual validation of these artifacts within organizational settings. 46 Design Principles for Probabilistic Systems: Traditional IS design theory often assumes consistent system behavior. Future IS research must formulate new design principles and meta-requirements for systems that are inherently unstable or prone to hallucination. How do we design IT artifacts that remain useful and trustworthy even when the underlying model is imperfect? This includes design for uncertainty (calibration cues, confidence communication, and verification affordances), and explicit “safety cases” that connect model limitations to workflow controls. Designing for Contestability: Reliability requires that users can challenge AI outputs. Design Science Research should focus on creating interfaces that support “contestability by design”—mechanisms that allow users to easily audit and query model outputs, shifting the user role from passive consumer to active auditor. Secure-by-Design and Privacy-by-Design GenAI: Building on prompt-injection/jailbreak and privacy-leakage concerns, future work should design and evaluate defense-in-depth patterns for GenAI-enabled IS, and examine how these technical controls interact with organizational routines and user practices. Green IS and Corporate Digital Responsibility: Aligning with the discipline’s growing focus on sustainability, researchers should investigate the trade-offs between model performance and environmental impact. IS scholars are positioned to develop decision frameworks for “Green GenAI”, helping organizations balance the computational costs (energy and financial) of LLMs against their actual business value, and promoting the adoption of “frugal AI” strategies where appropriate. Socio-Technical Evaluation Frameworks: Traditional metrics are insufficient for evaluating GenAI. Research should develop domain-specific evaluation frameworks that capture helpfulness, harmlessness, and contextual appropriateness. By pursuing this agenda, the IS community can fulfill its critical role as the bridge between the technical frontier of AI development and the social, organizational, and ethical contexts in which these systems must operate. The challenge before us is to leverage this distinctive positioning to ensure that GenAI evolves as a technology that augments human capability, respects human dignity, and serves the broad public interest. 9 Threats to Validity As with any systematic review, this study is subject to limitations that must be considered when interpreting the findings. We discuss these threats to validity following the classification framework for secondary studies proposed by Ampatzoglou et al. (2019). 9.1 Study Selection Validity This category concerns the risk of missing relevant studies or selecting inappropriate ones. • Selection of Digital Libraries: While we queried three premier repositories for IS research (AIS eLibrary, Scopus, and Web of Science), we acknowledge that extending the search to additional databases (e.g., Springer, Wiley, Emerald, and Taylor & Francis) could have improved coverage. However, Scopus and Web of Science index papers from multiple publishers, which partially mitigates this limitation. • Search Strategy Limitations: The search string was systematically constructed using three facets (study type, phenomenon, and domain) and incrementally revised by multiple co-authors. However, some studies using non-standard terminology may have been missed. • Selection of arbitrary starting year: We restricted our search to publications from 2023 onwards to capture the post-ChatGPT surge in GenAI research. This improves topical focus but may under- represent earlier IS-relevant work on generative models and may overweight LLM-centric framings that became dominant after late 2022. • Subjectivity in Screening: Despite employing a multi-stage screening process with independent reviewers and consensus meetings, the interpretation of inclusion and exclusion criteria inherently involves subjective judgment. Although all conflicts were resolved through consensus, different research teams might arrive at marginally different sets of included studies. • Exclusion of grey literature: Consistent with our focus on peer-reviewed secondary studies, we excluded grey literature. As for preprints, significant insights may be disseminated through non- traditional channels before appearing in peer-reviewed venues. As for technical reports, white papers, and industry publications, we believe that they are not appropriate to deliver literature review studies. However, this choice may exclude influential practitioner roadmaps that shape GenAI governance and adoption in IS practice, potentially under-representing practice-led developments. 9.2 Data Validity This category concerns the validity of the extracted dataset and its analysis. 47 • Data Extraction Reliability: Data extraction was conducted by six reviewers following a structured protocol refined during a pilot phase. While weekly alignment meetings and quality audits were employed to ensure consistency, the extraction of qualitative data—particularly the identification of benefits, challenges, and future research directions—required interpretive judgment. One reviewer’s work was identified as deficient during quality audits and was subsequently re-evaluated by another team member. Although this mitigation strategy improved data quality, it highlights the inherent challenges of maintaining consistency across multiple extractors. To support auditability, we captured verbatim quotations during extraction, enabling traceability from synthesized themes back to the source texts. • Validity of Secondary Studies: As a review of secondary studies, our findings are contingent upon the rigor and accuracy of the included reviews and research agendas. Any errors, biases, or omissions present in these source documents are transitively reflected in our synthesis. We did not verify the primary studies underlying the included secondary studies. Consequently, the strength of our conclusions is bounded by the quality of the evidence base we inherited. We mitigated this by excluding non-peer-reviewed sources. • Quality Assessment Limitations: While we applied formal quality assessment criteria (DARE for secondary studies, custom criteria for research agenda papers), quality assessment inherently involves subjective judgment. Different reviewers might assign marginally different ratings to the same study. We mitigated this through dual-reviewer assessment with consensus resolution and reported inter-rater agreement statistics. Additionally, we did not exclude studies based on quality ratings. This decision was made to ensure comprehensive coverage of the nascent GenAI literature, where even methodologically imperfect studies may offer valuable insights. However, this approach means that the quality of evidence synthesized in our review varies across included studies. To support informed interpretation, we report individual study quality scores alongside each synthesized finding in Tables 9, 10, and 11, enabling readers to gauge the evidential strength of specific findings and to interpret them with this heterogeneity in mind. • Heterogeneity of Source Material: Our review includes secondary studies and research agendas. While this provides a holistic view of the field, it introduces heterogeneity in the granularity of evidence. SLRs typically provide retrospective empirical evidence, while agendas provide prospective theoretical propositions. To address this, we used a flexible thematic synthesis method capable of handling diverse qualitative inputs. • Risk of Double Counting: When synthesizing findings across multiple secondary studies, there is a risk that overlapping primary studies may be counted multiple times, potentially inflating the apparent strength of certain findings. While we did not conduct an overlap analysis of primary studies across included reviews, researchers should be aware of this limitation when interpreting the prevalence of specific themes. 9.3 Research Validity This category concerns the analysis procedures and the reproducibility of the study. • Interpretive Bias in Thematic Analysis and Synthesis: The qualitative analysis and synthesis of benefits, challenges, and future directions utilized a Grounded Theory-inspired approach, which is inherently interpretive. To support transparency, we documented our research protocol in detail and made all intermediate artifacts—including screening decisions, extraction sheets, and coding outputs—publicly available in the online replication package. However different researchers applying the same protocol may arrive at somewhat different thematic structures, category labels, or emphasis in synthesis. To mitigate this, we employed a multi-analyst approach with independent coding and consensus meetings. • Generalizability: Our findings are bounded by the Information Systems discipline and the tertiary nature of our review. As we synthesize secondary studies, our results are constrained by what the included reviews chose to report, the application sectors and contexts they covered, and the nascent state of a rapidly evolving field. Consequently, these insights may not generalize to domains or industries not explicitly covered by the included secondary studies. 10 Conclusions This study synthesizes a recent and fast-moving body of Information Systems research on Generative AI (GenAI), drawing on qualified sources that were predominantly published in 2024 and 2025. The literature, mainly published in journal outlets, spans core IS application areas, with the strongest coverage of human health and social work activities, followed by information and communication, professional and technical 48 activities, and education. Our analysis grouped the findings into three themes: benefits, challenges and limitations, and research gaps and future directions. On the benefits side, the reviewed literature highlights improvements to information work in practice, including stronger clinical information flow, documentation, and decision support in healthcare, more personalized and accessible learning with reduced educator workload in education, support for knowledge synthesis and for developing design requirements and prototypes in research and design, productivity gains in software engineering through automated coding and testing support, and more efficient data management and multimodal content generation. This also include synthetic data generation positioned to support privacy and ethical AI work. Nonetheless, the same body of evidence shows that GenAI deployment is constrained by several risks. First, societal and ethical risks are persistent, including amplifying biases, fairness challenges, and misuse at scale, which shift GenAI from a purely technical issue to an issue of institutional legitimacy. Second, technical unreliability remains a core barrier, including hallucinations, limited contextual sensitivity, and performance drift and instability. To integrate these insights, we introduced the Sociotechnical GenAI Outcomes Matrix (SGOM) as a conceptual framework that links observed benefits and challenges to socio-technical dimensions. The SGOM reframes GenAI outcomes as co-produced across user, organizational, societal, ethical, engineering, and quality perspectives. This framing supports two implications that matter for IS scholarship and practice. For research, it motivates a shift from studying isolated tool adoption or model performance toward studying the hybrid human-AI ensemble. For practice, it emphasizes that the implementation of GenAI constitutes an organizational redesign challenge, necessitating explicit controls, traceability, and contestability mechanisms to uphold human responsibility. Future research in Information Systems should prioritize understanding the socio-technical conditions that ensure a safe and accountable use of GenAI, rather than simply documenting its immediate effects. This includes investigating how organizations govern decentralized "shadow" AI use, how ethical and legal requirements are translated into everyday routines, and how IT artifacts developed with GenAI can be both useful and trustworthy, while also allowing for human contestability. Supplementary information. The replication package is available at https://github.com/przybylek/ GenAI4IS. It includes the detailed thematic codebooks—containing verbatim excerpts from each included study mapped to their respective codes—as well as screening decisions, pre-consensus reviewer ratings with justifying comments, data extraction sheets, and analysis scripts. Acknowledgements. This paper is dedicated to the memory of Professor Stanisław Wrycza, the visionary founder of the International Conference on Information Systems Development (ISD). The collaboration underlying this study was initiated during the 32nd edition of the ISD conference in 2024, a testament to his enduring legacy in fostering a vibrant and innovative IS research community. Author contribution. https://github.com/przybylek/GenAI4IS Funding. This research was partially supported by the University of Belgrade – Faculty of Organizational Sciences and, in part, by the Ministry of Science, Technological Development and Innovation of the Republic of Serbia through institutional funding (grant number: 200151). This work was supported, in part, by Taighde Éireann – Research Ireland under Grant number 13/RC/2094_2. Co-funded by the European Union under the Systems, Methods, Context (SyMeCo) Pro- gramme Grant Agreement Number 101081459. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. This work was partially supported by the Europium Short-Term Outgoing Visits program from Gdańsk University of Technology’s Initiative of Excellence. Data availability. https://github.com/przybylek/GenAI4IS Code availability. https://github.com/przybylek/GenAI4IS Declarations Competing interests. The authors declare that they have no competing interests. Ethics approval and consent to participate. Not Applicable. Consent for publication. Not Applicable. 49 Declaration on the Use of Large Language Models. The authors employed Large Language Models (LLMs) to support the thematic analysis, specifically to brainstorm initial code names and propose refined labels and concise descriptions for the inductively developed codes and themes. In all cases, LLM suggestions served only as initial inspiration; many were discarded for being too narrow or conflating distinct concepts. All final code names, theme structures, and descriptions were critically evaluated and approved by the research team. Additionally, LLMs were used to improve the language, clarity, and readability of selected passages. All AI-generated suggestions were reviewed and edited by the authors, who take full responsibility for the final content of this paper. References Ahmed, I., Aleti, A., Cai, H., Chatzigeorgiou, A., He, P., Hu, X., . . . Xia, X. (2025). Artificial intelligence for software engineering: The journey so far and the road ahead. ACM Transactions on Software Engineering and Methodology, 34(5), 1–27, Albashrawi, M. (2025). Generative AI for Decision-Making: A multidisciplinary perspective. Journal of Innovation & Knowledge, 10(4), 100751, https://doi.org/10.1016/j.jik.2025.100751 Al-Lataifeh, Z., Harris, M.A., Smith, J., Chin, A.G. (2025). Generative AI Health Assistants in Modern Healthcare: Drivers and Barriers to Adoption. Information Systems Frontiers, , https://doi.org/ 10.1007/s10796-025-10681-4 Alter, S. (2008). Defining information systems as work systems: implications for the IS field. European Journal of Information Systems, 17(5), 448-469, https://doi.org/10.1057/ejis.2008.37 Ampatzoglou, A., Bibi, S., Avgeriou, P., Verbeek, M., Chatzigeorgiou, A. (2019). Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Information and Software Technology, 106, 201-230, https://doi.org/10.1016/j.infsof.2018.10.006 Bandara, W., Miskon, S., Fielt, E. (2011). A systematic, tool-supported method for conducting literature reviews in information systems. M. Rossi & J. Nandhakumar (Eds.), ECIS 2011 proceedings [19th European Conference on Information Systems] (p. 1–13). AIS Electronic Library (AISeL) / Association for Information Systems. Retrieved from https://eprints.qut.edu.au/42184/ Bazzan, T., Olojo, B., Majda, P., Kelly, T., Yilmaz, M., Marks, G., Clarke, P.M. (2024). Analysing the role of Generative AI in Software Engineering-Results from an MLR. European conference on software process improvement (p. 163–180). Beheshti, M., Toubal, I.E., Alaboud, K., Almalaysha, M., Ogundele, O.B., Turabieh, H., . . . Dahu, B.M. (2025, March). Evaluating the Reliability of ChatGPT for Health-Related Questions: A Systematic Review. Informatics, 12(1), 9, https://doi.org/10.3390/informatics12010009 Bellanda, V.C.F., Santos, M.L.d., Ferraz, D.A., Jorge, R., Melo, G.B. (2024, October). Applications of ChatGPT in the diagnosis, management, education, and research of retinal diseases: a scoping review. International Journal of Retina and Vitreous, 10(1), 79, https://doi.org/10.1186/s40942-024-00595-9 Benbasat, I., & Zmud, R.W. (2003, 06). The identity crisis within the IS discipline: Defining and communicating the discipline’s core properties. Management Information Systems Quarterly, 27(2), 183-194, https://doi.org/10.2307/30036527 Bendig, D., & Bräunche, A. (2024). The role of artificial intelligence algorithms in information systems research: a conceptual overview and avenues for research. Management Review Quarterly, 2863-–2908, https://doi.org/10.1007/s11301-024-00451-y 50 Berente, N., Gu, B., Recker, J., Santhanam, R. (2021, 09). Managing Artificial Intelligence. Management Information Systems Quarterly, 45(3), 1433-1450, https://doi.org/10.25300/MISQ/2021/16274 Boehm, B. (2006). A view of 20th and 21st century software engineering. Proceedings of the 28th international conference on software engineering (p. 12–29). New York, NY, USA: Association for Computing Machinery. Bommasani, R., et al. (2021). On the opportunities and risks of foundation models. arXiv:2108.07258, , https://doi.org/10.48550/arXiv.2108.07258 Bracken, A., Reilly, C., Feeley, A., Sheehan, E., Merghani, K., Feeley, I. (2025, February). Artificial Intelligence (AI) – Powered Documentation Systems in Healthcare: A Systematic Review. Journal of Medical Systems, 49(1), 28, https://doi.org/10.1007/s10916-025-02157-4 Brereton, P., Kitchenham, B., Budgen, D., Turner, M., Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80(4), 571-583, https://doi.org/https://doi.org/10.1016/j.jss.2006.07.009 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., . . . Amodei, D. (2020). Language models are few-shot learners – special version. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, p. 1877–1901). Budgen, D., Brereton, P., Drummond, S., Williams, N. (2018). Reporting systematic reviews: Some lessons from a tertiary study. Information and Software Technology, 95, 62–74, https://doi.org/10.1016/ j.infsof.2017.10.017 Butler, T., Gozman, D., Lyytinen, K. (2023). The regulation of and through information technology: Towards a conceptual ontology for is research. Journal of Information Technology, 38(2), 86-107, https://doi.org/10.1177/02683962231181147 Cavalcante, M., Varajão, J., Silva Rodrigues, L. (2025). Digital transformation initiatives: Motivations, objectives, and strategies. Telematics and Informatics Reports, 19, 100246, https://doi.org/https:// doi.org/10.1016/j.teler.2025.100246 Chau, M., & Xu, J. (2025). An IS Research Agenda on Large Language Models: Development, Applications, and Impacts on Business and Management. ACM Transactions on Management Information Systems, 16(1), 1:1–1:11, https://doi.org/10.1145/3713032 Chen, Y., Sun, W., Fang, C., Chen, Z., Ge, Y., Han, T., . . . Xu, B. (2025). Security of language models for code: A systematic literature review. ACM Transactions on Software Engineering and Methodology, , https://doi.org/10.1145/3735554 (Just Accepted) Cheng, H., Husen, J.H., Lu, Y., Racharak, T., Yoshioka, N., Ubayashi, N., Washizaki, H. (2025). Generative AI for requirements engineering: A systematic literature review. Software: Practice and Experience, 56(2), 141–170, https://doi.org/10.1002/spe.70029 Clear, T., Cajander, A., Clear, A., McDermott, R., Daniels, M., Divitini, M., . . . Zhu, T. (2025, January). AI Integration in the IT Professional Workplace: A Scoping Review and Interview Study with Implications for Education and Professional Competencies. 2024 Working Group Reports on Innovation and Technology in Computer Science Education (p. 34–67). New York, NY, USA: Association for Computing Machinery. 51 Cornide-Reyes, H., Monsalves, D., Durán, E., Silva-Aravena, F., Morales, J. (2025). Generative Artificial Intelligence in Agile Software Development Processes: A Literature Review Focused on User eXperience. A. Coman & S. Vasilache (Eds.), International conference on human-computer interaction (p. 228–246). Cham: Springer Nature Switzerland. Davis, F.D. (1989, 09). Perceived usefulness, perceived ease of use, and user acceptance of information technology. Management Information Systems Quarterly, 13(3), 319-340, https://doi.org/10.2307/ 249008 Devlin, J., Chang, M.W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1 (Long and Short Papers), p. 4171–4186). Dwivedi, Y.K., Hughes, L., Al-Ahmadi, M.S., Dutot, V., Ahmed, S.Q., Akter, S., . . . Walton, P. (2025). GenAI’s Impact on Global IT Management: A Multi-Expert Perspective and Research Agenda. Journal of Global Information Technology Management, 28(1), 49–63, https://doi.org/10.1080/ 1097198X.2025.2454192 Dwivedi, Y.K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., . . . Williams, M.D. (2021). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994, https://doi.org/https://doi.org/10.1016/j.ijinfomgt.2019.08.002 Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., . . . Wright, R. (2023, August). Opinion Paper: “So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, chal- lenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642, https://doi.org/10.1016/j.ijinfomgt.2023.102642 Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14(2), 179-211, https://doi.org/10.1207/ s15516709cog1402_1 Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., Zhang, J.M. (2023). Large Language Models for Software Engineering: Survey and Open Problems. 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) (p. 31–53). Feuerriegel, S., Hartmann, J., Janiesch, C., Zschech, P. (2024, January). Generative AI. Business & Informa- tion Systems Engineering, 66(1), 111–126, Retrieved from https://aisel.aisnet.org/bise/vol66/iss1/7 French, A.M., & Shim, J.P. (2025). From Artificial Intelligence to Augmented Intelligence: A Shift in Perspective, Application, and Conceptualization of AI. Information Systems Frontiers, 27(4), 1345–1366, https://doi.org/10.1007/s10796-024-10562-2 Fu, M., & Tantithamthavorn, C. (2022, may). LineVul: A Transformer-based Line-Level Vulnerability Prediction. Proceedings of the 19th International Conference on Mining Software Repositories (p. 608–620). New York, NY, USA: ACM. Ghebrehiwet, I., Zaki, N., Damseh, R., Mohamad, M.S. (2024, April). Revolutionizing personalized medicine with generative AI: a systematic review. Artificial Intelligence Review, 57(5), 128, https://doi.org/ 10.1007/s10462-024-10768-5 Goldberg, C., Balicer, R.D., Bhat, M., Blumenthal, D., Brendel, R.W., Brondolo, E., . . . Kohane, I. (2026). The missing dimension in clinical ai: Making hidden values visible. NEJM AI , 3(2), AIp2501266, https://doi.org/10.1056/AIp2501266 52 Gumusel, E. (2025). A literature review of user privacy concerns in conversational chatbots: A social informatics approach: An Annual Review of Information Science and Technology (ARIST) paper. Journal of the Association for Information Science and Technology, 76(1), 121–154, https://doi.org/ 10.1002/asi.24898 Haase, J., Kremser, W., Leopold, H., Mendling, J., Onnasch, L., Plattfaut, R. (2024, April). Interdisciplinary Directions for Researching the Effects of Robotic Process Automation and Large Language Models on Business Processes. Communications of the Association for Information Systems, 54(1), 579–604, https://doi.org/10.17705/1CAIS.05421 Haddaway, N.R., Page, M.J., Pritchard, C.C., McGuinness, L.A. (2022). PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Systematic Reviews, 18(2), e1230, https://doi.org/ 10.1002/cl2.1230 Harden, A., Thomas, J., Cargo, M., Harris, J., Pantoja, T., Flemming, K., . . . Noyes, J. (2018). Cochrane qualitative and implementation methods group guidance series—paper 5: Methods for integrating qualitative and implementation evidence within intervention effectiveness reviews. Journal of Clinical Epidemiology, 97, 70–78, https://doi.org/10.1016/j.jclinepi.2017.11.029 Hasanov, I., Virtanen, S., Hakkala, A., Isoaho, J. (2024). Application of large language models in cybersecurity: A systematic literature review. IEEE Access, 12, 176751-176778, https://doi.org/ 10.1109/ACCESS.2024.3505983 Hemmat, A., Sharbaf, M., Kolahdouz-Rahimi, S., Lano, K., Tehrani, S.Y. (2025). Research directions for using LLM in software requirement engineering: A systematic review. Frontiers in Computer Science, 7, 1519437, https://doi.org/10.3389/fcomp.2025.1519437 Hirschheim, R., & Klein, H.K. (1989). Four Paradigms of Information Systems Development. Communications of the ACM , 32(10), 1199–1216, https://doi.org/10.1145/67933.67937 Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735 Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., . . . Wang, H. (2024). Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 33(8), 1–79, https://doi.org/10.1145/3695988 Huang, Y., Arora, C., Huong, W.C., Kanij, T., Madugalla, A., Grundy, J. (2026). Ethical concerns of generative ai and mitigation strategies: A systematic mapping study. Applied Soft Computing, 193, 114789, https://doi.org/10.1016/j.asoc.2026.114789 Hughes, L., Malik, T., Dettmer, S., Al-Busaidi, A.S., Dwivedi, Y.K. (2025). Reimagining Higher Education: Navigating the Challenges of Generative AI Adoption. Information Systems Frontiers, , https:// doi.org/10.1007/s10796-025-10582-6 Iivari, N., Kinnula, M., Molin-Juustila, T., Kuure, L. (2018). Exclusions in social inclusion projects: Struggles in involving children in digital technology development. Information Systems Journal, 28(6), 1020-1048, https://doi.org/https://doi.org/10.1111/isj.12180 53 Jackson, V., Vasilescu, B., Russo, D., Ralph, P., Prikladnicki, R., Izadi, M., . . . van der Hoek, A. (2025, May). The impact of generative ai on creativity in software development: A research agenda. ACM Trans. Softw. Eng. Methodol., 34(5), , https://doi.org/10.1145/3708523 Jarvenpaa, S., & Klein, S. (2024, January). New Frontiers in Information Systems Theorizing: Human-gAI Collaboration. Journal of the Association for Information Systems, 25(1), 110–121, https://doi.org/ 10.17705/1jais.00868 Karlovs-Karlovskis, U. (2024). Generative artificial intelligence use in optimising software engineering process: A systematic literature review. Applied Computer Systems, 29(1), 68–77, https://doi.org/ 10.2478/acss-2024-0009 Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele University Technical Report TR/SE-040, , Kitchenham, B., Budgen, D., Brereton, P. (2016). Evidence-based software engineering and systematic reviews. CRC Press. Kitchenham, B., Charters, S., et al. (2007). Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE 2007-001, , Kitchenham, B., Madeyski, L., Budgen, D. (2023a). How Should Software Engineering Secondary Studies Include Grey Material? IEEE Transactions on Software Engineering, 49(2), 872-882, https://doi.org/ 10.1109/TSE.2022.3165938 Kitchenham, B., Madeyski, L., Budgen, D. (2023b). SEGRESS: Software Engineering Guidelines for REporting Secondary Studies. IEEE Transactions on Software Engineering, 49(3), 1273-1298, https:// doi.org/10.1109/TSE.2022.3174092 Krancher, O., Nagbøl, P., Müller, O. (2025). The danish business authority’s approach to the ongoing evaluation of ai systems. MIS Quarterly Executive, 24(2), 137–150, Laine, J., Minkkinen, M., Mäntymäki, M. (2025, March). Understanding the Ethics of Generative AI: Established and New Ethical Principles. Communications of the Association for Information Systems, 56(1), 1–25, https://doi.org/10.17705/1CAIS.05601 Lambiase, S., Catolino, G., Palomba, F., Ferrucci, F., Russo, D. (2025, December). Investigating the role of cultural values in adopting large language models for software engineering. ACM Trans. Softw. Eng. Methodol., 35(1), , https://doi.org/10.1145/3725529 Lareyre, F., Nasr, B., Chaudhuri, A., Di Lorenzo, G., Carlier, M., Raffort, J. (2023, January). Comprehensive Review of Natural Language Processing (NLP) in Vascular Surgery. EJVES Vascular Forum, 60, 57–63, https://doi.org/10.1016/j.ejvsvf.2023.09.002 Li, M., & Guenier, A.W. (2024). ChatGPT and Health Communication: A Systematic Literature Review. International Journal of E-Health and Medical Communications (IJEHMC), 15(1), 1–26, https:// doi.org/10.4018/IJEHMC.349980 54 Li, M.M., Reis, B.Y., Rodman, A., Cai, T., Dagan, N., Balicer, R.D., . . . Zitnik, M. (2026, Feb 01). Scaling medical ai across clinical contexts. Nature Medicine, 32(2), 439-448, https://doi.org/10.1038/ s41591-025-04184-7 Lyytinen, K., & Newman, M. (2008). Explaining information systems change: a punctuated socio-technical change model. European Journal of Information Systems, 17(6), 589-613, https://doi.org/10.1057/ ejis.2008.50 Madsen, D.O., & Toston I, D.M. (2025). ChatGPT and Digital Transformation: A Narrative Review of Its Role in Health, Education, and the Economy. Digital, 5(3), , https://doi.org/10.3390/digital5030024 Maita, I., Saide, S., Putri, A.M., Muwardi, D. (2024, June). Pros and Cons of Artificial Intelligence–ChatGPT Adoption in Education Settings: A Literature Review and Future Research Agendas. IEEE Engineering Management Review, 52(3), 27–42, https://doi.org/10.1109/EMR.2024.3394540 Majchrzak, A., Markus, M.L., Wareham, J. (2016, 06). Designing for Digital Transformation: Lessons for Information Systems Research from the Study of ICT and Societal Challenges. Management Information Systems Quarterly, 40(2), 267-277, https://doi.org/10.25300/MISQ/2016/40:2.03 Mambile, C., & Ishengoma, F. (2024, June). Exploring the non-linear trajectories of technology adoption in the digital age. Technological Sustainability, 3(4), 428–448, https://doi.org/10.1108/TECHS-11-2023-0050 (Publisher: Emerald Publishing Limited) Marabelli, M., Newell, S., Ahuja, M., Galliers, B. (2025). Are immersive platforms the future of work? the role of generative AI and DEI considerations. The Journal of Strategic Information Systems, 34(4), 101943, https://doi.org/10.1016/j.jsis.2025.101943 Markus, M.L. (1983, 6). Power, politics, and mis implementation. Communications of the ACM , 26(6), 430–444, https://doi.org/10.1145/358141.358148 Meng, X., Yan, X., Zhang, K., Liu, D., Cui, X., Yang, Y., . . . Tang, Y.-D. (2024, May). The application of large language models in medicine: A scoping review. iScience, 27(5), , https://doi.org/10.1016/ j.isci.2024.109713 (Publisher: Elsevier) Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, , Mira, J.M. (2008). Symbols versus connections: 50 years of artificial intelligence. Neurocomputing, 71(4), 671- 680, https://doi.org/https://doi.org/10.1016/j.neucom.2007.06.009 (Neural Networks: Algorithms and Applications 50 Years of Artificial Intelligence: a Neuronal Approach) Mohammad, B., Supti, T., Alzubaidi, M., Shah, H., Alam, T., Shah, Z., Househ, M. (2023). The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. ICIMTH (International Conference on Informatics, Management, and Technology in Healthcare) for the year 2023 (p. 644–647). IOS Press. Mumford, E. (1983). Designing human systems for new technology : the ethics method. Manchester: Manchester Business School. Nah, F., Cai, J., Zheng, R., Pang, N. (2023, September). An Activity System-based Perspective of Generative AI: Challenges and Research Directions. AIS Transactions on Human-Computer Interaction, 15(3), 55 247–267, https://doi.org/10.17705/1thci.00190 National Institute of Standards and Technology (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). Retrieved from https://doi.org/10.6028/NIST.AI.100-1 Neumann, M., Bischof, L., Hinz, N.E., Stockmann, L., Schrader, D., Ahaus, A.C., . . . Przybylek, A. (2026). Between policy and practice: Genai adoption in agile software development teams. Retrieved from https://arxiv.org/abs/2601.07051 Nguyen-Duc, A., Cabrero-Daniel, B., Przybylek, A., Arora, C., Khanna, D., Herda, T., . . . others (2025). Generative artificial intelligence for software engineering—a research agenda. Software: Practice and Experience, 55(11), 1806–1843, https://doi.org/10.1002/spe.70005 Onatayo, D., Onososen, A., Oyediran, A.O., Oyediran, H., Arowoiya, V., Onatayo, E. (2024). Generative AI Applications in Architecture, Engineering, and Construction: Trends, Implications for Practice, Education & Imperatives for Upskilling—A Review. Architecture, 4(4), 877–902, https://doi.org/ 10.3390/architecture4040046 Orlikowski, W.J., & Iacono, C.S. (2001). Research Commentary: Desperately Seeking the “IT” in IT Research—A Call to Theorizing the IT Artifact. Information Systems Research, 12(2), 121-134, https://doi.org/10.1287/isre.12.2.121.9700 Ouanes, K. (2024, August). Generative artificial intelligence in healthcare: current status and future directions. Italian Journal of Medicine, 18(3), , https://doi.org/10.4081/itjm.2024.1782 (Company: Italian Society of Internal Medicine (SIMI) Distributor: Italian Society of Internal Medicine (SIMI) Label: Italian Society of Internal Medicine (SIMI) Number: 3 Publisher: PAGEPress Publications) Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., . . . Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35(NeurIPS), 27730-27744, Pang, M.-S., Kankanhalli, A., Aanestad, M., Ram, S., Maruping, L.M. (2024, 12). Digital technologies and the advancement of social justice: A framework and agenda. Management Information Systems Quarterly, 48(4), 1591-1610, https://doi.org/10.25300/MISQ/2024/484E3 Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L. (2018). Deep contextualized word representations. Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies (p. 2227–2237). Philipp, R., Mladenow, A., Strauss, C., Völz, A. (2021). Machine Learning as a Service: Challenges in Research and Applications. Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications & Services (p. 396–406). New York, NY, USA: Association for Computing Machinery. Poličar, P.G., Špendl, M., Curk, T., Zupan, B. (2025, 07). Automated assignment grading with large language models: insights from a bioinformatics course. Bioinformatics, 41(Supplement 1), i21-i29, https://doi.org/10.1093/bioinformatics/btaf196 Pool, J., Indulska, M., Sadiq, S. (2024, September). Large Language Models and Generative AI in telehealth: a responsible use lens. Journal of the American Medical Informatics Association, 31(9), 2125–2136, https://doi.org/10.1093/jamia/ocae035 56 Qi, F., Hou, Y., Lin, N., Bao, S., Xu, N. (2024). A survey of testing techniques based on large language models. Proceedings of the 2024 international conference on computer and multimedia technology (p. 280–284). New York, NY, USA: Association for Computing Machinery. Roychoudhury, A. (2025, December). A Year in TOSEM: New Research Award, Agentic AI Special Issue and Much More. ACM Transactions on Software Engineering and Methodology, 35(1), 1-3, https://doi.org/10.1145/3785329 Russo, D. (2024, June). Navigating the complexity of generative ai adoption in software engineering. ACM Trans. Softw. Eng. Methodol., 33(5), , https://doi.org/10.1145/3652154 Russo, D., Baltes, S., Berkel, N.v., Avgeriou, P., Calefato, F., Cabrero-Daniel, B., . . . Vasilescu, B. (2024). Generative AI in Software Engineering Must Be Human-Centered: The Copenhagen Manifesto. Journal of Systems and Software, 216, 112115, https://doi.org/https://doi.org/10.1016/j.jss.2024.112115 Schneider, J. (2024, September). Explainable Generative AI (GenXAI): a survey, conceptualization, and research agenda. Artificial Intelligence Review, 57(11), 289, https://doi.org/10.1007/s10462-024 -10916-x Schöbel, S., Schmitt, A., Benner, D., Saqr, M., Janson, A., Leimeister, J.M. (2024). Charting the evolution and future of conversational agents: A research agenda along five waves and new frontiers. Information Systems Frontiers, 26(2), 729–754, https://doi.org/10.1007/s10796-023-10375-9 Seymour, M.W., Ruster, L.P., Riemer, K., Peter, S., Kautz, K. (2025). The Challenges of Researching AI in IS: AI and the Boundaries for IS Research and AI in Information Systems: Ethical and Methodological Challenges-Reports and Contextualization of Two ACIS Panels. Communications of the Association for Information Systems, 57, 1375–1395, https://doi.org/10.17705/1CAIS.05757 Sharp, H., & Robinson, H. (2005). Some social factors of software engineering: the maverick, community and technical practices. Proceedings of the 2005 Workshop on Human and Social Factors of Software Engineering (p. 1–6). New York, NY, USA: Association for Computing Machinery. Sigala, M., Ooi, K.-B., Tan, G.W.-H., Aw, E.C.-X., Cham, T.-H., Dwivedi, Y.K., . . . Wirtz, J. (2024, July). ChatGPT and service: opportunities, challenges, and research directions. Journal of Service Theory and Practice, 34(5), 726–737, https://doi.org/10.1108/JSTP-11-2023-0292 Smolensky, P. (1987). Connectionist AI, symbolic AI, and the brain. Artificial Intelligence Review, 1(2), 95–109, https://doi.org/10.1007/BF00130011 Srivastava, A., Marabelli, M., Blanch-Hartigan, D., Moriarty, J., Carey, E., Persky, S., Torous, J. (2025, March). The Present and Future of AI: Ethical Issues and Research Opportunities. Communications of the Association for Information Systems, 56(1), 255–273, https://doi.org/10.17705/1CAIS.05611 Storey, V.C., Yue, W.T., Zhao, J.L., Lukyanenko, R. (2025, February). Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research. Information Systems Frontiers, 27, 2081-–2102, https://doi.org/10.1007/s10796-025-10581-7 Sun, J., Zheng, C., Xie, E., Liu, Z., Chu, R., Qiu, J., . . . Li, Z. (2025, June). A survey of reasoning with foundation models: Concepts, methodologies, and outlook. ACM Computing Surveys, 57(11), 1–43, https://doi.org/10.1145/3729218 57 Triando, Simaremare, M., Wang, X., Prasad, A.S.R. (2025). The use of generative ai tools in the inception stage of software startups. E. Papatheocharous, S. Farshidi, S. Jansen, & S. Hyrynsalmi (Eds.), Software business (p. 439–453). Cham: Springer Nature Switzerland. United Nations (2008). International standard industrial classification of all economic activities (ISIC) (Revision 4 ed.). New York, NY: United Nations. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., . . . Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems (Vol. 30). Vial, G. (2021). Understanding digital transformation: A review and a research agenda. A. Hinterhuber, T. Vescovi, & F. Checchinato (Eds.), Managing digital transformation: Understanding the strategic process (p. 13-66). London: Routledge. Walsham, G. (2012). Are We Making a Better World with ICTs? Reflections on a Future Agenda for the IS Field. Journal of Information Technology, 27(2), 87-93, https://doi.org/10.1057/jit.2012.4 Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering, 50(4), 911–936, https://doi.org/10.1109/TSE.2024.3368208 Wang, X., Attal, M.I., Rafiq, U., Hubner-Benz, S. (2024). Turning large language models into ai assistants for startups using prompt patterns. P. Kruchten & P. Gregory (Eds.), Agile processes in software engineering and extreme programming – workshops (p. 192–200). Cham: Springer Nature Switzerland. Wass, S., Thygesen, E., Purao, S. (2023). Principles to facilitate social inclusion for design-oriented research. Journal of the Association for Information Systems, 24(5), 1204-1247, https://doi.org/10.17705/ 1jais.00814 Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., . . . Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in neural information processing systems (Vol. 35, p. 24824–24837). Wei, X., Kumar, N., Zhang, H. (2025, March). Addressing bias in generative AI: Challenges and research opportunities in information management. Information & Management, 62(2), 104103, https:// doi.org/10.1016/j.im.2025.104103 Wessel, M., Adam, M., Benlian, A., Majchrzak, A., Thies, F. (2025). Generative AI and its Transformative Value for Digital Platforms. Journal of Management Information Systems, 42(2), 346–369, https:// doi.org/10.1080/07421222.2025.2487315 White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., . . . Schmidt, D.C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Proceedings of the 30th Conference on Pattern Languages of Programs. USA: The Hillside Group. Wolfswinkel, J.F., Furtmueller, E., Wilderom, C.P.M. (2013, 1 01). Using grounded theory as a method for rigorously reviewing literature. European Journal of Information Systems, 22(1), 45-55, https:// doi.org/10.1057/ejis.2011.51 Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., . . . Wang, H. (2024). Large language models for cyber security: A systematic literature review. ACM Transactions on Software Engineering and Methodology, , https://doi.org/10.1145/3769676 (Just Accepted) Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., . . . Hu, X. (2024, jul). Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. ACM Transactions on Knowledge Discovery from Data, 18(6), 1–32, https://doi.org/10.1145/3649506 2304.13712 58 Yang, Y., Xia, X., Lo, D., Grundy, J. (2022, September). A survey on deep learning for software engineering. ACM Computing Surveys, 54(10s), , https://doi.org/10.1145/3505243 Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y. (2024). A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4(2), 100211, https://doi.org/10.1016/j.hcc.2024.100211 Yazdani, S., Singh, A., Saxena, N., Wang, Z., Palikhe, A., Pan, D., . . . Zhang, W. (2025, October). Generative AI in depth: A survey of recent advances, model variants, and real-world applications. Journal of Big Data, 12(1), 230, https://doi.org/10.1186/s40537-025-01247-x Yoo, Y. (2010, 06). Computing in everyday life: A call for research on experiential computing. Management Information Systems Quarterly, 34(2), 213-231, https://doi.org/10.2307/20721425 Zamani, E., & Rousaki, A. (2026). Risk, artificial intelligence, and the governance of migration: A critical discourse analysis of the eu ai act. Information Technology for Development, 32, , Retrieved from https://durham-repository.worktribe.com/output/4658503 Zheng, Z., Ning, K., Zhong, Q., Chen, J., Chen, W., Guo, L., . . . Wang, Y. (2025). Towards an understanding of large language models in software engineering tasks. Empirical Software Engineering, 30(2), 50, https://doi.org/10.1007/s10664-024-10602-0 59