To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
To assess the feasibility of using large language models (LLMs) to develop research questions about changes to the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) food packages.
Design:
We conducted a controlled experiment using ChatGPT-4 and its plugin, MixerBox Scholarly, to generate research questions based on a section of the USDA summary of the final public comments on the WIC revision. Five questions weekly for three weeks were generated using LLMs under two conditions: fed with or without relevant literature. The experiment generated 90 questions, which were evaluated using the FINER criteria (Feasibility, Innovation, Novelty, Ethics, and Relevance). T-tests and multivariate regression examined the difference by feeding status, AI model, evaluator, and criterion.
Setting:
The United States.
Participants
Six WIC expert evaluators from academia, government, industry, and non-profit sectors.
Results:
Five themes were identified: administrative barriers, nutrition outcomes, participant preferences, economics, and other topics. Feeding and non-feeding groups had no significant differences (Coeff. = 0.03, P = 0.52). MixerBox-generated questions received significantly lower scores than ChatGPT (Coeff. = –0.11, P = 0.02). Ethics scores were significantly higher than feasibility scores (Coeff. = 0.65, P < 0.001). Significant differences were found between the evaluators (P < 0.001).
Conclusions:
The LLM applications can assist in developing research questions with acceptable qualities related to the WIC food package revisions. Future research is needed to compare the development of research questions between LLMs and human researchers.
Housing affordability is one of the main aspects required for sustainable development and society. However, the timely delivery of new homes is often constrained by the need to upgrade and expand essential infrastructure such as water and electricity networks. For water utilities, responses to growth typically involve intensive hydraulic analysis to assess water distribution systems (WDS) capacity, identify upgrade needs and evaluate options for system extensions. This process becomes significantly complex and resource-intensive under high growth conditions, where a higher volume of faster answers is required to address a wide range of uncertain future scenarios. This paper presents a concept of using generative artificial intelligence (Gen AI) integrating with hydraulic models to form an AI Agent to support WDS design. Specific features of Gen AI used within the hydraulic agent are discussed. A real-life case study demonstrated that the AI agent can analyse land development requests, trigger hydraulic simulations and identify augmentation needed, significantly reducing manual tasks. This offers a breakthrough strategy for water distribution system design and planning to enable sustainable water infrastructure development.
Informed consent is a cornerstone of ethical research, but the lack of widely accepted standards for the key information (KI) section in informed consent documents (ICDs) creates challenges in institutional review board (IRB) reviews and participant comprehension. This study explored the use of GPT-4o, a large language model (collectively, AI), to generate standardized KI sections.
Methods:
An AI tool was developed to interpret and generate KI content from ICDs. The evaluation involved a multi-phased process where IRB subject matter experts, principal investigators (PIs), and IRB reviewers assessed the AI output for accuracy, differentiation between standard care and research, appropriate information prioritization, and structural coherence.
Results:
Iterative refinements improved the AI’s accuracy and clarity, with initial assessments highlighting factual errors that decreased over time. Many PIs found the AI-generated sections comparable to their own and expressed a high likelihood of using the tool for future drafts. Blinded evaluations by IRB reviewers highlighted the AI tool’s strengths in describing study benefits and maintaining readability. However, the findings underscore the need for further improvements, particularly in ensuring accurate risk descriptions, to enhance regulatory compliance and IRB reviewer confidence.
Conclusions:
The AI tool shows promise in enhancing the consistency and efficiency of KI section drafting in ICDs. However, it requires ongoing refinement and human oversight to fully comply with regulatory and institutional standards. Collaboration between AI and human experts is essential to maximize benefits while maintaining high ethical and accuracy standards in informed consent processes.
Artificial intelligence (AI) has achieved human-level performance in specialised tasks such as Go, image recognition and protein folding, raising the prospect of an AI singularity – where machines not only match, but surpass human reasoning. Here, we demonstrate a step towards this vision in the context of turbulence modelling. By treating a large language model (LLM), DeepSeek-R1, as an equal partner, we establish a closed-loop, iterative workflow in which the LLM proposes, refines and reasons about near-wall turbulence models under adverse pressure gradients (APGs), system rotation and surface roughness. Through multiple rounds of interaction involving long-chain reasoning and a priori and a posteriori evaluations, the LLM generates models that not only rediscover established strategies, but also synthesise new ones that outperform baseline wall models. Specifically, it recommends incorporating a material derivative to capture history effects in APG flows, modifying the law of the wall to account for system rotation and developing rough-wall models informed by surface statistics. In contrast to conventional data-driven turbulence modelling – often characterised by human-designed, black-box architectures – the models developed here are physically interpretable and grounded in clear reasoning.
Meta-research and evidence synthesis require considerable resources. Large language models (LLMs) have emerged as promising tools to assist in these processes, yet their performance varies across models, limiting their reliability. Taking advantage of the large availability of small size (<10 billion parameters) open-source LLMs, we implemented an agreement-based framework in which a decision is taken only if at least a given number of LLMs produce the same response. The decision is otherwise withheld. This approach was tested on 1020 abstracts of randomized controlled trials in rheumatology, using 2 classic literature review tasks: (1) classifying each intervention as drug or nondrug based on text interpretation and (2) extracting the total number of randomized patients, a task that sometimes required calculations. Re-examining abstracts where at least 4 LLMs disagreed with the human gold standard (dual review with adjudication) allowed constructing an improved gold standard. Compared to a human gold standard and single large LLMs (>70 billion parameters), our framework demonstrated robust performance: several model combinations achieved accuracies above 95% exceeding the human gold standard on at least 85% of abstracts (e.g., 3 of 5 models, 4 of 6 models, or 5 of 7 models). Performance variability across individual models was not an issue, as low-performing models contributed fewer accepted decisions. This agreement-based framework offers a scalable solution that can replace human reviewers for most abstracts, reserving human expertise for more complex cases. Such frameworks could significantly reduce the manual burden in systematic reviews while maintaining high accuracy and reproducibility.
The emergence of large language models (LLMs) provides an opportunity for AI to operate as a co-ideation partner during the creative processes. However, designers currently lack a comprehensive methodology for engaging in co-ideation with LLMs, and there is a limited framework that describes the process of co-ideation between a designer and ChatGPT. This research thus aimed to explore how LLMs can act as codesigners and influence creative ideation processes of industrial designers and whether the ideation performance of a designer could be improved by employing the proposed framework for co-ideation with custom GPT. A survey was first conducted to detect how LLMs influenced the creative ideation processes of industrial designers and to understand the problems that designers face when using ChatGPT to ideate. Then, a framework which based on mapping content to guide the co-ideation between humans and custom GPT (named as Co-Ideator) was promoted. Finally, a design case study followed by a survey and an interview was conducted to evaluate the ideation performance of the custom GPT and framework compared with traditional ideation methods. Also, the effect of custom GPT on co-ideation was compared with a non-artificial intelligence (AI)-used condition. The findings indicated that if users employed co-ideation with custom GPT, the novelty and quality of ideation outperformed by using traditional ideation.
Word processing during reading is known to be influenced by lexical features, especially word length, frequency, and predictability. This study examined the relative importance of these features in word processing during second language (L2) English reading. We used data from an eye-tracking corpus and applied a machine-learning approach to model word-level eye-tracking measures and identify key predictors. Predictors comprised several lexical features, including length, frequency, and predictability (e.g., surprisal). Additionally, sentence, passage, and reader characteristics were considered for comparison. The analysis found that word length was the most important variable across several eye-tracking measures. However, for certain measures, word frequency and predictability were more important than length, and in some cases, reader characteristics such as proficiency were more significant than lexical features. These findings highlight the complexity of word processing during reading, the shared processes between first language (L1) and L2 reading, and their potential to refine models of eye-movement control.
Large Language Models (LLMs) have advanced the extraction and generation of engineering design (ED) knowledge from textual data. However, assessing their accuracy in ED tasks remains challenging due to the lack of benchmark datasets specifically designed for ED applications. To address this, the study examines how theoretical concepts from Axiomatic Design Theory—such as Functional Requirements, Design Parameters, and their relationship—are expressed in natural language and develops a systematic approach for annotating ED concepts in text. It introduces a novel dataset of 6,000 patent sentences, annotated by domain experts. Annotation performance is assessed using inter-annotator agreement metrics, providing insights into the challenges of identifying ED concepts in text. The findings aim to support designers in better integrating design theories within LLMs for extracting ED knowledge.
Recent advancements in machine learning (ML) offer substantial potential for enhancing product development. However, adoption in companies remains limited due to challenges in framing domain-specific problems as ML tasks and selecting suitable ML algorithms, requiring expertise often lacking. This study investigates the use of large language models (LLMs) as recommender systems for facilitating ML implementation. Using a dataset derived from peer-reviewed publications, the LLMs were evaluated for their ability to recommend ML algorithms for product development-related problems. The results indicate moderate success, with GPT-4o achieving the highest accuracy by recommending suitable ML algorithms in 61% of cases. Key limitations include inaccurate recommendations and challenges in identifying multiple sub-problems. Future research will explore prompt engineering to improve performance.
A design catalog is a repository of design problems and their solutions, enabling designers to explore and discover applicable solutions for their specific design challenges. Creating such catalogs has depended on human knowledge and implicit judgment, with no systematic approach established. This study aims to develop a systematic method to create a design catalog from patent documents. We utilize a large language model (LLM) to extract problem-solution pairs described in the documents, presenting them as general purpose-means pairs. Subsequently, we create a design catalog by classifying the problems using similarity-based clustering, enhanced by the LLM’s semantic text similarity capabilities. We demonstrate a case study of creating a design catalog for martial arts devices and generating new design concepts based on the catalog to verify the effectiveness of the proposed method.
The chapter examines the legal regulation and governance of ‘generative AI,’ ‘foundation AI,’ ‘large language models’ (LLMs), and the ‘general-purpose’ AI models of the AI Act. Attention is drawn to two potential sorcerer’s apprentices, namely, in the spirit of J. W. Goethe’s poem, people who were unable to control a situation they created. Focus is on developers and producers of such technologies, such as LLMs that bring about risks of discrimination and information hazards, malicious uses and environmental harms; furthermore, the analysis dwells on the normative attempt of EU legislators to govern misuses and overuses of LLMs with the AI Act. Scholars, private companies, and organisations have stressed limits of such normative attempts. In addition to issues of competitiveness and legal certainty, bureaucratic burdens and standard development, the threat is the over-frequent revision of the law to tackle advancements of technology. The chapter illustrates this threat since the inception of the AI Act and recommends some ways in which the law has not to be continuously amended to address the challenges of technological innovation.
The utilization of creative design methodologies plays a pivotal role in nurturing innovation within the contemporary competitive market landscape. Although Theory of Inventive Problem Solving (TRIZ) has been recognized as a potent methodology for engendering innovative concepts, its intricate nature and time-consuming learning and application processes pose significant challenges. Furthermore, TRIZ has faced criticism for its limitations in processing design problems and facilitating designers in knowledge acquisition. Conversely, Environment-Based Design (EBD), a question-driven design methodology, provides robust methods and approaches for formulating design problems and identifying design conflicts. Large Language Models (LLMs) have also demonstrated the ability to streamline the design process and enhance design productivity. This study aims to propose an iteration of TRIZ integrated by EBD and supported by an LLM. This LLM-based conceptual design model assists designers through the conceptual design process. It begins by using question-asking and answering methods from EBD to gather relevant information. It then follows the EBD methodology to formulate the information into an interaction-dependence network, leading to the identification of functions and conflicts required by TRIZ. Lastly, TRIZ is used to generate inventive solutions. An evaluation is carried out to measure the effectiveness of the integrated approach. The results indicate that this approach successfully generates questions, processes designers’ responses, produces functional analysis elements, and generates ideas to resolve contradictions.
This article presents a novel conversational artificial intelligence (CAI)-enabled active ideation system as a creative idea generation tool to assist novice product designers in mitigating the initial latency and ideation bottlenecks that are commonly observed. It is a dynamic, interactive, and contextually responsive approach, actively involving a large language model (LLM) from the domain of natural language processing (NLP) in artificial intelligence (AI) to produce multiple statements of potential ideas for different design problems. Integrating such AI models with ideation creates what we refer to as an active ideation scenario, which helps foster continuous dialog-based interaction, context-sensitive conversation, and prolific idea generation. An empirical study was conducted with 30 novice product designers to generate multiple ideas for given problems using traditional methods and the new CAI-based interface. The ideas generated by both methods were qualitatively evaluated by a panel of experts. The findings demonstrated the relative superiority of the proposed tool for generating prolific, meaningful, novel, and diverse ideas. The interface was enhanced by incorporating a prompt-engineered structured dialog style for each ideation stage to make it uniform and more convenient for the product designers. A pilot study was conducted and the resulting responses of such a structured CAI interface were found to be more succinct and aligned toward the subsequent design stage. The article thus established the rich potential of using generative AI (Gen-AI) for the early ill-structured phase of the creative product design process.
Recent studies utilizing AI-driven speech-based Alzheimer’s disease (AD) detection have achieved remarkable success in detecting AD dementia through the analysis of audio and text data. However, detecting AD at an early stage of mild cognitive impairment (MCI), remains a challenging task, due to the lack of sufficient training data and imbalanced diagnostic labels. Motivated by recent advanced developments in Generative AI (GAI) and Large Language Models (LLMs), we propose an LLM-based data generation framework, leveraging prior knowledge encoded in LLMs to generate new data samples. Our novel LLM generation framework introduces two novel data generation strategies, namely, the cross-lingual and the counterfactual data generation, facilitating out-of-distribution learning over new data samples to reduce biases in MCI label prediction due to the systematic underrepresentation of MCI subjects in the AD speech dataset. The results have demonstrated that our proposed framework significantly improves MCI Detection Sensitivity and F1-score on average by a maximum of 38% and 31%, respectively. Furthermore, key speech markers in predicting MCI before and after LLM-based data generation have been identified to enhance our understanding of how the novel data generation approach contributes to the reduction of MCI label prediction biases, shedding new light on speech-based MCI detection under low data resource constraint. Our proposed methodology offers a generalized data generation framework for improving downstream prediction tasks in cases where limited and/or imbalanced data have presented significant challenges to AI-driven health decision-making. Future study can focus on incorporating more datasets and exploiting more acoustic features for speech-based MCI detection.
Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied.
Methods
We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by $F_{\beta }$, where we favor recall over precision, and F1.
Results
We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall $F_{\beta }$ and F1 in normalization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9 (QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and +21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6 and +18.7 (BM25).
Conclusions
Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly improve normalization performance using existing tools, with no fine-tuning.
This study compares the design practices and performance of ChatGPT 4.0, a large language model (LLM), against graduate engineering students in a 48-h prototyping hackathon, based on a dataset comprising more than 100 prototypes. The LLM participated by instructing two participants who executed its instructions and provided objective feedback, generated ideas autonomously and made all design decisions without human intervention. The LLM exhibited similar prototyping practices to human participants and finished second among six teams, successfully designing and providing building instructions for functional prototypes. The LLM’s concept generation capabilities were particularly strong. However, the LLM prematurely abandoned promising concepts when facing minor difficulties, added unnecessary complexity to designs, and experienced design fixation. Communication between the LLM and participants was challenging due to vague or unclear descriptions, and the LLM had difficulty maintaining continuity and relevance in answers. Based on these findings, six recommendations for implementing an LLM like ChatGPT in the design process are proposed, including leveraging it for ideation, ensuring human oversight for key decisions, implementing iterative feedback loops, prompting it to consider alternatives, and assigning specific and manageable tasks at a subsystem level.
The development of large language models (LLMs), such as GPT, has enabled the construction of several socialbots, like ChatGPT, that are receiving a lot of attention for their ability to simulate a human conversation. However, the conversation is not guided by a goal and is hard to control. In addition, because LLMs rely more on pattern recognition than deductive reasoning, they can give confusing answers and have difficulty integrating multiple topics into a cohesive response. These limitations often lead the LLM to deviate from the main topic to keep the conversation interesting. We propose AutoCompanion, a socialbot that uses an LLM model to translate natural language into predicates (and vice versa) and employs commonsense reasoning based on answer set programming (ASP) to hold a social conversation with a human. In particular, we rely on s(CASP), a goal-directed implementation of ASP as the backend. This paper presents the framework design and how an LLM is used to parse user messages and generate a response from the s(CASP) engine output. To validate our proposal, we describe (real) conversations in which the chatbot’s goal is to keep the user entertained by talking about movies and books, and s(CASP) ensures (i) correctness of answers, (ii) coherence (and precision) during the conversation—which it dynamically regulates to achieve its specific purpose—and (iii) no deviation from the main topic.
Recent advances in machine learning have enabled computers to converse with humans meaningfully. In this study, we propose using this technology to facilitate design conversations in large-scale urban development projects by creating chatbot systems that can automate and streamline information exchange between stakeholders and designers. To this end, we developed and evaluated a proof-of-concept chatbot system that can perform design conversations on a specific construction project and convert those conversations into a list of requirements. Next, in an experiment with 56 participants, we compared the chatbot system to a regular online survey, focusing on user satisfaction and the quality and quantity of collected information. The results revealed that, with regard to user satisfaction, the participants preferred the chatbot experience to a regular survey. However, we found that chatbot conversations produced more data than the survey, with a similar rate of novel ideas but fewer themes. Our findings provide robust evidence that chatbots can be effectively used for design discussions in large-scale design projects and offer a user-friendly experience that can help to engage people in the design process. Based on this evidence, by providing a space for meaningful conversations between stakeholders and expanding the reach of design projects, the use of chatbot systems in interactive design systems can potentially improve design processes and their outcomes.
AI can assist the linguist in doing research on the structure of language. This Element illustrates this possibility by showing how a conversational AI based on a Large Language Model (AI LLM chatbot) can assist the Construction Grammarian, and especially the Frame Semanticist. An AI LLM chatbot is a text-generation system trained on vast amounts of text. To generate text, it must be able to find patterns in the data and mimic some linguistic capacity, at least in the eyes of a cooperative human user. The authors do not focus on whether AIs “understand” language. Rather, they investigate whether AI LLM chatbots are useful tools for linguists. They reframe the discussion from what AI LLM chatbots can do with language to what they can do for linguists. They find that a chatty LLM can labor usefully as an eliciting interlocutor, and present precise, scripted routines for prompting conversational LLMs.