In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.[3][4] Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control.Foundation models are also being developed for fields like astronomy,[8] radiology,[9] genomics,[10] music,[11] coding,[12] times-series forecasting,[13] mathematics,[14] and chemistry.Advances in computer parallelism (e.g., CUDA GPUs) and new developments in neural network architecture (e.g., Transformers), and the increased use of training data with minimal supervision all contributed to the rise of foundation models.[23] Relative to most prior work on deep learning, these language models demonstrated the potential of training on much larger web-sourced datasets using self-supervised objectives (e.g. predicting the next word in a large corpus of text)."[26] These "dangerous capabilities" stem from the accidental or intentional misuse of such models, which in conjunction with their powerful nature can lead to severe harms.General-purpose AI systems are often characterized by large size, opacity, and potential for emergence, all of which can create unintended harms.[37] Training foundation models often runs the risk of violating user privacy, as private data can be disclosed, collected, or used in ways beyond the stated scope.The average foundation model is too large to be run within a single accelerator's memory and the initial training process requires an expensive amount of resources.GPUs are the most common choice of compute hardware for machine learning, due to high memory storage and strong power.Acquiring a sufficient amount of GPUs of requisite compute efficiency is a challenge for many foundation model developers, one that has led to an increasing dilemma in the field.Particularly, a model's scale is defined by compute, dataset size, and the number of parameters, all of which exhibit a power-law relationship with end performance.A variety of methods (e.g. prompting, in-context learning, fine-tuning, LoRA) provide different tradeoffs between the costs of adaptation and the extent to which models are specialized.[51] Since foundation models' utility depends on their own general capabilities and the performance of fine-tuned applications, evaluation must cover both metrics.[52] Foundation models' general capabilities allow them to fulfill a unique role in the AI ecosystem,[53] fueled by many upstream and downstream technologies.Scale AI,[55] Surge[56]) and compute providers (e.g. Amazon Web Services, Google Cloud, Microsoft Azure).As the size and scope of foundation models grows, larger quantities of internet scraping becomes necessary, resulting in higher likelihoods of biased or toxic data.[59] To address this issue of low-quality data that arose with unsupervised training, some foundation model developers have turned to manual filtering.People can then access these applications to serve their various means, allowing one foundation model to power and reach a wide audience.