Foundation model

In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.[3][4] Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control.Foundation models are also being developed for fields like astronomy,[8] radiology,[9] genomics,[10] music,[11] coding,[12] times-series forecasting,[13] mathematics,[14] and chemistry.Advances in computer parallelism (e.g., CUDA GPUs) and new developments in neural network architecture (e.g., Transformers), and the increased use of training data with minimal supervision all contributed to the rise of foundation models.[23] Relative to most prior work on deep learning, these language models demonstrated the potential of training on much larger web-sourced datasets using self-supervised objectives (e.g. predicting the next word in a large corpus of text)."[26] These "dangerous capabilities" stem from the accidental or intentional misuse of such models, which in conjunction with their powerful nature can lead to severe harms.General-purpose AI systems are often characterized by large size, opacity, and potential for emergence, all of which can create unintended harms.[37] Training foundation models often runs the risk of violating user privacy, as private data can be disclosed, collected, or used in ways beyond the stated scope.The average foundation model is too large to be run within a single accelerator's memory and the initial training process requires an expensive amount of resources.GPUs are the most common choice of compute hardware for machine learning, due to high memory storage and strong power.Acquiring a sufficient amount of GPUs of requisite compute efficiency is a challenge for many foundation model developers, one that has led to an increasing dilemma in the field.Particularly, a model's scale is defined by compute, dataset size, and the number of parameters, all of which exhibit a power-law relationship with end performance.A variety of methods (e.g. prompting, in-context learning, fine-tuning, LoRA) provide different tradeoffs between the costs of adaptation and the extent to which models are specialized.[51] Since foundation models' utility depends on their own general capabilities and the performance of fine-tuned applications, evaluation must cover both metrics.[52] Foundation models' general capabilities allow them to fulfill a unique role in the AI ecosystem,[53] fueled by many upstream and downstream technologies.Scale AI,[55] Surge[56]) and compute providers (e.g. Amazon Web Services, Google Cloud, Microsoft Azure).As the size and scope of foundation models grows, larger quantities of internet scraping becomes necessary, resulting in higher likelihoods of biased or toxic data.[59] To address this issue of low-quality data that arose with unsupervised training, some foundation model developers have turned to manual filtering.People can then access these applications to serve their various means, allowing one foundation model to power and reach a wide audience.

machine learningdeep learningGenerative AILarge Language Modelslanguage modelsOpenAI's GPTGoogleDALL-Etimes-series(large) language modelExecutive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligenceself-supervisionDon BeyerAnna EshooEuropean ParliamentE.U. AI ActCompetition and Markets Authoritydeep neural networkstransfer learningself-supervised learningCUDA GPUsTransformersword2vecStable DiffusionChatGPTMistralEU AI ActTransformerbroken scaling lawsbreak(s)promptingin-context learningfine-tuningAmazon Web ServicesGoogle CloudMicrosoft AzureOpen Source InitiativePaLM 2Llama 2GraniteGoogle DeepMindOpenAIblack boxbioRxivBibcodeLeung, JadeSolaiman, IreneNatural language processingAI-completeBag-of-wordsn-gramBigramTrigramComputational linguisticsNatural language understandingStop wordsText processingText analysisArgument miningCollocation extractionConcept miningDeep linguistic processingDistant readingInformation extractionNamed-entity recognitionOntology learningParsingSemantic parsingSyntactic parsingPart-of-speech taggingSemantic analysisSemantic role labelingSemantic decompositionSemantic similaritySentiment analysisTerminology extractionText miningTextual entailmentTruecasingWord-sense disambiguationWord-sense inductionText segmentationCompound-term processingLemmatisationLexical analysisText chunkingStemmingSentence segmentationAutomatic summarizationMulti-document summarizationSentence extractionText simplificationMachine translationComputer-assistedExample-basedRule-basedStatisticalTransfer-basedNeuralDistributional semanticsDocument-term matrixExplicit semantic analysisfastTextLanguage modelLatent semantic analysisSeq2seqWord embeddingLanguage resourcesCorpus linguisticsLexical resourceLinguistic Linked Open DataMachine-readable dictionaryParallel textPropBankSemantic networkSimple Knowledge Organization SystemSpeech corpusText corpusThesaurus (information retrieval)TreebankUniversal DependenciesBabelNetBank of EnglishDBpediaFrameNetGoogle Ngram ViewerWordNetWikidataAutomatic identificationand data captureSpeech recognitionSpeech segmentationSpeech synthesisNatural language generationOptical character recognitionTopic modelDocument classificationLatent Dirichlet allocationPachinko allocationComputer-assistedreviewingAutomated essay scoringConcordancerGrammar checkerPredictive textPronunciation assessmentSpell checkerNatural languageuser interfaceChatbotInteractive fictionSyntax guessingQuestion answeringVirtual assistantVoice user interfaceFormal semanticsHallucinationNatural Language ToolkitGeneralDifferentiable programmingInformation geometryStatistical manifoldAutomatic differentiationNeuromorphic computingPattern recognitionRicci calculusComputational learning theoryInductive biasMemristorSpiNNakerTensorFlowPyTorchscikit-learnTheanoFlux.jlMindSporeExistential riskartificial intelligenceAI alignmentAI capability controlAI safetyAI takeoverConsequentialismEffective accelerationismEthics of artificial intelligenceExistential risk from artificial intelligenceFriendly artificial intelligenceInstrumental convergenceIntelligence explosionLongtermismMachine ethicsSuffering risksSuperintelligenceTechnological singularityAlignment Research CenterCenter for AI SafetyCenter for Applied RationalityCenter for Human-Compatible Artificial IntelligenceCentre for the Study of Existential RiskEleutherAIFuture of Humanity InstituteFuture of Life Institute