What are AI Foundation Models

Sep 21, 2025#AI #ML #Models

A foundation model is a very large-scale AI model (often a deep neural network) pretrained on broad datasets with self-supervised or unsupervised objectives, enabling it to be adapted cheaply to a wide range of tasks, serving as a base infrastructure for many applications.

Core characteristics of foundation models include:

Feature	Description
Scale	Typically have billions (or more) of parameters and are trained on vast corpora (text, code, images, etc.)
General-Purpose / Transferable	Unlike narrow AI models, they learn broad patterns and can be fine-tuned or prompted for many downstream tasks (chat, translation, image recognition, robotics, etc.)
Self-Supervised Training	Training is usually unsupervised or self-supervised, e.g. predicting masked/next tokens in text, or using contrastive losses for images
Emergent Capabilities	When scaled up, they often display surprising emergent abilities (e.g. in-context learning, few-shot reasoning) that were not explicitly programmed
Multimodality	Many recent models handle multiple input/output modalities (text, images, audio, etc.)

Most foundation models are implemented as deep neural networks with Transformer architectures. The Transformer (self-attention) has become the de facto backbone for language and vision models because it can process large, high-dimensional inputs and scale well.

The standard adaption method is fine-tuning on smaller, task-specific datasets. More recently, lightweight adaptation (prompts, adapters, LoRA) and in-context learning (prompt engineering) have enabled even lower-cost specialization.

Let’s look at some major foundation models:

GPT (OpenAI): The Generative Pretrained Transformer series (GPT-2, GPT-3, GPT-4) are text-based LLMs trained on vast corpora. GPT-3 (2020) has 175 B parameters and can perform tasks via prompting . GPT-4 (2023) expands capabilities (multimodal text+image input) and underlies ChatGPT. These are prototypical FMs for NLP.

BERT (Google): Bidirectional Encoder Representations from Transformers (2018) pioneered deep bidirectional pretraining (masked LM). Although smaller (~0.34B parameters), it introduced pretraining/fine-tuning and transfer learning. Many derivatives (RoBERTa, DeBERTa) followed.

PaLM (Google): Pathways Language Model is a Google LLM family. The original PaLM (2022) has 540 B parameters. PaLM 2 (2023) is a 340 B model with improved multilingual and reasoning skills. Extensions include PaLM-E (a vision-language version for robotics) and AudioPaLM (speech). These demonstrate multi-modal expansion.

DALL·E (OpenAI): A Transformer/diffusion-based image generator pretrained on text-image pairs. It generates novel images from text prompts. Stable Diffusion (Stability AI) is a similar open image model. Vision-language models like CLIP (OpenAI) are pretrained on captioned images with contrastive loss, enabling zero-shot vision tasks. Meta’s Flamingo fuses language and vision in one model. (These are all foundation models for vision or cross-modal tasks.)

LLaMA (Meta): Large Language Model Meta AI (2023) is Meta’s LLM series (LLaMA 1 & 2) released partially open-source. LLaMA-2 offers models from 7B to 70B parameters with an open license. More recently LLaMA-3 (2024) goes to 405B parameters with open weights. These open models have driven community research.

Examples span domains: GPT-NeoX/BLOOM (open LLMs by EleutherAI/HuggingFace), Gato (DeepMind’s multi-task model for text, vision, and RL), GLaM, Megatron, Alpaca, FLAN (instruction-tuned models), and specialized ones like Med-PaLM (medical QA) or AlphaFold (protein structure predictor, not usually called FM but similarly massive pretrained model in biology).

Models like Meta’s LLaMA 3.1 (405B), Mistral’s Large 2 (123B), and Google’s Gemma 2 have been released with open or permissive licenses. Open models allow anyone to inspect and modify the model weights. It also enables on-premise use (running models locally) and customization without reliance on an API. Open models democratize AI: smaller companies and researchers can build products or fine-tune models without needing huge GPU clusters.

Many leading FMs remain proprietary. OpenAI’s GPT-4, Anthropic’s Claude, Google’s Bard, and Microsoft’s Azure AI models are not open-weight. Companies argue this protects safety (controlling misuse) and is needed for commercial viability.

share twitter send feedback

Why AI model training using GPU instead of CPUMar 16, 2023

An Introduction to AI and ML for Web DevelopersMar 27, 2023

Top 10 Vector Databases & Libraries in 2024May 27, 2024

Top 6 Open-Source AI Large Language ModelsMay 19, 2023

Machine Learning vs Deep LearningAug 18, 2023

What is Supervised Learning in MLAug 18, 2023

What is Unsupervised Learning in MLAug 18, 2023

What are AI Foundation Models

You might also like