Large language models (LLMs) are language models consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. LLMs emerged around 2018 and perform well at a wide variety of tasks.
- Text summarization: Generating concise summaries of long texts or documents.
- Text generation: Creating texts on various topics, genres, or styles.
- Sentiment analysis: Detecting the emotions or opinions expressed in texts.
- Content creation: Producing content such as articles, blogs, captions, headlines, etc..
- Chatbots, virtual assistants, and conversational AI: Engaging in natural and human-like conversations with users.
Following LLMs are publicly available for anyone to use, study, or modify. They are usually trained on large datasets of text from various sources and domains, and can perform various natural language tasks.
LLaMA stands for Language-agnostic, Large-scale Meaning Association. It can learn from large-scale multilingual aligned data and perform tasks such as machine translation, cross-lingual natural language inference, and zero-shot transfer.
- It is a large language model created by Meta (formerly Facebook) to help researchers advance their work in natural language processing.
- It is available at several sizes: 7B, 13B, 33B, and 65B parameters. The models were trained on text from 20 languages with Latin and Cyrillic alphabets.
- It is designed to be a foundation model, which means it can be fine-tuned for a variety of tasks, versus a fine-tuned model that is designed for a specific task.
- It is released under a noncommercial license focused on research use cases. The code and the models are available on GitHub.
Stanford Alpaca is an instruction-following LLaMA model created by Stanford researchers that can follow natural language instructions.
- It is based on the LLaMA 7B model, which is a large language model released by Meta.
- It was fine-tuned on 52K instruction-following demonstrations generated using OpenAI’s text-davinci-003 model. It shows many behaviors similar to text-davinci-003, but it is also smaller and cheaper to reproduce.
- It is intended and licensed for research use only. The code and the data are available on GitHub, but the model weights are not yet released.
FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
- It provides the weights, training code, and evaluation code for state-of-the-art models such as Vicuna and FastChat-T5.
- Vicuna is a chat assistant fine-tuned from LLaMA on user-shared conversations by LMSYS1. It can impress GPT-4 with 90% ChatGPT quality.
- FastChat-T5 is a chat assistant fine-tuned from FLAN-T5 by LMSYS3. It is compatible with commercial usage.
- It supports a distributed multi-model serving system with Web UI and OpenAI-compatible RESTful APIs.
- It introduces a Chatbot Arena where you can compare open large language models side-by-side or in battles.
OpenChatKit is an open-source project that provides a powerful base to create both specialized and general purpose chatbots for various applications. It consists of four key components:
- An instruction-tuned large language model, a 20B parameter model fine-tuned for chat from EleutherAI’s GPT-NeoX with over 43M instructions.
- Customization recipes to fine-tune the model for different domains and tasks.
- An extensible retrieval system to augment the model with live-updating information from custom repositories, such as Wikipedia or web search APIs.
- A moderation model to filter inappropriate or out-of-domain questions.
GPT4All is an open-source project that provides a powerful base to create both specialized and general purpose chatbots that run locally on your CPU. It uses the latest advancements in AI research, such as transformer models, to achieve state-of-the-art performance. GPT4All consists of several components:
- A large language model fine-tuned for chat from EleutherAI’s GPT-NeoX with over 43M instructions. The model is available in different sizes: 6B, 7B, 13B, and 20B parameters.
- A desktop chat client that allows you to run any GPT4All model natively on your home desktop with auto-updating features. The client is available for Windows, Mac OS X, and Ubuntu.
- A set of bindings for Python, TypeScript, and GoLang that enable you to integrate a locally running GPT4All model into any codebase.
- A datalake for donated GPT4All interaction data that can be used for research and improvement purposes.
RWKV (6.8k ⭐)
RWKV language model is a novel large language model architecture that combines the best features of RNNs and transformers. It has the following characteristics:
- It can be trained like a transformer with massive parallelization, but it uses a sort of attention that scales linearly with the number of tokens.
- It can be inferred like an RNN with a state, which allows it to handle long or infinite context lengths and save memory.
- It uses alternating time-mix and channel-mix layers to generate R (target), W (src, target), K (src), and V (src) vectors, which are then combined to produce logits.
- It has models with different sizes: 6B, 7B, 13B, and 20B parameters. The largest model, RWKV 20B, achieves transformer-level performance on several natural language tasks.