Flamingo

text+image+video

DeepMind (Google)

research

Flamingo (NeurIPS 2022) is DeepMind's visual-language model that processes images, videos, and text together. It bridges...

Version: 1.0

Released: 3y 6m 3d ago on 04/29/2022

Architecture

parameters: 80000000000
context_length: 2048
training_data: Large-scale multimodal web data (interleaved images/videos and text) with cross-attention layers
inference: few-shot visual-language generation

Capabilities

One-shot image
video understanding
handles images or video frames
text to perform tasks like visual question answering
captioning
state-of-the-art few-shot results

Benchmarks

VQA: SOTA (few-shot)
ImageCaptions: SOTA (few-shot)

Safety

Prone to visual biases
no public alignment details available.

Deployment

regions: private
hosting: No public API
integrations: used in research only.

Tags

vision-languagemultimodalfew-shotresearch

Join our community

Connect with others, share experiences, and stay in the loop.

LinkedIn

Connect with us and explore career opportunities.

Facebook

Follow us for updates and community news.

YouTube

Watch our latest videos and tutorials.

Twitter

Follow our latest updates and announcements.

Instagram

Follow us for behind-the-scenes content.

TikTok

Follow us for short-form content and trends.