llava

Solid

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

AI & Automation 191,515 stars 33299 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# LLaVA - Large Language and Vision Assistant Open-source vision-language model for conversational image understanding. ## When to use LLaVA **Use when:** - Building vision-language chatbots - Visual question answering (VQA) - Image description and captioning - Multi-turn image conversations - Visual instruction following - Document understanding with images **Metrics**: - **23,000+ GitHub stars** - GPT-4V level capabilities (targeted) - Apache 2.0 License - Multiple model sizes (7B-34B params) **Use alternatives instead**: - **GPT-4V**: Highest quality, API-based - **CLIP**: Simple zero-shot classification - **BLIP-2**: Better for captioning only - **Flamingo**: Research, not open-source ## Quick start ### Installation ```bash # Clone repository git clone https://github.com/haotian-liu/LLaVA cd LLaVA # Install pip install -e . ``` ### Basic usage ```python from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN from llava.conversation import conv_templates from PIL import Image import torch # Load model model_path = "liuhaotian/llava-v1.5-7b" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) ) # Load image image = Image.open("image.jpg") image_tensor = process_images([image], image_process...

Details

Author
NousResearch
Repository
NousResearch/hermes-agent
Created
10 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category