llava

Solid

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

AI & Automation 191,515 stars 33299 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# LLaVA - Large Language and Vision Assistant Open-source vision-language model for conversational image understanding. ## When to use LLaVA **Use when:** - Building vision-language chatbots - Visual question answering (VQA) - Image description and captioning - Multi-turn image conversations - Visual instruction following - Document understanding with images **Metrics**: - **23,000+ GitHub stars** - GPT-4V level capabilities (targeted) - Apache 2.0 License - Multiple model sizes (7B-34B params) **Use alternatives instead**: - **GPT-4V**: Highest quality, API-based - **CLIP**: Simple zero-shot classification - **BLIP-2**: Better for captioning only - **Flamingo**: Research, not open-source ## Quick start ### Installation ```bash # Clone repository git clone https://github.com/haotian-liu/LLaVA cd LLaVA # Install pip install -e . ``` ### Basic usage ```python from llava.model.builder import load_pretrained_model from llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN from llava.conversation import conv_templates from PIL import Image import torch # Load model model_path = "liuhaotian/llava-v1.5-7b" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) ) # Load image image = Image.open("image.jpg") image_tensor = process_images([image], image_process...

Details

Author: NousResearch
Repository: NousResearch/hermes-agent
Created: 10 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

llava

9,609 Updated 1 months ago

Orchestra-Research

AI & Automation Featured

llava

27,984 Updated today

davila7

AI & Automation Featured

blip-2-vision-language

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

27,984 Updated today

davila7