LLM Hub

Open-source Android app bringing the power of Large Language Models directly to your mobile device. Experience AI conversations with Gemma, Llama, and Phi models - all running locally for maximum privacy and offline accessibility.

Key Features

Everything you need for private, powerful AI conversations on your Android device

Multiple LLM Models
Support for Gemma-3, Llama-3.2, Phi-4, and Gemma-3n multimodal models
Privacy First
Complete privacy - your conversations never leave your device
Vision Support
Multimodal models that understand text, images, and audio input
Writing Aid
AI-powered writing assistance: summarize, expand, rewrite, improve text, or generate code
Translator
Translate text, images (OCR), and audio across 50+ languages - works offline
Audio Transcription
Convert speech to text with on-device processing using Gemma-3n models
Text-to-Speech
TTS with auto-readout for AI responses during conversations
Image Generator
Create images from text prompts using Stable Diffusion 1.5 with swipeable gallery for variations
Scam Detector
AI-powered fraud detection for messages, emails, and images with risk assessment
GPU Acceleration
Optimized performance on supported devices (8GB+ RAM recommended)
Offline Usage
Chat without internet connection after model download - complete offline functionality
Direct Downloads
Download models directly from HuggingFace or import custom MediaPipe models

Supported Models

Choose from a variety of state-of-the-art LLM models optimized for mobile devices

Gemma-3 1B Series
Google
Optimized text models for mobile devices with GPU acceleration support
INT4 - 2k context • INT8 - 1.2k/2k/4k context
Text generationGPU acceleration
Llama-3.2 Series
Meta
Meta's powerful models optimized for on-device inference
1B model - 1.2k context • 3B model - 1.2k context
Text generationCPU only
Phi-4 Mini
Microsoft
Microsoft's efficient model for advanced reasoning with GPU support on 8GB+ devices
INT8 quantization - 1.2k context
Text generationGPU acceleration (8GB+ RAM)
Gemma-3n Series
Google
Vision & Audio
Multimodal models with text, vision, and audio capabilities
E2B model • E4B model
Text generationImage understandingAudio transcriptionGPU acceleration (8GB+ RAM)
Absolute Reality SD1.5
Stable Diffusion
Image Generation
Image generation model for creating images from text prompts
MNN (CPU) • QNN (NPU)
Image generationCPU/NPU acceleration
Gecko-110M
Google
Embeddings (RAG)
Compact embedding model for RAG memory system
64D-1024D embeddings
Text embeddingsRAG support
EmbeddingGemma-300M
Google
Embeddings (RAG)
High-quality text embeddings for enhanced RAG retrieval
High-quality embeddings
Text embeddingsRAG support

Advanced Capabilities

Powerful features that enhance your AI experience while maintaining complete privacy

RAG Memory System

On-device Retrieval-Augmented Generation with local embeddings and semantic search

Global context memoryDocument chunkingPersistent embeddingsNo external endpoints
Web Search Integration

Built-in DuckDuckGo search for fact-checking and real-time information

Content-aware searchesInstant Answer APIOptional augmentationPrivacy-focused
Custom Model Import

Import your own MediaPipe-compatible models

.task format.litertlm formatMediaPipe Model MakerAI Edge Converter
Smart AI Tools

Comprehensive suite of AI-powered productivity tools

Writing assistanceMulti-language translationAudio transcriptionScam detection

Technology Stack

Built with modern Android development tools and cutting-edge AI technology

Kotlin
Modern Android development language
Jetpack Compose
Modern UI toolkit for Android
MediaPipe & LiteRT
AI runtime (formerly TensorFlow Lite)
INT4/INT8 Quantization
Optimized model compression for mobile
GPU Acceleration
LiteRT XNNPACK delegate for performance
HuggingFace
Model source and hosting

Requirements

Android 8.0+
API level 26 or higher
2GB+ RAM
6GB+ recommended, 8GB+ for Phi-4 GPU
1GB - 5GB Storage
Depending on selected models
Internet
Required only for model downloads

How It Works

LLM Hub uses Google's MediaPipe framework with LiteRT to run quantized AI models directly on your Android device

1
Download
Pre-optimized .task files from HuggingFace
2
Load
Models into MediaPipe's LLM Inference API
3
Process
Your input locally using CPU or GPU
4
Generate
Responses without sending data to external servers

Ready to Experience Private AI?

Download LLM Hub today and start having AI conversations that stay completely private on your device.