r/LocalLLaMA Daily Update (24h, 2026-03-19 JST)
Top concrete releases and practical updates from r/LocalLLaMA in the last 24 hours, prioritized over Q&A and memes.
Window: 2026-03-18 07:00 → 2026-03-19 07:00 JST
Models
- Nemotron 3 Nano 4B announced as a compact hybrid model for local inference — notable small-model release aimed at efficient local deployment.
- MiniMax M2.7 now available on OpenRouter — model availability update that expands practical access/testing.
- Qwen3.5-Claude-4.6-Opus-Reasoning-Distilled-v2 posted — new distilled variant release shared with the community.
Tools/Frameworks
- Arandu v0.6.0 released — concrete versioned tool update.
- Llama Bro: Android SDK for on-device inference using llama.cpp — new developer SDK for mobile local-LLM integration.
- Vibepod added local LLM support via Ollama and vLLM for Claude Code/Codex workflows — integration update connecting local backends to coding-agent tooling.
- afm mlx for macOS: new version released — Mac-focused tooling update with feature additions.
Resources
- Qwen3.5-122B-A10B GPTQ Int4 on 4× Radeon AI PRO R9700 (vLLM ROCm) with config + real numbers — useful deployment benchmark for AMD multi-GPU setups.
- Qianfan-OCR 4B write-up with OmniDocBench score + multilingual coverage + serving notes — practical model card-style metrics for document AI.
- Qwen3.5-27B 8-bit vs 16-bit comparison across 10 runs — quantization tradeoff data point for local inference decisions.