r/LocalLLaMA Daily Update (24h, 2026-03-18 JST)
Top concrete releases and practical updates from r/LocalLLaMA in the last 24 hours, prioritized over Q&A and memes.
Window: 2026-03-17 07:00 → 2026-03-18 07:00 JST
Models
- Drummer model pack release: Skyfall 31B v4.1, Valkyrie 49B v2.1, Anubis 70B v1.2, Anubis Mini 8B v1 — multi-model version bump posted as a new generation drop.
- Holotron-12B released (open-source multimodal model, NVIDIA collaboration) — concrete model launch with throughput-focused positioning for computer-use workloads.
Tools/Frameworks
- Introducing Unsloth Studio (open-source web UI to train/run LLMs) — major new tooling release; also heavily discussed in this follow-up thread.
- HF one-liner with llmfit + llama.cpp server + Pi agent bootstrap — notable setup automation update for local deployments.
- mlx-tune: fine-tune LLMs on Mac (SFT/DPO/GRPO/Vision) — new Mac-focused tuning framework with Unsloth-compatible API.
- Dynamic expert caching PR in vLLM — framework-level performance optimization in progress.
Resources
- Qwen3.5-35B-3AB benchmark on 8GB VRAM laptop (26 t/s at 100k context) — practical local-inference benchmark data.
- Qwen3.5 MLX vs GGUF performance on Mac Studio M3 Ultra (512GB) — useful backend comparison resource for Apple Silicon users.
- Mistral-Small-4-119B-2603 NVFP4 inference numbers on RTX Pro 6000 — concrete throughput snapshot on workstation-class GPU.