r/LocalLLaMA Daily Update (24h, 2026-03-26 JST)
Top concrete r/LocalLLaMA updates from the last 24 hours: notable model/runtime releases, tooling/security updates, and practical resources for local AI builders.
Window: last 24 hours (reported on 2026-03-26 JST)
Models
- Liquid AI’s LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU — notable local/browser runtime demo for a 24B-class model.
- Run Qwen3.5-4B on AMD NPU — practical on-device deployment datapoint for low-power local inference.
- Qwen3.5-35B-A3B-Claude-Opus-4.6-HauhauCS-Uncensored-GGUF + merging workflow script — concrete community GGUF release with reproducible merge workflow.
Tools/Frameworks
- PSA: litellm PyPI package was compromised — if you use DSPy, Cursor, or any LLM project, check your dependencies — high-priority supply-chain security alert.
- Open source load balancer for Ollama instances — infra utility for scaling self-hosted Ollama backends.
- Practical comparison: Ollama vs vLLM vs LM Studio for production use (ops perspective) — operator-focused trade-off notes across common local serving stacks.
Resources
- Introducing ARC-AGI-3 — major benchmark/discussion thread relevant for reasoning-eval tracking.
- Has anyone implemented Google’s TurboQuant paper yet? — active thread surfacing implementation status and quantization interest.
- Level1techs initial review of ARC B70 for Qwen and more. (He has 4 B70 pros) — early hardware review signal for local inference builders evaluating Intel B70.