r/LocalLLaMA Daily Update (24h, 2026-03-25 JST)
Top concrete r/LocalLLaMA updates from the last 24 hours: notable model releases, tooling/security updates, and practical benchmark resources.
Window: last 24 hours (reported on 2026-03-25 JST)
Models
- New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B — major open-weights drop spanning both ultra-large and smaller deployable tiers.
- MolmoWeb 4B/8B — new model family release signal with lightweight-to-mid-size variants.
- Devstral-Small-2-24B fine-tuned on Claude 4.6 Opus reasoning traces [GGUF Q4+Q5] — concrete community fine-tune + quantized artifacts for local inference.
- Mistral-Small-4-119B-2603-heretic — new derivative release for users tracking Mistral-based open checkpoints.
Tools/Frameworks
- text-generation-webui v4.2 released: use Claude Code with local models via new Anthropic-compatible API, smaller portable builds, UI theme improvements + more — substantial version update with integration and packaging improvements.
- mcp-scan: security scanner that audits MCP server configs across 10 AI clients — new security utility focused on MCP deployment hygiene.
- ACP Router, a small bridge/proxy for connecting ACP-based agents to OpenAI-compatible tools — interoperability bridge release for agent-tool wiring.
- SparkRun & Spark Arena = someone finally made an easy button for running vLLM on DGX Spark — tooling streamlining for Spark-focused local/server setups.
- Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! — high-priority supply-chain alert affecting a widely used LLM gateway layer.
- LM Studio may possibly be infected with sophisticated malware. — high-visibility community security incident thread to monitor/verify before updates.
Resources
- ran 150+ benchmarks across a bunch of macs, here’s what we found — practical benchmark dataset for Apple hardware sizing decisions.
- SWE-bench results for different KV cache quantization levels — concrete eval resource on quality/perf trade-offs in KV quantization.
- PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed — operational tuning reference with immediate production relevance.