Generative AI Newsroom

r/LocalLLaMA Daily Update (24h, 2026-03-25 JST)

LocalLLaMA r/LocalLLaMA · Mar 24, 2026, 10:01 p.m.

Top concrete r/LocalLLaMA updates from the last 24 hours: notable model releases, tooling/security updates, and practical benchmark resources.

Window: last 24 hours (reported on 2026-03-25 JST)

Models

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B — major open-weights drop spanning both ultra-large and smaller deployable tiers.
MolmoWeb 4B/8B — new model family release signal with lightweight-to-mid-size variants.
Devstral-Small-2-24B fine-tuned on Claude 4.6 Opus reasoning traces [GGUF Q4+Q5] — concrete community fine-tune + quantized artifacts for local inference.
Mistral-Small-4-119B-2603-heretic — new derivative release for users tracking Mistral-based open checkpoints.

Tools/Frameworks

text-generation-webui v4.2 released: use Claude Code with local models via new Anthropic-compatible API, smaller portable builds, UI theme improvements + more — substantial version update with integration and packaging improvements.
mcp-scan: security scanner that audits MCP server configs across 10 AI clients — new security utility focused on MCP deployment hygiene.
ACP Router, a small bridge/proxy for connecting ACP-based agents to OpenAI-compatible tools — interoperability bridge release for agent-tool wiring.
SparkRun & Spark Arena = someone finally made an easy button for running vLLM on DGX Spark — tooling streamlining for Spark-focused local/server setups.
Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! — high-priority supply-chain alert affecting a widely used LLM gateway layer.
LM Studio may possibly be infected with sophisticated malware. — high-visibility community security incident thread to monitor/verify before updates.

Resources

ran 150+ benchmarks across a bunch of macs, here’s what we found — practical benchmark dataset for Apple hardware sizing decisions.
SWE-bench results for different KV cache quantization levels — concrete eval resource on quality/perf trade-offs in KV quantization.
PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed — operational tuning reference with immediate production relevance.

Read original source ↗