r/LocalLLaMA Daily Update (24h, 2026-04-01 JST)

LocalLLaMA r/LocalLLaMA · Mar 31, 2026, 10:01 p.m.

Top concrete r/LocalLLaMA updates from the last 24 hours: notable new model drops (1-bit and compact agentic), open-source agent tooling momentum, and practical optimization resources.

Window: last 24 hours (reported on 2026-04-01 JST)

Models

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs — concrete new-model announcement around 1-bit inference efficiency claims.
IBM and Apache 2? Who Would Have Thought - Granite 4 3B Vision — new Granite 4 3B Vision signal shared with open-license emphasis.
Liquid AI releases LFM2.5-350M -> Agentic loops at 350M parameters — compact model release positioned for lightweight agentic workflows.

Tools/Frameworks

Claude Code’s source just leaked — I extracted its multi-agent orchestration system into an open-source framework that works with any LLM — high-engagement release thread for a reusable multi-agent orchestration framework.
Built a human-in-the-loop approval API for local and hosted agents, stops agents from taking irreversible actions without greenlight — safety-focused agent control tooling release.
open source deterministic replay engine for AI agents, zero api cost replays — debugging/repro tooling aimed at deterministic agent reruns.

Resources

ByteShape Qwen 3.5 9B: A Guide to Picking the Best Quant for Your Hardware — practical quant-selection guide for deployment tradeoffs.
Build script for llama.cpp for ROCm (including Mi50) using the Rock artifacts — actionable setup resource for AMD/ROCm users.
Claude Code running locally with Ollama — local integration walkthrough thread for coding-agent workflows.

Read original source ↗