Hacker News

Latest

I Don't Vibe Code

2026-05-20 @ 16:56:42Points: 75Comments: 76

After Town Bans Flock, Councilmember Crashes Out, Proposes Internet, Phone Ban

2026-05-20 @ 16:55:17Points: 93Comments: 81

Ask HN: Shouldn't Google need to give a public statement about Railway incident?

2026-05-20 @ 16:50:54Points: 127Comments: 79

Everytime I read something like this , I get nervous about the cloud providers and Google. Since this is a relatively high profile customer standards, shouldn't they explain what caused them to suspend the account ?

Apparently Google hates us now

2026-05-20 @ 16:27:25Points: 300Comments: 146

OpenAI Is Preparing to File for an IPO Soon

2026-05-20 @ 16:24:42Points: 61Comments: 59

Show HN: Lance – image/video generation and understanding in one model

2026-05-20 @ 15:45:32Points: 32Comments: 8

- Code: https://github.com/bytedance/Lance

- Homepage: https://lance-project.github.io/

- Paper: https://arxiv.org/abs/2605.18678

- Model: https://huggingface.co/bytedance-research/Lance

p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.

SBCL: the ultimate assembly code breadboard (2014)

2026-05-20 @ 15:39:48Points: 84Comments: 5

Formal Verification Gates for AI Coding Loops

2026-05-20 @ 15:25:45Points: 65Comments: 8

Stable Audio 3

2026-05-20 @ 15:10:05Points: 63Comments: 13

Testing distributed systems with AI agents

2026-05-20 @ 14:40:42Points: 58Comments: 8

Victory: Tennessee man jailed 37 days for Trump meme wins $835,000 settlement

2026-05-20 @ 14:30:47Points: 547Comments: 336

Goodbye Visa and Mastercard: 130M Europeans switching to sovereign payment

2026-05-20 @ 13:02:30Points: 811Comments: 647

Meta blocks human rights accounts from reaching audiences in Saudi Arabia, UAE

2026-05-20 @ 12:43:41Points: 801Comments: 342

Saying Goodbye to Asm.js

2026-05-20 @ 12:01:56Points: 237Comments: 108

Google's AI is being manipulated. The search giant is quietly fighting back

2026-05-20 @ 10:57:09Points: 194Comments: 143

Map of Metal

2026-05-20 @ 10:47:20Points: 338Comments: 119

Qwen3.7-Max: The Agent Frontier

2026-05-20 @ 10:35:02Points: 495Comments: 190

No way to parse integers in C (2022)

2026-05-20 @ 10:28:05Points: 59Comments: 86

Incident Report: May 19, 2026 – GCP Account Suspension

2026-05-20 @ 08:37:55Points: 301Comments: 178

Incident Report: Railway Blocked by Google Cloud [resolved] - https://news.ycombinator.com/item?id=48201484

Everything in C is undefined behavior

2026-05-20 @ 06:07:22Points: 439Comments: 589

Infomaniak transitions to a foundation model to protect user data privacy

2026-05-20 @ 05:43:51Points: 153Comments: 40

Japan is gripped by mass allergies. A 1950s project is to blame

2026-05-20 @ 01:43:06Points: 311Comments: 145

FiveThirtyEight articles on the Internet Archive

2026-05-20 @ 01:34:19Points: 352Comments: 78

Gemini 3.5 Flash

2026-05-19 @ 17:43:45Points: 928Comments: 634

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

2026-05-19 @ 12:23:07Points: 635Comments: 229

I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.

What it does:

- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware

- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it

- Ships with an eval harness and interactive dashboard so you can reproduce every number

I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.

Demo video: https://youtu.be/MzRgJoJAXGc (side-by-side: same model, same task, with and without Forge guardrails)

The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers:

- Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point.

- The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone.

- Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence.

I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!).

The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while.

One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend.

Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward.

Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this.

How to try it:

- Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard.

- Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it.

- Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code.

Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29.

Repo: https://github.com/antoinezambelli/forge

Paper: https://www.caisconf.org/program/2026/demos/forge-agentic-re... https://github.com/antoinezambelli/forge/blob/main/docs/forg...

Dashboard: https://github.com/antoinezambelli/forge/docs/results/dashbo...

When Fast Fourier Transform Meets Transformer for Image Restoration (2024)

2026-05-18 @ 14:10:47Points: 63Comments: 7

How fast is N tokens per second really?

2026-05-18 @ 02:04:38Points: 157Comments: 41

Autoregressive next token prediction and KV Cache in transformers

2026-05-17 @ 20:07:14Points: 42Comments: 0

Smartmedia Card Spec Opened, available free (2000)

2026-05-17 @ 19:22:42Points: 22Comments: 11

Handling the great code forge fragmentation

2026-05-17 @ 15:31:03Points: 23Comments: 7

Archives

2026

2025

2024

2023

2022