Hacker News

Latest

“This is not the computer for you”

2026-03-13 @ 01:45:49Points: 170Comments: 92

Shall I implement it? No

2026-03-12 @ 21:01:10Points: 1057Comments: 394

Innocent woman jailed after being misidentified using AI facial recognition

2026-03-12 @ 20:55:51Points: 500Comments: 266

Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference

2026-03-12 @ 18:52:36Points: 59Comments: 24

https://ionrouter.io/), an inference API for open-source and fine tuned models. You swap in our base URL, keep your existing OpenAI client code, and get access to any model (open source or finetuned to you) running on our own inference engine.

The problem we kept running into: every inference provider is either fast-but-expensive (Together, Fireworks — you pay for always-on GPUs) or cheap-but-DIY (Modal, RunPod — you configure vLLM yourself and deal with slow cold starts). Neither felt right for teams that just want to ship.

Suryaa spent years building GPU orchestration infrastructure at TensorDock and production systems at Palantir. I led ML infrastructure and Linux kernel development for Space Force and NASA contracts where the stack had to actually work under pressure. When we started building AI products ourselves, we kept hitting the same wall: GPU infrastructure was either too expensive or too much work.

So we built IonAttention — a C++ inference runtime designed specifically around the GH200's memory architecture. Most inference stacks treat GH200 as a compatibility target (make sure vLLM runs, use CPU memory as overflow). We took a different approach and built around what makes the hardware actually interesting: a 900 GB/s coherent CPU-GPU link, 452GB of LPDDR5X sitting right next to the accelerator, and 72 ARM cores you can actually use.

Three things came out of that that we think are novel: (1) using hardware cache coherence to make CUDA graphs behave as if they have dynamic parameters at zero per-step cost — something that only works on GH200-class hardware; (2) eager KV block writeback driven by immutability rather than memory pressure, which drops eviction stalls from 10ms+ to under 0.25ms; (3) phantom-tile attention scheduling at small batch sizes that cuts attention time by over 60% in the worst-affected regimes. We wrote up the details at cumulus.blog/ionattention.

On multimodal pipelines we get better performance than big players (588 tok/s vs. Together AI's 298 on the same VLM workload). We're honest that p50 latency is currently worse (~1.46s vs. 0.74s) — that's the tradeoff we're actively working on.

Pricing is per token, no idle costs: GPT-OSS-120B is $0.02 in / $0.095 out, Qwen3.5-122B is $0.20 in / $1.60 out. Full model list and pricing at https://ionrouter.io.

You can try the playground at https://ionrouter.io/playground right now, no signup required, or drop your API key in and swap the base URL — it's one line. We built this so teams can see the power of our engine and eventually come to us for their finetuned model needs using the same solution.

We're curious what you think, especially if you're running finetuned or custom models — that's the use case we've invested the most in. What's broken, what would make this actually useful for you?

Bubble Sorted Amen Break

2026-03-12 @ 17:13:56Points: 290Comments: 89

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

2026-03-12 @ 17:04:35Points: 96Comments: 38

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard
GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

Converge (YC S23) Is Hiring a Founding Platform Engineer (NYC, Onsite)

2026-03-12 @ 17:01:46Points: 1

Show HN: OneCLI – Vault for AI Agents in Rust

2026-03-12 @ 16:41:06Points: 132Comments: 41

OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.

Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli

The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).

We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.

It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.

GitHub: https://github.com/onecli/onecli Site: https://onecli.sh

Reversing memory loss via gut-brain communication

2026-03-12 @ 16:38:51Points: 270Comments: 107

The Met releases high-def 3D scans of 140 famous art objects

2026-03-12 @ 15:43:39Points: 253Comments: 50

WolfIP: Lightweight TCP/IP stack with no dynamic memory allocations

2026-03-12 @ 15:39:50Points: 108Comments: 14

The Road Not Taken: A World Where IPv4 Evolved

2026-03-12 @ 15:31:21Points: 68Comments: 133

ATMs didn’t kill bank teller jobs, but the iPhone did

2026-03-12 @ 14:48:57Points: 374Comments: 401

Show HN: Axe – A 12MB binary that replaces your AI framework

2026-03-12 @ 13:49:12Points: 162Comments: 100

Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.

Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.

What Axe is:

- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)

- Stdin piping, something like `git diff | axe run reviewer` just works

- Sub-agent delegation. Where agents call other agents via tool use, depth-limited

- Persistent memory. If you want, agents can remember across runs without you managing state

- MCP support. Axe can connect any MCP server to your agents

- Built-in tools. Such as web_search and url_fetch out of the box

- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format

- Path-sandboxed file ops. Keeps agents locked to a working directory

Written in Go. No daemon, no GUI.

What would you automate first?

Malus – Clean Room as a Service

2026-03-12 @ 13:42:04Points: 1127Comments: 421

Document poisoning in RAG systems: How attackers corrupt AI's sources

2026-03-12 @ 13:40:36Points: 91Comments: 40

https://github.com/aminrj-labs/mcp-attack-labs/tree/main/lab...

The lab runs entirely on LM Studio + Qwen2.5-7B-Instruct (Q4_K_M) + ChromaDB — no cloud APIs, no GPU required, no API keys.

From zero to seeing the poisoning succeed: git clone, make setup, make attack1. About 10 minutes.

Two things worth flagging upfront:

- The 95% success rate is against a 5-document corpus (best case for the attacker). In a mature collection you need proportionally more poisoned docs to dominate retrieval — but the mechanism is the same.

- Embedding anomaly detection at ingestion was the biggest surprise: 95% → 20% as a standalone control, outperforming all three generation-phase defenses combined. It runs on embeddings your pipeline already produces — no additional model.

All five layers combined: 10% residual.

Happy to discuss methodology, the PoisonedRAG comparison, or anything that looks off.

Long overlooked as crucial to life, fungi start to get their due

2026-03-12 @ 13:16:11Points: 106Comments: 34

US private credit defaults hit record 9.2% in 2025, Fitch says

2026-03-12 @ 12:44:45Points: 301Comments: 390

Are LLM merge rates not getting better?

2026-03-12 @ 11:49:05Points: 129Comments: 119

Many SWE-bench-Passing PRs would not be merged - https://news.ycombinator.com/item?id=47341645 - March 2026 (149 comments)

Big data on the cheapest MacBook

2026-03-12 @ 11:41:14Points: 329Comments: 264

Returning to Rails in 2026

2026-03-12 @ 06:06:46Points: 354Comments: 222

DDR4 Sdram – Initialization, Training and Calibration

2026-03-10 @ 06:02:27Points: 86Comments: 19

Language birth

2026-03-10 @ 04:48:50Points: 24Comments: 3

The Cost of Indirection in Rust

2026-03-09 @ 17:28:34Points: 93Comments: 44

Understanding the Go Runtime: The Scheduler

2026-03-09 @ 14:54:31Points: 59Comments: 3

IMG_0416 (2024)

2026-03-09 @ 13:07:45Points: 29Comments: 3

How people woke up before alarm clocks

2026-03-09 @ 00:09:54Points: 23Comments: 19

NASA's DART spacecraft changed an asteroid's orbit around the sun

2026-03-09 @ 00:00:05Points: 110Comments: 86

Full Spectrum and Infrared Photography

2026-03-08 @ 16:11:02Points: 53Comments: 27

Forcing Flash Attention onto a TPU and Learning the Hard Way

2026-03-08 @ 03:57:30Points: 53Comments: 13

Archives

2026

2025

2024

2023

2022