Hacker News
Latest
Palestinian boy, 12, describes how Israeli forces killed his family in car
2026-03-16 @ 18:38:14Points: 394Comments: 96
The return-to-the-office trend backfires
2026-03-16 @ 18:13:18Points: 82Comments: 53
Jemalloc un-abandoned by Meta
2026-03-16 @ 18:12:32Points: 197Comments: 76
Agent Skills – Open Security Database
2026-03-16 @ 17:35:58Points: 23Comments: 2
Language Model Teams as Distrbuted Systems
2026-03-16 @ 17:19:13Points: 33Comments: 5
The “small web” is bigger than you might think
2026-03-16 @ 17:17:57Points: 166Comments: 55
Launch HN: Chamber (YC W26) – An AI Teammate for GPU Infrastructure
2026-03-16 @ 17:09:34Points: 17Comments: 4
We all worked on GPU infrastructure at Amazon. Between us we've spent years on this problem — monitoring GPU fleets, debugging failures at scale, building the tooling around it. After leaving we talked to a bunch of AI teams and kept hearing the same stuff. Platform engineers spend half their time just keeping things running. Building dashboards, writing scheduling configs, answering "when will my job start?" all day. Researchers lose hours when a training run fails because figuring out why means digging through Kubernetes events, node logs, and GPU metrics in totally separate tools. Pretty much everyone had stitched together Prometheus, Grafana, Kubernetes scheduling policies, and a bunch of homegrown scripts, and they were spending as much time maintaining all of it as actually using it.
The thing we kept noticing is that most of this work follows patterns. Triage the failure, correlate a few signals, figure out what to do about it. If you had a platform with structured access to the full state of a GPU environment, you could have an agent do that work for you.
So that's what we built. Chamber is a control plane that keeps a live model of your GPU fleet: nodes, workloads, team structure, cluster health. Every operation it supports is exposed as a tool the agent can call. Inspecting node health, reading cluster topology, managing workload lifecycle, adjusting resource configs, provisioning infrastructure. These are structured operations with validation and rollback, not just raw shell commands. When we add new capabilities to the platform, they automatically become things the agent can do too.
We spent a lot of time on safety because we've seen what happens when infrastructure automation goes wrong. A wrong call can kill a multi-day training run or cascade across a cluster. So the agent has graduated autonomy. Routine stuff it handles on its own: diagnosing a failed job, resubmitting with corrected resources, cordoning a bad node. But anything that touches other teams' workloads or production jobs needs human approval first. Every action gets logged with what the agent saw, why it acted, and what it changed.
The platform underneath is really what makes the diagnosis work. When the agent investigates a failure, it queries GPU state, workload history, node health timelines, and cluster topology. That's the difference between "your job OOMed" and "your job OOMed because the batch size exceeded available VRAM on this node, here's a corrected config." Different root causes get different fixes.
One thing that surprised us, even coming from Amazon where we'd seen large GPU fleets: most teams we talk to can't even tell you how many GPUs are in use right now. The monitoring just doesn't exist. They're flying blind on their most expensive hardware.
We’ve launched with a few early customers and are onboarding new teams. We’re still refining pricing and are currently evaluating models like per-GPU-under-management and tiered plans. We plan to publish transparent pricing once we’ve validated what works best for customers. In the meantime, we know “contact us” isn’t ideal.
Would love to hear from anyone running GPU clusters. What's the most tedious part of your setup? What would you actually trust an agent to do? What's off limits? Looking forward to feedback!
Speed at the cost of quality: Study of use of Cursor AI in open source projects
2026-03-16 @ 17:07:37Points: 58Comments: 26
Kaizen (YC P25) Hiring Eng, GTM, Cos to Automate BPOs
2026-03-16 @ 17:00:17Points: 1
Launch HN: Voygr (YC W26) – A better maps API for agents and AI apps
2026-03-16 @ 16:21:07Points: 44Comments: 23
Google Maps can tell you a restaurant is "4.2 stars, open till 10." Their API can't tell you the chef left last month, wait times doubled, and locals moved on. Maps APIs today just give you a fixed snapshot. We're building an infinite, queryable place profile that combines accurate place data with fresh web context like news, articles, and events.
Vlad worked on the Google Maps APIs as well as in ridesharing and travel. Yarik led ML/Search infrastructure at Apple, Google, and Meta powering products used by hundreds of millions of users daily. We realized nobody was treating place data freshness as infrastructure, so we're building it.
We started with one of the hardest parts - knowing whether a place is even real. Our Business Validation API (https://github.com/voygr-tech/dev-tools) tells you whether a business is actually operating, closed, rebranded, or invalid. We aggregate multiple data sources, detect conflicting signals, and return a structured verdict. Think of it as continuous integration, but for the physical world.
The problem: ~40% of Google searches and up to 20% of LLM prompts involve local context. 25-30% of places churn every year. The world doesn't emit structured "I closed" events - you have to actively detect it. As agents start searching, booking, and shopping in the real world, this problem gets 10x bigger - and nobody's building the infrastructure for it. We recently benchmarked how well LLMs handle local place queries (https://news.ycombinator.com/item?id=47366423) - the results were bad: even the best gets 1 in 12 local queries wrong
We're processing tens of thousands of places per day for enterprise customers, including leading mapping and tech companies. Today we're opening API access to the developer community. Please find details here: https://github.com/voygr-tech/dev-tools
We'd love honest feedback - whether it's about the problem, our approach, or where you think we're wrong. If you're dealing with stale place data in your own products, we'd especially love to hear what breaks. We're here all day, AMA.