Hacker News

Latest

Talos: Hardware accelerator for deep convolutional neural networks

2026-03-03 @ 23:11:11Points: 48Comments: 16

Helsinki just went a full year without a single traffic death

2026-03-03 @ 22:49:54Points: 150Comments: 81

Don't Make Me Talk to Your Chatbot

2026-03-03 @ 22:24:06Points: 218Comments: 180

Voxile: A ray-traced game made in its own engine and programming language

2026-03-03 @ 21:10:27Points: 113Comments: 23

We've freed Cookie's Bustle from copyright hell

2026-03-03 @ 20:14:18Points: 98Comments: 11

Possible US Government iPhone-Hacking Toolkit in foreign spy and criminal hands

2026-03-03 @ 19:34:37Points: 193Comments: 62

An Interactive Intro to CRDTs (2023)

2026-03-03 @ 19:22:32Points: 97Comments: 16

GitHub Is Having Issues

2026-03-03 @ 19:02:14Points: 207Comments: 140

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

2026-03-03 @ 18:54:06Points: 253Comments: 200

GPT‑5.3 Instant

2026-03-03 @ 17:57:33Points: 300Comments: 227

When AI writes the software, who verifies it?

2026-03-03 @ 16:34:53Points: 151Comments: 142

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

2026-03-03 @ 14:42:30Points: 432Comments: 68

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

2026-03-03 @ 14:30:58Points: 72Comments: 19

https://www.cekura.ai). We've been running voice agent simulation for 1.5 years, and recently extended the same infrastructure to chat. Teams use Cekura to simulate real user conversations, stress-test prompts and LLM behavior, and catch regressions before they hit production.

The core problem: you can't manually QA an AI agent. When you ship a new prompt, swap a model, or add a tool, how do you know the agent still behaves correctly across the thousands of ways users might interact with it? Most teams resort to manual spot-checking (doesn't scale), waiting for users to complain (too late), or brittle scripted tests.

Our answer is simulation: synthetic users interact with your agent the way real users do, and LLM-based judges evaluate whether it responded correctly - across the full conversational arc, not just single turns. Three things make this actually work: Scenario generation + real conversation import - Our scenario generation agent bootstraps your test suite from a description of your agent. But real users find paths no generator anticipates, so we also ingest your production conversations and automatically extract test cases from them. Your coverage evolves as your users do.

Mock tool platform - Agents call tools. Running simulations against real APIs is slow and flaky. Our mock tool platform lets you define tool schemas, behavior, and return values so simulations exercise tool selection and decision-making without touching production systems.

Deterministic, structured test cases - LLMs are stochastic. A CI test that passes "most of the time" is useless. Rather than free-form prompts, our evaluators are defined as structured conditional action trees: explicit conditions that trigger specific responses, with support for fixed messages when word-for-word precision matters. This means the synthetic user behaves consistently across runs - same branching logic, same inputs - so a failure is a real regression, not noise.

Cekura also monitors your live agent traffic. The obvious alternative here is a tracing platform like Langfuse or LangSmith - and they're great tools for debugging individual LLM calls. But conversational agents have a different failure mode: the bug isn't in any single turn, it's in how turns relate to each other. Take a verification flow that requires name, date of birth, and phone number before proceeding - if the agent skips asking for DOB and moves on anyway, every individual turn looks fine in isolation. The failure only becomes visible when you evaluate the full session as a unit. Cekura is built around this from the ground up. Where tracing platforms evaluate turn by turn, Cekura evaluates the full session. Imagine a banking agent where the user fails verification in step 1, but the agent hallucinates and proceeds anyway. A turn-based evaluator sees step 3 (address confirmation) and marks it green - the right question was asked. Cekura's judge sees the full transcript and flags the session as failed because verification never succeeded.

Try us out at https://www.cekura.ai - 7-day free trial, no credit card required. Paid plans from $30/month.

We also put together a product video if you'd like to see it in action: https://www.youtube.com/watch?v=n8FFKv1-nMw. The first minute dives into quick onboarding - and if you want to jump straight to the results, skip to 8:40.

Curious what the HN community is doing - how are you testing behavioral regressions in your agents? What failure modes have hurt you most? Happy to dig in below!

I'm reluctant to verify my identity or age for any online services

2026-03-03 @ 14:22:25Points: 888Comments: 548

Don't become an engineering manager

2026-03-03 @ 14:19:14Points: 313Comments: 227

MacBook Air with M5

2026-03-03 @ 14:04:56Points: 378Comments: 420

MacBook Pro with new M5 Pro and M5 Max

2026-03-03 @ 14:02:06Points: 686Comments: 671

Apple Studio Display and Studio Display XDR

2026-03-03 @ 14:00:11Points: 216Comments: 253

I'm losing the SEO battle for my own open source project

2026-03-03 @ 13:39:38Points: 453Comments: 228

Claude's Cycles [pdf]

2026-03-03 @ 10:57:42Points: 488Comments: 217

The Xkcd thing, now interactive

2026-03-03 @ 10:56:52Points: 1160Comments: 153

The beauty and terror of modding Windows

2026-03-03 @ 10:49:05Points: 115Comments: 89

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

2026-03-03 @ 10:11:44Points: 35Comments: 2

I've worked in a number of regulated industries off & on for years, and recently hit this gap.

We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not.

The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging.

With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log.

It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits).

The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs.

Blog post with link to repo: https://systima.ai/blog/open-source-article-12-audit-logging

I'd value feedback, thoughts, and any critique.

Simplifying Application Architecture with Modular Design and MIM

2026-03-03 @ 09:23:42Points: 34Comments: 2

What's inside:

* A step-by-step tutorial refactoring a legacy big-ball-of-mud into self-contained modules.

* A bit of a challenge to Clean/Hexagonal Architectures with a pattern I've seen in the wild (which I named MIM in the text).

* A solid appendix on the fundamentals of Modular Design.

(Warning: It’s a long read. I’ve seen shorter ebooks on Leanpub).

Meta’s AI smart glasses and data privacy concerns

2026-03-02 @ 22:32:35Points: 1373Comments: 774

TV's TV (1987) & TV Games Encyclopedia (1988)

2026-03-01 @ 23:56:48Points: 12Comments: 0

The Two Kinds of Error

2026-03-01 @ 23:29:40Points: 34Comments: 19

Textadept

2026-03-01 @ 05:36:58Points: 76Comments: 16

TorchLean: Formalizing Neural Networks in Lean

2026-03-01 @ 03:03:46Points: 75Comments: 10

Disable Your SSH access accidentally with scp

2026-02-28 @ 05:18:22Points: 103Comments: 49

Archives

2026

2025

2024

2023

2022