Hacker News

Latest

RFC 9849. TLS Encrypted Client Hello

2026-03-04 @ 07:25:24Points: 35Comments: 4

Better JIT for Postgres

2026-03-04 @ 06:17:10Points: 38Comments: 8

Agentic Engineering Patterns

2026-03-04 @ 05:00:37Points: 98Comments: 16

A CPU that runs entirely on GPU

2026-03-04 @ 04:30:32Points: 73Comments: 23

Indefinite Book Club Hiatus

2026-03-04 @ 04:25:20Points: 11Comments: 4

Giving LLMs a personality is just good engineering

2026-03-04 @ 03:37:48Points: 21Comments: 9

California's Digital Age Assurance Act, and FOSS

2026-03-04 @ 03:36:18Points: 93Comments: 72

Speculative Speculative Decoding (SSD)

2026-03-04 @ 03:24:20Points: 40Comments: 6

Number Research Inc

2026-03-04 @ 02:34:06Points: 31Comments: 16

Graphics Programming Resources

2026-03-04 @ 02:23:01Points: 84Comments: 10

Weave – A language aware merge algorithm based on entities

2026-03-04 @ 01:52:21Points: 118Comments: 71

TikTok will not introduce end-to-end encryption, saying it makes users less safe

2026-03-04 @ 01:31:05Points: 219Comments: 146

The largest acidic geyser has been putting on quite a show

2026-03-04 @ 01:27:06Points: 48Comments: 1

Motorola GrapheneOS devices will be bootloader unlockable/relockable

2026-03-04 @ 00:58:31Points: 584Comments: 160

My spicy take on vibe coding for PMs

2026-03-03 @ 23:38:21Points: 87Comments: 82

Voxile: A ray-traced game made in its own engine and programming language

2026-03-03 @ 21:10:27Points: 186Comments: 50

An Interactive Intro to CRDTs (2023)

2026-03-03 @ 19:22:32Points: 138Comments: 23

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

2026-03-03 @ 18:54:06Points: 283Comments: 241

GPT‑5.3 Instant

2026-03-03 @ 17:57:33Points: 342Comments: 266

When AI writes the software, who verifies it?

2026-03-03 @ 16:34:53Points: 222Comments: 222

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

2026-03-03 @ 14:30:58Points: 81Comments: 20

https://www.cekura.ai). We've been running voice agent simulation for 1.5 years, and recently extended the same infrastructure to chat. Teams use Cekura to simulate real user conversations, stress-test prompts and LLM behavior, and catch regressions before they hit production.

The core problem: you can't manually QA an AI agent. When you ship a new prompt, swap a model, or add a tool, how do you know the agent still behaves correctly across the thousands of ways users might interact with it? Most teams resort to manual spot-checking (doesn't scale), waiting for users to complain (too late), or brittle scripted tests.

Our answer is simulation: synthetic users interact with your agent the way real users do, and LLM-based judges evaluate whether it responded correctly - across the full conversational arc, not just single turns. Three things make this actually work: Scenario generation + real conversation import - Our scenario generation agent bootstraps your test suite from a description of your agent. But real users find paths no generator anticipates, so we also ingest your production conversations and automatically extract test cases from them. Your coverage evolves as your users do.

Mock tool platform - Agents call tools. Running simulations against real APIs is slow and flaky. Our mock tool platform lets you define tool schemas, behavior, and return values so simulations exercise tool selection and decision-making without touching production systems.

Deterministic, structured test cases - LLMs are stochastic. A CI test that passes "most of the time" is useless. Rather than free-form prompts, our evaluators are defined as structured conditional action trees: explicit conditions that trigger specific responses, with support for fixed messages when word-for-word precision matters. This means the synthetic user behaves consistently across runs - same branching logic, same inputs - so a failure is a real regression, not noise.

Cekura also monitors your live agent traffic. The obvious alternative here is a tracing platform like Langfuse or LangSmith - and they're great tools for debugging individual LLM calls. But conversational agents have a different failure mode: the bug isn't in any single turn, it's in how turns relate to each other. Take a verification flow that requires name, date of birth, and phone number before proceeding - if the agent skips asking for DOB and moves on anyway, every individual turn looks fine in isolation. The failure only becomes visible when you evaluate the full session as a unit. Cekura is built around this from the ground up. Where tracing platforms evaluate turn by turn, Cekura evaluates the full session. Imagine a banking agent where the user fails verification in step 1, but the agent hallucinates and proceeds anyway. A turn-based evaluator sees step 3 (address confirmation) and marks it green - the right question was asked. Cekura's judge sees the full transcript and flags the session as failed because verification never succeeded.

Try us out at https://www.cekura.ai - 7-day free trial, no credit card required. Paid plans from $30/month.

We also put together a product video if you'd like to see it in action: https://www.youtube.com/watch?v=n8FFKv1-nMw. The first minute dives into quick onboarding - and if you want to jump straight to the results, skip to 8:40.

Curious what the HN community is doing - how are you testing behavioral regressions in your agents? What failure modes have hurt you most? Happy to dig in below!

MacBook Pro with M5 Pro and M5 Max

2026-03-03 @ 14:02:06Points: 774Comments: 800

Claude's Cycles [pdf]

2026-03-03 @ 10:57:42Points: 618Comments: 250

Circle Games (2019)

2026-03-01 @ 12:54:30Points: 6Comments: 0

Show HN: Rust compiler in PHP emitting x86-64 executables

2026-03-01 @ 11:02:55Points: 18Comments: 17

Welcoming Elizabeth Barron as the New Executive Director of the PHP Foundation

2026-03-01 @ 10:56:17Points: 27Comments: 13

On the Design of Programming Languages (1974) [pdf]

2026-03-01 @ 09:09:04Points: 34Comments: 1

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

2026-03-01 @ 06:25:10Points: 49Comments: 24

Textadept

2026-03-01 @ 05:36:58Points: 128Comments: 21

You can use newline characters in URLs

2026-02-28 @ 23:24:57Points: 68Comments: 32

Archives

2026

2025

2024

2023

2022