Hacker News

Latest

LibreOffice and the Art of Overreacting

2026-03-26 @ 10:13:38Points: 43Comments: 17

Swift 6.3

2026-03-26 @ 07:27:11Points: 109Comments: 46

Ashby (YC W19) Is Hiring Engineers Who Make Product Decisions

2026-03-26 @ 07:00:28Points: 1

Government agencies buy commercial data about Americans in bulk

2026-03-26 @ 06:11:01Points: 82Comments: 33

Show HN: Robust LLM Extractor for Websites in TypeScript

2026-03-26 @ 03:55:52Points: 48Comments: 34

LLMs seemed like the obvious fix — just throw the HTML at GPT and ask for JSON. Except in practice, it's more painful than that:

- Raw HTML is full of nav bars, footers, and tracking junk that eats your token budget. A typical product page is 80% noise. - LLMs return malformed JSON more often than you'd expect, especially with nested arrays and complex schemas. One bad bracket and your pipeline crashes. - Relative URLs, markdown-escaped links, tracking parameters — the "small" URL issues compound fast when you're processing thousands of pages. - You end up writing the same boilerplate: HTML cleanup → markdown conversion → LLM call → JSON parsing → error recovery → schema validation. Over and over.

We got tired of rebuilding this stack for every project, so we extracted it into a library.

Lightfeed Extractor is a TypeScript library that handles the full pipeline from raw HTML to validated, structured data:

- Converts HTML to LLM-ready markdown with main content extraction (strips nav, headers, footers), optional image inclusion, and URL cleaning - Works with any LangChain-compatible LLM (OpenAI, Gemini, Claude, Ollama, etc.) - Uses Zod schemas for type-safe extraction with real validation - Recovers partial data from malformed LLM output instead of failing entirely — if 19 out of 20 products parsed correctly, you get those 19 - Built-in browser automation via Playwright (local, serverless, or remote) with anti-bot patches - Pairs with our browser agent (@lightfeed/browser-agent) for AI-driven page navigation before extraction

We use this ourselves in production at Lightfeed, and it's been solid enough that we decided to open-source it.

GitHub: https://github.com/lightfeed/extractor npm: npm install @lightfeed/extractor Apache 2.0 licensed.

Happy to answer questions or hear feedback.

Obsolete Sounds

2026-03-26 @ 03:54:27Points: 53Comments: 7

The Last Contract: William T. Vollmann's Battle to Publish an Epic (2025)

2026-03-26 @ 03:54:13Points: 15Comments: 0

The Cassandra of 'The Machine'

2026-03-26 @ 03:53:40Points: 16Comments: 3

False claims in a widely-cited paper

2026-03-26 @ 00:46:31Points: 298Comments: 122

Shell Tricks That Make Life Easier (and Save Your Sanity)

2026-03-26 @ 00:28:38Points: 151Comments: 68

"Disregard That" Attacks

2026-03-25 @ 23:11:34Points: 92Comments: 64

Running Tesla Model 3's computer on my desk using parts from crashed cars

2026-03-25 @ 21:11:57Points: 664Comments: 217

The EU still wants to scan your private messages and photos

2026-03-25 @ 20:27:03Points: 1253Comments: 335

Personal Encyclopedias

2026-03-25 @ 19:41:51Points: 293Comments: 63

Apple randomly closes bug reports unless you "verify" the bug remains unfixed

2026-03-25 @ 19:14:42Points: 419Comments: 242

90% of Claude-linked output going to GitHub repos w <2 stars

2026-03-25 @ 18:16:40Points: 309Comments: 190

ARC-AGI-3

2026-03-25 @ 18:16:03Points: 419Comments: 264

Show HN: Optio – Orchestrate AI coding agents in K8s to go from ticket to PR

2026-03-25 @ 17:10:21Points: 59Comments: 33

Optio is an open-source orchestration system that turns tickets into merged pull requests using AI coding agents. You point it at your repos, and it handles the full lifecycle:

- Intake — pull tasks from GitHub Issues, Linear, or create them manually

- Execution — spin up isolated K8s pods per repo, run Claude Code or Codex in git worktrees

- PR monitoring — watch CI checks, review status, and merge readiness every 30s

- Self-healing — auto-resume the agent on CI failures, merge conflicts, or reviewer change requests

- Completion — squash-merge the PR and close the linked issue

The key idea is the feedback loop. Optio doesn't just run an agent and walk away — when CI breaks, it feeds the failure back to the agent. When a reviewer requests changes, the comments become the agent's next prompt. It keeps going until the PR merges or you tell it to stop.

Built with Fastify, Next.js, BullMQ, and Drizzle on Postgres. Ships with a Helm chart for production deployment.

Quantization from the Ground Up

2026-03-25 @ 16:06:34Points: 277Comments: 50

Supreme Court Sides with Cox in Copyright Fight over Pirated Music

2026-03-25 @ 15:02:56Points: 361Comments: 281

Earthquake scientists reveal how overplowing weakens soil at experimental farm

2026-03-25 @ 14:12:04Points: 172Comments: 88

Thoughts on slowing the fuck down

2026-03-25 @ 14:07:14Points: 898Comments: 398

From zero to a RAG system: successes and failures

2026-03-24 @ 06:53:30Points: 36Comments: 10

Maxell MXCP-P100 – wireless cassette player

2026-03-24 @ 01:56:51Points: 34Comments: 18

Niche Museums

2026-03-23 @ 20:57:19Points: 18Comments: 8

More precise elevation data for GraphHopper routing engine

2026-03-23 @ 16:39:53Points: 59Comments: 3

What came after the 486?

2026-03-23 @ 12:09:35Points: 53Comments: 46

My DIY FPGA board can run Quake II

2026-03-22 @ 23:06:57Points: 167Comments: 51

Two studies in compiler optimisations

2026-03-22 @ 15:34:44Points: 88Comments: 11

The truth that haunts the Ramones: 'They sold more T-shirts than records'

2026-03-22 @ 01:58:23Points: 154Comments: 93

Archives

2026

2025

2024

2023

2022