Hacker News
Latest
Two kinds of vibe coding
2025-12-18 @ 21:16:25Points: 30Comments: 12
Delty (YC X25) Is Hiring an ML Engineer
2025-12-18 @ 21:02:10Points: 1
Oliver Sacks put himself into his case studies – what was the cost?
2025-12-18 @ 20:52:38Points: 22Comments: 61
Interactive Fluid Typography
2025-12-18 @ 19:56:04Points: 26Comments: 3
T5Gemma 2: The next generation of encoder-decoder models
2025-12-18 @ 19:48:15Points: 69Comments: 10
How to hack Discord, Vercel and more with one easy trick
2025-12-18 @ 19:41:24Points: 74Comments: 14
The Scottish Highlands, the Appalachians, Atlas are the same mountain range
2025-12-18 @ 19:15:17Points: 59Comments: 15
We pwned X, Vercel, Cursor, and Discord through a supply-chain attack
2025-12-18 @ 19:08:48Points: 431Comments: 169
How China built its ‘Manhattan Project’ to rival the West in AI chips
2025-12-18 @ 18:55:34Points: 125Comments: 110
FunctionGemma 270M Model
2025-12-18 @ 18:26:52Points: 117Comments: 33
Firefox will have an option to disable all AI features
2025-12-18 @ 18:18:30Points: 186Comments: 172
GPT-5.2-Codex
2025-12-18 @ 18:14:48Points: 293Comments: 170
Skills for organizations, partners, the ecosystem
2025-12-18 @ 17:04:32Points: 211Comments: 134
Beginning January 2026, all ACM publications will be made open access
2025-12-18 @ 15:39:09Points: 1144Comments: 128
Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction
2025-12-18 @ 15:35:52Points: 31Comments: 34
Here’s a demo video: https://video.runpulse.com/video/pulse-platform-walkthrough-....
Later in this post, you’ll find links to before-and-after examples on particularly tricky cases. Check those out to see what Pulse can really do! Modern vision language models are great at producing plausible text, but that makes them risky for OCR and data ingestion. Plausibility isn’t good enough when you need accuracy.
When we started working on document extraction, we assumed the same thing many teams do: foundation models are improving quickly, multi-modal systems appear to read documents well, what’s not to like? And indeed, for small or clean inputs, those assumptions mostly give good results. However, limitations show up once you begin processing real documents in volume. Long PDFs, dense tables, mixed layouts, low-fidelity scans, and financial or operational data expose errors that are subtle, hard to detect, and expensive to correct. Outputs look reasonable even though they contain small but important mistakes, especially in tables and numeric fields.
Running into those challenges got us working. We ran controlled evaluations on complex documents, fine tuned vision models, and built labeled datasets where ground truth actually matters. There have been many nights where our team stayed up hand-annotating pages, drawing bounding boxes around tables, labeling charts point by point, or debating whether a number was unreadable or simply poorly scanned. That process shaped our intuition far more than benchmarks.
One thing became clear quickly. The core challenge is not extraction itself, but confidence. Vision language models embed document images into high-dimensional representations optimized for semantic understanding, not precise transcription. That process is inherently lossy. When uncertainty appears, models tend to resolve it using learned priors instead of surfacing ambiguity. This behavior can be helpful in consumer settings. In production pipelines, it creates verification problems that do not scale well. Pulse grew out of our trying to address this gap through system design rather than prompting alone.
Instead of treating document understanding as a single generative step, our system separates layout analysis from language modeling. Documents are normalized into structured representations that preserve hierarchy and tables before schema mapping occurs. Extraction is constrained by schemas defined ahead of time, and extracted values are tied back to source locations so uncertainty can be inspected rather than guessed away. In practice, this results in a hybrid approach that combines traditional computer vision techniques, layout models, and vision language models, because no single approach handles these cases reliably on its own.
We are intentionally sharing a few documents that reflect the types of inputs that motivated this work. These are representative of cases where we saw generic OCR or VLM-based pipelines struggle.
Here is a financial 10K: https://platform.runpulse.com/dashboard/examples/example1
Here is a newspaper: https://platform.runpulse.com/dashboard/examples/example2
Here is a rent roll: https://platform.runpulse.com/dashboard/examples/example3
Pulse is not perfect, particularly on highly degraded scans or uncommon handwriting, and we’re working on improvements. However, our goal is not to eliminate errors entirely, but to make them visible, auditable, and easier to reason about.
Pulse is available via usage-based access to the API and platform You can sign up to try it at https://platform.runpulse.com/login. API docs are at https://docs.runpulse.com/introduction.
We’d love to hear how others here evaluate correctness for document extraction, which failure modes you have seen in practice, and what signals you rely on to decide whether an output can be trusted.
We will be around to answer questions and are happy to run additional documents if people want to share examples. Put links in the comments and we’ll plug them in and get back to you.
Looking forward to your comments!
The immortality of Microsoft Word
2025-12-18 @ 15:11:06Points: 33Comments: 48
Using TypeScript to obtain one of the rarest license plates
2025-12-18 @ 15:00:32Points: 125Comments: 133
Your job is to deliver code you have proven to work
2025-12-18 @ 14:52:11Points: 563Comments: 480
Please just try HTMX
2025-12-18 @ 14:18:52Points: 392Comments: 331
Classical statues were not painted horribly
2025-12-18 @ 12:28:45Points: 508Comments: 253
How did IRC ping timeouts end up in a lawsuit?
2025-12-17 @ 18:25:29Points: 99Comments: 11
TRELLIS.2: state-of-the-art large 3D generative model (4B)
2025-12-16 @ 22:09:53Points: 50Comments: 10
Texas is suing all of the big TV makers for spying on what you watch
2025-12-16 @ 21:04:54Points: 318Comments: 177
Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn)
2025-12-16 @ 20:42:38Points: 86Comments: 53
There isn't much you can do about it without cloudflare. These companies ignore robots.txt, and you're competing with teams with more resources than you. It's you vs the MJs of programming, you're not going to win.
But there is a solution. Now I'm not going to say it's a great solution...but a solution is a solution. If your website contains content that will trigger their scraper's safeguards, it will get dropped from their data pipelines.
So here's what fuzzycanary does: it injects hundreds of invisible links to porn websites in your HTML. The links are hidden from users but present in the DOM so that scrapers can ingest them and say "nope we won't scrape there again in the future".
The problem with that approach is that it will absolutely nuke your website's SEO. So fuzzycanary also checks user agents and won't show the links to legitimate search engines, so Google and Bing won't see them.
One caveat: if you're using a static site generator it will bake the links into your HTML for everyone, including googlebot. Does anyone have a work-around for this that doesn't involve using a proxy?
Please try it out! Setup is one component or one import.
(And don't tell me it's a terrible idea because I already know it is)
package: https://www.npmjs.com/package/@fuzzycanary/core gh: https://github.com/vivienhenz24/fuzzy-canary
I've been writing ring buffers wrong all these years (2016)
2025-12-16 @ 19:11:47Points: 39Comments: 18
Meta Segment Anything Model Audio
2025-12-16 @ 18:26:43Points: 110Comments: 14
Show HN: Picknplace.js, an alternative to drag-and-drop
2025-12-16 @ 16:12:06Points: 72Comments: 47
While it might take more time than a regular drag and drop, the benefit is for people who struggle with holding down the mouse button. With picknplace.js, you only need two clicks and some scrolling.
This solution is meant as an experiment, so I'm open to discussion.