Hacker News

Latest

Uber wants to turn its drivers into a sensor grid for AV companies

2026-05-02 @ 15:38:17Points: 42Comments: 53

Zugzwang

2026-05-02 @ 15:34:25Points: 59Comments: 27

LLMs consistently pick resumes they generate over ones by humans or other models

2026-05-02 @ 15:28:13Points: 288Comments: 134

America's Expanding Domestic Surveillance

2026-05-02 @ 15:02:13Points: 86Comments: 35

Refusal in Language Models Is Mediated by a Single Direction

2026-05-02 @ 13:15:23Points: 40Comments: 14

An unknown Sega Saturn project has come to light after 29 years

2026-05-02 @ 12:39:52Points: 82Comments: 2

Open Design: Use Your Coding Agent as a Design Engine

2026-05-02 @ 12:16:16Points: 117Comments: 71

Craig Venter of Human Genome Project Dies at 79

2026-05-02 @ 12:08:51Points: 46Comments: 10

Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks

2026-05-02 @ 10:21:31Points: 51Comments: 11

I’ve been working on mljar-supervised (open-source AutoML for tabular data) for a few years. Recently I built a desktop app around it called MLJAR Studio.

The idea is simple: you talk to your data in natural language, the AI generates Python code, executes it locally, and the whole conversation becomes a reproducible notebook (*.ipynb file). So instead of just chatting with data, you end up with something you can inspect, modify, and rerun.

What MLJAR Studio does:

- Sets up a local Python environment automatically, runs on Mac, Windows, and Linux

- Installs missing packages during the conversation

- Built-in AutoML for tabular data (classification, regression, multiclass)

- Works with standard Python libraries (pandas, matplotlib, etc.)

- Works with any data file: CSV, Excel, Stata, Parquet ...

- Connects to PostgreSQL, MySQL, SQL Server, Snowflake, Databricks, and Supabase.

For AI: use Ollama locally (zero data egress), bring your own OpenAI key, or use MLJAR AI add-on.

I built this because I wanted something between Jupyter Notebook (flexible but manual) and AI tools that generate code but don’t preserve the workflow. Most tools I tried either hide too much or don’t give reproducible results and are cloud based

Demos:

- 60-second demo: https://youtu.be/BjxpZYRiY4c

- Full 3-minute analysis: https://youtu.be/1DHMMxaNJxI

Pricing is $199 one-time, with a 7-day trial.

Curious if this is useful for others doing real data work, or if I’m solving my own problem here.

Happy to answer questions.

How fast is a macOS VM, and how small could it be?

2026-05-02 @ 09:30:49Points: 174Comments: 65

Show HN: Browser-based light pollution simulator using real photometric data

2026-05-02 @ 09:08:18Points: 34Comments: 11

The atmospheric scattering model is currently single-scattering Rayleigh+Mie. Is that defensible for the use case, or should I move toward multi-scattering? The Bistro test scene works well visually but isn't a controlled environment. Anyone know of a public urban geometry asset that's more typical of real road-lighting evaluation? The CJJ 45 implementation (China's national road lighting standard) is the only one I've had to reverse-engineer from translated PDFs. If anyone has primary-source experience with it, I'd value a sanity check.

Open-source on GitHub (eulumdat-rs and the related crates). Crates.io: eulumdat

Show HN: Filling PDF forms with AI using client-side tool calling

2026-05-02 @ 08:54:27Points: 41Comments: 20

I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor. It fills fields, answers questions, focuses on a specific field, adds fields, deletes pages, and so on.

It's built on top of SimplePDF that I started 7 years ago, pioneering privacy-respecting client-side pdf editing, now used monthly by 200k+ people.

As for the privacy model: the PDF itself never leaves the browser. Parsing, rendering, and field detection all run client-side.

The text the model needs (and your messages) goes to whatever LLM you point at. By default that's our demo proxy (DeepSeek V4 Flash, rate-capped), but you can BYOK and point it at any cloud provider, or go fully local (I've been testing with LM Studio).

Unlike the existing "Chat with PDF" tools that only retrieve the text/OCR layer, Copilot can act on the PDF: filling fields, adding fields (detected client-side using CommonForms by Joe Barrow [1], jbarrow on HN with some post-processing heuristics I added on top), focusing on fields, deleting pages, and so on.

I built this because SimplePDF is mostly used by healthcare customers where document privacy is paramount, and I wanted an AI experience that didn't require shipping PII to a third party. Stack is pretty standard:

- Tanstack Start

- AI SDK from Vercel

- Tailwind (I personally prefer CSS modules, I'm old-school but the goal since I open source it, I figured that Tailwind would be a better fit)

The more interesting part is the client-side tool calling: events are passed back and forth via iframe postMessage.

If you're not familiar with "tool calling" and "client-side tool calling", a quick primer:

Tool calling is what LLMs use to take actions. When Claude runs grep or ls, or hits an MCP server, those are tool calls.

Client-side tool calling means the intent to call a tool comes from the LLM, but the execution happens in the browser.

That matters for: speed, you can't go faster than client-to-client operations and also gives you the ability to limit the data you expose to the LLM. For the demo I do feed the content of the document to the LLM, but that connection could be severed as simply as removing the tool that exposes the content data.

The demo is fully open source, available on Github [2] and the demo is the same as the link of this post [3]

What's not open source is SimplePDF itself (loaded as the iframe).

I could talk on and on about this, let me know if you have any questions, anything goes!

[1] https://github.com/jbarrow/commonforms

[2] https://github.com/SimplePDF/simplepdf-embed/tree/main/copil...

[3] https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...

Show HN: Large Scale Article Extract of Newspapers 1730s-1960s

2026-05-02 @ 08:42:45Points: 40Comments: 18

Problem: I wanted to search through newspaper archives, but when I tried every service only lets you search for keywords and dates, and gives you back raw images of the papers, and too many of them with no context. A sea of noise.

Solution: I taught machines how to read the newspapers and so far I've extracted the content from > 600k pages (about 5TB) from the Chronicling America collection. Problems I had to deal with were an infinite variety of layouts, font sizes, image scan qualities, resolutions, aspect ratios, navigating around the images on the page. I also had to figure out how to get OCR to be nearly perfect so people wouldn't hate reading the extracts. I stitched together a multi-model pipeline (layout tech, ocr tech, llm, vllm) with heuristics to go from layout -> segmentation -> classification. I put it all in OpenSearch / Postgres and made it semantically searchable and also put an agentic search tool on top that knows how to use the API really well and helps you write queries to find what you're looking for. Happy to discuss AWS architecture and scaling as well, that was tough!

If you have five minutes and you just want to jump in and have your own personalized experience, what I would suggest is:

Before searching for anything, go to the Sleuth page Ask it about anything from 1736 to 1963, maybe 1 or 2 follow up questions Then go to the search page so you can see the queries it wrote for you (bottom left "saved queries") and uncover more info on whatever it is you're interested in

If you think it's cool and you want to learn more, then there's about 10 minutes of video guides on the various capabilities in "Guide" on the nav bar

Some other people have also taken a crack at this, notably:

https://dell-research-harvard.github.io/resources/americanst... (very good attempt) https://labs.loc.gov/work/experiments/newspaper-navigator/ (focused on images)

Why are there both TMP and TEMP environment variables? (2015)

2026-05-02 @ 08:23:23Points: 153Comments: 74

CollectWise (YC F24) Is Hiring

2026-05-02 @ 04:43:20Points: 1

Why does it take so long to release black fan versions?

2026-05-02 @ 04:38:04Points: 575Comments: 246

Ask.com has closed

2026-05-02 @ 04:12:35Points: 400Comments: 205

Ti-84 Evo

2026-05-01 @ 20:06:59Points: 537Comments: 439

New research suggests people can communicate and practice skills while dreaming

2026-05-01 @ 17:47:42Points: 416Comments: 244

DeepSeek V4–almost on the frontier, a fraction of the price

2026-05-01 @ 16:52:43Points: 370Comments: 232

I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA

2026-05-01 @ 15:07:02Points: 186Comments: 233

SFO Gate Explorer

2026-04-30 @ 18:01:14Points: 28Comments: 31

Dotcl: Common Lisp Implementation on .NET

2026-04-30 @ 16:33:24Points: 118Comments: 19

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

2026-04-30 @ 13:15:04Points: 71Comments: 35

Bitmap and tilemap generation from a single example

2026-04-30 @ 11:51:02Points: 62Comments: 13

Artemis II Photo Timeline

2026-04-29 @ 20:48:17Points: 305Comments: 25

Why IPv6 is so complicated

2026-04-29 @ 19:37:38Points: 33Comments: 65

To Restore an Island Paradise, Add Fungi

2026-04-29 @ 15:28:55Points: 116Comments: 31

Show HN: DAC – open-source dashboard as code tool for agents and humans

2026-04-29 @ 14:37:20Points: 75Comments: 22

When agents became a reality one of the first things I wanted to do was to automate building dashboards. The first, and the most obvious, wall that I ran into was that a lot of the tools were just driven by UI. This meant that without the agents handling browser UIs and whatnot, it wasn't possible to have the agents do that. In addition, it would be impossible to review any of the changes the agent would make.

The first instinct there is to get your agent to build a React app for the dashboard. This works beautifully for the happy path, but I quickly ran into other issues there: - every dashboard turns out to be different - have to implement a backend to centralize the query execution - there is no centralized mechanism to control the rules and standards around visualizations - there is no way to get a semantic layer working with the dashboards easily

In the end, agents ended up reinventing the wheel for every new dashboard, even under the same project. Building a standardized, local project for these turned out to be building a BI tool from scratch.

After trying these out, I asked myself: what if the dashboards were built for agents as the primary user?

A product like that would need to have a couple of features: - First of all, everything needs to be driven by version-controllable text. YAML is fine. - Changes to the dashboards should be easy to review and understand by humans. - Agents are great at writing code, it'd be great if this were driven by code to have dynamic stuff: JSX would be great. - Static analysis being a first-class citizen: validate dashboards before deploying. Agents can check their work too. - A standardized way of deploying these based on a couple of files in a folder: operationally very simple. - Built-in semantic layer to standardize metrics.

That's what I ended up building: dac (Dashboard-As-Code) is an open-source tool and a spec to define dashboards, well, as code. It contains an implementation in Go that can be deployed as a single binary anywhere. The dashboards are defined in YAML and JSX, YAML for static stuff, JSX for dynamic dashboards. You can run queries at load time to define conditional charts, generate tabs on the fly per customer, or list charts for each A/B test you are running.

I built it in Go because I do love Go, and I think it is the greatest language at the moment to work with AI agents.

dac runs as a single binary, you can get started with a `dac init` command and it'll automatically create some sample dashboards for you based on duckdb. It supports 10+ SQL backends, with more to come. It supports validation, custom themes and whatnot.

You can see it here: https://github.com/bruin-data/dac

I would love to hear what can be improved here, please let me know your thoughts.

Barman – Backup and Recovery Manager for PostgreSQL

2026-04-29 @ 13:54:48Points: 80Comments: 14

Archives

2026

2025

2024

2023

2022