Hacker News

Latest

Show HN: Mljar Studio – local AI data analyst that saves analysis as notebooks

2026-05-02 @ 10:21:31Points: 19Comments: 1

I’ve been working on mljar-supervised (open-source AutoML for tabular data) for a few years. Recently I built a desktop app around it called MLJAR Studio.

The idea is simple: you talk to your data in natural language, the AI generates Python code, executes it locally, and the whole conversation becomes a reproducible notebook (*.ipynb file). So instead of just chatting with data, you end up with something you can inspect, modify, and rerun.

What MLJAR Studio does:

- Sets up a local Python environment automatically, runs on Mac, Windows, and Linux

- Installs missing packages during the conversation

- Built-in AutoML for tabular data (classification, regression, multiclass)

- Works with standard Python libraries (pandas, matplotlib, etc.)

- Works with any data file: CSV, Excel, Stata, Parquet ...

- Connects to PostgreSQL, MySQL, SQL Server, Snowflake, Databricks, and Supabase.

For AI: use Ollama locally (zero data egress), bring your own OpenAI key, or use MLJAR AI add-on.

I built this because I wanted something between Jupyter Notebook (flexible but manual) and AI tools that generate code but don’t preserve the workflow. Most tools I tried either hide too much or don’t give reproducible results and are cloud based

Demos:

- 60-second demo: https://youtu.be/BjxpZYRiY4c

- Full 3-minute analysis: https://youtu.be/1DHMMxaNJxI

Pricing is $199 one-time, with a 7-day trial.

Curious if this is useful for others doing real data work, or if I’m solving my own problem here.

Happy to answer questions.

How fast is a macOS VM, and how small could it be?

2026-05-02 @ 09:30:49Points: 50Comments: 11

Show HN: Browser-based light pollution simulator using real photometric data

2026-05-02 @ 09:08:18Points: 20Comments: 3

The atmospheric scattering model is currently single-scattering Rayleigh+Mie. Is that defensible for the use case, or should I move toward multi-scattering? The Bistro test scene works well visually but isn't a controlled environment. Anyone know of a public urban geometry asset that's more typical of real road-lighting evaluation? The CJJ 45 implementation (China's national road lighting standard) is the only one I've had to reverse-engineer from translated PDFs. If anyone has primary-source experience with it, I'd value a sanity check.

Open-source on GitHub (eulumdat-rs and the related crates). Crates.io: eulumdat

Show HN: Filling PDF forms with AI using client-side tool calling

2026-05-02 @ 08:54:27Points: 18Comments: 9

I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor. It fills fields, answers questions, focuses on a specific field, adds fields, deletes pages, and so on.

It's built on top of SimplePDF that I started 7 years ago, pioneering privacy-respecting client-side pdf editing, now used monthly by 200k+ people.

As for the privacy model: the PDF itself never leaves the browser. Parsing, rendering, and field detection all run client-side.

The text the model needs (and your messages) goes to whatever LLM you point at. By default that's our demo proxy (DeepSeek V4 Flash, rate-capped), but you can BYOK and point it at any cloud provider, or go fully local (I've been testing with LM Studio).

Unlike the existing "Chat with PDF" tools that only retrieve the text/OCR layer, Copilot can act on the PDF: filling fields, adding fields (detected client-side using CommonForms by Joe Barrow [1], jbarrow on HN with some post-processing heuristics I added on top), focusing on fields, deleting pages, and so on.

I built this because SimplePDF is mostly used by healthcare customers where document privacy is paramount, and I wanted an AI experience that didn't require shipping PII to a third party. Stack is pretty standard:

- Tanstack Start

- AI SDK from Vercel

- Tailwind (I personally prefer CSS modules, I'm old-school but the goal since I open source it, I figured that Tailwind would be a better fit)

The more interesting part is the client-side tool calling: events are passed back and forth via iframe postMessage.

If you're not familiar with "tool calling" and "client-side tool calling", a quick primer:

Tool calling is what LLMs use to take actions. When Claude runs grep or ls, or hits an MCP server, those are tool calls.

Client-side tool calling means the intent to call a tool comes from the LLM, but the execution happens in the browser.

That matters for: speed, you can't go faster than client-to-client operations and also gives you the ability to limit the data you expose to the LLM. For the demo I do feed the content of the document to the LLM, but that connection could be severed as simply as removing the tool that exposes the content data.

The demo is fully open source, available on Github [2] and the demo is the same as the link of this post [3]

What's not open source is SimplePDF itself (loaded as the iframe).

I could talk on and on about this, let me know if you have any questions, anything goes!

[1] https://github.com/jbarrow/commonforms

[2] https://github.com/SimplePDF/simplepdf-embed/tree/main/copil...

[3] https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...

Show HN: Large Scale Article Extract of Newspapers 1730s-1960s

2026-05-02 @ 08:42:45Points: 12Comments: 5

Problem: I wanted to search through newspaper archives, but when I tried every service only lets you search for keywords and dates, and gives you back raw images of the papers, and too many of them with no context. A sea of noise.

Solution: I taught machines how to read the newspapers and so far I've extracted the content from > 600k pages (about 5TB) from the Chronicling America collection. Problems I had to deal with were an infinite variety of layouts, font sizes, image scan qualities, resolutions, aspect ratios, navigating around the images on the page. I also had to figure out how to get OCR to be nearly perfect so people wouldn't hate reading the extracts. I stitched together a multi-model pipeline (layout tech, ocr tech, llm, vllm) with heuristics to go from layout -> segmentation -> classification. I put it all in OpenSearch / Postgres and made it semantically searchable and also put an agentic search tool on top that knows how to use the API really well and helps you write queries to find what you're looking for. Happy to discuss AWS architecture and scaling as well, that was tough!

If you have five minutes and you just want to jump in and have your own personalized experience, what I would suggest is:

Before searching for anything, go to the Sleuth page Ask it about anything from 1736 to 1963, maybe 1 or 2 follow up questions Then go to the search page so you can see the queries it wrote for you (bottom left "saved queries") and uncover more info on whatever it is you're interested in

If you think it's cool and you want to learn more, then there's about 10 minutes of video guides on the various capabilities in "Guide" on the nav bar

Some other people have also taken a crack at this, notably:

https://dell-research-harvard.github.io/resources/americanst... (very good attempt) https://labs.loc.gov/work/experiments/newspaper-navigator/ (focused on images)

Why are there both TMP and TEMP environment variables? (2015)

2026-05-02 @ 08:23:23Points: 53Comments: 23

SKILL.make: Makefile Styled Skill File

2026-05-02 @ 08:18:15Points: 28Comments: 16

Show HN: Stop playing my matchstick puzzles, start building your own in seconds

2026-05-02 @ 05:04:39Points: 18Comments: 17

CollectWise (YC F24) Is Hiring

2026-05-02 @ 04:43:20Points: 1

Why does it take so long to release black fan versions?

2026-05-02 @ 04:38:04Points: 303Comments: 141

Ask.com has closed

2026-05-02 @ 04:12:35Points: 297Comments: 152

K3k: Kubernetes in Kubernetes

2026-05-02 @ 04:00:42Points: 70Comments: 39

A report on burnout in open source software communities (2025) [pdf]

2026-05-01 @ 23:24:10Points: 82Comments: 30

Ti-84 Evo

2026-05-01 @ 20:06:59Points: 473Comments: 399

Lib0xc: A set of C standard library-adjacent APIs for safer systems programming

2026-05-01 @ 19:10:56Points: 152Comments: 57

New research suggests people can communicate and practice skills while dreaming

2026-05-01 @ 17:47:42Points: 360Comments: 207

DeepSeek V4–almost on the frontier, a fraction of the price

2026-05-01 @ 16:52:43Points: 162Comments: 78

Apocalypse Early Warning System

2026-05-01 @ 16:21:23Points: 205Comments: 97

I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA

2026-05-01 @ 15:07:02Points: 169Comments: 221

Ask HN: Who is hiring? (May 2026)

2026-05-01 @ 15:00:07Points: 267Comments: 284

not an option.

Please only post if you personally are part of the hiring company—no recruiting firms or job boards. One post per company. If it isn't a household name, explain what your company does.

Please only post if you are actively filling a position and are committed to replying to applicants.

Commenters: please don't reply to job posts to complain about something. It's off topic here.

Readers: please only email if you are personally interested in the job.

Searchers: try https://nthesis.ai/public/hn-who-is-hiring, https://dheerajck.github.io/hnwhoishiring/, http://nchelluri.github.io/hnjobs/, https://hnjobs.emilburzo.com, or this (unofficial) Chrome extension: https://chromewebstore.google.com/detail/hn-hiring-pro/mpfal....

Don't miss this other fine thread: Who wants to be hired? https://news.ycombinator.com/item?id=47975570

Bitmap and tilemap generation from a single example

2026-04-30 @ 11:51:02Points: 30Comments: 5

A Gopher Meets a Crab

2026-04-30 @ 06:30:53Points: 42Comments: 43

LFM2-24B-A2B: Scaling Up the LFM2 Architecture

2026-04-30 @ 03:15:42Points: 44Comments: 9

Eka’s robotic claw feels like we're approaching a ChatGPT moment

2026-04-29 @ 22:56:10Points: 149Comments: 209

Artemis II Photo Timeline

2026-04-29 @ 20:48:17Points: 231Comments: 20

Show HN: SimDrive – a browser racing game with your phone as the controller:D

2026-04-29 @ 20:24:49Points: 7Comments: 4

And it moved to online games as more of us moved to other places and party games became our go-to

Love jackbox and gaming couch (that I discovered here on HN)

I saw the vibej.am for this year and I couldn't think of a good enough idea till about the 3rd week but here I am!

This is a game made for me and my friends and it's also very inspired by what I thought the PS3 6-axis controller would be when I first heart about it

Thanks for reading this and I hope you and your friends enjoy playing

You can play either split screen or on separate screens with the room code or even have two split screens play too (max 8 players for now)

It started with F1 cars but working on go-karts, trucks and tuk tuks :D

I've tried to include a lot of "simulation" in the game. There's decent physics that include down force, grip etc and it even changes with/without the rain

Ideally you have an android+chrome phone so you can "feel" your driving with vibrations/haptics but otherwise you'll have just sound as feedback

Oops long post, bye

Direct electrochemical black coffee quality appraisal using cyclic voltammetry

2026-04-29 @ 18:58:37Points: 55Comments: 28

The USB Situation

2026-04-29 @ 16:49:13Points: 36Comments: 44

To Restore an Island Paradise, Add Fungi

2026-04-29 @ 15:28:55Points: 72Comments: 14

Show HN: DAC – open-source dashboard as code tool for agents and humans

2026-04-29 @ 14:37:20Points: 29Comments: 4

When agents became a reality one of the first things I wanted to do was to automate building dashboards. The first, and the most obvious, wall that I ran into was that a lot of the tools were just driven by UI. This meant that without the agents handling browser UIs and whatnot, it wasn't possible to have the agents do that. In addition, it would be impossible to review any of the changes the agent would make.

The first instinct there is to get your agent to build a React app for the dashboard. This works beautifully for the happy path, but I quickly ran into other issues there: - every dashboard turns out to be different - have to implement a backend to centralize the query execution - there is no centralized mechanism to control the rules and standards around visualizations - there is no way to get a semantic layer working with the dashboards easily

In the end, agents ended up reinventing the wheel for every new dashboard, even under the same project. Building a standardized, local project for these turned out to be building a BI tool from scratch.

After trying these out, I asked myself: what if the dashboards were built for agents as the primary user?

A product like that would need to have a couple of features: - First of all, everything needs to be driven by version-controllable text. YAML is fine. - Changes to the dashboards should be easy to review and understand by humans. - Agents are great at writing code, it'd be great if this were driven by code to have dynamic stuff: JSX would be great. - Static analysis being a first-class citizen: validate dashboards before deploying. Agents can check their work too. - A standardized way of deploying these based on a couple of files in a folder: operationally very simple. - Built-in semantic layer to standardize metrics.

That's what I ended up building: dac (Dashboard-As-Code) is an open-source tool and a spec to define dashboards, well, as code. It contains an implementation in Go that can be deployed as a single binary anywhere. The dashboards are defined in YAML and JSX, YAML for static stuff, JSX for dynamic dashboards. You can run queries at load time to define conditional charts, generate tabs on the fly per customer, or list charts for each A/B test you are running.

I built it in Go because I do love Go, and I think it is the greatest language at the moment to work with AI agents.

dac runs as a single binary, you can get started with a `dac init` command and it'll automatically create some sample dashboards for you based on duckdb. It supports 10+ SQL backends, with more to come. It supports validation, custom themes and whatnot.

You can see it here: https://github.com/bruin-data/dac

I would love to hear what can be improved here, please let me know your thoughts.

Archives

2026

2025

2024

2023

2022