Hacker News

Latest

Show HN: RunAnwhere – Faster AI Inference on Apple Silicon

2026-03-10 @ 17:14:52Points: 64Comments: 13

Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.

To get started:

  brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
  brew install rcli
  rcli setup   # downloads ~1 GB of models
  rcli         # interactive mode with push-to-talk
Or:

  curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):

LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms

STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.

TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.

We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.

The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.

We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.

MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:

LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...

How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.

Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...

RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.

Source: https://github.com/RunanywhereAI/RCLI (MIT)

Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg

What would you build if on-device AI were genuinely as fast as cloud?

I built a programming language using Claude Code

2026-03-10 @ 16:37:29Points: 33Comments: 41

$3 ChromeOS Flex stick will revive old and outdated computers

2026-03-10 @ 16:24:26Points: 26Comments: 16

Launch HN: Didit (YC W26) – Stripe for Identity Verification

2026-03-10 @ 15:08:05Points: 32Comments: 37

https://didit.me) with my identical twin brother Alejandro. We are building a unified identity layer—a single integration that handles KYC, AML, biometrics, authentication, and fraud prevention globally. Here’s a demo: https://www.youtube.com/watch?v=eTdcg7JCc4M&t=7s.

Being identical twins, we’ve spent our whole lives dealing with identity confusion, so it is a bit of irony that we ended up building a company to solve it for the internet.

Growing up in Barcelona, we spent years working on products where identity issues were a massive pain. We eventually realized that for most engineering teams, "global identity" is a fiction—in reality it is a fragmented mess. You end up stitching together one provider for US driver's licenses, another for NFC chip extraction in Europe, a third for AML screening, a fourth for government database validation in Brazil, a fifth for liveness detection on low-end Android devices, and yet another for biometric authentication and age estimation. Orchestrating these into a cohesive flow while adapting to localized regulations like GDPR or CCPA is a nightmare that makes no sense for most teams to be working on.

When we looked at the existing "enterprise" solutions, we were baffled. Most require a three-week sales cycle just to see a single page of documentation. Pricing is hidden behind "Contact Us" buttons, and the products themselves are often bloated legacy systems with high latency and abysmal accuracy.

We also noticed a recurring pattern: these tools are frequently optimized only for the latest iOS hardware, performing poorly on the mid-range or older Android devices that make up a huge percentage of the market. This results in a "leaky" funnel where legitimate users drop off due to technical friction and fraud goes undetected because data points are spread across disparate systems. Also, these systems are expensive, often requiring massive annual commits that price out early-stage startups.

We wanted to build a system that is accessible to everyone—a tool that works like Stripe for identity, where you can get a sandbox key in thirty seconds and start running real verifications with world-class UX and transparent pricing.

To solve this, we took the "delusional" path of full vertical integration. Rather than just wrapping existing APIs, we built our own ID verification and biometric AI models—from classification and fraud detection to OCR models for almost every language. This vertical integration is fundamental to how we handle user data. Because we own the entire stack, we control the flow of sensitive information from end-to-end. Your users' data doesn't get bounced around through a chain of third-party black boxes or regional middle-men. This allows us to provide a level of security and privacy that is impossible when you are just an orchestration layer for other people's APIs.

We believe that identity verification is one of the most critical problems on the internet, and must be solved correctly and ethically. Many people are rightfully skeptical, especially given recent news about projects that have turned identity into a tool for mass data collection or surveillance. We don’t do anything of the sort, but we also don’t want to be coerced in the future, so we facilitate data minimization on the customer side. Instead of a business asking for a full ID scan, we allow them to simply verify a specific attribute—like "is this person over 18?"—without ever seeing the document itself. Our goal is to move the industry away from data hoarding and toward zero knowledge, or at least minimal knowledge, verification.

The result of our all-in-one approach is a platform that increases onboarding rates while lowering identity costs. We’ve focused on building a high-confidence automated loop that reduces the need for manual review by up to 90%, catching sophisticated deepfakes and spoofing attempts that standard vision models miss. Our SDK is optimized for low bandwidth connections, ensuring it works on spotty 3G networks where legacy providers usually fail.

We are fully live, and you can jump into the dashboard at https://business.didit.me to see the workflow orchestration immediately. Our pricing is transparent and success-based; we don’t believe in hiding costs behind a sales call.

We’re here all day to answer any question—whether it’s about how we handle NFC verification, our approach to deepfake detection, the general ethics behind biometric data retention, or how we think about the future of identity. We’d love your brutal HN feedback on our APIs, platform, and integration flow!

Amazon is holding a mandatory meeting about AI breaking its systems

2026-03-10 @ 15:01:35Points: 250Comments: 162

Debian decides not to decide on AI-generated contributions

2026-03-10 @ 14:53:13Points: 149Comments: 116

We are building data breach machines and nobody cares

2026-03-10 @ 14:50:43Points: 18Comments: 6

Tony Hoare has died

2026-03-10 @ 14:50:16Points: 768Comments: 82

Meta acquires Moltbook

2026-03-10 @ 14:38:06Points: 211Comments: 133

Rebasing in Magit

2026-03-10 @ 13:38:39Points: 130Comments: 92

Sending Jabber/XMPP Messages via HTTP

2026-03-10 @ 13:29:21Points: 42Comments: 5

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

2026-03-10 @ 13:18:55Points: 137Comments: 47

Show HN: DD Photos – open-source photo album site generator (Go and SvelteKit)

2026-03-10 @ 13:13:48Points: 44Comments: 11

So I built DD Photos. You export photos from whatever you already use (Lightroom, Apple Photos, etc.) into folders, run `photogen` (a Go CLI) to resize them to WebP and generate JSON indexes, then deploy the SvelteKit static site anywhere that serves files. Apache, S3, whatever. No server-side code, no database.

Built over several weeks with heavy use of Claude Code, which I found genuinely useful for this kind of full-stack project spanning Go, SvelteKit/TypeScript, Apache config, Docker, and Playwright tests. Happy to discuss that experience too.

Live example: https://photos.donohoe.info Repo: https://github.com/dougdonohoe/ddphotos

Intel Demos Chip to Compute with Encrypted Data

2026-03-10 @ 13:10:48Points: 157Comments: 50

Online age-verification tools for child safety are surveilling adults

2026-03-10 @ 12:55:42Points: 337Comments: 196

Traffic from Russia to Cloudflare is 60% down from last year

2026-03-10 @ 12:55:14Points: 98Comments: 52

PgAdmin 4 9.13 with AI Assistant Panel

2026-03-10 @ 11:58:36Points: 68Comments: 20

Yann LeCun's AI startup raises $1B in Europe's largest ever seed round

2026-03-10 @ 10:50:30Points: 386Comments: 207

I put my whole life into a single database

2026-03-10 @ 10:07:48Points: 343Comments: 162

LoGeR – 3D reconstruction from extremely long videos (DeepMind, UC Berkeley)

2026-03-10 @ 06:16:06Points: 122Comments: 26

Two Years of Emacs Solo

2026-03-10 @ 00:16:44Points: 330Comments: 123

No, it doesn't cost Anthropic $5k per Claude Code user

2026-03-09 @ 23:22:06Points: 406Comments: 296

I used pulsar detection techniques to turn a phone into a watch timegrapher

2026-03-07 @ 19:54:26Points: 22Comments: 4

A New Version of Our Oracle Solaris Environment for Developers

2026-03-07 @ 18:56:56Points: 36Comments: 22

The Gervais Principle, or the Office According to "The Office" (2009)

2026-03-07 @ 11:28:37Points: 226Comments: 96

How many options fit into a boolean?

2026-03-07 @ 05:31:13Points: 33Comments: 17

TCXO Failure Analysis

2026-03-07 @ 04:49:38Points: 92Comments: 39

Practical Guide to Bare Metal C++

2026-03-07 @ 04:22:20Points: 96Comments: 34

Lotus 1-2-3 on the PC with DOS

2026-03-06 @ 19:11:24Points: 164Comments: 64

Caxlsx: Ruby gem for xlsx generation with charts, images, schema validation

2026-03-06 @ 15:59:29Points: 60Comments: 4

Archives

2026

2025

2024

2023

2022