Hacker News

Latest

Faster Asin() Was Hiding in Plain Sight

2026-03-11 @ 14:35:53Points: 81Comments: 24

UK MPs give ministers powers to restrict Internet for under 18s

2026-03-11 @ 14:08:48Points: 65Comments: 53

Whistleblower: DOGE member took Social Security data to new job

2026-03-11 @ 13:52:01Points: 367Comments: 137

Where did you think the training data was coming from?

2026-03-11 @ 13:33:33Points: 34Comments: 6

The entities enabling scientific fraud at scale are large, resilient and growing

2026-03-11 @ 13:32:12Points: 107Comments: 37

Lego's 0.002 mm Specification and Its Implications for Manufacturing (2025)

2026-03-11 @ 13:22:39Points: 203Comments: 139

Microsoft BitNet: 100B Param 1-Bit model for local CPUs

2026-03-11 @ 12:27:15Points: 161Comments: 87

AI Agent Hacks McKinsey

2026-03-11 @ 09:59:03Points: 138Comments: 44

Create value for others and don’t worry about the returns

2026-03-11 @ 05:45:49Points: 572Comments: 392

TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

2026-03-11 @ 05:42:55Points: 81Comments: 22

Standardizing source maps

2026-03-11 @ 04:42:23Points: 70Comments: 8

Writing my own text editor, and daily-driving it

2026-03-11 @ 02:04:22Points: 163Comments: 79

Zig – Type Resolution Redesign and Language Changes

2026-03-11 @ 01:24:47Points: 347Comments: 182

Universal vaccine against respiratory infections and allergens

2026-03-10 @ 22:33:48Points: 329Comments: 121

U+237C ⍼ Is Azimuth

2026-03-10 @ 22:33:45Points: 367Comments: 70

Cloudflare crawl endpoint

2026-03-10 @ 22:27:15Points: 408Comments: 154

RISC-V Is Sloooow

2026-03-10 @ 20:11:54Points: 289Comments: 309

Agents that run while I sleep

2026-03-10 @ 19:09:46Points: 386Comments: 437

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

2026-03-10 @ 17:14:52Points: 231Comments: 143

Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.

To get started:

  brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
  brew install rcli
  rcli setup   # downloads ~1 GB of models
  rcli         # interactive mode with push-to-talk
Or:

  curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):

LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms

STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.

TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.

We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.

The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.

We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.

MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:

LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...

How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.

Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...

RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.

Source: https://github.com/RunanywhereAI/RCLI (MIT)

Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg

What would you build if on-device AI were genuinely as fast as cloud?

Debian decides not to decide on AI-generated contributions

2026-03-10 @ 14:53:13Points: 360Comments: 271

Tony Hoare has died

2026-03-10 @ 14:50:16Points: 1927Comments: 252

Levels of Agentic Engineering

2026-03-10 @ 08:48:40Points: 248Comments: 122

Yann LeCun raises $1B to build AI that understands the physical world

2026-03-10 @ 08:46:53Points: 556Comments: 452

SSH Secret Menu

2026-03-10 @ 03:28:38Points: 293Comments: 137

PeppyOS: A simpler alternative to ROS 2 (now with containers support)

2026-03-08 @ 10:46:56Points: 48Comments: 15

Show HN: I wrote down every expensive hardware development mistake I've seen

2026-03-08 @ 10:29:50Points: 11Comments: 2

Julia Snail – An Emacs Development Environment for Julia Like Clojure's Cider

2026-03-08 @ 09:27:34Points: 128Comments: 16

When the chain becomes the product: Seven years inside a token-funded venture

2026-03-08 @ 08:37:02Points: 42Comments: 18

Roblox is minting teen millionaires

2026-03-08 @ 01:20:05Points: 202Comments: 242

Building a TB-303 from Scratch

2026-03-07 @ 21:18:11Points: 161Comments: 62

Archives

2026

2025

2024

2023

2022