Hopefully some of you find it interesting! Blog post here: https://andreasthinks.me/posts/octosphere/octosphere.html
Hacker News
Latest
China Moon Mission: Aiming for 2030 Lunar Landing
2026-02-03 @ 19:32:11Points: 45Comments: 25
AliSQL: Alibaba's open-source MySQL with vector and DuckDB engines
2026-02-03 @ 18:40:18Points: 73Comments: 6
Y Combinator will let founders receive funds in stablecoins
2026-02-03 @ 18:28:48Points: 39Comments: 47
Xcode 26.3 unlocks the power of agentic coding
2026-02-03 @ 18:04:08Points: 169Comments: 114
Sandboxing AI Agents in Linux
2026-02-03 @ 17:35:37Points: 32Comments: 20
Deno Sandbox
2026-02-03 @ 17:33:20Points: 193Comments: 69
Migrate Wizard – IMAP Based Email Migration Tool
2026-02-03 @ 17:19:47Points: 17Comments: 17
Defining Safe Hardware Design [pdf]
2026-02-03 @ 17:12:04Points: 28Comments: 4
Show HN: Octosphere, a tool to decentralise scientific publishing
2026-02-03 @ 17:11:42Points: 27Comments: 11
Show HN: I built "AI Wattpad" to eval LLMs on fiction
2026-02-03 @ 17:08:43Points: 15Comments: 20
Turns out this is surprisingly hard to answer. Creative writing isn't a single capability – it's a pipeline: brainstorming → writing → memory. You need to generate interesting premises, execute them with good prose, and maintain consistency across a long narrative. Most benchmarks test these in isolation, but readers experience them as a whole.
The current evaluation landscape is fragmented: Memory benchmarks like FictionLive's tests use MCQs to check if models remember plot details across long contexts. Useful, but memory is necessary for good fiction, not sufficient. A model can ace recall and still write boring stories.
Author-side usage data from tools like Novelcrafter shows which models writers prefer as copilots. But that measures what's useful for human-AI collaboration, not what produces engaging standalone output. Authors and readers have different needs.
LLM-as-a-judge is the most common approach for prose quality, but it's notoriously unreliable for creative work. Models have systematic biases (favoring verbose prose, certain structures), and "good writing" is genuinely subjective in ways that "correct code" isn't.
What's missing is a reader-side quantitative benchmark – something that measures whether real humans actually enjoy reading what these models produce. That's the gap Narrator fills: views, time spent reading, ratings, bookmarks, comments, return visits. Think of it as an "AI Wattpad" where the models are the authors.
I shared an early DSPy-based version here 5 months ago (https://news.ycombinator.com/item?id=44903265). The big lesson: one-shot generation doesn't work for long-form fiction. Models lose plot threads, forget characters, and quality degrades across chapters.
The rewrite: from one-shot to a persistent agent loop
The current version runs each model through a writing harness that maintains state across chapters. Before generating, the agent reviews structured context: character sheets, plot outlines, unresolved threads, world-building notes. After generating, it updates these artifacts for the next chapter. Essentially each model gets a "writer's notebook" that persists across the whole story.
This made a measurable difference – models that struggled with consistency in the one-shot version improved significantly with access to their own notes.
Granular filtering instead of a single score:
We classify stories upfront by language, genre, tags, and content rating. Instead of one "creative writing" leaderboard, we can drill into specifics: which model writes the best Spanish Comedy? Which handles LitRPG stories with Male Leads the best? Which does well with romance versus horror?
The answers aren't always what you'd expect from general benchmarks. Some models that rank mid-tier overall dominate specific niches.
A few features I'm proud of:
Story forking lets readers branch stories CYOA-style – if you don't like where the plot went, fork it and see how the same model handles the divergence. Creates natural A/B comparisons.
Visual LitRPG was a personal itch to scratch. Instead of walls of [STR: 15 → 16] text, stats and skill trees render as actual UI elements. Example: https://narrator.sh/novel/beware-the-starter-pet/chapter/1
What I'm looking for:
More readers to build out the engagement data. Also curious if anyone else working on long-form LLM generation has found better patterns for maintaining consistency across chapters – the agent harness approach works but I'm sure there are improvements.
Show HN: PII-Shield – Log Sanitization Sidecar with JSON Integrity (Go, Entropy)
2026-02-03 @ 16:40:12Points: 12Comments: 7
Why deterministic? So that "pass123" always hashes to the same "[HIDDEN:a1b2c]", allowing QA/Devs to correlate errors without seeing the raw data.
Key features: 1. JSON Integrity: It parses JSON, sanitizes values, and rebuilds it. It guarantees valid JSON output for your SIEM (ELK/Datadog). 2. Entropy Detection: Uses context-aware entropy analysis to catch high-randomness strings. 3. Fail-Open: Designed as a transparent pipe wrapper to preserve app uptime.
The project is open-source (Apache 2.0).
Repo: https://github.com/aragossa/pii-shield Docs: https://pii-shield.gitbook.io/docs/
I'd love your feedback on the entropy/threshold logic!
France dumps Zoom and Teams as Europe seeks digital autonomy from the US
2026-02-03 @ 16:39:18Points: 483Comments: 277
Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust
2026-02-03 @ 16:29:34Points: 133Comments: 62
Tadpole – A modular and extensible DSL built for web scraping
2026-02-03 @ 16:29:13Points: 26Comments: 5
X offices raided in France
2026-02-03 @ 16:14:17Points: 164Comments: 127
Show HN: C discrete event SIM w stackful coroutines runs 45x faster than SimPy
2026-02-03 @ 16:09:07Points: 36Comments: 14
I have built Cimba, a multithreaded discrete event simulation library in C.
Cimba uses POSIX pthread multithreading for parallel execution of multiple simulation trials, while coroutines provide concurrency inside each simulated trial universe. The simulated processes are based on asymmetric stackful coroutines with the context switching hand-coded in assembly.
The stackful coroutines make it natural to express agentic behavior by conceptually placing oneself "inside" that process and describing what it does. A process can run in an infinite loop or just act as a one-shot customer passing through the system, yielding and resuming execution from any level of its call stack, acting both as an active agent and a passive object as needed. This is inspired by my own experience programming in Simula67, many moons ago, where I found the coroutines more important than the deservedly famous object-orientation.
Cimba turned out to run really fast. In a simple benchmark, 100 trials of an M/M/1 queue run for one million time units each, it ran 45 times faster than an equivalent model built in SimPy + Python multiprocessing. The running time was reduced by 97.8 % vs the SimPy model. Cimba even processed more simulated events per second on a single CPU core than SimPy could do on all 64 cores.
The speed is not only due to the efficient coroutines. Other parts are also designed for speed, such as a hash-heap event queue (binary heap plus Fibonacci hash map), fast random number generators and distributions, memory pools for frequently used object types, and so on.
The initial implementation supports the AMD64/x86-64 architecture for Linux and Windows. I plan to target Apple Silicon next, then probably ARM.
I believe this may interest the HN community. I would appreciate your views on both the API and the code. Any thoughts on future target architectures to consider?
Launch HN: Modelence (YC S25) – App Builder with TypeScript / MongoDB Framework
2026-02-03 @ 16:03:21Points: 45Comments: 23
(Here’s our prior Show HN post for reference: https://news.ycombinator.com/item?id=44902227)
At the same time, we were excited by the whole AI app builder boom and realized that the real challenge there is the platform rather than the tool itself. Now we’re making Modelence the first full-stack framework that’s built for coding agents and humans alike:
- TypeScript is already great for AI coding because it provides guardrails and catches many errors at build time, so agents can auto-correct
- MongoDB eliminates the schema management problem for agents, which is where they fail the most often otherwise (+ works great with TS/Node.js)
- Built-in auth, database, cron jobs and else that just works together out of the box means agents only focus on your product logic and don’t fail at trying to set these things up (+ less tokens spent on boilerplate).
You can now try the Modelence app builder (based on Claude Agent SDK) by just typing a prompt on our landing page ( https://modelence.com ) - watch a demo video here: https://youtu.be/BPsYvj_nGuE
Then you can check it out locally and continue working in your own IDE, while still using Modelence Cloud as your backend, with a dev cloud environment, and later deploy and run on Modelence Cloud with built-in observability around every operation running in your app.
We’re also going to add a built-in DevOps agent that lives in the same cloud, knows the framework end-to-end, and will use all this observability data to act on errors, alerts, and incidents - closing the loop, because running in production is much harder than just building.
We launched the app builder as a quick start for developers, to demonstrate the framework and Modelence Cloud without having to manually read docs and follow the steps to set up a new app. Our main focus is still the platform itself, since we believe the real challenge in AI coding is the framework and the platform rather than the builder tool itself.
Qwen3-Coder-Next
2026-02-03 @ 16:01:50Points: 447Comments: 252
The next steps for Airbus' big bet on open rotor engines
2026-02-03 @ 15:31:40Points: 52Comments: 44
Show HN: Sandboxing untrusted code using WebAssembly
2026-02-03 @ 14:28:01Points: 53Comments: 18
I built a runtime to isolate untrusted code using wasm sandboxes.
Basically, it protects your host system from problems that untrusted code can cause. We’ve had a great discussion about sandboxing in Python lately that elaborates a bit more on the problem [1]. In TypeScript, wasm integration is even more natural thanks to the close proximity between both ecosystems.
The core is built in Rust. On top of that, I use WASI 0.2 via wasmtime and the component model, along with custom SDKs that keep things as idiomatic as possible.
For example, in Python we have a simple decorator:
from capsule import task
@task(
name="analyze_data",
compute="MEDIUM",
ram="512mb",
allowed_files=["./authorized-folder/"],
timeout="30s",
max_retries=1
)
def analyze_data(dataset: list) -> dict:
"""Process data in an isolated, resource-controlled environment."""
# Your code runs safely in a Wasm sandbox
return {"processed": len(dataset), "status": "complete"}
And in TypeScript we have a wrapper: import { task } from "@capsule-run/sdk"
export const analyze = task({
name: "analyzeData",
compute: "MEDIUM",
ram: "512mb",
allowedFiles: ["./authorized-folder/"],
timeout: 30000,
maxRetries: 1
}, (dataset: number[]) => {
return {processed: dataset.length, status: "complete"}
});
You can set CPU (with compute), memory, filesystem access, and retries to keep precise control over your tasks. It's still quite early, but I'd love feedback. I’ll be around to answer questions.