Yeah. Lots of busywork where if you had to assign it to a human you would need t...

mattmanser · 2026-02-21T21:46:33 1771710393

You've missed the subtlety here.

LLMs don't have attention to detail.

This project had extremely comprehensive, easily verifiable, tests.

So the LLM could be as sloppy as they usually arez they just had to keep redoing their work until the code actually worked.

selridge · 2026-02-22T05:49:17 1771739357

I missed the subtlety?

I linked the paper! I read the paper. Yeah. they wrote the tests, which is how this worked! how the heck do you think it was supposed to work?

the fact that they needed to write the tests was just the means to implementation. It didn't change the non-LLM labor economics of the problem.

mattmanser · 2026-02-22T08:55:10 1771750510

No, I meant subtlety of definition, you've attributed the diligence to the LLM when in fact it's the tests that provide that.

You've unfortunately committed the big sin of anthropomorphizing the LLM and calling it diligent.

An LLM cannot be diligant, it's stochastic so it's literally impossible for it to be diligant.

Writing all those tests was diligant.

selridge · 2026-02-22T15:39:44 1771774784

I didn’t attribute diligence to anything.

I’m not worried about the personal character of diligence. I’m interested in what the technology unlocked and how things made with it are materially different in terms of labor configurations.

selridge · 2026-02-22T15:40:50 1771774850

Lmfao I’ve committed the sin of anthropomorphism.

Fadda forgive me!

C’mon pal.

salawat · 2026-02-21T23:05:38 1771715138

Who wrote the tests?

bigbuppo · 2026-02-22T00:23:35 1771719815

The meat wrote the tests. As I've been telling you, they're made out of meat.

selridge · 2026-02-22T15:41:27 1771774887

And how does the answer to your question bear on the claim I’m making?

salawat · 2026-02-22T16:24:17 1771777457

If you're trying to automate all coding activity, writing tests is coding activity. Arguably the greater fraction of effort between implementation, and verifying said implementation. If the only thing making your problem space tractable for the automation to be able to replace the lesser half of coding activity is an authored test suite you couldn't generate via your automation, then you really need to admit that.

"Did you check?" is the most expensive question, and one of the most feared in my experience in tech circles. Spent quite a few years as a dedicated tester once I developed the knack for it. Everybody gangsta til it's time to prove the damn thing works.

selridge · 2026-02-22T17:08:11 1771780091

> "Did you check?" is the most expensive question, and one of the most feared in my experience in tech circles.

Here it’s a climb-down. Writing tests to validate translation is orders of magnitude less work (and less likely to fail or be too dull to do properly) than the alternative available prior to 2025.

The fact that they wrote tests and clearly established operational control is not some kinda gotcha! It’s how they managed to get this piece of technology to allow them to do the IMPOSSIBLE.

I am just really struggling to understand someone who reads that paper and thinks “yup, everything is still the same and we don’t need to re-evaluate any ideas” Like, if you want to say that they still need engineering discipline in order to do ISA transitions at scale then…ok? That’s true? But Gemini meant that this formerly impossible thing was now not only within reach but done.