I wonder what the thought process of the Go designers was when coming up with th...

0xjnml · 2026-02-20T12:17:41 1771589861

> I wonder what the thought process of the Go designers was when coming up with that approach.

Sometimes we need block scoped cleanup, other times we need the function one.

You can turn the function scoped defer into a block scoped defer in a function literal.

AFAICT, you cannot turn a block scoped defer into the function one.

So I think the choice was obvious - go with the more general(izable) variant. Picking the alternative, which can do only half of the job, would be IMO a mistake.

aw1621107 · 2026-02-20T16:10:27 1771603827

> AFAICT, you cannot turn a block scoped defer into the function one.

You kinda-sorta can by creating an array/vector/slice/etc. of thunks (?) in the outer scope and then `defer`ing iterating through/invoking those.

mort96 · 2026-02-20T10:11:03 1771582263

I hate that you can't call defer in a loop.

I hate even more that you can call defer in a loop, and it will appear to work, as long as the loop has relatively few iterations, and is just silently massively wasteful.

usrnm · 2026-02-20T10:18:09 1771582689

The go way of dealing with it is wrapping the block with your defers in a lambda. Looks weird at first, but you can get used to it.

mort96 · 2026-02-20T10:21:47 1771582907

I know. Or in some cases, you can put the loop body in a dedicated function. There are workarounds. It's just bad that the wrong way a) is the most obvious way, and b) is silently wrong in such a way that it appears to work during testing, often becoming a problem only when confronted with real-world data, and often surfacing only as being a hard-to-debug performance or resource usage issue.

9rx · 2026-02-20T13:48:48 1771595328

What's the use-case for block-level defer?

In a tight loop you'd want your cleanup to happen after the fact. And in, say, an IO loop, you're going to want concurrency anyway, which necessarily introduces new function scope.

mort96 · 2026-02-20T14:12:41 1771596761

> In a tight loop you'd want your cleanup to happen after the fact.

Why? Doing 10 000 iterations where each iteration allocates and operates a resource, then later going through and freeing those 10 000 resources, is not better than doing 10 000 iterations where each iteration allocates a resource, operates on it, and frees it. You just waste more resources.

> And in, say, an IO loop, you're going to want concurrency anyway

This is not necessarily true; not everything is so performance sensitive that you want to add the significant complexity of doing it async. Often, a simple loop where each iteration opens a file, reads stuff from it, and closes it, is more than good enough.

Say you have a folder with a bunch of data files you need to work on. Maybe the work you do per file is significant and easily parallelizable; you would probably want to iterate through the files one by one and process each file with all your cores. There are even situations where the output of working on one file becomes part of the input for work on the next file.

Anyway, I will concede that all of this is sort of an edge case which doesn't come up that often. But why should the obvious way be the wrong way? Block-scoped defer is the most obvious solution since variable lifetimes are naturally block-scoped; what's the argument for why it ought to be different?

nasretdinov · 2026-02-20T14:13:34 1771596814

Let's say you're opening files upon each loop iteration. If you're not careful you'll run out of open file descriptors before the loop finishes.

mort96 · 2026-02-20T14:24:58 1771597498

It doesn't just have to be files, FWIW. I once worked in a Go project which used SDL through CGO for drawing. "Widgets" were basically functions which would allocate an SDL surface, draw to it using Cairo, and return it to Go code. That SDL surface would be wrapped in a Go wrapper with a Destroy method which would call SDL_DestroySurface.

And to draw a surface to the screen, you need to create an SDL texture from it. If that's all you want to do, you can then destroy the SDL surface.

So you could imagine code like this:

    strings := []string{"Lorem", "ipsum", "dolor", "sit", "amet"}
    
    stringTextures := []SDLTexture{}
    for _, s := range strings {
        surface := RenderTextToSurface(s)
        defer surface.Destroy()
        stringTextures = append(stringTextures, surface.CreateTexture())
    }

Oops, you're now using way more memory than you need!

win311fwg · 2026-02-20T14:34:47 1771598087

Why would you allocate/destroy memory in each iteration when you can reuse it to much greater effect? Aside from bad API design, but a language isn't there to paper over bad design decisions. A good language makes bad design decisions painful.

mort96 · 2026-02-20T14:39:32 1771598372

The surfaces are all of different size, so the code would have to be more complex, resizing some underlying buffer on demand. You'd have to split up the text rendering into an API to measure the text and an API to render the text, so that you could resize the buffer. So you'd introduce quite a lot of extra complexity.

And what would be the benefit? You save up to one malloc and free per string you want to render, but text rendering is so demanding it completely drowns out the cost of one allocation.

win311fwg · 2026-02-20T14:57:19 1771599439

Why does the buffer need to be resized? Your malloc version allocates a fixed amount of memory on each iteration. You can allocate the same amount of memory ahead of time.

If you were dynamically changing the malloc allocation size on each iteration then you have a case for a growable buffer to do the same, but in that case you would already have all the complexity of which you speak as required to support a dynamically-sized malloc.

mort96 · 2026-02-20T15:21:19 1771600879

The example allocates an SDL_Surface large enough to fit the text string each iteration.

Granted, you could do a pre-pass to find the largest string and allocate enough memory for that once, then use that buffer throughout the loop.

But again, what do you gain from that complexity?

win311fwg · 2026-02-20T15:27:06 1771601226

> The example allocates an SDL_Surface large enough to fit the text string each iteration.

Impossible without knowing how much to allocate, which you indicate would require adding a bunch of complexity. However, I am willing to chalk that up to being a typo. Given that we are now calculating how much to allocate on each iteration, where is the meaningful complexity? I see almost no difference between:

    while (next()) {
        size_t size = measure_text(t);
        void *p = malloc(size);
        draw_text(p, t);
        free(p);
    }

and

    void *p = NULL;
    while (next()) {
        size_t size = measure_text(t);
        void *p = galloc(p, size);
        draw_text(p, t);
    }
    free(p);

mort96 · 2026-02-20T15:44:31 1771602271

>> The example allocates an SDL_Surface large enough to fit the text string each iteration.

> Impossible without knowing how much to allocate

But we do know how much to allocate? The implementation of this example's RenderTextToSurface function would use SDL functions to measure the text, then allocate an SDL_Surface large enough, then draw to that surface.

> I see almost no difference between: (code example) and (code example)

What? Those two code examples aren't even in the same language as the code I showed.

The difference would be between the example I gave earlier:

    stringTextures := []SDLTexture{}
    for _, str := range strings {
        surface := RenderTextToSurface(str)
        defer surface.Destroy()
        stringTextures = append(stringTextures, surface.CreateTexture())
    }

and:

    surface := NewSDLSurface(0, 0)
    defer surface.Destroy()
    stringTextures := []SDLTexture{}
    for _, str := range strings {
        size := MeasureText(s)
        if size.X > surface.X || size.Y > surface.Y {
            surface.Destroy()
            surface = NewSDLSurface(size.X, size.Y)
        }

        surface.Clear()
        RenderTextToSurface(surface, str)
        stringTextures = append(stringTextures, surface.CreateTextureFromRegion(0, 0, size.X, size.Y))
    }

Remember, I'm talking about the API to a Go wrapper around SDL. How the C code would've looked if you wrote it in C is pretty much irrelevant.

I have to ask again though, since you ignored me the first time: what do you gain? Text rendering is really really slow compared to memory allocation.

win311fwg · 2026-02-20T15:51:19 1771602679

> Remember, I'm talking about the API to a Go wrapper around SDL.

We were talking about using malloc/free vs. a resizable buffer. Happy to progress the discussion towards a Go API, however. That, obviously, is going to look something more like this:

    renderer := SDLRenderer()
    defer renderer.Destroy()
    for _, str := range strings {
        surface := renderer.RenderTextToSurface(str)
        textures = append(textures, renderer.CreateTextureFromSurface(surface))
    }

I have no idea why you think it would look like that monstrosity you came up with.

mort96 · 2026-02-20T15:56:29 1771602989

> No. We were talking about using malloc/free vs. a resizable buffer.

No. This is a conversation about Go. My example[1], that you responded to, was an example taken from a real-world project I've worked on which uses Go wrappers around SDL functions to render text. Nowhere did I mention malloc or free, you brought those up.

The code you gave this time is literally my first example (again, [1]), which allocates a new surface every time, except that you forgot to destroy the surface. Good job.

Can this conversation be over now?

[1] https://news.ycombinator.com/item?id=47088409

win311fwg · 2026-02-20T15:58:34 1771603114

I invite you to read the code again. You missed a few things. Notably it uses a shared memory buffer, as discussed, and does free it upon defer being executed. It is essentially equivalent to the second C snippet above, while your original example is essentially equivalent to the first C snippet.

mort96 · 2026-02-20T16:06:05 1771603565

Wait, so your wrapper around SDL_Renderer now also inexplicably contains a scratch buffer? I guess that explains why you put RenderTextToSurface on your SDL_Renderer wrapper, but ... that's some really weird API design. Why does the SDL_Renderer wrapper know how to use SDL_TTF or PangoCairo to draw text to a surface? Why does SDL_Renderer then own the resulting surface?

To anyone used to SDL, your proposed API is extremely surprising.

It would've made your point clearer if you'd explained this coupling between SDL_Renderer and text rendering in your original post.

But yes, I concede that if there was any reason to do so, putting a scratch surface into your SDL_Renderer that you can auto-resize and render text to would be a solution that makes for slightly nicer API design. Your SDL_Renderer now needs to be passed around as a parameter to stuff which only ought to need to concern itself with CPU rendering, and you now need to deal with mutexes if you have multiple goroutines rendering text, but those would've been alright trade-offs -- again, if there was a reason to do so. But there's not; the allocation is fast and the text rendering is slow.

win311fwg · 2026-02-20T16:12:33 1771603953

You're right to call out that the SDLRenderer name was a poor choice. SDL is an implementation detail that should be completely hidden from the user of the API. That it may or may not use SDL under the hood is irrelevant to the user of the API. If the user wanted to use SDL, they would do so directly. The whole point of this kind of abstraction, of course, is to decouple of the dependence on something like SDL. Point taken.

Aside from my failure in dealing with the hardest problem in computer science, how would you improve the intent of the API? It is clearly improved over the original version, but we would do well to iterate towards something even better.

mort96 · 2026-02-20T16:14:40 1771604080

I think the most obvious improvement would be: just make it a free function which returns a surface, text rendering is slow and allocation is fast

win311fwg · 2026-02-20T16:17:50 1771604270

That is a good point. If text rendering is slow, why are you not doing it in parallel? This is what 9rx called out earlier.

mort96 · 2026-02-20T16:24:03 1771604643

Some hypothetical example numbers: if software-rendering text takes 0.1 milliseconds, and I have a handful of text strings to render, I may not care that rendering the strings takes a millisecond or two.

But that 0.1 millisecond to render a string is an eternity compared to the time it takes to allocate some memory, which might be on the order of single digit microseconds. Saving a microsecond from a process which takes 0.1 milliseconds isn't noticeable.

win311fwg · 2026-02-20T16:31:24 1771605084

You might not care today, but the next guy tasked to render many millions of strings tomorrow does care. If he has to build yet another API that ultimately does the same thing and is almost exactly the same, something has gone wrong. A good API is accommodating to users of all kinds.

krapp · 2026-02-20T15:36:47 1771601807

I think I've been successfully nerd sniped.

It might be preferable to create a font atlas and just allocate printable ASCII characters as a spritesheet (a single SDL_Texture* reference and an array of rects.) Rather than allocating a texture for each string, you just iterate the string and blit the characters, no new allocations necessary.

If you need something more complex, with kerning and the like, the current version of SDL_TTF can create font atlases for various backends.

mort96 · 2026-02-20T15:52:00 1771602720

Completely depends on context. If you're rendering dynamically changing text, you should do as you say. If you have some completely static text, there's really nothing wrong with doing the text rendering once using PangoCairo and then re-using that texture. Doing it with PangoCairo also lets you do other fancy things like drop shadows easier.

9rx · 2026-02-20T14:25:52 1771597552

Files are IO, which means a lot of waiting. For what reason wouldn't you want to open them concurrently?

mort96 · 2026-02-20T14:29:46 1771597786

Opening a file is fairly fast (at least if you're on Linux; Windows not so much). Synchronous code is simpler than concurrent code. If processing files sequentially is fast enough, for what reason would you want to open them concurrently?

nasretdinov · 2026-02-20T14:49:49 1771598989

For concurrent processing you'd probably do something like splitting the file names into several batches and process those batches sequentially in each goroutine, so it's very much possible that you'd have an exact same loop for the concurrent scenario.

P.S. If you have enough files you don't want to try to open them all at once — Go will start creating more and more threads to handle the "blocked" syscalls (open(2) in this case), and you can run out of 10,000 threads too

win311fwg · 2026-02-20T15:09:23 1771600163

You'd probably have to be doing something pretty unusual to not use a worker queue. Your "P.S." point being a perfect case in point as to why.

If you have a legitimate reason for doing something unusual, it is fine to have to use the tools unusually. It serves as a useful reminder that you are purposefully doing something unusual rather than simply making a bad design choice. A good language makes bad design decisions painful.

mort96 · 2026-02-20T15:30:30 1771601430

You have now transformed the easy problem of "iterate through some files" into the much more complex problem of either finding a work queue library or writing your own work queue library; and you're baking in the assumption that the only reasonable way to use that work queue is to make each work item exactly one file.

What you propose is not a bad solution, but don't come here and pretend it's the only reasonable solution for almost all situations. It's not. Sometimes, you want each work item to be a list of files, if processing one file is fast enough for synchronisation overhead to be significant. Often, you don't have to care so much about the wall clock time your loop takes and it's fast enough to just do sequentially. Sometimes, you're implementing a non-important background task where you intentionally want to only bother one core. None of these are super unusual situations.

It is telling that you keep insisting that any solution that's not a one-file-per-work-item work queue is super strange and should be punished by the language's design, when you haven't even responded to my core argument that: sometimes sequential is fast enough.

win311fwg · 2026-02-20T15:48:22 1771602502

> It is telling that you keep insisting

Keep insisting? What do you mean by that?

> when you haven't even responded to my core argument that: sometimes sequential is fast enough.

That stands to reason. I wasn't responding to you. The above comment was in reply to nasretdinov.

mort96 · 2026-02-20T15:59:43 1771603183

Your comment was in reply to nasretdinov, but its fundamental logic ignores what I've been telling you this whole time. You're pretending that the only solution to iterating through files is a work queue and that any solution that does a synchronous open/close for each iteration is fundamentally bad. I have told you why it isn't: you don't always need the performance.

nasretdinov · 2026-02-20T16:33:39 1771605219

Using a "work queue", i.e. a channel would still have a for loop like

  for filename := range workQueue {
      fp, err := os.Open(filename)
      if err != nil { ... }
      defer fp.Close()
      // do work
  }

Which would have the same exact problem :)

win311fwg · 2026-02-20T16:40:46 1771605646

I don't see the problem.

    for _, filename := range files {
        queue <- func() {
            f, _ := os.Open(filename)
            defer f.Close()
        }
    }

or more realistically,

    var group errgroup.Group
    group.SetLimit(10)
    for _, filename := range files {
        group.Go(func() error {
            f, err := os.Open(filename)
            if err != nil {
                return fmt.Errorf("failed to open file %s: %w", filename, err)
            }
            defer f.Close()  
            // ...
            return nil          
        })
    }
    if err := group.Wait(); err != nil {
        return fmt.Errorf("failed to process files: %w", err)
    }

Perhaps you can elaborate?

I did read your code, but it is not clear where the worker queue is. It looks like it ranges over (presumably) a channel of filenames, which is not meaningfully different than ranging over a slice of filenames. That is the original, non-concurrent solution, more or less.

mort96 · 2026-02-20T17:24:15 1771608255

I think they imagine a solution like this:

    // Spawn workers
    for _ := range 10 {
        go func() {
            for path := range workQueue {
                fp, err := os.Open(path)
                if err != nil { ... }
                defer fp.Close()
                // do work
            }
        }()
    }

    // Iterate files and give work to workers
    for _, path := range paths {
        workQueue <- path
    }

win311fwg · 2026-02-20T18:49:25 1771613365

Maybe, but why would one introduce coupling between the worker queue and the work being done? That is a poor design.

Now we know why it was painful. What is interesting here is that the pain wasn't noticed as a signal that the design was off. I wonder why?

We should dive into that topic. I suspect at the heart of it lies why there is so much general dislike for Go as a language, with it being far less forgiving to poor choices than a lot of other popular languages.

mort96 · 2026-02-20T19:06:00 1771614360

I think your issue is that you're an architecture astronaut. This is not a compliment. It's okay for things to just do the thing they're meant to do and not be super duper generic and extensible.

win311fwg · 2026-02-20T19:25:24 1771615524

It is perfectly okay inside of a package. Once you introduce exports, like as seen in another thread, then there is good reason to think more carefully about how users are going to use it. Pulling the rug out from underneath them later when you discover your original API was ill-conceived is not good citizenry.

But one does still have to be mindful if they want to write software productively. Using a "super duper generic and extensible" solution means that things like error propagation is already solved for you. Your code, on the other hand, is going to quickly become a mess once you start adding all that extra machinery. It didn't go unnoticed that you conveniently left that out.

Maybe that no longer matters with LLMs, when you don't even have to look the code and producing it is effectively free, but LLMs these days also understand how defer works so then this whole thing becomes moot.