jEngine post-mortem: what I learned building a renderer from scratch

It started, as most bad ideas do, at around 1am on a Tuesday.

I was trying to get a sprite on screen in one of the existing Rust game frameworks and hitting friction at every turn — abstractions I didn't understand, magic happening three layers down, error messages that pointed to the wrong thing entirely. The frustration built up until it flipped into something else: fine. I'll just write the renderer myself.

That was four months ago. jEngine is now archived. And I'd do it again.

Why from scratch

I want to be honest about the motivation because it wasn't purely noble. Part of it was ego. Part of it was the specific kind of stubbornness that makes you spend a week debugging a render pipeline instead of just using Bevy. But a real part of it was that I'd been doing game dev for years without actually understanding what was happening below the framework layer. I knew that a draw call happened. I didn't know how.

The plan was simple: winit for windowing and event handling, wgpu for GPU access, nothing else. Build up from there. Write an ECS. Make it draw sprites. See how far it goes.

Simple.

Week one: getting a triangle

Getting a triangle on screen with wgpu takes approximately 300 lines of boilerplate before you see a single pixel. I am not exaggerating.

You need an Instance. From that you get a Surface and an Adapter. From the adapter you get a Device and a Queue. You configure the surface with a SurfaceConfiguration. You write a vertex shader and a fragment shader in WGSL. You create a ShaderModule. You define a VertexBufferLayout describing your data. You create a RenderPipeline from a RenderPipelineDescriptor that references your shaders, your buffer layout, your surface format, your depth stencil state, your multisample state, your primitive topology.

Then you write the render loop. An encoder creates a RenderPass. You set the pipeline. You set the vertex buffer. You draw.

None of this is bad, actually. It's just a lot. And if you get any of it wrong, the feedback is often a validation error in the GPU driver that points at the wrong line, or worse: a black screen and silence.

The triangle appeared on day three. I stared at it for probably ten minutes.

The texture problem

Getting an image on screen is where things get interesting. wgpu uses bind groups — bundles of resources (textures, samplers, uniform buffers) that you bind to a shader. The bind group layout has to match exactly what your shader expects. Declare a binding in the wrong stage, forget to include the sampler, mismatch the visibility flags — black screen.

WGSL is stricter than GLSL. Everything is typed, everything is explicit. The compiler catches more mistakes, but the error messages can be cryptic until you learn to read them. After a week of it I started to love the shader language. It felt like writing Rust.

The first textured quad appeared on day five. This time I stared at it for fifteen minutes.

Batch rendering, or: performance murder and resurrection

My first sprite renderer did one draw call per sprite. This is the naive approach. It works fine for two sprites, fine for twenty, starts stuttering at two hundred, and falls completely apart at a thousand.

The fix is batching: collect all the sprite data into a single vertex buffer, submit one draw call, done. Simple concept, messy implementation. You need to sort by texture (to minimize texture switches), handle the case where your batch is full and you need to flush mid-frame, and update a dynamic vertex buffer every frame without stalling the GPU.

I got it working after about a week and a half of iteration. The jump from one-draw-call-per-sprite to batched rendering was something like a 40x improvement in the pathological case. This felt like magic the first time.

The texture atlas piece followed naturally — pack multiple sprites into one texture, use UV offsets to reference individual frames, avoid any texture switches at all. I got a basic packer working but never made it production-quality. That was fine. It did the job.

Building the ECS

I rolled my own entity-component system. This was probably the decision that cost me the most time.

My first attempt used HashMap<TypeId, Vec<Box<dyn Any>>> — a map from component type to a vector of type-erased values. It worked. It was horrifying to use. The ergonomics of downcasting every component access made me want to delete the whole thing.

The second attempt used a proper generational arena for entity IDs and typed component storages using Rust's type system more carefully. This was better. It was also where I learned that the borrow checker and game state have a complicated relationship.

The problem: to run a system, you typically need to iterate over multiple component types simultaneously. In Rust, you can't hold a mutable reference to one component storage while reading from another if they're both behind the same RefCell or Mutex. You have to design around this explicitly — either split borrows, use separate locks per storage, or accept interior mutability everywhere.

I went with typed component storages that could be borrowed independently:

fn update(&mut self, world: &mut World) {
    let (positions, velocities) = world.borrow_two::<Position, Velocity>();
    for (pos, vel) in positions.iter_mut().zip(velocities.iter()) {
        pos.x += vel.x;
        pos.y += vel.y;
    }
}

borrow_two was a custom method that handled the double-borrow case. It worked, and it was ugly, and it made me understand exactly why Bevy's ECS is as complex as it is.

What the borrow checker actually taught me

The borrow checker in a game engine context is not just an obstacle. It forced me to make better architectural decisions than I would have made otherwise.

The biggest one: separate update from draw, completely. In a language that lets you mix mutations freely, it's easy to slip into a design where your render pass writes back to game state or your update loop reaches into render resources. The borrow checker makes this physically painful. So I didn't do it. Update phase touches game state. Render phase reads game state and touches only GPU resources. The boundary is clean.

wgpu's RenderPass has a lifetime tied to the CommandEncoder it's created from, which is tied to the Device. Passing it around between functions requires threading those lifetimes through every function signature, which blows up fast. The practical solution is: keep the render pass local, encode everything in one place, and submit. Don't try to share a live render pass across subsystems.

I fought the borrow checker for weeks in this area before understanding what it was telling me. Once I understood it, the architecture improved.

The wall

Month three is when it started feeling like quicksand.

Every new feature — audio, a scene graph, an asset pipeline with hot reloading, a proper tilemap renderer — was its own multi-week rabbit hole. Each one individually is manageable. All of them together are a second full-time job.

I also started to notice that jEngine was becoming the project rather than a tool for making projects. The original goal was to understand the renderer layer. I understood it. Mission complete. But I kept going, adding things, trying to make it "real."

This is the trap of building your own engine. It's also exactly why game studios don't build engines unless they have to.

The moment I caught myself planning a shader hot-reload system for the third time instead of working on the actual game I wanted to make, I stopped.

Why it's archived

jEngine did what it was supposed to do. I understand how a renderer works now. I understand wgpu's resource model, bind groups, render pipelines, dynamic buffers. I understand why ECS designs are complex. I understand what the borrow checker is protecting you from in a real concurrent system.

None of that knowledge required shipping jEngine as a production engine. It required building one far enough to hit the real problems.

The game I was building it for is now in Unity. The next game is in Unreal. Both of those decisions feel more correct now than they would have before I spent four months building the thing they abstract over.

What I'd do differently

Design the ECS API before implementing storage. Write the system code you want to write first, in terms of how you'd like to access components, then build the storage layer to support it. I did it backwards and refactored twice.

Accept that batching needs a layering system on day one. The sprite sort-by-depth problem is trivial to design in at the start and painful to bolt on later. I bolted it on later.

Don't try to write a general asset pipeline. For a learning project, hardcode your asset loading. You'll learn more about the rendering and less about MIME types.

Write tests for the ECS. The renderer is hard to unit test. The ECS is not. I had almost no tests and paid for it every time I refactored the storage layer.

Use wgpu but also look at how render graphs work sooner. The wgpu API pushes you toward a flat list of render passes. Real engines use a render graph to express dependencies between passes. Understanding this earlier would have saved me a mess I made around pass ordering.

No regrets

Four months, a lot of late nights, somewhere around 8,000 lines of Rust, and one archived repository.

I can look at any renderer now and understand what it's doing. I can read wgpu changelogs and know exactly what they mean. I can look at Bevy's source and follow the logic without getting lost. I have opinions about ECS design that are based on having built one and having it break in specific ways.

That's worth a lot more than a clean git history.

Archive it. Take the lessons. Move on.