How to create Red Alert-inspired game assets by fine-tuning Stable Diffusion
Tools for AI-generated assets are constantly evolving. Scenario co-founder Emmanuel de Maistre has shown what game developers can achieve by fine-tuning Stable Diffusion.
Emmanuel de Maistre recently posted a thread on Twitter, detailing the process of creating an isometric bunker inspired by Command & Conquer: Red Alert. All thanks to the power of AI and long fine-tuning.
Thread time 🧵
Here’s how to precisely design a small building in a game (such as an isometric bunker) by fine-tuning #StableDiffusion
This example was inspired by #RedAlert, which I spent countless hours on (in 96-97 – pls don’t call me old 😅)
CC https://t.co/dlRJIWtD3Y pic.twitter.com/IqEaPRyZAu
— Emm (@emmanuel_2m) November 30, 2022
First, de Maistre generated more than 200 buildings using Midjourney. He didn’t reveal exact textual descriptions, but here are various types of constructions and vehicles that he got.
After that, he picked 16 images — all different from each other, but still consistent with the overall style — to have a smaller dataset.
These pictures were used to fine tune Stable Diffusion using Scenario, an upcoming tool for creating AI-generated assets (expected to launch later this month).
Style-consistency is paramount when it comes to designing #game assets.
I trained a fine-tune using @Scenario_gg (alpha), using 16 images (below), inspired by the Red Alert/Command & Conquer buildings.
Fun fact: I generated them all in… @midjourney. pic.twitter.com/oj3WSUXGBh
— Emm (@emmanuel_2m) November 30, 2022
Following the model’s training, de Maistre tried several prompts to generate different types of buildings. Here are some results, including a nuclear plant, a radar dome, and a refinery.
And what about a bunker? de Maistre generated multiple versions from a single reference picture from the original dataset using the image-to-image translation.
He also tried a few other options before deciding to retrain the model. He reduced the dataset to 12 images (instead of 16) and set the text encoder at 50% (instead of 100%).
So I re-trained the model with two differences > reducing the dataset to 12 images (to increase the consistency, at the risk of lowering the variability). I also set the text encoder at 50% (vs. 100%)
And it worked much better. Here’s a first bunker (tower-shaped) pic.twitter.com/XRhPnEKl0c
— Emm (@emmanuel_2m) November 30, 2022
Eventually, de Maistre got back to the pillbox shape and customized it using the following prompt: “Isometric bunker, realistic, soviet flag, red, video game.” He also changed the description a little to generate a US version.
He then showed AI-generated bunkers with different landscapes and structures around them. “3 words, a good fine-tune, a curated image (for img2img) and the possibilities are just infinite,” de Maistre noted.
I changed the original image to generate bunker with a wider angle (and some structures around).
“isometric bunker, realistic, video game”.
That’s it. 3 words, a good fine-tune, a curated image (for img2img) and the possibilities are just infinite. pic.twitter.com/UrUJd3jR3h
— Emm (@emmanuel_2m) November 30, 2022
More images and generated assets can be found in the full thread. As de Maistre pointed out, these tools will be especially useful for artists with high levels of creativity, knowledge, and culture.
“I predict game studios will end up managing hundreds (if not thousands) of fine-tuned models, which will undergo some validation process before being used in production by various teams (artists, developers, designers, marketers…),” de Maistre concluded.
If you want to see more examples of what AI models are capable of in good hands, check out this collection of Fallout 2 assets reimagined with Stable Diffusion.