Output Clean-up - Breakdown

Benchmarking 2 of the most "multi-purpose" models ( however, each of them biased towards their own "artstyle" )

SeekArtMega outputs pretty outputs, however, strongly biased towards upper body close-ups and low preservation of artist style ( Alphonse Mucha in this case )

Also, notice that the benchmark is trying to achieve the best outputs with a reasonable resolution, fairly "low" CFG with low steps to evaluate their speed while "sketching" our prompts and measure the speed of our time while "prompt engineering" !

Protogen is outputing something closer to original artist style, however, couldn't handle faces and hands ( doable since they nearly match our prompt : "a full body portrait" and the size of the head and hands are smaller when doing so )

First of all, I'm not going to use Stable diffusion's standard model, I'm going to use a "style blend" between "SeekArtMega" and "ProtogenInfinity" models available on CIVITAI

That type of "style transfer" merge tends to output overexposed images with high contrast

Of course I could keep merging these models until "fixing" these issues, however, I will just accept that and try to fix them through some prompt negatives

Not bad... rich in details, however, "too anime" for my taste...
I'm in love with the sharp contours and edges ... I will take it !

Being "fair with protogen" using the same prompt with the new negatives

Being "fair with SeekArtMega" using the same prompt with the new negatives ( of course something colorful with feathered edges / contours and smooth shading )

Most samplers tends to converge their denoising steps, making the amount of steps become "useless" after a certain amount of steps ( I've used Euler with the sample above ), thus, the amount of steps isn't the "key" for good quality!

When using any ancestral sampler, every step generates a new image, not necessarily with more details or better shading, but "PROBABLY" with more detailed features and better shadows and highlights! ( I've used the Euler Ancestral for the sample above )

DPM samplers outputs a lot of details with more depth through highly detailed shadows and highlights, however, clumsy and noisy compositions that tends to converge only at a huge amount of steps. ( I've used the DPM++ SDE Karras for the sample above )

The new UniPC sampler converge the output images much faster and the higher the amount of steps, smoother is the shading, however, with a huge amount of steps it's going to entirely "swap" your composition!
Noticeable unique and detailed outputs!

UniPC generation is blazing fast!
HOWEVER, can't generate simultaneous images ( probably not yet ), meaning a loss of time when compared to other samplers!
Generating simultaneous images tends to speed up by 20% and 200% when using high resolution fix!

You may play around with these parameters from UNIPC sampler, however, you will not achieve anything better than the other samplers with such low amount of steps and that low resolution scale !

UNIPC order will only increase or decrease the range of colors used and the amount of details, however, higher values tends to get glitchy outputs without coherence while lower values, less details with more coherence!

Since I'm avoiding to increase the scale of the render to achieve outputs that match my prompt ( "a full body portrait" ), I'm going to firstly reduce the width to increase the probability ! ( and leave the prompt for later )

Much better...

then triples the attention to "full body"...

Yeah... only 2 of 12 didn't frame the subject's feet inside the composition...

I could push the aspect ratio to something like that without touching the prompt, however, I will start struggling with cropped arms and hands instead of heads and feets...

pretty good for tarot cards and vertical banners, however, the prompt is also starting to output stretched legs and motifs (double features / tiling ) ...
Also it's rendering 30.720 more pixels per image than before! ( increasing rendering time )

Then, to fix these issues, I could just turn on the "magic" high resolution fix and output something that I could say it's "good enough" ? ( and fast ? )

Low increase in resolution with "acceptable" quality if we are trying to output as many possibilities as possible before going deeper on prompt engineering ...
We could stop the tutorial right now and just accept the outputs, however, I'm going deeper..

Not a huge fan of "aggressively" increasing attention tags to something higher than 2 (exclusive for artists or something very important or hard to achieve while prompting).
I like the idea of prompting with some drawing principles ( industry jargons ).

Take a big picture of many outputs to figure out "patterns" that you should get rid off...

Firstly trying to fix that "yellow" look with negatives by mixing and matching them...

Did not use the obvious sepia negative to avoid bluish images and increase the range of colors I could achieve!

Yeah, more colorful than before, and the first image on most left top is showing some of the "glitches" from a fully merged model without any fix ...
However, it's just a single sample between 12 ... "it's doable"

Then remove the "bias" towards anime artstyle inherited from those models!

Yeah...

"Turn On" the lights... ( negatives to reduce the shadows )

The contrast of this model is too high and it's outputing overexposed images even with the negative ...

AI image generators are not meant to generate images, they are like "image enhancing" algorythms highly trained on understanding the difference between damage and detail!
Someone said one day: "what if I could input a random noise"?

( roughly speaking )

High CFG scale through the steps will decrease "the noise / detail" by increasing the contrast, then adding "space" and "spreading the
silhouettes" over the rendered area! Also attempting to create aspect of "lines continuity"!

Also, small "details" that look like "noise" will blend with the nearest pixels until they completely disappear and open "space" to create continuous lines and contours!

On the other hand, with a very low CFG scale, the denoising steps are completely useless!
Creating a "noisy and extremely detailed" output!

And before someone says: "Nahhh! It's because your prompt is asking for clear tangents and negative space" ....
Ok ... that's another prompt just asking for a "A female character by Olphonse Mucha!" ( AND without a single negative prompt! )

My prompt just "boosted" it's "ancestral image
enhancing" nature!

RED is increasing negative space and lines continuity!
BLUE is "converging / mixing" nearest pixels and increasing their contrast.
GREEN is getting rid off the clumsy and noisy sleeves!

"High resolution fix" is just "image to image" ( except when using the latent upscalers that tends to add more details and "unlock" certain features that you couldn't achieve without rendering an image at such resolution ).
I'm not going to use It !

Previously stated that "quality" could derive from many factors, however, high steps count and high render scale or different samplers will not achieve the same "magic" behind the high resolution fix ...

The truth is that your image just need another oportunity to "fix" the mistakes and choices made while generating them!
The LEFT image is the original image while the RIGHT image is an image to image generated output!

And there's no need for high steps because we are using the smallest denoising strength as possible to output "fixed images" not "new variants".
And avoiding to upscale because it's going to slow down the fixing process

The "loopback strategy" ( yeah, the same way the script called "loopback" works, however, with more control over every succeeding step ) isn't going to work for too many loops because of the accumulating contrast creating overexposed outputs!

HOWEVER, if you slightly increase the output resolution on every loop ( not much, just 2 or 4 pixels ) you're going to trigger the upscaling algorithm, and most of the upscalers are highly trained on fixing burnt images =D

Each of the ESRGAN upscalers are highly trained on creating sharp lines with high noise suppression ( these upscalers ruins traditional painting outputs because they consider
"brush strokes" and "watercolor splotches" as "graphical
artifacts" )

Let's say that you are going to say : "Isn't that the same as just crancking up the resolution and the amount of steps while using the high resolution fixer" ?

ok ... Let's check ...

Also, there's no need for such amount of steps while using the high resolution fix...

Loopbacks of 40% will end up with something "cleaner" with precisely stabilized lines and nearly no noise or wobbly lines!
LDSR is one of the cleaner outputs achieving something "minimalistic" looking like vector art with flat shading !

Also, the reason why LDSR outputs something "cleaner", is because before upscaling, LDSR downscales to ridiculous resolutions and then use a DDIM denoiser to clean up the resampling artifacts when stretching the texture back to it's final upscaled size!

Just in case you are asking yourself: "WHY LOOSING DETAIL IS SOMETHING GOOD?" ...
It's not loosing detail, it's just "cleaning up your mess", and it's mainly used to clean and enhance noisy images like the one above!

Nearly like "giving life" to a painting on the wall ...
The image above went through 6 loops of image to image at 40% of denoising strength

The final output with less wobbly lines and higher saturation with more contrasting shadows and highlights!

Also, another good side of that approach, is that your "second pass" or "manual high resolution fix" can contain prompts specialized in "post processing your image", then divide your image creation into "sketching, fixing, then refinement"

On the left original image, on the right another output that went through 2 loops using the "post-processing prompt" at 50% of denoising strength!

Alternatively, you may use the script of depth aware generation to mask the background or the foreground, then, "shrink" to a feasible resolution, fix, upscale back, then remove "ghosting artifacts" with image to image around 30% of denoising strength!

Also, you might have noticed that I'm always using the seed upscaling feature!
I've never achieved the effect it was designed to, HOWEVER, it's a good way to do the inverse, by bringing the quality of high resolutions to low resolutions !!!

Higher resolution renders comes with better shading and better readability, however, it's also going to scatter an absurd amount of tiled or stretched features!
Thus, with a small canvas to draw, they can handle the denoising to output something "pretty"!

However, the inverse of the formula isn't going to work since any seed under the rendered resolution is just "a random output" instead of an output with less details or less noise and less tiling!

High amount of steps? Higher CFG scale (12 in this case ) ?
Nope!

Lower CFG scale ( 4 in this case ) ?
Nope!

Maybe increasing the clip skipping to reduce the amount of detail? ( 2 in this case )
Nope!

Maybe higher clip skipping ? (4 in this case )
Nope!

Maybe 8 of Clip Skipping ?
Nope!

Maybe with embbedings highly trained on composition, structure and anatomy like this one ? ( https://civitai.com/models/19375/negativeembed )
Better ... however.... what just happend with the artstyle? ( also using 8 of clip Skipping )

Maybe this one ? ( https://civitai.com/models/4499/unspeakable-horrors-negative-prompt )
Nope ...

Maybe This one ? ( https://civitai.com/models/16191/danger-horse-and-danger-donkey-anatomystructure-and-face-quality-helper )
Nope ...

Maybe this one? (
https://civitai.com/models/12770/djz-johnsons-image-helper-2-danger-moose-and-danger-mouse )
Nope ...

Maybe this one? ( https://civitai.com/models/4629/deep-negative-v1x )

Maybe manually add a "crafted negative prompt" with higher attention tags?
Mehhh ... better ... however, again a loss in the artstyle!

All together with a huge load of negatives and increased attention tags !!!
Nope ...

Good outputs with clean lines and less grainy noise spread all over the canvas...
HOWEVER ... SD 1.5 models will ALWAYS "tile" when rendering at 150% of their trained resolution ( on any axis ).
You need Image to image to solve the issue!

(Generated on 384 x 768 with 8 steps)

Going a bit further with the clip skipping, it's good to speed up your "sketching" phase because it can achieve "cleaner and readable" outputs with low artifacting, however, many detailed features are lost this way!

(Generated on 512 x 1024 with 8 steps)

To "restore" some details back, you should try to increase the resolution ( thus, increasing rendering time, however, "doable" if we evaluate the second image as a "quality of life improvement" while "sketching" )

(Generated on 512 x 1024 with 64 steps)

Worth of noting, even with the same seed, however, higher amount of steps, the second image "unlocked" the same detailed features ( the golden frames ) from images generated with clip skip of 1.

(Generated on 768 x 1536 with 8 steps)

With this final sample, we can say that any clip skip value higher than 2 will nearly never create a full body portrait and tile more often! ( between 3 to 7 clip skips )

Higher clip skip ( like 3 ) with a very low CFG scale ( like 2 ) and high
amount of steps ( like 32 ), even with low resolutions ( like 384 x 768 ) tends to output AMAZING IMAGES that we could put some effort on "outpainting them"

And don't you dare to increase the resolution, it's going to tile!
However, what a beautiful look!!!
Looks like ... "midjourney" ?

Even with pretty low resolutions using some stretched aspect ratios ( 256 x 768 ), you will not crate a full body portrait!

Maybe a problem with the prompt? maybe the keyword "portrait" is "probably too strong" ? Maybe bad artists?
Ok, then let me modify the prompt and check again!

Much better, just a single full body output !!!

Maybe fix and inforce the seed of the single good output that achieved a full body portrait and let the variation seed "randomize" something using nearly the same composition ?

FINALLY !!! CLIP SKIP 3 OUTPUTING FULL BODY PORTRAITS !!!
Taking 6 minutes to generate the first batch and find a good seed then 6 additional minutes per "sketching batch" ! ( on my GTX 1070 of course )

We could also "inforce" that structure using image to image!
Maybe a workflow by generating a guide image using Clip skip 1 and then "render" with the artstyle of Clip skip 3?
Also, a "CFG 2" generate random outputs far from target prompt !!

"Sketching" still a problem, because using a high denoising strength to "avoid looking like the initial image" will also lose the "structure" of the composition we are aiming for!

Decreasing the denoising strength to maintain the composition structure isn't going to generate different outputs!
Of course your could try to "walk in circles" fine tunning the denoising strength ...

And the CFG Scale will not solve the issue, nor any combination between CFG, steps, samplers, denoising strength ...

How I usually work :
I slightly reduce the resolution to something that reduces the tiling.
Then, reinforce my positive prompt with negative prompts that act like the "inverse" of them!

Tweaked settings for fast sketching with clip skip 3 !!!
Image to image is "unavoidable", thus, I'm not going to waste my time using the high resolution fix!

2 good seeds that I can use for another sketching batch! ( nearly taking about 1 minute each of these batches )

Fixed the seed of one of the samples with the best composition for another "sketching batch"!

Awesome!

Narrow down to something closer, using the new chosen seed and a lower variation seed strength just to explore different floral background frames and different character poses!

better...

I could outpaint using Automatic 1111 scripts, however, they suck and they insist on adding more pixels than I've asked to expand the image!

Using invokeAI to outpaint with our model and the symmetry feature could help, however they do not support clip skipping and that's outputing a diverging artstyle ...

However, on Leonardo.AI ( a website with "fairly good" AI image generation tools ) you will probably end up with something AWESOME with zero loss of time just configuring the outpainting tool and many failed attempts !

HOWEVER, just to avoid someone saying "that's a useless tutorial just because I've used a paid service"...
I will outpaint using the outpainting script available on Automatic 1111 !!!

When outpainting, that's aways a good idea to expand side by side and not both at the same time!
Also, avoid HUGE pixel expansions, It's not going to output something good and lead you to "stretching" your image borders !!!

Right side fixed...

Expanded just a small area of 32 pixels on top of the image to add more negative space and some "gravity" do the composition, ending up with a 512 x 832 image

Before proceeding with image to image, I usually configure my stable diffusion with a lower mask strength and lower noise multiplier because on image to image I'm "fixing and tweaking instead of exploring"!
Also, set to do the exactly amount of steps!

Then, on inpainting I'm going to fix a few issues with the image, part by part, firstly the hands, with a huge area around the hand!

Specially ONLY WHEN FIXING HANDS AND FEETS ( not faces ), I usually "swap" the model being used to any photo realistic model
like realistic vision : https://civitai.com/models/4201/realistic-vision-v20

Then, ONLY WHILE FIXING HANDS AND FEETS, I usually tweak the negative prompt with all of the embbedings trained on fixing bad anatomy traits!

To increase coherence, I'm going to max the masked area padding pixels and increase the mask blur to avoid noticeable seams and samples that do not "match the context" of the original image!

One of the issues with image to image generation, specially while inpainting is that takes twice to thrice the amount of time to generate images!
The samples above took me 3 minutes instead of 1 minute when compared to text to image generation!

Some samples generated while trying to fix the last inpainted hand!
Now "partialy loopback" some areas of the image to explore different clothes and backgrounds!

Swaping back to our original "mixed model" from realistic vision model, I've masked the pendant and her face because generating faces with less than 400x400 pixels on image to image isn't a good idea ( the face on this image has nearly 140 x 140 pixels ).

Awesome outputs with cleaner lines and a less blurry / foggy image than before!

While sketching these samples I did not use a high denoising strength, trying to avoid huge changes on the original image and just output small changes before upscaling! ( of course the newly generated area is sharper than her original artstyle )

Also, I've used 4 of CFG scale because using 2 ( like the original image parameter ) would end up with something foggier and blurrier than the original image, while higher CFG scale would turn the image into stylized comics artstyle!

When upscaling, I usually aim for sharp details withtout loosing too much texture, for that reason I set that for R-ESRGAN4X+ with a high tile size to also speed up image upscaling!

Also, I'm going to upscale from 512x832 to 1024x1664 within 4 steps increasing 128x208 pixels on each loop!
Higher CFG scale with a REALY LOW denoising strength and face restoration enabled to avoid ugly deformations caused by low resolution rendering!

512 x 832

640 x 1040 ( 22 seconds to render )

768 x 1248 ( 54 seconds to render )

896 x 1456 ( 1 minute and 45 seconds to render )

1024 x 1664 ( 3 minutes and 15 seconds to render ) ...
That's one og the reasons why sketching, fixing and tweaking with high resolution images isn't a good idea!
The time do render every single image grows exponentially !!!

Inpainting the hand on the right side of the image took me about 5 attempts of 1 minute and 20 seconds each!
Even if working with higher resolutions you may achieve better quality faces and hands, that's not the same as fixing through sketching samples!

I had to invest nearly 16 "inpainting loopbacks" to fix and tweak each of these : outer border of the floral background, flower on the chest, embroidery trimmings on top of the dress and skirt, then sleeves and lastly her hands!

Output Clean-up - Breakdown