When PULSE upscaling in Stable Diffusion caused odd blocky pixels and the alternate ESRGAN pipeline that produced clean high-res images

In the world of AI-generated imagery, Stable Diffusion has emerged as a powerful tool for artists, developers, and content creators. But when it comes to enhancing those generated images to larger resolutions, not all upscaling methods are created equal. One specific method—PULSE upscaling—once promised high-quality enhancements but instead left users puzzled with strange artifacts and blocky pixels. This paved the way for the rise of alternative pipelines, most notably those based on ESRGAN, which delivered significantly cleaner and more aesthetically pleasing high-resolution results.

Table of Contents

TL;DR

PULSE upscaling, once touted as a promising enhancement method in the AI art community, often created undesirable blocky and surreal artifacts in output images. Users quickly became frustrated with its limitations and turned to ESRGAN-based pipelines. ESRGAN proved to be more effective and reliable in preserving image details and generating clean upscaled images. This shift marked a significant improvement in the workflow of upscaling AI-generated visuals.

The Rise and Fall of PULSE Upscaling

PULSE (Photo Upsampling via Latent Space Exploration) was initially introduced with much enthusiasm. It was designed to reconstruct high-resolution facial images from low-resolution ones using generative adversarial networks (GANs). The core idea was to search the latent space of a GAN to find photo-realistic images that, when downscaled, would resemble the low-res input. In theory, this meant AI models like Stable Diffusion could generate low-res renders quickly, which PULSE could then upscale to make them appear photo-realistic.

However, in practice, PULSE didn’t always deliver as expected. While it produced striking results in select cases, the average outcome often involved bizarre distortions and blocky textures. Faces looked like collages rather than smooth representations, and fine details resembled pixelated artifacts instead of high-res imagery. It became clear that PULSE couldn’t reliably interpret all variations of synthetic images produced by models like Stable Diffusion, especially when images deviated from photo-realism.

PULSE’s Limitations

Overdependence on Latent Space: PULSE generated variations by exploring the latent space of StyleGAN2, which is trained on real human faces. When applied to AI-generated art that was not based on real photos, it often failed to match the expected structure.
Block Artifacts: Instead of enhancing textures, it magnified inconsistencies, often creating large square-like patterns and mismatched facial features.
Uncontrollable Outputs: Users had minimal control over the specifics of the enhanced results, making it difficult for artists to achieve their desired aesthetic.

The Emergence of ESRGAN

Enter Enhanced Super-Resolution Generative Adversarial Network—better known as ESRGAN. The AI art community quickly pivoted to using this far more flexible and adaptable option. ESRGAN introduced substantial improvements over older super-resolution techniques, primarily through intricate use of residual-in-residual dense blocks and perceptual loss.

Rather than reconstructing entirely new representations of an image as PULSE attempted, ESRGAN focused on refining the details already present. This made it particularly well-suited for scaling AI-generated art, which often included fantasy elements, unusual patterns, and imaginative structures that defied real-world logic.

Benefits of ESRGAN in Image Upscaling

Retention of Original Aesthetics: ESRGAN smartly preserves the stylistic intent of the original image while enhancing its resolution.
Customizable Models: The ESRGAN ecosystem allows for model swaps—users can load models specialized for anime, realistic textures, or painted styles.
Smooth and Sharp Results: With proper pre- and post-processing, ESRGAN outputs show reduced noise, smooth gradients, and crisp edges.

ESRGAN’s Integration with Stable Diffusion Pipelines

Given that Stable Diffusion typically generates 512×512 images due to model limitations, upscaling has become a vital part of the workflow. ESRGAN provided a non-invasive way to double or even quadruple image size while maintaining artistic integrity.

For most workflows, ESRGAN is used as a final step after generating images. Some advanced pipelines combine ESRGAN with interpolation (for gradual scaling) and custom models trained on similar datasets to maximize compatibility. Tools like Real-ESRGAN became especially popular due to their GUI support and simplified APIs, making it easier for non-technical users to access high-quality upscaling.

Real-World Use Cases

Poster Printing: Artists aiming to convert digital art into large-format posters use ESRGAN to maintain crispness.
Game Design: Developers enhance AI-generated textures for 3D environments using specialized ESRGAN models.
Virtual Avatars: In metaverse applications, clean high-res portraits created via ESRGAN significantly boost visual fidelity.

Transitioning from PULSE to ESRGAN

Once ESRGAN’s benefits became widely known, the community rapidly moved away from using PULSE. Forums, GitHub repositories, and online tutorials began recommending Real-ESRGAN for all serious enhancement tasks. Many Stable Diffusion GUI packages and automation scripts even removed PULSE support entirely, replacing it with ESRGAN options and presets.

This shift also led to the development of hybrid workflows. In these, base images were enhanced through multiple carefully tuned ESRGAN passes, sometimes even combined with denoising and latent upscaling techniques. The results were consistent, cleaner, and more aligned with the original creative vision of the artist.

Which Models Work Best With ESRGAN?

Various ESRGAN models are trained for targeted use cases, and the best choice often depends on the image type.

Real-ESRGAN: Best for general-purpose realistic image enhancement.
Anime4K & Manga109: Specifically tuned for cartoon and anime illustrations.
4x-UltraSharp: Ideal for texture-rich AI renders that need heightened crispness without introducing oversharpening artifacts.

Conclusion

What began as an ambitious push for realism in AI-generated faces through PULSE upscaling eventually revealed the limitations of that approach. The AI art community quickly found a more robust and flexible solution in ESRGAN, ushering in a new era of high-resolution refinement. Today, ESRGAN stands as the undisputed standard for upscaling in the AI imagery space, allowing creators to turn small-scale renders into stunning, high-detail pieces fit for large screens and printed media.

FAQ

What is the main difference between PULSE and ESRGAN?
PULSE tries to reconstruct images from low-res inputs using GAN latent spaces, while ESRGAN enhances existing images without altering their content significantly.
Why did PULSE produce blocky artifacts in Stable Diffusion images?
Because it was trained on real-world facial data, and struggled to interpret the diverse AI-generated content, leading to mismatched features and pixel blocks.
Can ESRGAN be used on non-facial images?
Absolutely. ESRGAN works exceptionally well on everything from landscapes to abstract art, given an appropriately trained model.
Is Real-ESRGAN better than base ESRGAN?
Real-ESRGAN builds on ESRGAN’s core with additional training for real-world scenarios and improved perceptual quality, making it more user-friendly and adaptable.
Is there a risk of losing detail with ESRGAN upscaling?
Minimal, especially if the right model is used. Some overly aggressive models might blur fine textures, but tuning the settings usually mitigates this.