NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Yu Zeng1, Charles Ochoa1, Mingyuan Zhou2, Vishal M. Patel3, Vitor Guizilini1, Rowan McAllister1

1Toyota Research Institute    2UT Austin    3Johns Hopkins University

Before After

← Drag the bar to compare β†’

After Before
After Before

We introduce Phase-Preserving Diffusion (Ο•-PD), a drop-in change to the diffusion process that preserves image phase while diffusing magnitude β€” enabling geometry-consistent re-rendering for games, videos, and simulators.

TL;DR

  • Key insight: Replace Gaussian noise with phase-preserving noise: keep the input phase, randomize the magnitude.
  • Controllable alignment: A single cutoff parameter r adjusts how strictly structure is preserved.
  • Plug-and-play: No architectural changes or extra parameters; works with any diffusion model for images or videos.
  • Applications: We showed results on photoreal game remasters, stylized re-rendering, and autonomous driving. Ο•-PD improves CARLA-to-Waymo planner performance by ~50%, significantly reducing the sim-to-real gap. The method can also be applied to other structured-aligned generation tasks.

Introducing NeuralRemaster with Phase-Preserving Diffusion

Re-imagine Retro Games

And More

Gaussian Diffusion in Frequency Domain

Standard diffusion models corrupt images with Gaussian noise, and learn to generate images by learning to invert this process. In frequency domain, Gaussian noise destroys both the magnitude and phase.

Gaussian Diffusion on a 1D signal (left) and an image (right) in frequency domain. Even early diffusion steps destroy the phase.

This works well for generating images from scratch (e.g. text-to-image), however, could lead to structural misalignment for image-to-image or video-to-video tasks.

After Before
After Before

← Drag the center line to compare β†’

Results from ChatGPT and Qwen-Edit: "Make this look like a real picture". Overlay shows misalignment.

Phase-Preserving Diffusion (Ο•-PD)

Classical signal processing tells us that structural information is encoded in the phase. If you mix the phase of one image with the magnitude of another, the result keeps the structure of where the phase is from.

Mixing the phase of one image with the magnitude of another preserves the structure of where the phase is fro

Inspired by this observation, we introduce phase-preserving diffusion, diffusing magnitude while keeping most of the phase.

Phase-Preserving Diffusion on a 1D signal (left) and an image (right) in frequency domain.

Instead of Gaussian noise, Ο•-PD uses structured noise that shares the image phase. This allows the model to learn to denoise without ever losing structural alignement. Unlike previous methods, Ο•-PD does not need additional module to encode the structural information from the input. It is model agnostic, works with any base model for images or videos, and makes no architectural changes.

Ο•-PD does not alter the model architectue or training objective, remaining lightweight and model agnostic.
This provides a simple and efficient way to acheive structure-aligned generation with diverse appearance while keeping the original structure.
Before After
Before After

← Drag the center line to compare β†’

Qwen-Edit (left) and Ο•-PD (right) on the same input. πŸŽ‰ No misalignment!

Controlling Alignment Strength with Frequency-Selective Structured Noise

One perk of ControlNet over simple channel-wise concatenation is that it allows us to control the alignment strength.

Ο•-PD can provide the same flexibility without the need for a heavy encoder module. This is achieved by introducing Frequency-Selective Structured (FSS) noise.

We define a smooth mask in the frequency domain with cutoff radius r:

  • Low frequencies (inside r) keep the image phase β†’ preserve coarse geometry and layout.
  • High frequencies use the noise phase β†’ allow appearance variation and detailed edits.
Same noise, different r: large r keeps geometry almost perfectly aligned, small r allows creative edits

Applications in Embodied AI and Sim-to-Real

Autonomous driving Simulation enhancement Zero-shot transfer

In autonomous driving and robotics, planners depend on consistent geometry: lane positions, obstacles, and ego motion. Ο•-PD can enhance simulators like CARLA by re-rendering them to look more like real-world data without altering the underlying scene.

Comparison to Cosmos-Transfer2.5 (vis input, control weight 0.5).

In our experiments, Ο•-PD achieves up to 50% reduction in ADE/FDE on Waymo’s WOD-E2E validation set compared to the CARLA-only baseline.

Error of a simple ResNet-34 planner trained on CARLA/transfered images. Lower is better. 1 Zero-shot transfer. 2 With AV finetuning.

Applications in Content Creation

For content creation, given real images or videos, Ο•-PD can generate creative visual effects while keeping the structure intact.

Stylized re-rendering of a dog video. Upper left: original video.

Video

Additional Results

← Drag the bar to compare β†’

After Before
After Before
After Before
After Before
After Before
After Before

Citation

If you find this work useful, please cite:

@article{zeng2025neuralremaster,
  title   = {{NeuralRemaster}: Phase-Preserving Diffusion for Structure-Aligned Generation},
  author  = {Zeng, Yu and Ochoa, Charles and Zhou, Mingyuan and Patel, Vishal M and
             Guizilini, Vitor and McAllister, Rowan},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2025}
}