Applications in Content Creation
For content creation, given real images or videos, Ο-PD can generate creative visual effects while keeping the structure intact.
1Toyota Research Institute 2UT Austin 3Johns Hopkins University
β Drag the bar to compare β
We introduce Phase-Preserving Diffusion (Ο-PD), a drop-in change to the diffusion process that preserves image phase while diffusing magnitude β enabling geometry-consistent re-rendering for games, videos, and simulators.
r adjusts how strictly structure is preserved.
Standard diffusion models corrupt images with Gaussian noise, and learn to generate images by learning to invert this process. In frequency domain, Gaussian noise destroys both the magnitude and phase.
This works well for generating images from scratch (e.g. text-to-image), however, could lead to structural misalignment for image-to-image or video-to-video tasks.
β Drag the center line to compare β
Classical signal processing tells us that structural information is encoded in the phase. If you mix the phase of one image with the magnitude of another, the result keeps the structure of where the phase is from.
Inspired by this observation, we introduce phase-preserving diffusion, diffusing magnitude while keeping most of the phase.
Instead of Gaussian noise, Ο-PD uses structured noise that shares the image phase. This allows the model to learn to denoise without ever losing structural alignement. Unlike previous methods, Ο-PD does not need additional module to encode the structural information from the input. It is model agnostic, works with any base model for images or videos, and makes no architectural changes.
β Drag the center line to compare β
One perk of ControlNet over simple channel-wise concatenation is that it allows us to control the alignment strength.
Ο-PD can provide the same flexibility without the need for a heavy encoder module. This is achieved by introducing Frequency-Selective Structured (FSS) noise.
We define a smooth mask in the frequency domain with cutoff radius r:
r) keep the image phase β preserve coarse geometry and layout.
r: large r keeps geometry almost perfectly aligned, small r allows creative edits
In autonomous driving and robotics, planners depend on consistent geometry: lane positions, obstacles, and ego motion. Ο-PD can enhance simulators like CARLA by re-rendering them to look more like real-world data without altering the underlying scene.
In our experiments, Ο-PD achieves up to 50% reduction in ADE/FDE on Waymoβs WOD-E2E validation set compared to the CARLA-only baseline.
For content creation, given real images or videos, Ο-PD can generate creative visual effects while keeping the structure intact.
β Drag the bar to compare β
If you find this work useful, please cite:
@article{zeng2025neuralremaster,
title = {{NeuralRemaster}: Phase-Preserving Diffusion for Structure-Aligned Generation},
author = {Zeng, Yu and Ochoa, Charles and Zhou, Mingyuan and Patel, Vishal M and
Guizilini, Vitor and McAllister, Rowan},
journal = {arXiv preprint arXiv:XXXX.XXXXX},
year = {2025}
}