Introduction
Recent advancements in image-to-image AI models have opened new possibilities in digital asset management and content generation. Having worked with Sitecore Content Hub across numerous enterprise implementations, I've discovered an interesting application: using our extensive library of professional photography to train specialized AI models for generating new property images.
Content Hub's robust asset management capabilities make it an ideal platform for both sourcing training data and managing AI-generated content. Through years of implementations, we've accumulated a substantial collection of high-quality property photographs, complete with detailed metadata. This presented an opportunity to explore how modern AI could help solve a common challenge in real estate marketing.
The Challenge
In real estate development, presenting yet-to-be-built properties requires compelling visual content that accurately represents the final product. Traditional approaches often involve basic architectural renders or reference photos, which may not fully capture the intended aesthetic. Our task was to generate photorealistic images of future properties using a combination of architectural renders, sketches, and text descriptions, while maintaining the professional quality and distinctive style present in our existing photography.
The complexity lies in translating architectural specifications into images that match the caliber of professional photography. Each new property visualization needs to reflect specific design elements, materials, and finishes while maintaining visual consistency with our existing portfolio. This requires an AI model that understands not just general architectural features, but also our unique brand aesthetic captured in thousands of previous property photos.
The core concept is performing style transfer from our curated Content Hub assets to newly generated images using the Flux image-to-image model.
Technical Architecture
The solution leverages several cutting-edge technologies:
- Base Model: After evaluating various options, including Stable Diffusion 3.5, I selected Flux 1 from Black Forest Labs as our foundation. Their image-to-image model demonstrated superior photorealism and high-resolution output capabilities, crucial for architectural visualization.
- Custom LoRA Models: Through experimentation, I developed separate LoRA models for different contexts (interior and exterior shots). The training process required careful balancing of dataset size and training parameters. Finding the optimal learning rate, batch size, and training steps proved to be one of the more challenging aspects of the implementation.
- Infrastructure Stack:
- fal.ai for high-performance inference
- Hugging Face to host LoRA models
- RunPod.io for fine-tuning LoRA models
- Sitecore Content Hub as both the source of training data and the destination for generated images
Technical Implementation
The core of our solution involves fine-tuning LoRA models for Flux 1. I chose to use this AI toolkit framework as a foundation, adapting it to our specific requirements.
Here's a glimpse into our training pipeline:
- Data Preparation: Quality metadata is crucial for training. I leveraged OpenAI's vision API to generate detailed descriptions, focusing on creating prompts that would help the image model render new photorealistic images that closely resemble our existing portfolio. I'm using the same approach as what I described in another blog post, AI-Enhanced Interior Photography: Leveraging Content Hub for Image Generation.
- Model Training: configuration: here's my configuration file (please refer to readme in ai-toolkit documentation for more details on the configuration):
--- job: extension config: # this name will be the folder and filename name name: "mh_flux_lora_v7" process: - type: "sd_trainer" # root folder to save training sessions/samples/weights training_folder: "output" # uncomment to see performance stats in the terminal every N steps # performance_log_every: 1000 device: cuda:0 # if a trigger word is specified, it will be added to captions of training data if it does not already exist # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word # trigger_word: "meritage" network: type: "lora" linear: 16 linear_alpha: 16 save: dtype: float16 # precision to save save_every: 250 # save every this many steps max_step_saves_to_keep: 4 # how many intermittent saves to keep push_to_hub: false #change this to True to push your trained model to Hugging Face. # You can either set up a HF_TOKEN env variable or you'll be prompted to log-in # hf_repo_id: your-username/your-model-slug # hf_private: true #whether the repo is private or public datasets: # datasets are a folder of images. captions need to be txt files with the same name as the image # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently # images will automatically be resized and bucketed into the resolution specified # on windows, escape back slashes with another backslash so # "C:\\path\\to\\images\\folder" - folder_path: "/workspace/ai-toolkit/bedrooms_empty" caption_ext: "txt" caption_dropout_rate: 0.05 # will drop out the caption 5% of time shuffle_tokens: false # shuffle caption order, split by commas cache_latents_to_disk: true # leave this true unless you know what you're doing resolution: [512, 768, 1024] # flux enjoys multiple resolutions train: batch_size: 16 steps: 2000 # total number of steps to train 500 - 4000 is a good range gradient_accumulation_steps: 1 train_unet: true train_text_encoder: false # probably won't work with flux gradient_checkpointing: true # need the on unless you have a ton of vram noise_scheduler: "flowmatch" # for training only optimizer: "adamw8bit" lr: 1e-4 # uncomment this to skip the pre training sample # skip_first_sample: true # uncomment to completely disable sampling # disable_sampling: true # uncomment to use new vell curved weighting. Experimental but may produce better results # linear_timesteps: true # ema will smooth out learning, but could slow it down. Recommended to leave on. ema_config: use_ema: true ema_decay: 0.99 # will probably need this if gpu supports it for flux, other dtypes may not work correctly dtype: bf16 model: # huggingface model name or path name_or_path: "black-forest-labs/FLUX.1-dev" is_flux: true quantize: true # run 8bit mixed precision # low_vram: true # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower. sample: sampler: "flowmatch" # must match train.noise_scheduler sample_every: 250 # sample every this many steps width: 1024 height: 1024 prompts: # you can add [trigger] to the prompts here and it will be replaced with the trigger word # - "[trigger] holding a sign that says 'I LOVE PROMPTS!'"\n - "a primary bedroom with a large bed, nightstands, and a dresser, soft lighting, and a window with sheer curtains" - "a cozy living room with a fireplace, a large sofa, and a coffee table, warm lighting, and a rug on the floor" - "a modern kitchen with stainless steel appliances, white cabinets, and a marble countertop, pendant lighting, and a tiled backsplash" - "a spacious bathroom with a freestanding tub, a double vanity, and a walk-in shower, natural light, and a potted plant" - "a minimalist dining room with a wooden table, chairs, and a chandelier, neutral colors, and artwork on the walls" - "a home office with a desk, a chair, and bookshelves, a window with a view, and a laptop on the desk" - "a walk-in closet with shelves, drawers, and hanging space, a mirror, and a chandelier" - "a sunroom with wicker furniture, plants, and a ceiling fan, large windows, and a view of the garden" - "a basement with a home theater, a bar, and a pool table, dim lighting, and a popcorn machine" - "an attic with exposed beams, a skylight, and a reading nook, cozy lighting, and a bookshelf" neg: "" # not used on flux seed: 42 walk_seed: true guidance_scale: 4 sample_steps: 20 # you can add any additional meta info here. [name] is replaced with config name at top meta: name: "[name]" version: "1.0"
- Training script: And here's what my training script looks like, so far (this is still work in progress, as I'm making it better):
#!/usr/bin/env python3 """ LoRA Training Script for AI Image Generation This script sets up and runs the training process for LoRA models using the AI-toolkit framework. It includes setup for environment, image processing, and model training. """ #!/usr/bin/env python3 """ LoRA Training Script for AI Image Generation This script processes images and generates captions using OpenAI's GPT-4 Vision API, then runs LoRA training for AI image generation models. Key features: - Parallel image processing using Ray - Caption generation with GPT-4 Vision - Integration with AI-toolkit for LoRA training """ import os import base64 from typing import List, Tuple import time from openai import OpenAI import ray from PIL import Image # Initialize Ray for parallel processing ray.shutdown() ray.init(num_cpus=6) # Adjust based on available CPU cores @ray.remote class ImageProcessor: """Handles image processing and caption generation using OpenAI's GPT-4 Vision.""" def __init__(self, api_key): self.client = OpenAI(api_key=api_key) def encode_image(self, image_path: str) -> str: """Convert image to base64 encoding for API submission.""" with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') def generate_caption(self, image_path: str) -> Tuple[str, str]: """ Generate a detailed caption for an image using GPT-4 Vision. Returns tuple of (image_path, caption) """ prompt = """ Create a caption of this image (the materials and finishes) using a set of 5-15 one-to-two-word descriptors that can be fed into a diffusion image generation model, starting with 'empty bedroom'. """ try: base64_image = self.encode_image(image_path) response = self.client.chat.completions.create( model="gpt-4-vision-preview", messages=[ { "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" }, }, ], } ], ) return image_path, response.choices[0].message.content except Exception as e: print(f"Error generating caption for {image_path}: {e}") return image_path, None def get_image_files(folder_path: str) -> List[str]: """Return list of supported image file paths in the given folder.""" supported_formats = (".jpg", ".jpeg", ".png", ".bmp", ".tiff") return [ os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.lower().endswith(supported_formats) ] def save_caption(result: Tuple[str, str]) -> None: """Save generated caption to a text file alongside the image.""" image_path, caption = result if caption: txt_path = os.path.splitext(image_path)[0] + ".txt" with open(txt_path, "w") as txt_file: txt_file.write(caption) print(f"Caption saved for {os.path.basename(image_path)}.") else: print(f"Failed to generate caption for {os.path.basename(image_path)}.") def process_images_parallel(folder_path: str, api_key: str, num_workers: int = 8) -> None: """ Process images in parallel using Ray. Args: folder_path: Path to the folder containing images api_key: OpenAI API key num_workers: Number of parallel workers to use """ # Get list of image files image_files = get_image_files(folder_path) # Skip images that already have captions image_files = [ f for f in image_files if not os.path.exists(os.path.splitext(f)[0] + ".txt") ] if not image_files: print("No new images to process.") return # Create processor actors processors = [ImageProcessor.remote(api_key) for _ in range(num_workers)] # Distribute work among processors futures = [] for i, image_path in enumerate(image_files): processor = processors[i % num_workers] futures.append(processor.generate_caption.remote(image_path)) # Process results as they complete for future in ray.get(futures): save_caption(future) print(f"Processed {len(image_files)} images using {num_workers} workers.") def main(): """Main execution function.""" # Configuration api_key = os.getenv("OPENAI_API_KEY") if not api_key: raise ValueError("Please set OPENAI_API_KEY environment variable") folder_path = "/workspace/ai-toolkit/bedrooms_empty" # Process images process_images_parallel(folder_path, api_key) # Run LoRA training using AI-toolkit os.system("python3 run.py config/train_lora*") if __name__ == "__main__": main() ## Next Steps: Integration with Sitecore Content Hub The integration of generated images back into Content Hub deserves its own detailed discussion. The process involves automated asset ingestion, metadata enhancement, and maintaining proper versioning - topics I plan to cover in a future post. ## Looking Forward: Generative AI and Virtual Staging This project represents an early exploration of combining enterprise DAM capabilities with generative AI. One particularly promising direction is virtual staging, where we can transform empty interior photos into fully staged spaces ready for marketing materials. The possibilities for streamlining real estate visualization while maintaining brand consistency are quite exciting. ## Conclusion The intersection of Content Hub enterprise-grade asset management and modern AI capabilities offers intriguing possibilities for real estate visualization. While the technology continues to evolve, our experiments with Flux and LoRA models demonstrate the potential for bridging the gap between architectural concepts and photorealistic presentations. The key to success lies in leveraging high-quality training data and careful model fine-tuning - areas where Content Hub structured approach to digital asset management proves invaluable. As we continue to explore these technologies, the focus remains on practical applications that deliver real value to our marketing and sales processes. The ability to generate consistent, high-quality property visualizations represents just the beginning of what is possible with this combination of enterprise DAM and generative AI. ## Useful Links - [Flux 1](https://blackforestlabs.ai) - [Stable Diffusion](https://stability.ai/stable-diffusion) - [AI Toolkit](https://github.com/ostris/ai-toolkit) - [Hugging Face](https://huggingface.co/) - [RunPod.io](https://www.runpod.io/) - [Sitecore Content Hub](https://www.sitecore.com/products/content-hub) - [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685) - [fal.ai](https://fal.ai/)