HorleyTech

Meet CM3LEON, Meta’s New AI Image Generation Model For Greater Efficiency

AI-image-generating tools are obviously not news at this point, with popular ones like Stable Diffusion, DALL·E and Midjourney widely available. However, Meta is pushing forward and seeking to break boundaries with its research into new forms of generative AI models.

And, just recently, it revealed its latest effort known as CM3leon (pronounced “chameleon”).

CM3leon is a multimodal foundation model for text-to-image and image-to-text creation, which is useful for automatically generating captions for images. And what sets it apart from existing AI-image-generating tools are the techniques Meta used to build CM3leon and the performance that Meta claims the foundation model can achieve.

Where other text-to-image generation technologies rely largely on the use of diffusion models to create an image (hence the origin of the name Stable Diffusion), CM3leon uses something different: a token-based autoregressive model.

“Diffusion models have recently dominated image generation work due to their strong performance and relatively modest computational cost,” Meta wrote in a research paper titled Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning. “In contrast, token-based autoregressive models are known to also produce strong results, with even better global image coherence in particular, but are much more expensive to train and use for inference.”

Basically, Meta researchers used CM3leon to demonstrate how the token-based autoregressive model can, in fact, be more efficient than a diffusion model-based approach.

“CM3leon achieves state-of-the-art performance for text-to-image generation, despite being trained with five times less compute than previous transformer-based methods,” Meta wrote in a blog post.

CM3LEON and Its Ethical Approach To Image Generation

While the fundamentals of how CM3leon works are still somewhat similar to existing text generation models, Meta researchers started with a retrieval-augmented pre-training stage. So, instead of just scraping publicly available images off the internet – a method that has led to some legal challenges for diffusion-based models – Meta has taken a different path.

“The ethical implications of image data sourcing in the domain of text-to-image generation have been a topic of considerable debate,” the Meta research paper states. “In this study, we use only licensed images from Shutterstock. As a result, we can avoid concerns related to image ownership and attribution, without sacrificing performance.”

After the pre-training, the CM3leon model undergoes a supervised fine-tuning (SFT) stage. Meta researchers claim this produces highly optimized results regarding both resource utilization and image quality. Meta notes in its research paper that it implements SFT, the same approach that OpenAI uses to help train ChatGPT, to train the model to understand complex prompts. This is useful for generative tasks.

“We have found that instruction tuning notably amplifies multi-modal model performance across various tasks such as image caption generation, visual question answering, text-based editing, and conditional image generation,” the paper states.

The sample sets of generated images that Meta has shared in its blog post about CM3leon are impressive. They clearly show the model’s ability to understand complex, multi-stage prompts, generating extremely high-resolution images as a result.

CM3LEON

Credit: Meta AI

Currently, CM3leon is a research effort. Hence, it’s not yet clear when or even if Meta will make this technology publicly available in a service on one of its platforms. However, given how powerful the tool seems to be, as well as the higher efficiency of generation, it does seem very likely that CM3leon and its approach to generative AI will eventually move beyond research.

RELATED ARTICLES:

Leave a Comment

Your email address will not be published. Required fields are marked *

WeCreativez WhatsApp Support
Our customer support team is here to answer your questions. Ask us anything!
👋 Hi, how can I help?