How to generate images with ChatGPT
robort - 2023-06-06 09:30:28
ChatGPT is a tool that sometimes seems miraculous, capable of providing great output from very simple prompts. We have already talked about the risks of ChatGPT thanks to the Panda Security experts, who have provided some advice on how to recognize the main dangers of AI, signed OpenAI.
However, today the time has come to answer another question that many people have asked themselves when experimenting with this tool. Here's how to generate images with ChatGPT.
Is it possible to generate images with ChatGPT?
Actually, first of all, we need to ask ourselves another question: is it possible to create images with ChatGPT? The answer is no, and we'll explain why.
Since it is a language model that is limited to producing texts similar to those written by human beings based on an input offered by an individual, it cannot provide a visual output in the same way as tools such as DALL-E, a solution proposed by OpenAI to starting in 2021.
However, it is possible to take advantage of ChatGPT's text output to create an image using image generators, just like DALL-E or, alternatively, Midjourney. Let's see how it's done.
How to create an image with Midjourney
In the absence of a direct synchronization between ChatGPT and DALL-E, made by the same company, or even with Midjourney, you have to proceed manually. First of all, we need an idea: with a simple request placed "in one's own words", therefore, we can obtain a base thanks to ChatGPT.
For example, if you want to make a particular logo for a local soccer club you would use a prompt like this: “Create a text prompt for Midjourney to create a soccer club logo”. Of course, it is better to translate it into English so that both AIs can communicate better with each other.
Once the output has been obtained, in the specific case of Midjourney it is necessary to enter the Discord server used for generating the images, with completely free access.
At this point, just go to the appropriate channel and complete the generation of the desired image by delivering the ChatGPT prompt to the AI. In a few seconds - in the best of cases - or in a few minutes - in moments of greater traffic -, Midjourney will provide you with four outputs.
Does the procedure also apply to DALL-E?
Naturally, the same procedure is also valid for DALL-E or other image generators: first, a more complex text is requested from ChatGPT, starting from our needs, and then the output of the language model is copied and pasted into another AI designed for the production of drawings, photographs, and visual projects. Each of them will therefore respond in a different way, offering peculiar outputs and not necessarily correct or perfect.
Don't expect miracles from the AIs, mind you: their limits will force you to carefully evaluate the words used, keeping the input simple and essential to guarantee the AI more freedom, or going down to the smallest details - without exaggerating - so that the guidelines Midjourney offers and similar services allow them to work at their best.
With GPT-4 everything will change
While the current version of ChatGPT is limited to text generation, the next version featuring the new GPT-4 language model is expected to be multimodal in nature. What does it mean? That GPT-4 could be able to handle audiovisual inputs and outputs, i.e. becoming capable of generating images. In this way, finally, even the most demanding users inspired by the latest generation AI tools will be able to have fun, experiment, and be satisfied with the final results.
When will GPT-4 arrive? At the moment we still don't have a launch date, much less a debut period in its stable version.
Nevertheless, we know that Microsoft will show GPT-4 next week in the context of the "Focus on AI - Digital Kickoff" event organized by the German division of the US company.
Perhaps on this occasion, we will be able to see GPT-4 working with a modified version of ChatGPT for Bing.
In short, the Redmond company could have anticipated all its rivals by first securing GPT-4 thanks to a partnership agreement with OpenAI.
What will be the capabilities of this tool? It's hard to predict. The incredible potential of ChatGPT, which we recall is based on GPT-3, highlighted what awaits us in the future, i.e. AI tools ready to offer us almost immediate results even in complex contexts, with strings of code and other textual products of fine workmanship, albeit often imperfect.
Midjourney, on the other hand, showed what the future of art could be.
The ethics of these tools is still dubious and the subject of discussion among lawyers, artists, programmers, and technology giants, but the direction of the developers already seems quite clear: to be able to produce the language model as complete and performing as possible, preparing for a future that allows anyone to express themselves through technology.
So we just have to be patient, waiting for the evolution of ChatGPT – or rather, of the “ Generative Pre-trained Transformer ” on which it is based – and finally trying the new artificial intelligence by hand, with all the credentials to be able to overcome any limit we thought of in the past. Will they really be able to do it? We'll see.