DALL·E 2

PULSE

By V La Harivardhan, 2nd Year

Dall-E 2 is a deep learning model (AI system) developed by Open AI. In its essence deep learning is a type of machine learning, in which a computer system learns to perform tasks directly from input (text, images, sound, etc.,). This article briefs about the AI system “Dall-E 2”.

Dall-E 2 is an AI model which generates high resolution images or art from abstract texts. It is the successor of Dall-E which generates unique images from texts. Dall-E was launched in January 2021 and it uses a version of GPT-3(the predecessor of chatgpt) to generate images. In September 2022, Dall-E 2 came into use and it is capable of generating realistic images with higher resolution and more detailing. The word Dall-E is derived from the sound blend of Spanish artist Salvador Dali and Pixar’s animated robot character WALL-E.

Dall-E2 offers significant improvements comparing to the older version Dall-E, which took much longer time and always produced grainy outputs. In November 2022, Open AI released Dall-E 2 as an API (Application Programming Interface). API is a type of software interface which provides a path for two or more computer programs to communicate with each other.

Microsoft implemented Dall-e 2 in their designer app “Bing” and image creation tool in “Microsoft edge”. It operates on the basis of cost per image and volume discounts are available to companies due to large number of images in its commercial deployment.

Now the model can run with a text prompt of upto 400 characters. It can create images in multiple styles. For example, by simply adding “in pencil drawing” to the text description will give a pencil art. It can also rearrange and manipulate objects in the image. It can produce images for wide range of arbitrary descriptions. For each Input, Dall-E 2 returns four images. It can be downloaded in .png form or can be shared as an URL. Rather than text, it can also receive an image as input and give variations and edit them.

Variations and edits return a maximum of three images. It has another option called “out painting”, which allows to extend the images beyond the current established boundaries.

Dall-E 2 combines two processes to generate image from the text – diffusion models and CLIP (Constastive Language Image Pretraining) by Open AI. CLIP is a multimodal model that combines English concepts with knowledge of images. Quite the marvel of Dall-E 2, it has certain limitations-it can’t differentiate between red flower in a yellow vase and a yellow flower in a red vase. Requesting in numerical format also leads to error in images.

We live in a era where things are furthured by AI systems. In near future most of the human activity is going to get automated and a huge role is played by AI systems in doing so. These small AI systems are root to the emergence of fully functional AI systems in future.

Prompts given to DALL·E 2 (From top left, clockwise)

1) A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars.