Using Python and OpenAI to generate and resize images (#dev #python #openai #ai #aiart)

What is a metaclass and how it works. (#python #dev #metaclass)

Overview

Since everyone and their dogs are either using OpenAI (API or ChatGPT) to create content, I decided to join the party and write a tutorial showing a simple way to use the OpenAI (not ChatGPT) to automate the process of generating images from a prompt, and then creating variants of that image in different sizes.

Whether you’re looking to experiment with AI-generated art or automate image resizing tasks, this post will help you get started.

Quick note before we start

I’m not responsible for anything you do. If you create something that is offensive, illegal, or harmful, that’s on you. Also if you accidentaly post your API Key to GitHub or something and then receive a bill for thousands of dollars, that’s also on you. Be carefaul with your API Key and use it responsibly.

Getting started with OpenAI

So, as you may imagine, OpenAI is not free, so you’ll need to create an account there and get an API key. From their site:

First, create an OpenAI account or sign in. Next, navigate to the API key page and “Create new secret key”, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone.

I also higly recommend that you access the Usage: Cost page and set a budget limit for your account. This way you won’t get a surprise bill at the end of the month.

For my tinkering needs, a budget limit of $20 bucks (USD) is more than enough. The code in this script eat $0.08 of my monthly budget.

What we’ll be doing

The script we’ll be creating will call the OpenAI API to generate an image based on a prompt + model configuration, and then it will create different versions of that image with different sizes, but this will be done locally, so you won’t be charged for that.

Installing the required libraries

For this, we’ll need the following libraries:

  • OpenAI’s Python library
  • Pillow (PIL)
  • Requests
  • Io
  • Types
  • PathLib
  • PPrint

Luckly, most of those are already installed in your Python environment, so you’ll only need to install the following:

pip install --upgrade openai
pip install --upgrade Pillow

Basic parameters

This might change in the future, but for now we can use 2 models for this: dall-e-2 and dall-e-3. So, if we want to allow for more flexibility, we should create a way to define the allowed values and parameters for each of those models.

In this case we’ll use a dictionary to store the parameters for each model.

Explaining what each parameter means:

  • max_prompt_size: The maximum number of characters allowed for the prompt.
  • n: The number of images to generate.
  • quality: The quality of the generated image.
  • response_format: The format of the response.
  • size: The size of the generated image.
  • style: The style of the generated image.

After reading the documentation, for model dall-e-2 we have:

{
    "max_prompt_size": 1000,
    "n": {
        "min": 1,
        "max": 10
    },
    "quality": None,  # Not supported.
    "response_format": [
        "url",  # Only available for 60 minutes after the request
        "b64_json"
    ],
    "size": [
        "256x256",   # Square
        "512x512",   # Bigger square
        "1024x1024"  # Even Bigger square
    ],
    "style": None  # Not supported.
}

And for model dall-e-3 we have:

{
    "max_prompt_size": 4000,
    "n": {
        "min": 1,
        "max": 1
    },
    "quality": ["standard", "hd"],
    "response_format": [
        "url",  # Only available for 60 minutes after the request
        "b64_json"
    ],
    "size": [
        "1024x1792",  # Portrait
        "1792x1024",  # Landscape
        "1024x1024"   # Square
    ],
    "style": ["vivid", "natural"]
}

Feel free to change those later, but I’ll use dall-e-3 as the default model for this script and the following config:

  • n: 1
  • quality: standard
  • response_format: url
  • size: 1024x1792
  • style: vivid

Since we already have the allowed configuration for each model, I’ll create the default config like this:

from types import MappingProxyType

default_config = MappingProxyType({  # Immutable dictionary
    "n": params[default_model]["n"]["min"],
    "quality": params[default_model]["quality"][0],
    "response_format": params[default_model]["response_format"][0],
    "size": params[default_model]["size"][0],
    "style": params[default_model]["style"][0]
})

The reason I’m using MappingProxyType is to make the dictionary immutable, so we can’t change the values later and we can use it as the default value of an argument without the linter yelling at us.

Generating the image

Now that we have the default configuration, we can create a function that will call the OpenAI API to generate the image. This function will return the URL of the generated image, and will receive the following arguments:

  • Prompt - str;
  • model - str (default: dall-e-3);
  • config - dict (default: default_config).

Since we’re receving arguments, we need to make sure that their are valid.

Validating model name

To validate the model, we’re checking if we have a model name and if it’s valid, and we can do this with this function:

def _ensure_model_name_is_valid(model: str):
    if model is None:
        raise ValueError("Model name cannot be None.")

    if model in params:
        return

    raise ValueError(f"Model name must be one of the following: {', '.join(params.keys())}")

Validating prompt

This is even simpler to check. We just need to make sure we have a prompt and it’s within the allowed size. If this was a produciton level code, we could also check if this was a malicious prompt, etc., but for now life is simple.

def _ensure_prompt_is_valid(prompt: str, model: str):
    max_prompt_size = params[model]["max_prompt_size"]

    if prompt is None:
        raise ValueError("Prompt cannot be None.")

    if len(prompt) <= max_prompt_size:
        return

    raise ValueError(f"Prompt must be less than or equal to {max_prompt_size} characters.")

Validating config

This is a bit more complex, since we need to check if the values are valid for the model we’re using. I’m not going to worry too much about optimization here, so I’ll just check every config value against the model definitions.

Note: I could create a function to sanitize the config values, but this post is already big enough.

def _ensure_config_is_valid(config: dict, model: str):
    for key, value in config.items():
        if key in params[model]:
            if value is None:
                raise ValueError(f"Value for key '{key}' cannot be None.")

            if key == "n":
                if value < params[model][key]["min"] or value > params[model][key]["max"]:
                    raise ValueError(f"Value for key '{key}' must be between {params[model][key]['min']} and {params[model][key]['max']}.")

            if key == "quality":
                if value not in params[model][key]:
                    raise ValueError(f"Value for key '{key}' must be one of the following: {', '.join(params[model][key])}")

            if key == "response_format":
                if value not in params[model][key]:
                    raise ValueError(f"Value for key '{key}' must be one of the following: {', '.join(params[model][key])}")

            if key == "size":
                if value not in params[model][key]:
                    raise ValueError(f"Value for key '{key}' must be one of the following: {', '.join(params[model][key])}")

            if key == "style":
                if value not in params[model][key]:
                    raise ValueError(f"Value for key '{key}' must be one of the following: {', '.join(params[model][key])}")

Generating the image

So now that we have the validation functions, let’s move on to the function that will actually call the OpenAI API.

First thing we call the validation functions we just created:

_ensure_model_name_is_valid(model)
_ensure_prompt_is_valid(prompt, model)
_ensure_config_is_valid(config, model)

Then we crete the OpenAI client:

import openai

client = openai.OpenAI(api_key="ab-proj-1234567890")

And then we make the request to generate the image:

response = client.images.generate(
    prompt=prompt,
    model=model,
    **config
)

In this code, we explictly pass the prompt, and the model. The config we create, I’m using a dictionary unpacking to pass the values as keyword arguments (like what we do in TypeScript…)

This is a blocking call and will raise an exception if something goes wrong, so we can just return the URL of the generated:

return response.data[0].url

And that’s it! So far, we have a function that will generate an image based on a prompt and return the URL of the generated image in a flexible enough way that we can easily change the parameters of the request. If that’s all you wanted, you can stop here and use this function in your projects.

Downloading the image

So now we have the URL of the generated image, we can download it and save it to a file. Since we’re generating variations of this image, we’ll add the word original to this filename.

This function will receive two arguments:

  • url - str;
  • filename without the extension - str.

Going to trust the process and won’t validate those inputs, but you could.

First thing we do is to make a request to the URL and get the image data:

import requests
response = requests.get(image_url)

To make sure nothing went wrong, and since we’re already raising errors in case of failure, we can do this:

response.raise_for_status()

If no exception was raised, we can safely load the image to a Pillow object:

from io import BytesIO
from PIL import Image

original_image = Image.open(BytesIO(response.content))

Now let’s adjust the filename:

original_filename = f"{filename_without_ext}_original.jpg"

And save the image:

original_image.save(original_filename)

Since we’re generating variations of this image, let’s gather some details and return them:

# Get the image width and height
original_width, original_height = original_image.size

# Calculate the ratio
ratio = original_height / original_width

# Return the image details
return {
    "filename_without_ext": filename_without_ext,
    "original_filename": original_filename,
    "image_object": original_image,
    "path": Path(original_filename),
    "width": original_width,
    "height": original_height,
    "ratio": ratio
}

Lastly but not least: Creating variations of the image

Now that we have the original image, we can create variations of it with different sizes. This last function will receive the following arguments:

  • image details - dict;
  • target_widths - list (default: [256, 512, 1024]);

Since we’re creating variations of the iamge, let’s save some details of each image, starting with the original one:

images = {
    "original": {
        "filename": image_details["original_filename"],
        "path": image_details["path"],
        "width": image_details["width"],
        "height": image_details["height"]
    }
}

We’re going to loop through the target widths and create the variations of the image. For each loop, we will:

Define the height of the image, based on the width:

height = int(width * image_details["ratio"])

For convenience sake, we’ll create a variable with the size of the image:

size = f"{width}x{height}"

Since we have the image object, we can simply resize it multiple times:

resized_image = image_details["image_object"].resize((width, height))

Now we define a filename for this image:

filename = f"{image_details['filename_without_ext']}_{size}.jpg"

Then we save the file:

resized_image.save(filename)

And lastly, we save the details of this image:

from pathlib import Path

images[size] = {
    "filename": filename,
    "path": Path(filename),
    "width": width,
    "height": height
}

After the for loop, we return the images:

return images

Orchestrating everything

Now that we have all the functions we need, we can create a the script part that will call those functions in the correct order and generate the images.

if __name__ == '__main__':
    # Prompt for the image
    img_prompt = "A cute corgi dog in a space suit, floating in space, and trying to reach a tasty treat."

    print("Generating image...")
    generated_image_url = generate_image(img_prompt)

    print("Downloading generated image...")
    original_image_details = download_image(generated_image_url, "doggo_in_space")

    # Define the desired target widths
    desired_target_widths = [780, 500, 342, 185, 154, 92]

    print("Creating image variants...")
    generated_images_details = create_variants(
        image_details=original_image_details,
        target_widths=desired_target_widths
    )

    print("Image processing completed.")
    print("Generated image details:")
    pprint(generated_images_details)

You could also ask the user for the prompt, or change some configurations, etc.

Example usage

In the script, I used the following prompt: A cute corgi dog in a space suit, floating in space, and trying to reach a tasty treat.

And that was the result (that costed me USD$0.08) 🐶:

A cute corgi dog in a space suit, floating in space, and trying to reach a tasty treat.

A cute corgi dog in a space suit, floating in space, and trying to reach a tasty treat.

If you want the full script, you can get a full (including the resulting images) working version (sans the API Key) here.

Hope that helps. 🙂

Translations: