By Mattias Lundell -

Extracting recipes from images using gpt-4o in Opper

Multimodal models are a class of models that can process and generate data from multiple input modalities, such as text, images, and audio. These models are particularly useful when working with data that is not only text-based, but also includes other types of data, such as images or audio. In this blog post, we will explore how to use multimodal models in Opper, a powerful tool for working with structured data. Specifically, we will use OpenAI's newly released model gpt-4o to generate structured data from images.

Problem statement

One thing that I do quite often is to take pictures of recipes. It can be from books, magazines, or even handwritten notes. However, it can be a bit of a pain to take a picture of a recipe and then have to manually transcribe the ingredients and steps. It would be much easier to have something else that can do it for me.

Let's see if we can use Opper and gpt-4o to turn this picture of a recipe into something structured that we can work with.

The image we will use is the following:

dumplings

A picture of a recipe for dumplings, covering two pages, with ingredients and steps.

First attempt

Let's start by defining a function that takes an image as input and returns the description of the image.

from opperai import fn
from opperai.types import ImageContent

@fn(model="openai/gpt-4o")
def describe(image=ImageContent) -> str:
    """given an image return the description of that image"""

res = describe(ImageContent.from_path("dumplings.png"))

This code will use the gpt-4o model to generate a description of the image and return that as a string.

The image features a recipe for 'Canice's Pork and Chive Dumplings' located on 
pages 60 and 61. It includes a list of ingredients and directions for preparing 
the dish. The ingredients list includes medium ground pork, chives, garlic, 
ginger, light soy sauce, Shaoxing rice cooking wine, Chinese chili oil, sesame 
oil, broth, cornstarch, sugar, white pepper, and salt. The directions are 
detailed in steps from finely mincing the ingredients to folding, freezing, and 
cooking the dumplings. Additionally, there are cooking tips for frying and 
boiling the dumplings, as well as serving suggestions. Page 61 features three 
images showing the process of assembling the dumplings.

This is pretty useful, let's see if we can extract the ingredients and steps.

Extracting ingredients and steps

Let's start with defining the data model for the recipe. We use Pydantic to define the data model. It will have a title, a list of ingredients, and a list of instructions.

from pydantic import BaseModel
from typing import List

class Recipe(BaseModel):
    title: str
    ingredients: List[str]
    instructions: List[str]

Now we can define a function that takes an image as input and returns the structured data. It is important to use the special type ImageContent to tell Opper that the input is an image.

@fn(model="openai/gpt-4o")
def extract(image=ImageContent) -> Recipe:
    """given an image, extract the recipe"""

res = extract(ImageContent.from_path("dumplings.png"))

Which gives us an object of the following structure:

Recipe(
    title='Canice’s Pork and Chive Dumplings',
    ingredients=[
        '1.5 lb medium ground pork (optional: sub in 0.5 lb of finely chopped shrimp)',
        '1 package wrappers (1 lb)',
        '1 bunch flowering chives or Chinese chives',
        '2-4 garlic cloves, to taste',
        '2 tbsp fresh ginger',
        '2 tbsp light soy sauce',
        '2 tbsp Shaoxing rice cooking wine',
        '1-2 tbsp Chinese chili oil',
        '2 tbsp sesame oil',
        'A few tablespoons of broth (optional)',
        '1 tbsp cornstarch',
        '1 tsp sugar',
        '1 tsp white pepper',
        '1 tsp salt'
    ],
    instructions=[
        '1. Finely mince or food process the chives, garlic and ginger.',
        "2. Add to bowl with pork, dump in all the seasoning (and broth if you have it). Stir, in only one direction, until smooth, even a little sticky. 'Beating' in the liquid incorporates it into the meat and makes it springy, instead of shrinking while cooking and leaving you with a saggy, empty bag of skin.",
        '3. Start folding: put about 1 tbsp filling in the centre of the wrapper, dip your finger in a bowl of warm water, wet the entire edge, fold in half and pleat from one edge to the other, pinching shut as you go. Pinch the entire edge again for good measure.',
        '4. If you’re freezing: set on a baking sheet with space around each dumpling. Freeze for an hour, bang the whole sheet on the counter until they come loose and put in a freezer bag. Keeps in freezer for a month or two.',
        'If cooking immediately:',
        "5. For pan-fried potstickers: swirl some oil into a hot pan, set the dumplings in evenly and shake the pan so they don't stick. Fry on medium-high heat till they have brown crispy bottoms. Add in a 1/4 cup water and cover. Steam until water evaporates, remove lid and fry till crispy again, adding a little more oil if needed. Always shake the pan to prevent sticking.",
        '6. For boiled dumplings: bring pot of water to rolling boil. Add dumplings.',
        '7. Stir frequently til it comes back to a boil. Keep cooking for another 3-4 minutes, add some Chinese greens in the last minute if you want some veggies with it. Drain the whole thing.',
        '8. Serve with a dipping sauce made of equal parts Chinkiang black vinegar, light soy sauce and Lao Gan Ma chili oil, and a few drops of sesame oil. Minced garlic and sesame seeds are also good additions.',
        '9. You might end up with leftover filling. If you do, it is excellent stir-fried with Shanghai noodles and sad fridge vegetables.'
    ]
)

This is great but maybe we can do better. Let's see if we can make the ingredients list a bit more detailed. Here we separate the amount, unit, and item of each ingredient. We also add an optional field for notes.

from typing import Optional

class Ingredient(BaseModel):
    item: str
    amount: float
    unit: str
    notes: Optional[str] = None

class Recipe(BaseModel):
    title: str
    ingredients: List[Ingredient]
    instructions: List[str]

Calling the extract function again will now give us a more detailed ingredients list:

Recipe(
    title="Canice's Pork and Chive Dumplings",
    ingredients=[
        Ingredient(item='Medium ground pork', amount=1.5, unit='lb', notes='Optional: sub in 0.5 lb of finely chopped shrimp'),
        Ingredient(item='Wonton wrappers', amount=1.0, unit='package', notes='1 lb'),
        Ingredient(item='Flowering chives or Chinese chives', amount=1.0, unit='bunch', notes=None),
        Ingredient(item='Garlic cloves', amount=2.0, unit='item', notes='To taste'),
        Ingredient(item='Fresh ginger', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='Light soy sauce', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='Shaoxing rice cooking wine', amount=2.0, unit='tbsp', notes='The brown varieties have more flavor, avoid the clear wines'),
        Ingredient(item='Chinese chili oil', amount=1.0, unit='tbsp', notes=None),
        Ingredient(item='Sesame oil', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='Broth', amount=0.0, unit='n/a', notes='A few tablespoons, optional'),
        Ingredient(item='Cornstarch', amount=1.0, unit='tbsp', notes=None),
        Ingredient(item='Sugar', amount=1.0, unit='tsp', notes=None),
        Ingredient(item='White pepper', amount=1.0, unit='tsp', notes=None),
        Ingredient(item='Salt', amount=1.0, unit='tsp', notes=None)
    ],
    instructions=[
        'Finely mince or food process the chives, garlic and ginger.',
        "Add to bowl with pork, dump in all the seasoning (and broth if you have it). Stir, in only one direction, until smooth, even a little sticky. 'Beating' in the liquid incorporates it into the meat and makes it springy, instead of shrinking while cooking, and leaving you with a saggy, empty bag of skin.",
        'Start folding: put about 1 tbsp filling in the center of the wrapper, dip your finger in a bowl of warm water, wet the entire edge, fold in half and pleat from one edge to the other, pinching shut as you go. Pinch the entire edge again for good measure.',
        "If you're freezing: set on a baking sheet with space around each dumpling. Freeze for an hour, bang the whole sheet on the counter until they come loose and put in a freezer bag. Keeps in freezer for a month or two.",
        'For cooking immediately:',
        "For pan-fried potstickers: swirl some oil into a hot pan, set the dumplings in evenly and shake the pan so they don't stick. Fry on medium-high heat till they have brown crispy bottoms. Add in a 1/4 cup water and cover. Steam until water evaporates, remove lid and fry till crispy again, adding a little more oil if needed. Always shake the pan to prevent sticking.",
        'For boiled dumplings: bring a pot of water to a rolling boil. Add dumplings.',
        'Stir frequently till it comes back to a boil. Keep cooking for another 3-4 minutes, add some Chinese greens in the last minute if you want some veggies with it. Drain the whole thing.',
        'Serve with a dipping sauce made of equal parts Chinkiang black vinegar, light soy sauce and Lao Gan Ma chili oil, and a few drops of sesame oil. Minced garlic and sesame seeds are also good additions.',
        'You might end up with leftover filling. If you do, it is excellent stir-fried with Shanghai noodles and sad fridge vegetables.'
    ]
)

Translate units

Maybe we also want to translate the units to the metric system. For this we will define a function that takes the structured data as input and returns the same data but with the units translated.

@fn(model="openai/gpt3.5-turbo")
def translate_to_metric(recipe: Recipe) -> Recipe:
    """Given a recipe, translate the weight units to metric system, keep tbsp and tsp as is"""


translated = translate_to_metric(recipe)

This will give us the following:

Recipe(
    title="Canice's Pork and Chive Dumplings",
    ingredients=[
        Ingredient(item='medium ground pork', amount=0.68, unit='kg', notes='optional: sub with 0.23 kg of finely chopped shrimp'),
        Ingredient(item='wonton wrappers', amount=0.45, unit='kg', notes=None),
        Ingredient(item='flowering chives or Chinese chives', amount=1.0, unit='bunch', notes=None),
        Ingredient(item='garlic cloves', amount=2.0, unit='pieces', notes='to taste'),
        Ingredient(item='fresh ginger', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='light soy sauce', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='Shaoxing rice cooking wine', amount=2.0, unit='tbsp', notes='the brown varieties have more flavor, avoid the clear wines'),
        Ingredient(item='Chinese chili oil', amount=1.0, unit='tbsp', notes=None),
        Ingredient(item='sesame oil', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='broth', amount=0.0, unit='n/a', notes='a few tablespoons, optional'),
        Ingredient(item='cornstarch', amount=1.0, unit='tbsp', notes=None),
        Ingredient(item='sugar', amount=1.0, unit='tsp', notes=None),
        Ingredient(item='white pepper', amount=1.0, unit='tsp', notes=None),
        Ingredient(item='salt', amount=1.0, unit='tsp', notes=None)
    ],
    instructions=[
        'finely mince or food process the chives, garlic and ginger.',
        "add to bowl with pork, dump in all the seasonings (and broth if you have it). stir, in only one direction, until smooth, even a little sticky. 'beating in' the liquid incorporates it into the meat and makes it springy, instead of shrinking while cooking and leaving you with a saggy, empty bag of skin.",
        'start folding: put about 1 tbsp filling in the center of the wrapper, dip your finger in a bowl of warm water, wet the entire edge, fold in half and pleat from one edge to the other, pinching shut as you go. pinch the entire edge again for good measure.',
        "if you're freezing: set on a baking sheet with space around each dumpling. freeze for an hour, bang the whole sheet on the counter until they come loose and put in a freezer bag. keeps in the freezer for a month or two.",
        'if cooking immediately:',
        "for pan-fried potstickers: swirl some oil into a hot pan, set the dumplings in evenly and shake the pan so they don't stick. fry on medium-high heat til they have brown crispy bottoms. add in a 1/4 cup water and cover. steam until water evaporates, remove lid and fry til crispy again, adding a little more oil if needed. always shake the pan to prevent sticking.",
        'for boiled dumplings: bring pot of water to rolling boil. add dumplings.',
        'stir frequently til it comes back to a boil. keep cooking for another 3-4 minutes, add some Chinese greens in the last minute if you want some veggies with it. drain the whole thing.',
        'serve with a dipping sauce made of equal parts Chinkiang black vinegar, light soy sauce and Lao Gan Ma chili oil, and a few drops of sesame oil. minced garlic and sesame seeds are also good additions.',
        'you might end up with leftover filling. if you do, it is excellent stir-fried with Shanghai noodles and sad fridge vegetables.'
    ]
)

Scaling the recipe

Finally, let's define a function that takes the structured data as input and returns the same data but scaled to the desired number of servings.

from pydantic import Field

class RecipeWithNotes(Recipe):
    notes: Optional[str] = Field(None, description="Additional notes for the recipe for example if not all ingredients scale well")

@fn()
def scale(recipe: Recipe, people: int) -> RecipeWithNotes:
    """Given a recipe, scale the ingredients to the number of people"""

scaled = scale(translated, 10)
RecipeWithNotes(
    title="Canice's Pork and Chive Dumplings",
    ingredients=[
        Ingredient(item='medium ground pork', amount=1.36, unit='kg', notes='optional: sub with 0.45 kg of finely chopped shrimp'),
        Ingredient(item='wonton wrappers', amount=0.9, unit='kg', notes=None),
        Ingredient(item='flowering chives or Chinese chives', amount=2.0, unit='bunch', notes=None),
        Ingredient(item='garlic cloves', amount=4.0, unit='pieces', notes='to taste'),
        Ingredient(item='fresh ginger', amount=4.0, unit='tbsp', notes=None),
        Ingredient(item='light soy sauce', amount=4.0, unit='tbsp', notes=None),
        Ingredient(item='Shaoxing rice cooking wine', amount=4.0, unit='tbsp', notes='the brown varieties have more flavor, avoid the clear wines'),
        Ingredient(item='Chinese chili oil', amount=4.0, unit='tbsp', notes=None),
        Ingredient(item='sesame oil', amount=4.0, unit='tbsp', notes=None),
        Ingredient(item='broth', amount=2.0, unit='tbsp', notes='a few tablespoons, optional'),
        Ingredient(item='cornstarch', amount=2.0, unit='tbsp', notes=None),
        Ingredient(item='sugar', amount=2.0, unit='tsp', notes=None),
        Ingredient(item='white pepper', amount=2.0, unit='tsp', notes=None),
        Ingredient(item='salt', amount=2.0, unit='tsp', notes=None)
    ],
    instructions=[
        'finely mince or food process the chives, garlic and ginger.',
        "add to bowl with pork, dump in all the seasonings (and broth if you have it). stir, in only one direction, until smooth, even a little sticky. 'beating in' the liquid incorporates it into the meat and makes it springy, instead of shrinking while cooking and leaving you with a saggy, empty bag of skin.",
        'start folding: put about 1 tbsp filling in the center of the wrapper, dip your finger in a bowl of warm water, wet the entire edge, fold in half and pleat from one edge to the other, pinching shut as you go. pinch the entire edge again for good measure.',
        "if you're freezing: set on a baking sheet with space around each dumpling. freeze for an hour, bang the whole sheet on the counter until they come loose and put in a freezer bag. keeps in the freezer for a month or two.",
        'if cooking immediately:',
        "for pan-fried potstickers: swirl some oil into a hot pan, set the dumplings in evenly and shake the pan so they don't stick. fry on medium-high heat til they have brown crispy bottoms. add in a 1/4 cup water and cover. steam until water evaporates, remove lid and fry til crispy again, adding a little more oil if needed. always shake the pan to prevent sticking.",
        'for boiled dumplings: bring pot of water to rolling boil. add dumplings.',
        'stir frequently til it comes back to a boil. keep cooking for another 3-4 minutes, add some Chinese greens in the last minute if you want some veggies with it. drain the whole thing.',
        'serve with a dipping sauce made of equal parts Chinkiang black vinegar, light soy sauce and Lao Gan Ma chili oil, and a few drops of sesame oil. minced garlic and sesame seeds are also good additions.',
        'you might end up with leftover filling. if you do, it is excellent stir-fried with Shanghai noodles and sad fridge vegetables.'
    ],
    notes='scaled up the ingredients to serve 10 people instead of the original 5.'
)

Conclusion

In this blog post, we have explored how to use multimodal models in Opper to generate structured data from images. We have used OpenAI's newly released model gpt-4o to generate a description of an image of a recipe for dumplings. We have then extracted the ingredients and steps from the description and translated the units to the metric system. Finally, we have scaled the recipe to the desired number of servings. All of this was done with just a few lines of code, demonstrating the power and flexibility of Opper.