How Mixbook used generative AI to offer personalized photo book experiences

July 16, 2024

This post was co-written with Vlad Lebedev and DJ Charles from Mixbook.

Mixbook is an award-winning design platform that gives users unprecedented creative freedom to create and share unique stories, changing the lives of more than six million people. Today, Mixbook is the highest-rated photo book service in the U.S., with 26,000 five-star reviews.

Mixbook enables users to share their stories with creativity and confidence. Their mission is to help users celebrate the beautiful moments of their lives. Mixbook aims to foster deep connections between users and their loved ones by sharing their stories in both physical and digital media.

Years ago, Mixbook launched a strategic initiative to move its operational workloads to Amazon Web Services (AWS), a move that has consistently delivered significant benefits. This fundamental decision has been critical to achieving their mission and ensuring that their system operations are characterized by reliability, superior performance, and operational efficiency.

In this post, we’ll show you how Mixbook used generative artificial intelligence (AI) in AWS to personalize their photo book experiences – a step toward their mission.

Business challenge

In today’s digital world, we have a lot of pictures that we take and share with our friends and family. Let’s imagine a scenario where we have hundreds of photos from a recent family vacation and we want to create a coffee table photo book to make it unforgettable. However, selecting the best pictures from the crowd and describing them with captions can take a lot of time and effort. As we all know, a picture is worth a thousand words, which is why trying to sum up a moment with a caption of just six to ten words can be so difficult. Mixbook really understands the problem and is here to fix it.

Solution

Mixbook Smart Captions is the magical solution to the caption dilemma. It not only interprets user photos but also adds a dash of creativity that brings the stories to life.

Most importantly, Smart Captions doesn’t completely automate the creative process. Instead, it provides a creative partner that allows users to tell their own story, adding personal flourishes to a book. Whether it’s a selfie or a landscape shot, the goal is to ensure users’ photos effortlessly speak volumes.

Architecture overview

The implementation of the system includes three main components:

Data acquisition
Information derivation
Creative synthesis

Subtitle generation is highly dependent on the inference process, as the quality and meaningfulness of the results of the comprehension process directly affect the specificity and personalization of subtitle generation. Below you can see the data flow diagram of the subtitle generation process, which is described in the text below.

Data acquisition

A user uploads photos to Mixbook. The raw photos are stored in Amazon Simple Storage Service (Amazon S3).

The data ingestion process involves three macro components: Amazon Aurora MySQL-Compatible Edition, Amazon S3, and AWS Fargate for Amazon ECS. Aurora MySQL serves as the primary relational data storage solution for tracking and recording media file upload sessions and their associated metadata. It offers flexible capacity options ranging from serverless on one side to reserved provisioned instances for predictable long-term use on the other. S3, in turn, provides efficient, scalable, and secure storage for the media file objects themselves. Its storage classes enable recent uploads to be maintained in a warm state for low-latency access, while older objects can be transferred to Amazon S3 Glacier tiers, minimizing storage costs over time. Amazon Elastic Container Registry (Amazon ECS), coupled with AWS Fargate’s low-maintenance compute environment, forms a convenient orchestrator for containerized workloads that brings all components together seamlessly.

Inference

The understanding phase extracts essential contextual and semantic elements from the input, including image descriptions, temporal and spatial data, face detection, emotional sentiment, and captions. Among these, the image descriptions generated by a computer vision model provide the most basic understanding of the captured moments. Amazon Rekognition delivers accurate detection of the bounding boxes and emotional expressions of faces. Face detection is crucial for optimal automatic photo placement and cropping, while emotion detection enables more effective adjustments to the tone of the story. The detected face bounding boxes on the photos are mainly used for optimal automatic photo placement and cropping. The emotions are used to select a better tone, for example, to make the image more funny or nostalgic. In addition, Amazon Rekognition increases safety by identifying potentially offensive content.

The inference pipeline is based on an AWS Lambda-based multi-stage architecture that maximizes cost efficiency and elasticity by executing independent image analysis steps in parallel. AWS Step Functions enables the synchronization and ordering of interdependent steps.

The captions are generated by an Amazon SageMaker inference endpoint augmented by an Amazon ElastiCache for Redis-based buffer. The buffer was implemented after benchmarking the performance of the caption model. The benchmarking showed that the model performed optimally when processing batches of images, but performed below average when analyzing individual images.

generation

The caption generation mechanism behind the Writing Assistant feature makes Mixbook Studio a tool for creating natural language stories. The assistant was based on a Llama language model and initially used carefully developed prompts created by AI experts. However, the Mixbook Storyarts team sought more granular control over the style and tone of the captions, resulting in a diverse team that included an Emmy-nominated screenwriter to review, adjust, and add unique, hand-crafted examples. This led to a process of fine-tuning the model, moderating changed responses, and deploying approved models for experimental and public releases. After inference, three captions are created and stored in Amazon Relational Database Service (Amazon RDS).

The following image shows the Mixbook Smart Captions feature in Mixbook Studio.

Advantages

Mixbook implemented this solution to provide new functionality to its customers, providing an improved user experience while maintaining operational efficiency.

User experience

Improved storytelling: Captures users’ emotions and experiences, now expressed through beautiful, heartfelt captions.
User joy: Adds an element of surprise with subtitles that are not only accurate, but also fun and imaginative. Enthusiastic user Hanie U says, “I hope to see more subtitle experiences released in the future.” Another user, Megan P., says, “It worked great!” Users can also edit the generated subtitles.
Time efficiency: Nobody has time to mess around with captions. This feature saves valuable time and lets user stories shine.
Security and correctness: The subtitles have been created responsibly, using guardrails to ensure moderation and relevance of the content.

system

Elasticity and scalability of Lambda
Understandable workflow orchestration with Step Functions
Various basic models of SageMaker and tuning options for maximum control

Due to its improved usability, Mixbook has been named an official winner of the 2024 Webby Awards for Apps and software – making the most of AI and machine learning.

“AWS enables us to scale the innovations our customers love most. And now, with AWS’s new generative AI capabilities, we can blow our customers away with a creative power they never thought possible. Innovations like these are why we’ve been working with AWS since beta in 2006.”

– Andrew Laffoon, CEO, Mixbook

Diploma

Mixbook began experimenting with AWS generative AI solutions in early 2023 to augment their existing application. They started with a quick proof of concept to produce results that demonstrate the art of the possible. Continuous development, testing, and integration using AWS services in compute, storage, analytics, and machine learning allowed them to iterate quickly. After releasing the Smart Caption features in beta, they were able to quickly adapt to real-world usage patterns and protect the value of the product.

Try Mixbook Studio to experience storytelling. To learn more about AWS solutions for generative AI, start with Transform your business with generative AI. To hear more from Mixbook leaders, listen to the AWS re:Think Podcast available on Art19, Apple Podcasts and Spotify.

About the authors

Vlad Lebedev is a Senior Technology Leader at Mixbook. He leads a product development team responsible for transforming Mixbook into a place for soulful storytelling. He draws on over a decade of hands-on experience in web development, systems design, and data engineering to find elegant solutions to complex problems. Vlad enjoys learning about contemporary and ancient cultures, their history, and languages.

DJ Charles is CTO at Mixbook. He has a 30-year career architecting interactive and e-commerce designs for top brands. Innovating broadband technology for the cable industry in the 90s, revolutionizing supply chain processes in the 2000s, and advancing environmental technology at Perillon led to global real-time bidding platforms for brands like Sotheby’s and eBay. Aside from tech, DJ loves learning new musical instruments, mastering the art of songwriting, and delving deep into music production and engineering in his free time.

Malini Chatterjee is a Senior Solutions Architect at AWS. She advises AWS customers on their workloads using a variety of AWS technologies. She brings extensive expertise in data analytics and machine learning. Prior to joining AWS, she worked as a data solutions architect in the financial industry. She is passionate about semi-classical dancing and performing at community events. She loves traveling and spending time with her family.

Jessica Oliveira is an Account Manager at AWS, providing consulting and support for commercial sales in Northern California. She is passionate about building strategic partnerships to ensure her customers’ success. Outside of work, she enjoys traveling, learning different languages and cultures, and spending time with her family.

Rihondo

How Mixbook used generative AI to offer personalized photo book experiences

Business challenge