Moonsift MSc Project: Multimodal Machine Learning | Skills and Talent

Moonsift is building an AI-powered product research tool (co-pilot) for taste driven purchases.

What was the project?

The main goal was to determine what is currently the state of the art in Composed Image Retrieval (CIR) with a particular focus on the task of the retrieval of fashion images. CIR is the task of finding the most relevant images in a database based on a query image with textual modifications. In modern systems this usually means using Machine Learning (ML) models to store images as vector representations. Images are then retrieved by converting image and textual modifications to query vectors to identify mathematically similar vector representations.

The project involved the student undertaking a thorough literature review of recent advances in this field and implementing and assessing 4 techniques that had been applied in two leading papers but not all together before. He evaluated these techniques on fashion datasets as part of an ablation study and filled in vital gaps in the previous papers to determine the best techniques to apply to this problem and investigated potential causes of differences in performance using mathematical techniques (GradCam + t-SNE plots).

What was the business need?

Moonsift is building an AI-powered shopping copilot to assist its 10,000’s of online shoppers with a focus on their taste-driven purchases in Fashion and Homeware. This involves a chat-enhanced search where users can query Moonsift’s cross-retailer dataset of 10s of millions of products using vector-powered search. We are now able to apply the results from the student’s research to improve our product similarity search to enable users to apply modifications as they search, for example “I’d like a shirt similar to this image but with shorter sleeves”

What did the student bring to the team?

The student brought his prior skills in Computer Vision and ML research skills developed during his Masters course as well as an interest in fashion. He thoroughly read the literature and lead the drafting of the specific research question, which we consolidated during weekly meetings. He was also able to independently take on the tasks of training the ML models and running experiments with limited technical support required. We supported him with compute time and discussing his results and evaluations of the different techniques.

This article was published on 2024-12-18