Blueprint for a Production-Ready Information Retrieval System based on Multi-Modal Embeddings

Ebert, AndréApel, AnikaChodyko, PiotrHiroyasu, KyleIsmali, FestinaKoo, HyeinKronburger, JuliaPesch, Robert2021-12-142021-12-142021978-3-88579-708-1https://dl.gi.de/handle/20.500.12116/37765Deep Learning models for mapping documents from different domains, e.g., text, images, and audio, into a common vector space, enable a seamless information retrieval between the different domains and, thus, significantly improve the user experience of many expert tools. Despite various models for multi-modal mappings presented in scientific literature, the implementation and integration remain a challenge within the industry, especially for small or medium-sized companies. Reasons are, that developing such retrieval systems for production use-cases is a non-trivial task, requiring scalable, reliable, and cost-efficient infrastructure, services as well as adequate Deep Learning models. We present a generic and flexible blueprint architecture, targeting the development of a production-ready image-text retrieval search system using Kubernetes, MLflow, Elasticsearch, and integrating state-of-the-art Deep Learning models.enInformation RetrievalImage-to-TextMulti-Modal EmbeddingsDeep LearningArtificial IntelligenceData-Science to ProductionBlueprint for a Production-Ready Information Retrieval System based on Multi-Modal Embeddings10.18420/informatik2021-0951617-5468