MongoDB adds vector search to Atlas database to help build AI apps
After trying to broaden its user base to include traditional database professionals last year, MongoDB is switching gears, adding features to turn its NoSQL Atlas database-as-a-service (DBaaS) into a more complete data platform for developers, including capabilities that support building generative AI applications.
In addition to introducing vector search for Atlas and integrating Google Cloud’s Vertex AI foundation models, the company announced a variety of new capabilities for the DBaaS at its MongoDB.local conference in New York Thursday, including new Atlas Search, data streaming, and querying capabilities.
“Everything that MongoDB has announced can be seen as a move to make Atlas a more comprehensive and complete data platform for developers,” said Doug Henschen, principal analyst at Constellation Research. “The more that MongoDB can provide to enable developers with all the tools that they need, the stickier the platform becomes for those developers and the enterprises they work for.”
Henschen’s perspective seem reasonable, given that the company has been competing with cloud data platform suppliers such as Snowflake, which offers a Native Application Framework, and Databricks, which recently launched Lakehouse Apps.
Vector search helps build generative AI apps
In an effort to help enterprise build applications based on generative AI from data stored in MongoDB, the company has introduced a vector search capability inside Atlas, dubbed Atlas Vector Search.
This new search capability, according to the company, will help support a new range of workloads, including semantic search with text, image search, and highly personalized product recommendations.
The search runs on vectors — multidimensional mathematical representations of features or attributes of raw data that could include text, images, audio or video, said Matt Aslett, research director at Ventana Research.
“Vector search utilizes vectors to perform similarity searches by enabling rapid identification and retrieval of similar or related data,” Aslett said, adding that vector search can also be used to complement large language models (LLMs) to reduce concerns about accuracy and trust through the incorporation of approved enterprise content and data.
MongoDB Atlas’ Vector Search will also allow enterprises to augment the capabilities of pretrained models such as GPT-4 with their own data via the use of open source frameworks such as LangChain and LlamaIndex, the company said.
These frameworks can be used to access LLMs from MongoDB partners and model providers, such as AWS, Databricks, Google Cloud, Microsoft Azure, MindsDB, Anthropic, Hugging Face and OpenAI, to generate vector embeddings and build AI-powered applications on Atlas, it added.
MongoDB partners with Google Cloud
MongoDB’s partnership with Google Cloud to integrate Vertex AI capabilities is meant to accelerate the development of generative AI-based applications. Vertex AI, according to the company, will provide the text embedding API required to generate embeddings from enterprise data stored in MongoDB Atlas.
These embeddings can be later combined with the PaLM text models to create advanced functionality like semantic search, classification, outlier detection, AI-powered chatbots, and text summarization.
The partnership will also allow enterprises to get hands-on assistance from MongoDB and Google Cloud service teams on data schema and indexing design, query structuring, and fine-tuning AI models.
Databases from Dremio, DataStax and Kinetica are also adding generative AI capabilities.
MongoDB’s move to add vector search to Atlas is not unique but it will enhance the company’s competitiveness, Aslett said. “There is a growing list of specialist vector database providers, while multiple vendors of existing databases are working to add support to bring vector search to data already stored in their data platforms,” Aslett said.
Managing real-time streaming data in a single interface
In order to help enterprises manage real-time streaming data from multiple sources in a single interface, MongoDB has added a stream processing interface to Atlas.
Dubbed Atlas Stream Processing, the new interface, which can process any kind of data and has a flexible data model, will allow enterprises to analyze data in real-time and adjust application behavior to suit end customer needs, the company said.
Atlas Stream Processing bypasses the need for developers to use multiple specialized programming languages, libraries, application programming interfaces (APIs), and drivers, while avoiding the complexity of using these multiple tools, MongoDB claimed.
The new interface, according to Aslett, helps developers to work with both streaming and historical data using the document model.
“Processing data as it is ingested enables data to be queried continuously as new data is added, providing a constantly updated, real-time view that is triggered by the ingestion of new data,” Aslett said.
A report from Ventana Research claims that more than seven in 10 enterprises’ standard information architectures will include streaming data and event processing by 2025, so that they can provide better customer experiences.
Atlas Stream Processing, according to SanjMo’s principal analyst Sanjeev Mohan, can also be used by developers to perform functions like aggregations, as well as filter and do anomaly detection on data that is in Kafka topics, Amazon Kinesis or even MongoDB change data capture.
The flexible data model inside Atlas Stream Processing can also be modified over time to suit needs, the company said.
The addition of the new interface to Atlas can be seen as a move to play catchup with rival data cloud providers such as Snowflake and Databricks, which have already introduced features for processing real-time data, noted Constellation’s Henschen.
New Atlas search features
In order to help enterprises to maintain database and search performance on Atlas, the company has introduced a new feature, dubbed Atlas Search Nodes, that isolates search workloads from database workloads.
Targeted at enterprises that have already scaled their search workloads on MongoDB, Atlas Search Nodes provides dedicated resources and optimizes resource utilization to support performance of these specific workloads, including vector search, the company said.
“Enterprises may find that dedicating nodes in a cluster, specifically to search, can support operational efficiency by avoiding performance degradation on other workloads,” Aslett said, adding that this is a capability that was being adopted by multiple providers of distributed databases.
MongoDB’s updates to Atlas also include a new time-series data editing feature that the company claims is usually not allowed in most time-series databases.
The company’s Time Series Collections features will now allow enterprises to modify time-series data resulting in better storage efficiency, accurate results, and better query performance, the company said.
The feature to modify time-series data will help most enterprises, according to Mohan.
Other updates to MongoDB Atlas include the ability to tier and query databases on Microsoft Azure using the Atlas Online Archive and Atlas Data Federation features, the company said, adding that Atlas already supported tiering and querying on AWS.
MongoDB Atlas for financial services and other industries
As part of the updates announced at its MongoDB.local conference, the company said that it will be launching a new industry-specific Atlas database program for financial services, followed by other industry sectors such as retail, healthcare, insurance, manufacturing and automotive.
These industry-specific programs will see the company offer expert-led architectural design reviews, technology partnerships via workshops and other instruments for enterprises to build vertical-specific solutions. The company will also offer tailored MongoDB University courses and learning materials to enable developers for their enterprise projects.
While the company did not immediately provide information on the availability and pricing of the new features, it said that it was making its Relational Migrator tool generally available.
The tool is designed to help enterprises move their legacy databases to modern document-based databases.