Microsoft is throwing more transactional database systems into its Fabric analytics and data lake environment in expectation the proximity will help users that are adding AI to their systems.
During its Build conference this week, the Redmond software and cloud biz said it was adding transactional document database Cosmos DB and its relational workhorse SQL Server to Fabric, the analytics and data lake platform first announced in June 2023.
Adding Cosmos DB’s global secondary index to Fabric, for example, would remove the need to scan all the operational data in an Azure Cosmos DB database, Microsoft said. This is intended to enable faster queries and minimize latency while also helping to make sure that queries do not negatively impact transactional performance, the vendor argued.
Arun Ulag, corporate vice president for Azure data, told The Register the idea is to let customers bring AI and analytics workloads closer to their transactional data as the two would share the same underlying file format, Apache Parquet, in the data lake environment, which Microsoft calls OneLake and uses the open source Delta Lake format.
“Cosmos DB is a great place to store your entire product catalog, for example, and with customers browsing around your website, you want to make recommendations. Anything built on Fabric, by default, all of the data, whether it’s SQL Server or Cosmos DB or a data warehouse or data lake, is sitting on OneLake,” he said.
“Everything is in the open source, Apache Parquet, Delta Lake format, which means if you’re building a machine learning model, the data is just there and always current. You don’t need to build copies. You don’t need to shuttle data around. You can build your machine learning models directly on top of OneLake.”
Aaron Rosenbaum, Gartner senior director, data management and analytics, said the move was part of a continuing trend of making integration simple and automated between different parts of the data management infrastructure.