data

Databricks launches data lakehouse for retail and consumer goods customers

Estimated reading time: 5 minutes

Andrew Martin, Head of Databricks South Asia

EITN: What is a data lakehouse? Why is there a need for data lakehouse as opposed to just a data warehouse and a separate data lake?

Andrew: A data lakehouse is a new, open data management architecture – pioneered by Databricks – that combines the flexibility, cost-efficiency and scale of data lakes with the granular data management capabilities of data warehouses. By integrating a structured transactional layer to a data lake, to provide data warehouse-like performance, reliability, quality, and scale, the lakehouse enables efficient and secure Artificial Intelligence (AI) and Business Intelligence capabilities directly on vast amounts of enterprise data stored. 

Despite the flexibility and popularity of data lakes, the format struggles at providing adequate data reliability, governance and consistent quality for detailed Business Intelligence and analytics.

Hence historically, enterprises have been forced into data silos by running both legacy data warehouses and data lakes concurrently while using each framework separately for Business Intelligence and AI use cases. This approach has proven overly complex, and results in information inequality, high costs, and slowed operations. 

The lakehouse is an ideal data architecture for data-driven organizations, building on the best qualities of legacy data warehouses and data lakes to provide a single unified data platform that supports all major use cases from streaming analytics to Business Intelligence, data science, and AI.

EITN: What are the benefits of data lakehouse technology?

Andrew: As companies migrated to the cloud, on-premise data architecture was generally replicated, with  proprietary data warehouses configured for business intelligence workloads and data lakes used for data science and machine learning workloads. The approach was expensive and complex, giving rise to data silos and slowing down project delivery. The complexities were further replicated increasingly across multiple clouds when companies diversified their cloud service providers. However, such slow, complex and insular frameworks are hampering enterprises from leveraging data to drive ongoing business adaptations in response to market and customer volatility. 

The lakehouse is an ideal data architecture for data-driven organizations, building on the best qualities of legacy data warehouses and data lakes to provide a single unified data platform that supports all major use cases from streaming analytics to Business Intelligence, data science, and AI.

The lakehouse architecture addresses these challenges, by unifying all data on a simplified platform to reduce data movement between systems, multiple copies of data, and errors in security and governance. It is also based on open standards and open formats, reducing lock-in on every level and maximizing an organization’s ability to future-proof decision-making – enabling data to be used as fuel for real-time competitive advantages.

EITN:  What are the components required to make data lakehouse work?

Andrew: Lakehouses have overcome the fundamental issues that have turned data lakes into undesirable data swamps, by adding key data warehousing capabilities such as robust support for transactions, enforcement of data quality and governance and rapid, optimized analytics. Generally, a lakehouse will feature five key layers:

  1. The foundation starts with the data lake that manages all organizational data – unstructured, structured and semi-structured
  2. A metadata layer added for granular data quality and governance
  3. Optimization for rapid, high-performance analytics
  4. Optimized access for machine learning and data science tools
  5. Hosted on open formats and open standards, that prevents data lock-ins

EITN: What are the advantages of having industry-specific lakehouse solutions?

Andrew: The Lakehouse for Retail is Databricks’ first industry-specific Lakehouse, specifically for retail and consumer goods customers. It is addressing challenges that retail has long tried to crack – but struggled due to limits in the capability of technology.

Speed is the antidote to business volatility, and the lakehouse architecture, catered to specific industry nuances, is designed to give retailers the flexibility to adopt the capabilities they need to address their most pressing business needs, from driving real-time decisions to powering better experiences with shoppers to improving collaboration across the value chain and more.

EITN: What unique characteristics has Databricks observed about the retailers and consumer goods industry that its industry-specific solution can address?

Andrew: The retail and consumer goods industry is unique in that it combines the demands of high volume, highly detailed and frequently changing information that must be available in near real-time, with the need to incorporate alternative data sets such as image, video and audio from digital and mobile channels, and the need to collaborate with a broad ecosystem of suppliers, customers and partners.

At the core of this industry-specific solution is a new, inexpensive and open method of data sharing and collaboration that powers real-time decisions with data, improves the accuracy of decisions, has native support for all types of data and opens interaction and innovation to all partners in the value chain.

EITN: What are solution accelerators? Are they offered as modules that can be added?

Andrew: Solution Accelerators are fully functional, proven capabilities from Databricks that help companies quickly prove the feasibility of solving a problem with data and AI. Companies can use Solution Accelerators to quickly complete a pilot on a business problem, and then use that as a foundation to complete a functional solution that can be tested in the market. Solution Accelerators are at the core of critical use cases in retail, ranging from Demand Forecasting to Personalized Recommendations to On-shelf Availability. These use cases can help customers save anywhere from 25-50 percent on development efforts.

At the core of this industry-specific solution is a new, inexpensive and open method of data sharing and collaboration that powers real-time decisions with data, improves the accuracy of decisions, has native support for all types of data and opens interaction and innovation to all partners in the value chain.

To help companies quickly realize value from their investment in data and AI, Databricks has invested in the creation of more than 20 Retail Solution Accelerators, that are made freely available to customers.

EITN: How does Databricks go to market with partners like Deloitte and Tredence?

Andrew: Going to market with key industry partners like Deloitte and Tredence, first requires a commitment from Databricks to educate thousands of partners’ employees on the Lakehouse platform. We are also increasing investment in partners, to help bring native Lakehouse solutions to their customers. Our partners, in turn, have developed pre-built solutions that provide retailers with a faster and proven path to value.