Skip to main content

Lakehouse Concept | Data Warehouse | Data Lake

Lakehouse concept:

In the context of data management and analytics, a "lakehouse" refers to a modern data architecture that combines the capabilities of data lakes and data warehouses. It aims to address the limitations and challenges associated with these traditional data storage and processing approaches.

A data lakehouse provides a unified and scalable platform for storing, managing, and analyzing large volumes of structured and unstructured data. It incorporates the following key features:

Data Storage: Similar to a data lake, a lakehouse enables the storage of raw and unprocessed data in its native format. This includes structured data (e.g., relational databases, CSV files) and unstructured data (e.g., logs, sensor data, images). By using a common storage layer, such as a distributed file system, data can be ingested from various sources without requiring upfront schema design or transformation.


Ref.: https://www.databricks.com/wp-content/uploads/2020/01/data-lakehouse-new-1024x538.png

ACID Transactions: Unlike traditional data lakes, a lakehouse provides ACID (Atomicity, Consistency, Isolation, Durability) transactional guarantees. This means that data can be updated, deleted, and queried reliably, ensuring consistency and data integrity. ACID compliance enables the execution of complex analytics workflows and supports real-time and batch processing.

Schema Enforcement: A lakehouse allows for schema enforcement and schema evolution. It enables the definition of a schema upon data ingestion, ensuring that data adheres to a specific structure or schema. This feature makes it easier to maintain data quality, enforce governance policies, and enable self-service analytics.

Data Processing: A lakehouse incorporates data processing capabilities, typically using distributed processing frameworks like Apache Spark. This enables data transformation, cleansing, aggregation, and other data preparation tasks. The processing capabilities are integrated within the same platform, eliminating the need to move data between different systems.

Querying and Analytics: A lakehouse provides SQL-based querying capabilities, allowing users to perform ad-hoc and complex analytics directly on the stored data. It supports both batch processing and real-time streaming analytics, enabling organizations to derive insights and make data-driven decisions in a timely manner.

By combining the strengths of data lakes and data warehouses, a lakehouse architecture aims to provide a more flexible, scalable, and efficient approach to managing and analyzing data. It enables organizations to store and process data in a cost-effective manner while supporting a wide range of data analytics use cases.

Comments

Popular posts from this blog

MySQL InnoDB cluster troubleshooting | commands

Cluster Validation: select * from performance_schema.replication_group_members; All members should be online. select instance_name, mysql_server_uuid, addresses from  mysql_innodb_cluster_metadata.instances; All instances should return same value for mysql_server_uuid SELECT @@GTID_EXECUTED; All nodes should return same value Frequently use commands: mysql> SET SQL_LOG_BIN = 0;  mysql> stop group_replication; mysql> set global super_read_only=0; mysql> drop database mysql_innodb_cluster_metadata; mysql> RESET MASTER; mysql> RESET SLAVE ALL; JS > var cluster = dba.getCluster() JS > var cluster = dba.getCluster("<Cluster_name>") JS > var cluster = dba.createCluster('name') JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true}) JS > cluster.addInstance('root@<IP add>,:<port>') JS > cluster.addInstance('root@ <IP add>,:<port> ') JS > dba.getC...

MySQL 5.7 Install | Configure MySQL | Configure MySQL Replication | Configure systemd for single instance

Install MySQL 5.7 Community Edition on Linux: #yum install mysql80-community-release-el7-1.noarch.rpm #yum install mysql-community-server #yum install perl-DBD-MySQL-4.023-6.el7.x86_64.rpm #yum install percona-release-0.1-4.noarch.rpm Increase no. of open files: Edit file /etc/security/limits.conf and includes as follows, which will increase no of open files for mysql user to 65535 from 1024 which is default. excute ulimit -a after sudo to mysql, if you are logged in exit and login again then and then only you will be able to see it. mysql              soft     nofile           65535 mysql             hard     nofile           65535 Ref.: https://dev.mysql.com/doc/refman/8.0/en/linux-installation-yum-repo.html https://jinyuwang.weebly.co...

Create MySQL database with hyphen

Create MySQL database with hyphen: If you are trying to create MySQL database with hyphen " - " in the name such as test-db and get error  " your MySQL server version for the right syntax to use near '-db' at line" then you might be wondering how to get it done as your business require MySQL database name with hyphen " - "  Here is the fix, use escape character " ` " before and after database name such as `test-db` and you will be able to create database with hyphen. CREATE DATABASE `test-db`;