Relational Databases are known for their atomic, consistent, independent and durable properties (ACID). The first version of Apache Spark was released in 2014 However, the tables in the hive catalog did not have any ACID properties at that time. That was because the hive catalog stored the schema for reading the source files at run time. Databricks released the Delta file format in 2019 which changed how big data engineers inserted, updated, and deleted data within a data lake.
The image below was taken from Mathew Powers’ article on delta lake time travel. It shows how Spark SQL statements (insert,