Dynamic Partitioning and a Simple Incremental Load


 

Let’s consider a simple statement for partitioning and save a table in a lakehouse:

df.write.mode(« overwrite »).format(« delta »).partitionBy(« Year », »Month », »Day »).save(« Tables/ » + table_name)

Let’s consider we load the data daily, with all the transactions from the day. The table will save the transactions for each day in different partitions. We can expect the table to keep the partitions from previous day, months and years to be kept, achieving a kind of incremental load, right?

Wrong.

The files are neither, overwritten or deleted, but they are removed from the delta log. The records are not directly acessible in the