Unzipping Word Documents in T-SQL


In the databases I am currently working with, there are some tables containing Word documents stored as binaries and I was wondering if it would be possible to perform some server-side processing on these documents.

Modern Word documents (.docx) are nothing more than zip files and SQL Server 2016 introduced the COMPRESS and DECOMPRESS functions, which understand the GZIP format. It turns out that the various XML files that make a Word document are compressed as GZIP files (while other files, like images, are uncompressed).

We thus need to understand the binary format of a ZIP file, write some T-SQL