Exploring Azure SQL Ledger Tables

“Trust, but verify.”

– Ronald Reagan

Can I trust you?

Many organizations have databases holding data and managing processes that others rely on. Customers, clients, and various stakeholders trust organizations with their data. They believe those organizations will not defraud them for the accuracy of the data. There are many good reasons to trust some organizations, ranging from reputation to laws. However, sometimes even the best go astray or become victims of hackers or others with malicious intent.

The Bitcoin cryptocurrency deals with trust issues in the realm of currency. Underlying Bitcoin is the blockchain. In a public database, everyone can trust and believe without the need to trust or believe anyone else who is using the system.

To bring this level of trust to databases, Azure SQL is introducing Ledger tables. These are tables where a record of transactions can be verified using an external ledger similar to a blockchain.

In this blog post, I will go through an example of these new tables and explain how the system works.

Sections

What Is a Ledger Table

Ledger tables are SQL tables backed up by external ledgers that ensure data integrity. Ledger tables come in two varieties: Append-only and Updateable.

Append-only tables only allow INSERT operations which help protect the data from later changes. Updateable tables allow inserts, updates, and deletes.

The database stores the entire history of all the ledger tables. Append-only tables, by their nature, store their own history. Updateable tables automatically create a history table that stores all the changes to the table.

Table history is also stored in the form of ‘digests’. These are JSON objects containing a hash of the state of the database at a given time. A sequence of these digests, also called blocks, forms the ledger blockchain.

A hash value is produced by a hash algorithm. These algorithms can take any amount of information and produce a fixed-length value, the hash value. A given algorithm will always produce the same hash value for the same input data. However, it is impossible to re-create the input data from the hash value. So the hash is a way of summarizing data without showing the data.

Because the history of the database is available, any block (or digest) can be re-created from the database and compared to those in the ledger. If the data is valid and tamper-free, the re-created digests will match the ledger, confirming the data is trustworthy.

Demo

I’m going to walk through the process of creating and working with an updateable ledger table to provide a more detailed explanation of how these tables work. The process for an append-only table is similar and I will mention differences as they come up.

In order to create ledger tables, you first have to create a new database with the ledger enabled (See this guide). In my case, I choose the option to make all tables ledger tables.

While configuring a new database, you will be able to choose where to store your ledger. Generally available right now is Azure storage such as Data Lake Gen 2. For extra security, an append only Blob prevents anyone from tampering with the ledger.

Currently in preview, is Azure Confidential Ledger, which is a super secure storage location and is even safe from Microsoft employees. According to the documentation: “…no one is “above” the Ledger, not even Microsoft.”

After creating the database, the first step is to create a table and insert some data.

CREATE TABLE [updateableLedger]
(
    [Letter] CHAR(1),
    [Number] INT
)
WITH (
 SYSTEM_VERSIONING = ON,
    LEDGER = ON --Technically redundant in my case
    (APPEND_ONLY = OFF) --Turning this on creates an append only table
)
INSERT INTO dbo.[updateableLedger]
VALUES ('E', 5), ('F', 6), ('O', 15);

At this point, the ledger is starting to form and transactions are being assigned to blocks. To get a look at it, I can create a digest that will show the latest block. If there are any open blocks at this point, running the stored procedure closes the block to produce the digest.

EXECUTE sp_generate_database_ledger_digest
{
    "database_name": "ledgerBlog",
    "block_id": 1,
    "hash": "0xB4412B2F4D6269EC19B3999E15FAF1CC97D275E671CA5A179267D0786575FE8D",
    "last_transaction_commit_time": "2022-03-22T14:37:45.9600000",
    "digest_time": "2022-03-22T14:40:57.3581867"
}

This digest shows block #1, which is the second block because the ledger starts at block #0. The digest shows the time of the last transaction and the digest’s creation time. The hash is the key as it encapsulates the current state of the database’s ledger tables. Digests do not store actual data, just hashed versions of it. Therefore, these digests could be sent to various stakeholders as an additional layer of security.

Next, I do a few more transactions and create a new digest.

UPDATE dbo.[updateableLedger]
SET [Number] = 0
WHERE [Letter] = 'O'
DELETE FROM dbo.[updateableLedger]
WHERE [Letter] = 'F'
{ 
    "database_name":"ledgerBlog",
    "block_id":2,
    "hash": "0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E",
    "last_transaction_commit_time":"2022-03-22T14:44:36.0333333",
    "digest_time":"2022-03-22T14:46:24.2223514"
}

The new transactions have created a new block with a new hash value.

Here is the current state of the table along with several other system generated columns that need to be explicitly selected. These extra columns give some metadata related to the row and its relationship to the ledger.

SELECT *,
ledger_start_transaction_id,
ledger_end_transaction_id,
ledger_start_sequence_number,
ledger_end_sequence_number
FROM [dbo].[updateableLedger]
LetterNumberledger_start_transaction_idledger_end_transaction_idledger_start_sequence_numberledger_end_sequence_number
E511700
O011870

The ledger_start_transaction_id gives the transaction number when that version of the row was created (like the created date in a Type 2 Slowly Changing Dimension except it uses transaction numbers instead of dates). ledger_end_transaction_id provides the end transaction for that row (these two are both current). legder_start_sequence_number and ledger_end_sequence_number provide the start and end block numbers of the transactions that are part of.

Along with the main table, a history table and a view are also created. Below is the history table. It stores the historical values of the table so there is a record of all the previous values and where they show up in the ledger. History tables are only made for updateable tables and they also contain the same extra columns as the main table.

SELECT * 
FROM [dbo].[MSSQL_LedgerHistoryFor_1525580473]
ORDER BY ledger_start_sequence_number, ledger_start_transaction_id
LetterNumberledger_start_transaction_idledger_end_transaction_idledger_start_sequence_numberledger_end_sequence_number
F61170119010
O151170118721

There is also a ledger view of the table which displays all the transactions and what type they are along with the ledger metadata.

SELECT * 
FROM [dbo].[MSSQL_LedgerHistoryFor_1525580473]
ORDER BY ledger_start_sequence_number, ledger_start_transaction_id
LetterNumberledger_transaction_idledger_sequence_numberledger_operation_typeledger_operation_type_desc
E5117001INSERT
F6117011INSERT
O15117021INSERT
O0118701INSERT
O15118712DELETE
F6119002DELETE

The Ledger

For this demo, I stored my ledger in Data Lake Gen 2 storage. The database created a file there to store the ledger. Below are the contents of the ledger file.

{
    "database_name":"ledgerBlog",
    "block_id":0,
    "hash":"0xD10F345A6FDC137BB49E9F32DF507A58CC4EAC2D16D805B062C5F960289ED52B",
    "last_transaction_commit_time":"2022-03-22T14:35:33.9100000",
    "digest_time":"2022-03-22T14:35:53.3518898"
}{
    "database_name":"ledgerBlog",
    "block_id":1,
    "hash":"0xB4412B2F4D6269EC19B3999E15FAF1CC97D275E671CA5A179267D0786575FE8D",
    "last_transaction_commit_time":"2022-03-22T14:37:45.9600000",
    "digest_time":"2022-03-22T14:38:03.7306141"
}{
    "database_name":"ledgerBlog",
    "block_id":2,
    "hash":"0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E",
    "last_transaction_commit_time":"2022-03-22T14:44:36.0333333",
    "digest_time":"2022-03-22T14:44:49.9397197"
}

As you can see, the content of the ledger is a series of database digests.

Verification

In order to verify if there has been any tampering with the data, there are two stored procedures available. The first requires you to provide one or more digests for the verification, as shown below.

DECLARE @digest nvarchar(max) = N'{"database_name":"ledgerBlog","block_id":2,"hash":"0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E","last_transaction_commit_time":"2022-03-22T14:44:36.0333333","digest_time":"2022-03-22T14:46:24.2223514"}
'
EXECUTE sp_verify_database_ledger @digest

Complete verification requires reading through the entire ledger. The Microsoft-provided code below collects the entire ledger from its storage location(s). Then re-creates it using the database history and makes a comparison. If the comparison passes, the entire database is verified.

--Getting the leder locations and putting them into a JSON format.
DECLARE @digest_locations NVARCHAR(MAX) = (SELECT * FROM sys.database_ledger_digest_locations FOR JSON AUTO, INCLUDE_NULL_VALUES);
SELECT @digest_locations as digest_locations;

--Verifying the database against the ledger
BEGIN TRY
 EXEC sys.sp_verify_database_ledger_from_digest_storage @digest_locations;
 SELECT 'Ledger verification succeeded.' AS Result;
END TRY
BEGIN CATCH
 THROW;
END CATCH

These stored procedures can detect any tampering with the data and history of the database.

Conclusion

Trusting the validity of data is getting more and more important all the time. Ledger tables provide a way to ensure the integrity of data and detect any sort of tampering which may be going on. They allow you and anyone else to depend on a database to ensure the data is trustworthy. In cases where multiple businesses are working together on projects, blockchains are already used to provide trust between parties. Bringing ledgers to SQL Server allows many businesses to leverage technologies they are already familiar with.

Leave a Reply

Your email address will not be published.

Stephen is an Analytics Consultant with Iteration Insights. Stephen blends his doctoral background in Computational Chemistry with his inquisitive approach to Data Analytics. Some of the tools Stephen likes to work with are Power BI/DAX, SQL, R, and Python. Behind the scenes, Stephen is making waves in the analytics space by sharing his knowledge and speaking at local, national, and international events like the Dynamic Communities North America Summit.

Iterate With Us

Signup for the Iteration Insights newsletter for access to recent news and other updates.

Related Posts

Power BI Bookmarks

Bookmarks are a feature within Power BI that allows users to capture a specific state

Share on

Next Event

Sign up for our newsletter