Exploring Azure SQL Ledger Tables

“Trust, but verify.”
– Ronald Reagan

Can I trust you?

Many organizations rely on database management tools to store data and handle processes. Customers, clients, and various stakeholders trust organizations with their data. They believe those organizations will not defraud them for the accuracy of the data. There are many good reasons to trust some organizations, ranging from reputation to laws. However, sometimes even the best go astray or become victims of hackers or others with malicious intent.

The Bitcoin cryptocurrency deals with trust issues in the realm of currency. Underlying Bitcoin is the blockchain technology. In a public database, everyone can trust and believe without the need to trust or believe anyone else who is using the system.

To bring this level of trust to databases, Azure SQL is introducing Ledger tables. These tables have the ability to verify transaction records. They use an external ledger such as a blockchain infrastructure.

In this blog post, I will go through an example of these new tables and explain how the system works.

Sections

What Is a Ledger Table

Ledger tables are SQL tables backed up by external ledgers that ensure data integrity. Ledger tables come in two varieties: Ap pend-only Ledger Table and Updatable Ledger Table.

Append-only ledger tables only allow INSERT operations which help protect the data from later changes. An updatable ledger table allows inserts, updates, and deletes.

The database stores the entire history of all the ledger tables. Append-only tables, by their nature, store their own history. Updateable tables automatically create a history table that stores all the changes to the table.

Table history is also stored in the form of ‘digests’. These are JSON objects containing a hash of the state of the database at a given time. A sequence of these digests, also called blocks, forms the ledger blockchain.

A hash value is produced by a hash function or algorithm. These algorithms can take any amount of information and produce a fixed-length value, the hash value. A given algorithm will always produce the same hash value for the same input data. However, it is impossible to re-create the input data from the hash value. So the hash is a way of summarizing data without showing the data.

Because the history of the database is available, any block (or digest) can be re-created from the database and compared to those in the ledger. If the data is valid and tamper-free, the re-created digests will match the ledger, confirming the data is trustworthy.

Demo

I’m going to walk through the process of creating and working with an updatable ledger table. The goal is to provide a more detailed explanation of how these tables work. The process for an append-only table is similar and I will mention differences as they come up.

To create ledger tables, you first have to create a new database with the ledger enabled (See this guide). In my case, I choose the option to make all tables ledger tables.

While configuring a new database, you will be able to choose where to store your ledger. Generally available right now is Azure storage such as Data Lake Gen 2. For additional security, an append only Blob prevents anyone from tampering with the ledger.

Currently in preview is Azure Confidential Ledger. This ledger is a super secure storage location and is even safe from Microsoft employees. According to the documentation: “…no one is “above” the Ledger, not even Microsoft.”

After creating the database, the first step is to create a table and insert some data.

CREATE TABLE [updateableLedger]
(
    [Letter] CHAR(1),
    [Number] INT
)
WITH (
 SYSTEM_VERSIONING = ON,
    LEDGER = ON --Technically redundant in my case
    (APPEND_ONLY = OFF) --Turning this on creates an append only table
)

INSERT INTO dbo.[updateableLedger]
VALUES ('E', 5), ('F', 6), ('O', 15);

At this point, the ledger is starting to form and transactions are being assigned to blocks. To get a look at it, I can create a digest that will show the latest block. If there are any open blocks at this point, running the stored procedure closes the block to produce the digest.

EXECUTE sp_generate_database_ledger_digest

{
    "database_name": "ledgerBlog",
    "block_id": 1,
    "hash": "0xB4412B2F4D6269EC19B3999E15FAF1CC97D275E671CA5A179267D0786575FE8D",
    "last_transaction_commit_time": "2022-03-22T14:37:45.9600000",
    "digest_time": "2022-03-22T14:40:57.3581867"
}

This digest shows block #1, which is the second block because the ledger starts at block #0. The digest shows the time of the last transaction and the digest’s creation time. The hash is the key as it encapsulates the current state of the database’s ledger tables. Digests do not store actual data, just hashed versions of it. Therefore, these digests could be sent to various stakeholders as an additional layer of security.

Next, I do a few more transactions and create a new digest.

UPDATE dbo.[updateableLedger]
SET [Number] = 0
WHERE [Letter] = 'O'
DELETE FROM dbo.[updateableLedger]
WHERE [Letter] = 'F'

{ 
    "database_name":"ledgerBlog",
    "block_id":2,
    "hash": "0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E",
    "last_transaction_commit_time":"2022-03-22T14:44:36.0333333",
    "digest_time":"2022-03-22T14:46:24.2223514"
}

The new transactions have created a new block with a new hash value.

Here is the current state of the table along with several other system generated columns that need to be explicitly selected. These extra columns give some metadata related to the row and its relationship to the ledger.

SELECT *,
ledger_start_transaction_id,
ledger_end_transaction_id,
ledger_start_sequence_number,
ledger_end_sequence_number
FROM [dbo].[updateableLedger]

Letter	Number	ledger_start_transaction_id	ledger_end_transaction_id	ledger_start_sequence_number	ledger_end_sequence_number
E	5	1170		0
O	0	1187		0

This unique identifier gives the transaction number when that version of the row was created:

ledger_start_transaction_id

It’s like the created date in a Type 2 Slowly Changing Dimension except it uses transaction numbers instead of dates.

This identifier provides the end transaction for that row (these two are both current):

ledger_end_transaction_id

The following identifiers provide the start and end block numbers of the transactions that are part of:

legder_start_sequence_number

ledger_end_sequence_number

Along with the main table, a history table and a view are also created. Below is the history table. It stores the historical values of the table so there is a record of all the previous values and where they show up in the ledger. History tables are only made for updateable ledger tables and they also contain the same extra columns as the main table.

SELECT * 
FROM [dbo].[MSSQL_LedgerHistoryFor_1525580473]
ORDER BY ledger_start_sequence_number, ledger_start_transaction_id

Letter	Number	ledger_start_transaction_id	ledger_end_transaction_id	ledger_start_sequence_number	ledger_end_sequence_number
F	6	1170	1190	1	0
O	15	1170	1187	2	1

There is also a ledger view of the table which displays all the transactions and what type they are along with the ledger metadata.

SELECT * 
FROM [dbo].[MSSQL_LedgerHistoryFor_1525580473]
ORDER BY ledger_start_sequence_number, ledger_start_transaction_id

Letter	Number	ledger_transaction_id	ledger_sequence_number	ledger_operation_type	ledger_operation_type_desc
E	5	1170	0	1	INSERT
F	6	1170	1	1	INSERT
O	15	1170	2	1	INSERT
O	0	1187	0	1	INSERT
O	15	1187	1	2	DELETE
F	6	1190	0	2	DELETE

The Ledger

For this demo, I stored my ledger in Data Lake Gen 2 storage. The database created a file there to store the ledger. Below are the contents of the database file.

{
    "database_name":"ledgerBlog",
    "block_id":0,
    "hash":"0xD10F345A6FDC137BB49E9F32DF507A58CC4EAC2D16D805B062C5F960289ED52B",
    "last_transaction_commit_time":"2022-03-22T14:35:33.9100000",
    "digest_time":"2022-03-22T14:35:53.3518898"
}{
    "database_name":"ledgerBlog",
    "block_id":1,
    "hash":"0xB4412B2F4D6269EC19B3999E15FAF1CC97D275E671CA5A179267D0786575FE8D",
    "last_transaction_commit_time":"2022-03-22T14:37:45.9600000",
    "digest_time":"2022-03-22T14:38:03.7306141"
}{
    "database_name":"ledgerBlog",
    "block_id":2,
    "hash":"0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E",
    "last_transaction_commit_time":"2022-03-22T14:44:36.0333333",
    "digest_time":"2022-03-22T14:44:49.9397197"
}

As you can see, the content of the ledger is a series of database digests.

Verification

To verify if there has been any tampering with the data, there are two stored procedures available. The first requires you to provide one or more digests for the verification, as shown below.

DECLARE @digest nvarchar(max) = N'{"database_name":"ledgerBlog","block_id":2,"hash":"0x540199FD6F3C72E8A5F6923CA20B5F580607EEA1CD0FBDAA9B0E3C3953323E8E","last_transaction_commit_time":"2022-03-22T14:44:36.0333333","digest_time":"2022-03-22T14:46:24.2223514"}
'
EXECUTE sp_verify_database_ledger @digest

Complete verification requires reading through the entire ledger. The Microsoft-provided code below collects the entire ledger from its storage location(s). Then re-creates it using the database history and makes a comparison. If the comparison passes, the entire database is verified.

--Getting the leder locations and putting them into a JSON format.
DECLARE @digest_locations NVARCHAR(MAX) = (SELECT * FROM sys.database_ledger_digest_locations FOR JSON AUTO, INCLUDE_NULL_VALUES);
SELECT @digest_locations as digest_locations;

--Verifying the database against the ledger
BEGIN TRY
 EXEC sys.sp_verify_database_ledger_from_digest_storage @digest_locations;
 SELECT 'Ledger verification succeeded.' AS Result;
END TRY
BEGIN CATCH
 THROW;
END CATCH

The stored procedures have tamper-evidence capabilities. They are designed to detect any unauthorized modifications to the data or the history of the database.

Conclusion

Trusting the validity of data is getting more and more important all the time. Ledger tables provide a way to ensure the integrity of data and detect any sort of tampering which may be going on. They allow you and anyone else to depend on a database engine to ensure the data is trustworthy.

Blockchain Technology help establish digital trust between multiple business parties collaborating on projects. By integrating ledgers with SQL Server, businesses can leverage familiar technologies.

For more Azure Ledger Database updates and insights, make sure to visit our website and embark on our iterative journey with us.