The advent of Bitcoin paved the way for a plethora of blockchain systems supporting diverse applications beyond cryptocurrencies. Although in-depth studies of the consensus protocols as well as the privacy of blockchain transactions are available, there is no formal model of the transaction semantics that a blockchain is supposed to guarantee.
In this work, we fill this gap, motivated by the observation that the semantics of transactions in blockchain systems can be captured by a directed acyclic graph. Such a transaction graph, or TDAG, generally consists of the states and the transactions as transitions between the states, together with conditions for the consistency and validity of transactions. We instantiate the TDAG model for three prominent blockchain systems: Bitcoin, Ethereum, and Hyperledger Fabric. We specify the states and transactions as well as the validity conditions of the TDAG for each one. This demonstrates the applicability of the model and formalizes the transaction-level semantics that these systems aim for.
The success of Bitcoin  has sparked the development of many other blockchain systems. Whereas the first blockchains after Bitcoin (called alt-coins) resembled the cryptocurrency functionality offered by Bitcoin and mostly differed in the choice of certain parameters, Ethereum  was the pioneer of so-called smart contract systems that support arbitrary (deterministic) computation on the blockchain. Platforms for running smart contracts are seen to be of wide-spread interest for replacing trusted parties, whether in public blockchains where participation is open to anyone or in private blockchains inside a consortium.
Many recent blockchain platforms run generic computations, model specific asset classes, or add cryptographic privacy guarantees; prominent systems today include Hyperledger Fabric , R3 Corda , Tendermint/Cosmos , and Chain Core .
Blockchain systems have attracted attention not only from industry but also from academia. Many works have analyzed blockchains from different perspectives, for example, focusing on the underlying consensus protocols , their privacy guarantees , and many more aspects. This collection is necessarily partial; excellent surveys exist in the literature .
What is, surprisingly, missing to date is a formal model of the semantics of a blockchain, addressing the transaction-level consistency guarantees that they aim to achieve. These guarantees are intuitive and easy to grasp in the context of Bitcoin: given a proper modeling of the mining of new coins, the overall amount of bitcoins must remain invariant. For the newer, generic, and more complex blockchains, such as Ethereum or Hyperledger Fabric, a proper model of the guarantees they provide appear necessary. For instance, such a model should allow for reasoning whether the intuitively expected guarantees are indeed achieved. It should also model the operation of a blockchain at an appropriate level, such that the properties of a system appear concisely and differences across platforms become visible. In particular, it has to describe the criteria that determine whether a transaction that manipulates state is considered valid and consequently executed by the nodes.
We introduce a formal model, called the transaction graph or TDAG for short, a directed acyclic graph that models the transactions occurring on a blockchain and how they interact through states. In a nutshell, a TDAG is a graph consisting of transactions that link states to each other. Each transaction may consume, observe, or produce states, and occurs only with respect to an external input that triggers the transaction. The model abstracts the transaction validation into a predicate that can be evaluated locally in the graph, in the sense that validation only considers the relevant states. This corresponds to how many blockchains work, during the process of transaction validation and consensus, which must be efficient and based on local state. The TDAG is a generic model to encode properties expected from every blockchain system, such as notions of validity and consistency, and for characterizing the invariants that must be enforced in a blockchain.
We instantiate the TDAG model for three different prominent blockchains: Bitcoin, Ethereum, and Hyperledger Fabric. For each system, we describe the states and transactions of the TDAG, specify the notion of consistency, and define the validity of transactions. This shows the broad applicability of our model, and results in an abstract description of these real-world systems.
Atzei et al. provided a formal transaction model for Bitcoin . While their model covers certain aspects of Bitcoin, like scripts and multi-signature, in more detail than ours, it does not allow to model and compare with other blockchain systems. BitML  develops a formal language for smart contracts based on Bitcoin. Like our work, it is more abstract than , but the scope is different: While BitML targets modeling and describing concrete contracts, the TDAG model aims to model the consistency of the platform as a whole. The TDAG can be seen as a refinement of the precedence graph (or serialization graph) from database concurrency theory , which relates transactions with conflicting data access. The TDAG in addition contains states as vertices, as one goal of the TDAG(besides formalizing conflicts) is to make statements about the consistency of the states.
This section introduces the transaction directed acyclic graph, abbreviated transaction graph or TDAG for representing the semantics of a blockchain. It models the context held by the blockchain and its evolution through transactions that obey validation rules.
We start by introducing some notation. Let be a relation between sets and . For the predicate , we also write . Furthermore, we denote the set by and its size by .
A transaction graph or TDAG is a directed acyclic graph . The vertices can be partitioned into states and witnesses , that is, . At a high level the edges represent transitions between states. More precisely, an edge represents the relation between a state and a witness in the context of a transaction, and an edge may connect a state to a witness or vice versa. The edges can be partitioned into consuming, observing, and producing edges, denoted , , and , respectively, such that . We now introduce the elements of informally.
States . The first type of vertex, , denotes an atomic state represented by the blockchain and is depicted by a circle . It models an individual asset, a digital coin, some coins controlled by a particular cryptographic key, a variable of a smart contract at a moment in time, and so on. The complete context of the blockchain consists of all states that exist at a particular time. A state results from a transaction on the blockchain and can transition to other states through a transaction.
There is a special genesis state , which represents the initial state of the blockchain. There is a single genesis state by intention because the blockchain system is initialized exactly once.
Witnesses . The second kind of vertex, , denotes a witness in the context of a transaction and is depicted by a rectangle . It represents any data included in a transaction that is required for the transaction to be valid according to the validation rules of the blockchain system. Every transaction of the blockchain system contains exactly one witness.
Consuming edges . A consuming edge connects a state to a witness and models that the state is consumed by the transaction that involves witness , i.e., the unique transaction that corresponds to . A state can be consumed exactly once, i.e., it is not available for being consumed by another transaction once it has been consumed. Consuming a state means that the state is “updated” or “overwritten” by the transaction.
Observing edges . An observing edge also connects a state to a witness; it models that the state enters into the transaction represented by the witness, but that it remains available for consumption by another transaction. A state can be observed by many transactions, even if it is also consumed. Intuitively a transaction that observes a state “reads” it.
Producing edges . A producing edge connects a witness to a state, and denotes that the state is created or produced by the transaction corresponding to the witness. Every state apart from the genesis state is produced exactly once.
With these notions, a transaction represents a transition from one state, or from some set of states, in a TDAG to another set of states according to the blockchain system. The transaction is linked to a unique witness, which makes it “valid” as described later. We say that a transaction has input states that are consumed or observed by the transaction and output states that are produced by the transaction. More formally, a transaction is also a weakly connected DAG, i.e., a DAG that is connected as a graph.
A weakly connected DAG with a set of input states , a set of output states , and a witness is called a transaction whenever
Every input state in is a source (has indegree zero);
Every output state in is a sink (has outdegree zero);
Every edge in is either a consuming edge or an observing edge and links some input state to , or it is a producing edge and links to some output state .
The transaction as defined above does not directly correspond to what is referred to as transaction in many blockchain platforms, and referred to as witness here: Definition (2.1) references the input states used in the execution of the transaction. In many practical systems such as Ethereum, however, the input states used in the execution depend on where the transaction is included in the blockchain, as the state may be modified by preceding transactions. Definition (2.1) allows to model the local validity conditions for transactions, which necessarily involve the input and output states.
As the name suggests, a transaction graph contains many transactions.
A transaction graph (TDAG) is a directed unweighted graph , where are the vertices and are the edges. The set denotes the states and contains a special state called genesis. The set denotes the witnesses. Edges are partitioned into three subsets, where denotes consuming edges, denotes observing edges, and denotes the producing edges.
It satisfies the following conditions:
does not have any producing or observing edges and it has a single consuming edge, i.e., .
Every state except for the genesis state has exactly one producing edge, i.e., .
Every state except for the genesis state may have multiple successors, but at most one among them is connected with a consuming edge, i.e., .
is weakly connected.
has no cycles.
The consuming and observing edges incident to a state are also called the outgoing edges of that state. Similarly, the consuming and observing edges incident to a witness are called incoming edges of that witness. The producing edges of a witness are outgoing edges of the witness. There is no order among the edges incident to a vertex in a TDAG. The set of all unconsumed states in a TDAG are the states without an incident consuming edge.
In a TDAG every witness corresponds to a unique transaction . The next definition follows naturally and is easily seen to be equivalent to Definition (2.1).
Given a TDAG and a witness , the transaction with witness is the unique subgraph , where
is the witness of the transaction;
is the set of states connected to , i.e., ; and
are the edges with both endpoints in .
The input states of are the states being observed or consumed by , and the output states of are the states being produced by . With this terminology a transaction can have one of the following five types, which depends mostly on the number of input and output states:
INIT: A unique initialization transaction exists in every non-empty TDAG, consisting of a consuming edge that links the genesis state to a witness and a set of producing edges that link to a set of states.
SISO: A single-input, single-output transaction consists of one consuming edge that links one input state to a witness and one producing edge that links to an output state.
SIMO: A single-input, multi-output transaction consists of one consuming edge that links an input state to a witness , and a set of producing edges that link to a set of output states.
MISO: A multi-input, single-output transaction contains a set of multiple consuming and observing edges that link distinct input states to a witness and one producing edge that links to an output state.
MIMO: A multi-input, multi-output transaction contains a set of multiple consuming and observing edges that link distinct input states to a witness , and a set of producing edges that link to a set of output states.
Figure (1) shows the possible transaction types in a TDAG. The initialization transaction plays a special role; it represents the creation of the blockchain, which typically creates all assets represented by the states. Modeling initialization through a specific transaction is a deliberate design choice that will become clear later, in the context of transaction validation. The other types represent “ordinary” transactions that consume (and possibly observe) one or more states and produce one or more states. We note that SISO and SIMO transactions have a single input state and have no observing edges. This models that a transaction must update or overwrite at least one state for it to make sense of being included in the blockchain, as simple read queries can be handled by inspecting the blockchain.
For the moment, it suffices to say that the initialization transaction typically creates all “assets” modeled by the blockchain or the “states” that it holds, setting them to a predefined value. This allows a subsequent transaction to be linked only with the states to which it refers and that it consumes. Otherwise, all transactions that modify any state would be linked from the genesis state (with a consuming edge), contrary to the condition that every state has at most one consuming edge. We consider this an important property of the TDAG model. A further argument for modeling only one initialization transaction goes as follows. If there were multiple INIT transactions, then it would not be easily possible to assess whether one INIT transaction is “valid” without looking also at the other ones. For instance, an INIT transaction that creates a new asset is only valid if no other INIT transaction has created the same asset beforehand.
Therefore, we purposely restrict the model so that it has a single initialization transaction for simplicity, but without loss of generality as this unique initialization transaction can create as many states as required throughout the lifetime of the blockchain.
Figure (2) shows an illustrative example of a TDAG modeling a Bitcoin execution with four transactions. First, represents the creation of the Bitcoin blockchain by minting all available bitcoins into a Bitcoin address containing unmined bitcoins (). Here, represents the Bitcoin creation rules. Second, represents a transaction that transfers some unmined bitcoins () to the Bitcoin address of a user that successfully mined the first Bitcoin block (); saves the remaining unmined bitcoins () for subsequent block creations. Here, represents proof-of-work in the block mined by . Third, represents a transaction where transfers some of her bitcoins () to another Bitcoin address (). The associated transaction fee is modeled as another address (). Here, represents the authorization of the transaction in the form of a digital signature by . Finally, represents a transaction that rewards a user for creating a Bitcoin block containing . In that sense, is similar to , with the difference that also captures the fact that the user also receives the fees associated to .
We note that this example does not contain any observing edge. This results from the fact that read-only operations are not supported in Bitcoin.
A central goal of blockchain systems is to prevent conflicts among transactions and to ensure validity for all transactions, as a result of a consensus process executed among the participating entities. The TDAG model permits to have a closer look at the semantics of conflicts and validity; modeling consensus is outside the scope of this work.
Intuitively, a conflict in a blockchain underlying a cryptocurrency such as Bitcoin occurs in an attempt to “double-spend” money. According to the example describing Bitcoin from before (and expanded in Section 3), assume that a state in a TDAG corresponds to bitcoins held by a particular Bitcoin address. Two transactions that double-spend such bitcoins map to two transactions that both consume . But every state in a TDAG can be consumed at most once, hence, the TDAG model already prevents this form of conflict.
In blockchains for arbitrary smart contracts, a conflict corresponds to a situation where generic validation rules for transactions are violated. Such rules may refer to coins (such as an amount of Ether in Ethereum) or to other assets modeled in the blockchain. The TDAG model for these blockchains also imposes that every state can be consumed at most once.
When one considers an arbitrary set of transactions (not arising from the same transaction graph), such as transactions that have merely been proposed and are not executed on the blockchain yet, then conflicts among them could exist. This is the case in a cryptocurrency like Bitcoin when a miner searches for the next block, for example, and two transactions might be floating around in the network that both attempt to consume the same state . Similarly, conflicting transactions exist in smart-contract platforms during the process of reaching consensus on a valid blockchain execution.
We now consider a set of transactions (in the form of a graph) and define what it means for them to be conflict-free.
Consider a DAG with states , witnesses , producing edges and consuming edges that contains a transaction for every witness . We say that has no conflicts if every state has at most one producing edge and one consuming edge, i.e., .
A conflict-free set of transactions can be added to a TDAG. To ensure that its addition does not cause any conflicts with the TDAG only simple and local conditions have to be verified.
Consider a TDAG and a DAG containing a conflict-free set of transactions such that
No witness of is in , i.e., ;
Every input state of is an unconsumed output state of , i.e., ;
The output states of do not exist in , i.e., .
Then the result of adding to is the DAG , with , , and .
When a conflict-free set of transactions is added to a TDAG, then the resulting graph is also a TDAG.
PROOF: Here we show that satisfies the conditions to be a TDAG.
The genesis state must not have producing or observing edges and it must have a single consuming edge. This condition is fulfilled since is a TDAG and does not contain the genesis state if it is already consumed in .
Every state, other than genesis, must have a single producing edge. This condition is fulfilled in and in by definition. Now, the addition of to does not create new edges. Therefore, this condition holds also in .
Every state, other than the genesis, can have multiple successors, but at most one among them is connected with a consuming edge. It is easy to see that fulfills this condition following an argument similar as before.
The graph must be weakly connected. Note that by the definition of TDAG, each vertex is weakly connected to every unconsumed state in . Moreover, every vertex in is weakly connected to at least one input state of . Now, as the set of input states in is a subset of the unconsumed states in , it follows that is weakly connected.
The graph must not have cycles. According to the assumptions on and because is a DAG, and through the way in which is constructed, it is easy to see that has no cycles.
We now introduce the notion of validity for transactions in a TDAG, which models the fact that on a blockchain only “valid” transactions are executed. As an important design choice of the model, the validity of a transaction in a TDAG must be decidable locally, that is, from the transaction alone, considering only its input states, the witness, and the output states. To capture this notion, we assume that the blockchain context defines a boolean validation predicate on the space of all transactions included in the blockchain.
Let be a transaction in a TDAG . Then is valid whenever . Furthermore, is a valid transaction graph if all transactions in are valid.
Combined with the locally checkable conditions for adding transactions to a TDAG, the fact that the validity of a transaction is locally decidable defines, in an influential way, how many blockchain systems work during consensus, validation, and execution of new transactions. The only steps needed for validation are to ensure the validity predicate of a candidate transaction plus the checks according to Definition [def:add] involving the states to which the transaction refers.
Transaction validation also relies on the property that all states in the TDAG are distinct. In a typical blockchain, the validation function relies on a cryptographic hash of the states to which it refers; this directly ensures uniqueness. For example, consider an execution of a smart contract that holds state on the blockchain in the form of a local variable . The contract may update multiple times, and it may write the same value to more than once. To make the resulting states in the TDAG different, the model will usually include a version number in the state that makes each assignment unique.
At this point, let us review our design choice of a single INIT transaction. Using a single transaction to create all assets represented by the states enables to locally check the validity of the initialization of the blockchain as well as preserve the locally checkable conditions for further transactions consuming those states.
In Bitcoin (and many other cryptocurrencies), all the miners participate in the consensus protocol to decide about the validity of every single transaction. The permissionless nature of this consensus mechanism heavily limits the transaction throughput. One alternative to overcome this scalability issue is called sharding and consists in organizing disjoint sets of miners, letting each of these sets reach consensus about a subset of the transactions. The composition of those subsets of transactions is required then to shape the blockchain.
In the following, we describe the composition of transaction graphs, which states the conditions under which two TDAGs can be merged into a single one. One may then reason about their consistency and validity in a unified manner. Composition of transaction graphs can be used to model the goal of protocols for cross-chain transactions, namely that the combined state of both chains achieves the expected consistency properties.
Consider two TDAGs and . Assume that denotes the INIT transaction in and denotes the INIT transaction in . Further assume that denotes a INIT transaction where and the output states are the union of output states from and . Then, the composition of and is the TDAG.
The composition of two TDAGs and results in a graph , which is also a TDAG.
PROOF: Here we show that satisfies the conditions to be a TDAG.
The genesis state must not have producing or observing edges and it must have a single consuming edge. This condition is fulfilled by our definition of the INIT transaction .
Every state, other than genesis, must have a single producing edge. As and are two TDAGs, it is easy to see that each state in and has a single producing edge. Moreover, by definition of INIT transaction, each output state in has a single producing edge.
Every state, other than the genesis, can have multiple successors, but at most one among them is connected with a consuming edge. It is easy to see that fulfills this condition along the lines of previous argument.
The graph must be weakly connected. and are connected by definition, as and are two TDAGs. Moreover, the definition of the INIT transaction ensures that any vertex in is connected to any vertex in through .
The graph must not have cycles. and are acyclic by definition, as and are two TDAGs. Moreover, the addition of clearly does not introduce any cycle.
In this section, we describe how executions of different blockchain systems are modeled by transaction graphs. We cover three prominent blockchains: Bitcoin, Ethereum, and Hyperledger Fabric. They differ in how they store assets in their state. Bitcoin, for example, does not have state “variables” but maintains an asset only in the context of the transaction that created it. Ethereum, on the other hand, uses variables and accounts for its state. The data model in Fabric is a key-value store (KVS), which can be mapped to local database on each node. Due to lack of space, this section only gives a short overview and more details appear in the full version .
Throughout this section, we denote by a cryptographic, collision-free hash function that takes as input a bit-string of arbitrary length and returns a fixed-length string .
Since Bitcoin (bitcoin.org) is the prototype of all blockchain systems, there are many publicly available descriptions  and we keep the background short. Likewise, the discussion here applies to all alt-coins patterned after Bitcoin.
Bitcoin combines transaction validation, coin mining, and agreement on the ledger with the “Nakamoto protocol” that uses proof-of-work and ensures consensus. A block in Bitcoin can hold two types of transactions:
A coinbase transaction that transfers yet unmined bitcoins to a Bitcoin address as chosen by the miner of the corresponding block, as a reward for creating the block. This transaction is valid if (i) it transfers a number of bitcoins according to the height of the block to a Bitcoin address, and (ii) it contains the solution to the proof-of-work puzzle for successful mining of the block.
A regular transaction transfers bitcoins from a set of Bitcoin (input) addresses to another set of Bitcoin (output) addresses. It also incurs a fee, defined as the difference between the bitcoin amounts in the input and output, which is assigned to the miner of the block in which the transaction appears. A regular transaction is valid if it includes a confirmation for each input for the amount and output and if it does not create new bitcoins.
Bitcoin value exists in the blockchain in the form of unspent transaction output, often abbreviated UTXO, which has been assigned to an address, representing a digital-signature public key. This value is controlled by the holder of the corresponding private key. It can be spent and transferred to another address by signing a transaction with the private key.
In the TDAG modeling Bitcoin, we let every state be a tuple of the form
where addr denotes an address, val denotes the amount of bitcoins held in this state, txhash is the cryptographic hash of the Bitcoin transaction generating this state, and index is the sequential index of the output among all outputs generated in that transaction.
In contrast to the Bitcoin code, we model transaction fees and unmined bitcoins as held by or associated to an (imaginary) address. This allows a coherent model for the TDAG. Thus, the state resulting from the special INIT transaction is fixed to , holding all bitcoins that ever exist.
The form of a witness depends on the transaction type: The witness for a coinbase transaction is the solution for the proof-of-work to assign the bitcoins to the address designated by the miner. For a regular transaction, the witness consists of a set of confirmations for the transfer of bitcoin, in the form of a digital signature for each UTXO, over the input and output addresses of the transfer. Finally, the INIT transaction does not require any witness.
The TDAG for Bitcoin contains producing and consuming edges but no observing edges. For a coinbase transaction, the input states are the unconsumed state of unmined bitcoins and the fee states for the transactions included in the mined block. One producing edge leads to a state for collecting the fees and the mining reward, another one to a state containing the remaining unmined bitcoins. Its witness is the mining proof. For a regular transaction, the input states are the unconsumed states representing the transaction inputs and the produced output states correspond to the transactions output addresses. The witness holds a set of confirmations (digital signatures), confirming for each input state the transfer of some bitcoins to the corresponding output addresses.
The transaction predicate incorporates the validation rules of Bitcoin, as expressed in the states, witnesses, and transactions of the TDAG.
With these definitions, one can then show the intuitive result that except with negligible probability, every (legal) execution of Bitcoin, considering only transactions that are “deep enough” in the blockchain (e.g., six blocks deep)  gives rise to a TDAG constructed like this. The formal analysis of this result exploits that the DAG formed by the hash-function applications among states has no cycles, and therefore satisfies the properties of a TDAG.
Ethereum  is the most prominent public blockchain and cryptocurrency supporting generic smart contracts today. In Ethereum, there exist two types of accounts, called externally owned accounts and contract accounts. Externally owned accounts by and large resemble the accounts of other cryptocurrencies such as Bitcoin. In these accounts, users maintain their currency balance in Ether, owned by them. But the main innovation of Ethereum lies in contract accounts, which represent a smart contract (an arbitrary piece of code in the platform-specific language) that executes a set of instructions upon receiving suitable input. A contract account also holds and controls its own Ether balance.
Ethereum supports several types of transactions. First, a transaction in Ethereum can be used to transfer Ether between two externally owned accounts. This type of transaction is like the exchange of coins in other cryptocurrencies. Second, a transaction can be used to create a contract with the code of the contract and an externally owned account as inputs. It outputs a contract account with the information required to initialize the implemented code (e.g., the inputs for the init function). Finally, a transaction can be used to invoke an existing contract on the blockchain.
An Ethereum transaction includes as input the sender’s address (an externally owned account), a recipient address (another account), a transaction value to be transferred from the sender’s address to the recipient, some arguments with parameters for the contract. As the transaction sender has to pay gas for the execution of the contract, the transaction also specifies a gas price that determines the price the sender is willing to pay for each executed instruction, and a gas limit, specifying a maximum overall price for the execution. A contract may also call functions of other contracts; however, this will not give rise to new transactions, as these calls take place in the context of the original transaction.
To model an Ethereum execution as a TDAG, we let each state consist of a tuple
Here, addr denotes the account address that produced the state, account-type determines whether this is a state of an external account or a contract account, code is a hash of the smart contract’s code, local-state denotes collectively all variables held by the contract, and val is the Ether balance held by the account after the execution that produced the state. If account-type specifies an externally owned account, then the smart contract is the fixed logic to validate payments from such accounts.
There is also a genesis state that models the creation of an Ethereum blockchain. In contrast to Bitcoin, there is currently no bound on the amount of Ether that will exist in the public Ethereum blockchain; the creation of new Ether is therefore subsumed into the mining operation and its validation.
A transaction in the TDAG is determined by the witness. It corresponds to an invocation of a smart contract and contains a gas limit and regular input arguments that validate the transaction. For instance, these arguments must contain a digital signature valid under the public key associated to the invoking external account that runs the transaction.
The transaction contains the state of the invoking account and the state of the contract as input states, with consuming edges to the witness. It also produces two states, an updated state of the invoking account and an updated state of the contract, as resulting from running the contract with the given gas limit and input arguments. If the contract calls functions of other contracts and they modify their state, then the states representing these contracts are also part of the transaction in the TDAG (as input states and output states). The validation predicate simply executes the code.
For mining new Ether, running transactions, and collecting the corresponding fees, similar states and validation logic as in the TDAG model of Bitcoin are added. Given these notions one can show that every (legal) execution of Ethereum, considering as in Bitcoin only those transactions that are deep enough in the blockchain, produces a valid TDAG.
Hyperledger Fabric (www.hyperledger.org/projects/fabric), or Fabric for short, is a permissioned blockchain framework, designed to support modular implementations of different components, including its consensus protocol, membership provider, and cryptography library . The nodes executing the transactions in Fabric are called peers.
An instance of Fabric may contain multiple channels that may run on different sets of peers, where each channel operates like a blockchain system independent of the others, apart from using some of the same code infrastructure, ordering protocol, and other components. We therefore consider only one channel here, modeling one blockchain.
On a channel, a configuration transaction (configtx) sets the initial values used for transaction processing, such as the credentials of the peers or organizations controlling the channel, the implementation of its ordering service, and so on. Once a channel has been prepared like this, it is ready to execute operations on its peers. Transactions in Fabric are executed by smart contracts called chaincode.
Chaincode is first installed on the peer and may later be upgraded; it must be instantiated for a specific channel before it can process transactions. Once instantiated on the channel, a chaincode supports two types of transactions: init and invoke. An init transaction is executed once after the chaincode has been installed or upgraded; it specifies an endorsement policy that determines how any subsequent transaction of this chaincode should be authorized. A chaincode determines through the endorsement policy on which peers it executes: whether all peers in the channel execute it, or only some, and which peers or which set of peers are sufficient to authorize the execution of the transaction.
An invoke transaction is used to execute a computation that may read and modify the state of the chaincode, which is a set of key-value pairs. The operations to access the state are (given a key , return the last value written to it) and (write the value to storage under the key ).
The processing of a transaction on Fabric proceeds like this :
A client creates and signs a transaction for a particular chaincode and sends it to the respective endorsing peers.
The endorsing peers simulate the transaction on their current current copy of the key-value store (KVS), verifying that the client is authorized to execute it. If successful, each endorsing peer returns the result of the execution to the client. This is also called an endorsement. It comes in the form of a signed readset and writeset (with the key-value pairs accessed during simulation, including a version for every value in the readset, determined by the logical time when this value was written). The endorsement serves as a static representation of the chaincode execution.
When the client has assembled enough endorsements that produce the same KVS changes and that satisfy the endorsement policy, it combines them to a transaction proposal. Then the client broadcasts this transaction proposal to the ordering service, which simply orders transactions without considering their semantics. Currently an ordering service based on Apache Kafka (kafka.apache.org) running in a cluster is supported and an ordering service using BFT consensus is under development .
The ordering service disseminates an ordered stream of transactions (grouped into blocks) to the peers on the channel. Each peer on its own then validates each transaction, by verifying that the endorsement policy is satisfied and that there were no changes to the key-value pairs contained in the readset (since transaction simulation).
If successful, the peer appends the block to the blockchain (of the channel) and performs the updates from the writeset to its local copy of the KVS. This assigns a version to the modified key-value pairs. Since the validation is deterministic, the states and versions are the same for all correct peers.
In the TDAG for Fabric, the states correspond to the entries in the KVS. Every state is a tuple containing at least
It is assumed that an init transaction implicitly initializes every key used by the chaincode later with a default value (). The init transaction is always valid.
Furthermore, every invoke transaction that reads or writes a set of keys , contains an observing edge for every accessed by an operation but not by an operation , and a consuming edge for every that is written using an operation . In other words, every key is implicitly read before it is written and, thus, a transaction in the TDAG modeling a Fabric execution has the same number of consuming edges as the number of producing edges.
A witness in the TDAG corresponds to a valid endorsement, in the form of signatures from the endorsers issued on the same readset/writeset pair from the transaction proposal. The validation predicate contains the steps that each peer takes to validate a transaction coming from the ordering service, with respect to its local KVS. Notice that this validation only accesses the versions in the readset, but no other state entry in the KVS. Since these states are also contained in the transaction in the TDAG, the evaluation of in the graph is local.
Given that the ordering service of Fabric outputs the same stream of blocks with transactions to every connected peer, it is easy to verify that the graph resulting from any execution of Fabric is a TDAG.
Blockchains and distributed ledger platforms are of great interest for the financial industry today, due to their role as trustless intermediaries gained from their resilience to attacks and subversion. For gaining confidence in a new technology, it is paramount to study its security with formal models.
This work has proposed transaction graphs or TDAGs as a discrete model for the semantics of the interactions in a blockchain system. In contrast to existing event-based models for generic distributed and concurrent systems, it explicitly takes into account the validation of transactions, which is an important aspect of blockchains. For instance, the TDAG model allows to model assets and their transfer among different entities. It also facilitates comparisons among different technologies available today.
We envision that richer semantics can be expressed by refining the TDAG model. For instance, one may argue about further invariants of the blockchain system as properties of the TDAG, similar to modeling Bitcoin’s fixed coin supply. One might also use a TDAG to formally model the provenance for generic assets that are handled by smart contracts, building on the paths through which the asset was transferred in the TDAG. One could also leverage a TDAG to formally describe the guarantees provided by a blockchain equipped with a pruning mechanism, reasoning about the remaining states in the TDAG after pruning. Finally, we additionally foresee that the TDAG can be extended to model invariants required for payment channels, for instance payment channel transactions should be free of conflicts with those included in the TDAG.
We start with the description of an execution of the Bitcoin system as represented by the corresponding blockchain. A Bitcoin blockchain is composed of blocks, where each block is created as a result of successfully executing the Bitcoin mining process . The miner of such block (i.e., user showing a valid proof of successful mining) chooses a set of regular transactions to be added in the block along with a single coinbase transaction. There exists a special block, denoted as genesis block, that represents the initialization of the blockchain.
A coinbase transaction transfers unmined bitcoins to a (set of) Bitcoin address, chosen by the corresponding miner, as a reward for creating the block. A coinbase transaction is valid if it transfers only the number of bitcoins set as reward according to the height of the mined blocked. A regular transaction transfers bitcoins from a set of Bitcoin addresses (i.e., input addresses) to another set of Bitcoin addresses (i.e., output addresses). A regular transaction is valid if: (i) it includes a confirmation for each input address; (ii) it does not create new bitcoins. Finally, a regular transaction has an associated fee (i.e., between the bitcoins held at input and output addresses).
A Bitcoin execution is a set of blocks , where denotes the genesis block and contains a single initialization transaction. Each other block is a tuple composed of a proof of successful mining , and a set of transactions containing a coinbase transaction and regular transactions . A contains a Bitcoin address . A is a tuple (, , ), where and are two sets of Bitcoin addresses and is a set of confirmations .
We now describe our modeling of a given execution of Bitcoin as a TDAG. A state represents a Bitcoin address that holds a group of bitcoins, a transaction fee or the yet unmined bitcoins. We note that fees and unmined Bitcoins are not associated to an address in the real Bitcoin, but we model them as held by an address to have a coherent transaction graph model. The genesis state represents a Bitcoin address holding the M bitcoins ever existing in the Bitcoin system. Each witness represents either a proof of successful mining for a block or the (set of) confirmations required in a regular transaction. Finally, we consider two types of edges: producing and consuming edges. A producing edge links unconsumed addresses for unmined bitcoins and transaction fees to the mining proof for the corresponding coinbase transaction; or an input address to the corresponding confirmation in a regular transaction. A consuming edge links a mining proof to the Bitcoin addresses getting the reward, or a set of confirmations to the corresponding output addresses receiving (part of) the transferred bitcoins.
We model an execution of Bitcoin system as a graph defined as follows:
State: Each state is defined as a tuple (, , txhash, index), where denotes a Bitcoin address, denotes the amount of Bitcoins held at , txhash denotes the hash of the transaction generating , and index the index of the output described by . The genesis state is defined as the fixed tuple .
Witness: Each witness is defined by a tuple (, ), where denotes the type of the transaction and determines the content of . In particular, (, ) is the witness for the initialization transaction; (, ) denotes a witness for a coinbase transaction and (, ) denotes the witness for a regular transaction.
Edge: Each edge is defined either as consuming edge or producing edge.
The transaction graph presented here determines the modeling of the possible transactions in a Bitcoin execution. The next definition maps transaction in a Bitcoin execution to transaction types supported in a TDAG.
A coinbase transaction is modeled as a SIMO transaction. A regular transaction is modeled as a SISO, SIMO, MISO or MIMO transaction depending on and . For instance, SISO models a regular transaction where . The rest are derived accordingly. Finally, we define the initialization transaction included in the genesis block as an INIT transaction of the form , where , and , where denotes a Bitcoin address that contains unmined bitcoins. and are as defined in Definition (A.2).
Finally, we complete our description of the Bitcoin context with the corresponding transaction predicate . For that, we use (, ) as a function that on input a Bitcoin address and a confirmation , returns if encodes a valid confirmation to spend the bitcoins held at . Otherwise, it returns . Additionally, we use () as a function that on input a mining proof , returns if is a valid proof-of-work for the corresponding block, or otherwise. We thereby abstract away the implementation details for validation of Bitcoin scripts and mining proofs.
Consider a transaction . Then, returns if the following conditions hold and otherwise.
If is a regular transaction (.= ), the witness holds a valid confirmation for each input state i.e., .
If is a coinbase transaction (.= ), the witness contains a valid mining proof, i.e., .
Each output state represents a positive number of bitcoins, i.e., .
The sum of bitcoins held at the input states must be equal to the sum of bitcoins held at the output states, i.e.,
Observe that it is unnecessary to verify the txhash and index components of the states. Their sole purpose is to separate different states: as long as collision occurs between different transaction hashes on the Bitcoin blockchain, all states in our definition will be distinct. This property is obviously not achieved by and alone.
We start this section by analyzing the definition of transaction graph presented in the previous section. We start by showing that it is a TDAG. Here, we consider legal, a Bitcoin execution that contains only transactions that are “deep enough” in the blockchain (e.g., six blocks deep). We thereby enable the study of any Bitcoin execution in terms of the properties of a TDAG such as conflict-freedom or validity.
Assume is a collision-resistant hash function  and assume that is a legal Bitcoin execution. Then, the graph resulting from modeling is a TDAG.
PROOF: Here, we show that fulfills the conditions to be a TDAG.
The genesis state must not have producing or observing edges and it must have a single producing edge. Our designed INIT transaction ensures this.
Every state, other than the genesis, must have a single producing edge. Assume by contradiction that it is not fulfilled. Then, there is a state with at least two producing edges and that implies that there exists two different sets and such that . However, and contradict the assumption that is collision resistant.
Every state other than the genesis can have multiple successors, but at most one among them is connected with a consuming edge. Each Bitcoin address is consumed only once in a legal Bitcoin execution. Therefore, this condition is fulfilled.
The graph must be weakly connected. Each new transaction consumes a previously unconsumed state in the graph , i.e., either a unspent Bitcoin address or mines yet unmined bitcoins and consumes unclaimed fees. Therefore, the overall graph is weakly connected.
The graph must not have cycles. Assume by contradiction that there is a cycle in . This, however, implies that there are two different transactions and that produce the same state. However, as we have seen before, this only occurs in case two different transactions have the same hash, which contradicts the fact that the underlying hash function is collision resistant.
Remember from Definition (2.7) that a TDAG is valid if each transaction individually is valid according to a transaction predicate . Next, we show that validating Bitcoin transactions individually in our model, suffices to safely consider that unconsumed states represent all bitcoins in the system.
Consider a TDAG modeling a Bitcoin execution. Then, the unspent bitcoins in are the sum of bitcoins held at unconsumed states of .
Consider a valid TDAG that models a Bitcoin execution. Then, the amount of unspent bitcoins in is equal to all bitcoins ever existing in the system. More formally, let be the set of unconsumed states in , then .
PROOF: Assume by contradiction that Theorem (A.7) does not hold. Then, there must exist a transaction in such that . This, however, clearly implies that returns , which contradicts the assumption that is a valid TDAG.
Here, we describe our modeling for an illustrative example of Bitcoin execution. We assume for simplicity that the block reward is fixed to a value of bitcoins as it was the first reward set in the Bitcoin system. Additionally, we assume that the transaction fee is fixed to bitcoin. We stress, however, that the TDAG model is expressive enough to relax these assumptions.
We focus in the illustrative example depicted in Figure (3). In particular, Figure (3a) shows a possible Bitcoin execution , where and . We note that this example is similar to that in Figure (2) and due to lack of space we do not describe it here again. However, we remark that it is expanded here with an extra MIMO transaction (i.e., ) to show how we model transactions that involve multiple payers and multiple payees. Instead, we focus on the description of , a transaction graph modeling the aforementioned Bitcoin execution as depicted in Figure (3b).
, where and . This represents the initialization transaction where , and .
, where and . A SIMO transaction that issues bitcoins to Alice after she has successfully mined a block. In a bit more detail, , and , where denotes a Bitcoin address owned by Alice. We follow this notion in the rest of the example for the addresses owned by the example users. is defined as in .
, where and . A SIMO transaction that pays bitcoins to Bob and the remaining bitcoins are sent back to Alice. In a bit more detail, , , and . The rest of states are defined as in previous transactions.
where and . A MIMO transaction that pays bitcoins to Charles, jointly by Alice and Bob. In a bit more detail, , , , , and . The rest of states are defined as previous transactions.
where and A MIMO transaction that issues bitcoins to Diana after she has successfully mined a block. Additionally, Diana claims the transaction fees for transactions and . In a bit more detail, , , and .
In this section, we study the Hyperledger Fabric  blockchain-based system. We start by the description of an execution of Fabric. An execution of Fabric is represented as a set of blockchains, one per channel. However, as each single blockchain evolves independently from each other, we restrict our description here to a single blockchain. This description, however, can be easily extended to model a Fabric execution with multiple channels.
A blockchain is composed of blocks. We denote the first block as genesis block and each subsequent block is created by the ordering service. Such ordering service chooses the sorted set of transactions to be included in each block. Fabric supports two types of transactions: Init and Invoke. An init transaction is included in the genesis block and it is used to initialize every key used in the blockchain to a default value and includes an endorsement policy, that determines how any subsequent transaction should be authorized. We consider that an initialization transaction is always valid.
An invoke transaction is used to carry out updates in a set of key-value pairs for the local current key-value store (KVS) through two operations: (i) , that given a key provides the most current value associated to it; and (ii) , that updates value associated to a given key to the newly provided value . An invoke transaction is valid if it contains enough endorsements from the set of endorsers specified in the endorsement policy.
A Fabric execution is a set of blocks , where denotes the genesis block that contains a single init transaction, denoted by . Each block is a set of invoke transactions . An contains a single endorsement policy . Each transaction is defined as a tuple (, ), where denotes the set of endorsements , and denotes the set of operations to update key-value pairs.
We continue by describing the modeling of a Fabric execution. Informally, each state in our model represents a key-value pair. Each witness represents the set of endorsements required for a transaction to be valid. Finally, here we consider three type of edges: observing, consuming and producing edges. An observing edge links a key to the endorsement specified in a transaction that reads but does not modify it (e.g., an invoke transaction that contains only a operation). If the key is modified (e.g., an invoke transaction that contains operation), a consuming edge links then the key with the endorsements for such transaction. Finally, a producing edge links the endorsements to a key a transaction has modified it (e.g., by means of a operation).
We model a Fabric execution as a graph defined as follows:
States:: Each state is defined as a tuple (, ), where denotes the key part of a key-value pair and denotes the current version number of the key-value pair. The genesis state is defined as and denotes a special key-value pair that holds the configuration parameters for a channel as indicated in channel initialization.
Witness:: Each witness is defined as a tuple (, ), where set to indicates an init transaction and set to indicates an invoke transaction. denotes an endorsement policy if or a set of endorsements if . For simplicity, we assume that an endorsement also contains the corresponding set of operations and .
Edges:: Each edge is defined as either observing, consuming or producing edge.
An invoke transaction is modeled as a SISO, MISO or MIMO transaction depending on the set of operations and that it uses. For instance, a SISO transaction models a transaction that uses a single operation for a key . A MISO transaction models a transaction that updates a single key and reads at least one additional key (e.g., ). Finally, a MIMO transaction models a transaction that updates several keys and possibly reads other additional keys (e.g., ). An init transaction is of type INIT and is defined as , where , , , and each . The genesis state is defined in Theorem (B.2).
We make two observations in the definition of the transaction types. First, MISO and MIMO types are restricted in the sense that they must have the same number of consuming and producing edges. This is due to the fact that we model each operation as a consuming edge from the state of the key being updated and a producing edge to the state corresponding to the updated key-value pair. We note, however, that this is a characteristic inherent to all systems based on key-value stores and not a particular limitation of Fabric.
Second, as any system based in a key-value store, each key must exist only once. For that, we model our initialization transaction such that all the keys used in the given Fabric’s execution are created and initialized to a fixed initial value ().
Now, we finalize the description of our model by defining the transaction predicate for Fabric. Here, we denote by a boolean function that takes a set of endorsements and returns if represents a valid set of endorsements according to the endorsement policy , and otherwise. Here, we assume that is obtained from the initialization transaction included in the corresponding Fabric execution.
Consider a transaction . Then, returns if the following conditions hold and otherwise.
If is an invoke transaction, the witness must contain a set of valid endorsements, i.e., .
If is an invoke transaction, each output state must represent an update of a key included in a input state. Moreover, the version number for the output state must be bigger than the version number for the input state representing the same key, i.e.,
In this section we analyze our model for the execution of the Fabric system. We start by showing that any legal Fabric execution modeled as aforementioned results in a TDAG. Here, we consider as legal a Fabric execution that contains only blocks included in the blockchain that have been produced by the ordering service.
Assume that is a legal Fabric execution. Then, the instance modeling is a TDAG.
PROOF: Here, we show that fulfills all the conditions required in Theorem 2.2.
The genesis state must not have any producing or observing edges and it must have a single producing edge. This condition is ensured by our definition of initialization transaction.
Every state, other than the genesis, must have a single producing edge. Assume by contradiction that 1 This implies that there are at least two transactions and in that update the same key-value pair simultaneously. This, however, contradicts the assumption that a valid execution contains only transactions sorted by an ordering service.
Every state other than the genesis can have multiple successors, but at most one among them is connected with a consuming edge. The proof for this condition holds along the same lines as for the previous condition.
The graph must be weakly connected. Each new transaction reads our updates a key represented by an unconsumed state in the graph. Therefore, the overall graph is weakly connected.
The graph must not have cycles. Assume by contradiction that there is a cycle in . This necessarily implies that there are two transactions that produce the same state. However, as we argued before, this contradicts the fact that the ordering service establishes a total order among the transactions.
As we did with the Bitcoin model, here we show that validating Fabric transactions individually suffices to reason about properties of the complete Fabric execution. In particular, we show that if is a valid TDAG, then the highest version number (i.e., most recent) for any given key is represented in a unconsumed state of .
Consider that is a TDAG modeling a Fabric execution. Then, we define the states representing the most recent key-value pairs as the set of states with the highest version number for each key, i.e., .
Assume that is a valid TDAG and models a legal Fabric execution . Then, the unconsumed states of represent the most recent key-value pairs.
PROOF: Assume by contradiction that Theorem (B.7) does not hold. Then, there must exist at least a transaction where the field in the output state for a key is smaller than the field in the input state for the same key, i.e., . However, would return False, which contradicts the fact that is a valid TDAG.
Here we describe how we model an illustrative example of Fabric execution. We assume for simplicity that the endorsement policy requires a single endorsement for each transaction.
Throughout our description, we focus in the illustrative example depicted in Figure (4). In particular, Figure (4b) shows the Fabric execution , where , and . In a bit more detail, represents the initialization transaction that initializes the key-value pairs used later in the execution and it is included in the genesis block. Moreover, represents an invoke transaction that calls the function and represents another invoke transaction that calls in this case.
Now, we describe how we model such Fabric execution as an instance of as shown in Figure (4c):
, where and