NEAR 白皮书

Nightshade: Near Protocol Sharding Design

Yazan Alex Skidanov and Illia Polosukhin · 2019

Tek mod near.org

Sharding Basics

Sharding Basics

Let’s start with the simplest approach to sharding. In this approach instead of running one blockchain, we will run multiple, and call each such blockchain a “shard”. Each shard will have its own set of validators. Here and below we use a generic term “validator” to refer to participants that verify transactions and produce blocks, either by mining, such as in Proof of Work, or via a voting-based 1This section was previously published at https://near.ai/shard1. If you read it before, skip to the next section.

mechanism. For now let’s assume that the shards never communicate with each other. This design, though simple, is sufficient to outline some initial major challenges in sharding. 1.1 Validator partitioning and Beacon chains Say that the system comprises 10 shards. The first challenge is that with each shard having its own validators, each shard is now 10 times less secure than the entire chain. So if a non-sharded chain with X validators decides to hard-fork into a sharded chain, and splits X validators across 10 shards, each shard now only has X/10 validators, and corrupting one shard only requires corrupting 5.1% (51% / 10) of the total number of validators (see figure 1), Figure 1: Splitting the validators across shards which brings us to the second point: who chooses validators for each shard? Controlling 5.1% of validators is only damaging if all those 5.1% of validators are in the same shard. If validators can’t choose which shard they get to validate in, a participant controlling 5.1% of the validators is highly unlikely to get all their validators in the same shard, heavily reducing their ability to compromise the system. Almost all sharding designs today rely on some source of randomness to assign validators to shards. Randomness on blockchain on itself is a very challenging topic and is out of scope for this document. For now let’s assume there’s some source of randomness we can use. We will cover validators assignment in more detail in section 2.1. Both randomness and validator assignment require computation that is not specific to any particular shard. For that computation, practically all existing designs have a separate blockchain that is tasked with performing operations necessary for the maintenance of the entire network. Besides generating random

numbers and assigning validators to the shards, these operations often also include receiving updates from shards and taking snapshots of them, processing stakes and slashing in Proof-of-Stake systems, and rebalancing shards when that feature is supported. Such chain is called a Beacon chain in Ethereum, a Relay chain in PolkaDot, and the Cosmos Hub in Cosmos. Throughout this document we will refer to such chain as a Beacon chain. The existence of the Beacon chain brings us to the next interesting topic, the quadratic sharding. 1.2 Quadratic sharding Sharding is often advertised as a solution that scales infinitely with the number of nodes participating in the network operation. While it is in theory possible to design such a sharding solution, any solution that has the concept of a Beacon chain doesn’t have infinite scalability. To understand why, note that the Beacon chain has to do some bookkeeping computation, such as assigning validators to shards, or snapshotting shard chain blocks, that is proportional to the number of shards in the system. Since the Beacon chain is itself a single blockchain, with computation bounded by the computational capabilities of nodes operating it, the number of shards is naturally limited. However, the structure of a sharded network does bestow a multiplicative effect on any improvements to its nodes. Consider the case in which an arbitrary improvement is made to the efficiency of nodes in the network which will allow them faster transaction processing times. If the nodes operating the network, including the nodes in the Beacon chain, become four times faster, then each shard will be able to process four times more transactions, and the Beacon chain will be able to maintain 4 times more shards. The throughput across the system will increase by the factor of 4 × 4 = 16 — thus the name quadratic sharding. It is hard to provide an accurate measurement for how many shards are viable today, but it is unlikely that in any foreseeable future the throughput needs of blockchain users will outgrow the limitations of quadratic sharding. The sheer number of nodes necessary to operate such a volume of shards securely is likely orders of magnitude higher than the number of nodes operating all the blockchains combined today. 1.3 State sharding Up until now we haven’t defined very well what exactly is and is not separated when a network is divided into shards. Specifically, nodes in the blockchain perform three important tasks: not only do they 1) process transactions, they also 2) relay validated transactions and completed blocks to other nodes and 3) store the state and the history of the entire network ledger. Each of these three tasks imposes a growing requirement on the nodes operating the network:

  1. The necessity to process transactions requires more compute power with the increased number of transactions being processed;
  2. The necessity to relay transactions and blocks requires more network bandwidth with the increased number of transactions being relayed;
  3. The necessity to store data requires more storage as the state grows. Importantly, unlike the processing power and network, the storage requirement grows even if the transaction rate (number of transactions processed per second) remains constant. From the above list it might appear that the storage requirement would be the most pressing, since it is the only one that is being increased over time even if the number of transactions per second doesn’t change, but in practice the most pressing requirement today is the compute power. The entire state of Ethereum as of this writing is 100GB, easily manageable by most of the nodes. But the number of transactions Ethereum can process is around 20, orders of magnitude less than what is needed for many practical use cases. Zilliqa is the most well-known project that shards processing but not storage. Sharding of processing is an easier problem because each node has the entire state, meaning that contracts can freely invoke other contracts and read any data from the blockchain. Some careful engineering is needed to make sure updates from multiple shards updating the same parts of the state do not conflict. In those regards Zilliqa is taking a relatively simplistic approach2. While sharding of storage without sharding of processing was proposed, it is extremely uncommon. Thus in practice sharding of storage, or State Sharding, almost always implies sharding of processing and sharding of network. Practically, under State Sharding the nodes in each shard are building their own blockchain that contains transactions that affect only the local part of the global state that is assigned to that shard. Therefore, the validators in the shard only need to store their local part of the global state and only execute, and as such only relay, transactions that affect their part of the state. This partition linearly reduces the requirement on all compute power, storage, and network bandwidth, but introduces new problems, such as data availability and cross-shard transactions, both of which we will cover below. 1.4 Cross-shard transactions The sharding model we described so far is not a very useful, because if individual shards cannot communicate with each other, they are no better than multiple independent blockchains. Even today, when sharding is not available, there’s a huge demand for interoperability between various blockchains. Let’s for now only consider simple payment transactions, where each participant has account on exactly one shard. If one wishes to transfer money from 2Our analysis of their approach can be found here: https://medium.com/nearprotocol/ 8f9efae0ce3b

one account to another within the same shard, the transaction can be processed entirely by the validators in that shard. If, however, Alice that resides on shard

1 wants to send money to Bob who resides on shard #2, neither validators

on shard #1(they won’t be able to credit Bob’s account) nor the validators on shard #2 (they won’t be able to debit Alice’s account) can process the entire transaction. There are two families of approaches to cross-shard transactions: • Synchronous: whenever a cross-shard transaction needs to be executed, the blocks in multiple shards that contain state transition related to the transaction get all produced at the same time, and the validators of multiple shards collaborate on executing such transactions.3 • Asynchronous: a cross-shard transaction that affects multiple shards is executed in those shards asynchronously, the “Credit” shard executing its half once it has sufficient evidence that the “Debit” shard has executed its portion. This approach tends to be more prevalent due to its simplicity and ease of coordination. This system is today proposed in Cosmos, Ethereum Serenity, Near, Kadena, and others. A problem with this approach lies in that if blocks are produced independently, there’s a nonzero chance that one of the multiple blocks will be orphaned, thus making the transaction only partially applied. Consider figure 2 that depicts two shards both of which encountered a fork, and a cross-shard transaction that was recorded in blocks A and X’ correspondingly. If the chains A-B and V’-X’-Y’-Z’ end up being canonical in the corresponding shards, the transaction is fully finalized. If A’-B’-C’-D’ and V-X become canonical, then the transaction is fully abandoned, which is acceptable. But if, for example, A-B and V-X become canonical, then one part of the transaction is finalized and one is abandoned, creating an atomicity failure. We will cover how this problem is addressed in proposed protocols in the second part, when covering changes to the fork-choice rules and consensus algorithms proposed for sharded protocols. Note that communication between chains is useful outside of sharded blockchains too. Interoperability between chains is a complex problem that many projects are trying to solve. In sharded blockchains the problem is somewhat easier since the block structure and consensus are the same across shards, and there’s a beacon chain that can be used for coordination. In a sharded blockchain, however, all the shard chains are the same, while in the global blockchains ecosystem there are lots of different blockchains, with different target use cases, decentralization and privacy guarantees. Building a system in which a set of chains have different properties but use sufficiently similar consensus and block structure and have a common beacon chain could enable an ecosystem of heterogeneous blockchains that have a 3The most detailed proposal known to the authors of this document is Merge Blocks, described here: https://ethresear.ch/t/ merge-blocks-and-synchronous-cross-shard-state-execution/1240

Figure 2: Asynchronous cross-shard transactions working interoperability subsystem. Such system is unlikely to feature validator rotation, so some extra measures need to be taken to ensure security. Both Cosmos and PolkaDot are effectively such systems4 1.5 Malicious behavior In this section we will review what adversarial behavior can malicious validators exercise if they manage to corrupt a shard. We will review classic approaches to avoiding corrupting shards in section 2.1. 1.5.1 Malicious forks A set of malicious validators might attempt to create a fork. Note that it doesn’t matter if the underlying consensus is BFT or not, corrupting sufficient number of validators will always make it possible to create a fork. It is significantly more likely for more that 50% of a single shard to be corrupted, than for more than 50% of the entire network to be corrupted (we will dive deeper into these probabilities in section 2.1). As discussed in section 1.4, cross-shard transactions involve certain state changes in multiple shards, and the corresponding blocks in such shards that apply such state changes must either be all finalized (i.e. appear in the selected chains on their corresponding shards), or all be orphaned (i.e. not appear in the selected chains on their corresponding shards). Since generally the probability of shards being corrupted 4Refer to this writeup by Zaki Manian from Cosmos: https://forum.cosmos.network/ t/polkadot-vs-cosmos/1397/2 and this tweet-storm by the first author of this document: https://twitter.com/AlexSkidanov/status/1129511266660126720 for a detailed comparison of the two

is not negligible, we can’t assume that the forks won’t happen even if a byzantine consensus was reached among the shard validators, or many blocks were produced on top of the block with the state change. This problem has multiple solutions, the most common one being occasional cross-linking of the latest shard chain block to the beacon chain. The fork choice rule in the shard chains is then changed to always prefer the chain that is cross-linked, and only apply shard-specific fork-choice rule for blocks that were published since the last cross-link. 1.5.2 Approving invalid blocks A set of validators might attempt to create a block that applies the state transition function incorrectly. For example, starting with a state in which Alice has 10 tokens and Bob has 0 tokens, the block might contain a transaction that sends 10 tokens from Alice to Bob, but ends up with a state in which Alice has 0 tokens and Bob has 1000 tokens, as shown on figure 3. Figure 3: An example of an invalid block In a classic non-sharded blockchain such an attack is not possible, since all the participant in the network validate all the blocks, and the block with such an invalid state transition will be rejected by both other block producers, and the participants of the network that do not create blocks. Even if the malicious validators continue creating blocks on top of such an invalid block faster than honest validators build the correct chain, thus having the chain with the invalid block being longer, it doesn’t matter, since every participant that is using the blockchain for any purpose validates all the blocks, and discards all the blocks built on top of the invalid block. On the figure 4 there are five validators, three of whom are malicious. They created an invalid block A’, and then continued building new blocks on top of it. Two honest validators discarded A’ as invalid and were building on top

Figure 4: Attempt to create an invalid block in a non-sharded blockchain of the last valid block known to them, creating a fork. Since there are fewer validators in the honest fork, their chain is shorter. However, in classic nonsharded blockchain every participant that uses blockchain for any purpose is responsible for validating all the blocks they receive and recomputing the state. Thus any person who has any interest in the blockchain would observe that A’ is invalid, and thus also immediately discard B’, C’ and D’, as such taking the chain A-B as the current longest valid chain. In a sharded blockchain, however, no participant can validate all the transactions on all the shards, so they need to have some way to confirm that at no point in history of any shard of the blockchain no invalid block was included. Note that unlike with forks, cross-linking to the Beacon chain is not a sufficient solution, since the Beacon chain doesn’t have the capacity to validate the blocks. It can only validate that a sufficient number of validators in that shard signed the block (and as such attested to its correctness). We will discuss solutions to this problem in section 2.2 below.

分片基础知识

让我们从最简单的分片方法开始。在这种方法中,而不是 运行一个 blockchain ,我们将运行多个,并将每个这样的 blockchain 称为 “碎片”。每个分片都有自己的一组 validator。在这里和下面我们使用 通用术语“validator”指的是验证交易和 通过挖矿(例如工作量证明)或通过基于投票的方式生成区块 1本节之前发布于 https://near.ai/shard1. 如果您之前阅读过, 跳到下一节。

机制。现在我们假设分片永远不会相互通信 其他。 这种设计虽然简单,但足以概述分片中的一些最初的主要挑战。 1.1 验证器分区和信标链 假设系统包含 10 个分片。第一个挑战是每个 分片有自己的 validator,现在每个分片的安全性比原来的分片低 10 倍 整个链条。因此,如果具有 X validators 的非分片链决定硬分叉 成一个分片链,并将 X validator 分成 10 个分片,现在每个分片 只有 X/10 validators,损坏一个分片只需要损坏 validator 总数的 5.1% (51% / 10)(见图 1), 图1: 将 validator 拆分到各个分片上 这就引出了第二点:谁为每个分片选择 validators? 仅当所有 5.1% 的 validator 都被控制时,控制 5.1% 的 validator 才具有破坏性 位于同一个分片中。如果 validators 无法选择他们要验证的分片 其中,控制 validator 的 5.1% 的参与者极不可能获得全部 他们的 validator 在同一个分片中,大大降低了他们妥协的能力 系统。 如今几乎所有的分片设计都依赖于某种随机性来源 将 validators 分配给分片。 blockchain 的随机性本身就是一个非常具有挑战性的主题,超出了本文档的范围。现在我们假设有 我们可以使用一些随机性来源。我们将在 validator 中完成作业 2.1 节中有更多详细信息。 随机性和 validator 赋值都不需要计算 特定于任何特定分片。对于该计算,实际上所有现有的 设计有一个单独的 blockchain ,其任务是执行操作 整个网络的维护所必需的。除了生成随机数字并将 validator 分配给分片,这些操作通常也 包括从分片接收更新并拍摄它们的快照,处理 权益证明系统中的权益和削减,以及在这种情况下重新平衡分片 支持功能。这样的链在Ethereum中被称为Beacon链,一个Relay PolkaDot 中的链,以及 Cosmos 中的 Cosmos Hub。 在本文档中,我们将此类链称为信标链。 信标链的存在给我们带来了下一个有趣的话题, 二次分片。 1.2 二次分片 分片经常被宣传为一种可以无限扩展的解决方案 参与网络运行的节点数。虽然理论上可以 设计这样的分片解决方案,任何具有Beacon概念的解决方案 链不具有无限的可扩展性。要了解原因,请注意 Beacon 链必须进行一些簿记计算,例如将 validators 分配给 分片,或者快照分片链区块,与数量成正比 系统中的碎片数。由于信标链本身就是一个 blockchain, 计算受操作节点的计算能力限制, 分片的数量自然是有限的。 然而,分片网络的结构确实赋予了乘法 对其节点的任何改进都有影响。考虑任意的情况 网络中节点的效率得到了提高,这将允许 他们的交易处理时间更快。 如果运营网络的节点,包括Beacon链中的节点, 速度提高四倍,那么每个分片的处理能力将增加四倍 交易量,Beacon 链将能够维护 4 倍以上的分片。 整个系统的吞吐量将增加 4 × 4 = 16 倍 — 因此称为二次分片。 很难准确测量分片的数量 今天可行,但在可预见的未来,吞吐量不太可能 blockchain 用户的需求将超出二次分片的限制。 安全操作如此大量的分片所需的节点数量之多 可能比运行所有节点的节点数量高几个数量级 blockchain 今天合并。 1.3 状态分片 到目前为止,我们还没有很好地定义什么是分离的,什么是未分离的 当网络被划分为分片时。具体来说,blockchain 中的节点 执行三项重要任务:它们不仅 1) 处理交易,而且 另外 2) 将经过验证的交易和已完成的区块中继到其他节点以及 3) 存储整个网络分类帐的状态和历史记录。这三个中的每一个 任务对运行网络的节点提出了越来越高的要求:1. 处理交易的必要性需要更多的计算能力 正在处理的交易数量增加; 2. 中继交易和区块的必要性随着中继交易数量的增加需要更多的网络带宽; 3.随着状态的增长,存储数据的必要性需要更多的存储。重要的是,与处理能力和网络不同,即使交易率(处理的交易数量 每秒)保持不变。 从上面的列表来看,存储需求可能是 最紧迫的,因为它是唯一一个随着时间的推移而增加的 即使每秒的交易数量没有改变,但实际上 如今最紧迫的需求是计算能力。整个状态 截至撰写本文时,Ethereum 为 100GB,大多数节点都可以轻松管理。 但 Ethereum 可以处理的交易数量约为 20 笔,订单数 幅度小于许多实际用例所需的幅度。 Zilliqa 是最著名的项目,它对处理进行分片,但不对存储进行分片。 处理的分片是一个更容易的问题,因为每个节点都有整个 状态,这意味着合约可以自由调用其他合约并读取任何数据 来自blockchain。需要一些仔细的工程来确保更新 从多个分片更新状态的相同部分不会发生冲突。在 在这些方面,Zilliqa 采取了相对简单的方法2。 虽然提出了存储分片而不是处理分片,但 极其罕见。因此,在实践中存储分片或状态分片, 几乎总是意味着处理的分片和网络的分片。 实际上,在状态分片下,每个分片中的节点都在构建自己的 自己的 blockchain 包含仅影响本地部分的事务 分配给该分片的全局状态。 因此, validators 中 分片只需要存储全局状态的本地部分并且只执行, 因此,仅中继影响其状态部分的交易。这个 分区线性降低了对所有计算能力、存储和资源的要求 网络带宽,但引入了新的问题,例如数据可用性和 跨分片交易,我们将在下面介绍。 1.4 跨分片交易 到目前为止我们描述的分片模型并不是很有用,因为如果单个 分片之间不能相互通信,它们并不比多个分片更好 独立的 blockchains。即使在今天,当分片不可用时,仍然存在 各种 blockchain 之间的互操作性需求巨大。 现在我们只考虑简单的支付交易,其中每个参与者在一个分片上都有账户。如果有人希望从 2我们对其方法的分析可以在这里找到:https://medium.com/nearprotocol/ 8f9efae0ce3b同一分片内的一个帐户到另一个帐户,可以处理交易 完全由该碎片中的 validator 执行。然而,如果爱丽丝驻留在分片上

1 想要向位于分片 #2 上的 Bob 汇款,而 validators 都没有

在分片 #1 上(他们将无法记入 Bob 的帐户)或 validators 分片 #2(他们将无法从 Alice 的账户中扣除费用)可以处理整个 交易。 跨分片交易有两种方法: • 同步:每当需要执行跨分片交易时, 多个分片中包含与状态转换相关的块 交易同时产生,并且多个分片的 validator 协作执行此类交易。3 • 异步:影响多个分片的跨分片交易 在这些分片中异步执行,“Credit”分片执行 一旦有足够的证据表明“借记”分片已执行其部分,则其一半将被执行。这种方法由于其 简单且易于协调。该系统今天在 Cosmos、Ethereum Serenity、Near、Kadena 等中提出。这个有问题 方法在于,如果块是独立生成的,那么多个块之一将有非零的机会成为孤立块,从而使得 该交易仅部分适用。考虑图 2,它描述了两个 两个分片都遇到了分叉,并且发生了跨分片交易 相应地记录在块 A 和 X' 中。如果链条A-B V'-X'-Y'-Z' 最终在相应的分片中成为规范, 交易已完全完成。如果 A'-B'-C'-D' 和 V-X 成为规范, 那么交易就完全放弃了,这是可以接受的。但如果,对于 例如,A-B 和 V-X 成为规范,然后事务的一部分完成,一部分被放弃,从而造成原子性故障。我们 将在第二部分的拟议协议中介绍如何解决这个问题,其中涉及对分叉选择规则和共识的更改 为分片协议提出的算法。 请注意,链之间的通信在分片 blockchains 之外很有用 也是。链之间的互操作性是许多项目都面临的一个复杂问题 正在努力解决。在分片 blockchains 中,问题更容易一些,因为 各个分片的区块结构和共识是相同的,并且有一个信标链可以用于协调。然而,在分片 blockchain 中, 所有分片链都是相同的,而在全球 blockchains 生态系统中 有很多不同的 blockchain,具有不同的目标用例、去中心化 和隐私保证。 构建一个系统,其中一组链具有不同的属性,但 使用足够相似的共识和区块结构并拥有共同的信标链可以实现异构 blockchain 的生态系统,该生态系统具有 3该 大多数 详细的 提案 已知的 到 的 作者 的 这个 文件 是 合并 块, 描述的 这里: https://ethresear.ch/t/ 合并块和同步跨分片状态执行/1240图2: 异步跨分片交易 工作互操作子系统。此类系统不太可能具有 validator 轮换功能,因此需要采取一些额外措施来确保安全。两者都 Cosmos 和 PolkaDot 实际上就是这样的系统4 1.5 恶意行为 在本节中,我们将回顾哪些对抗行为可以恶意 validators 如果他们设法破坏了碎片,则进行锻炼。我们将回顾经典方法 避免第 2.1 节中的损坏分片。 1.5.1 恶意分叉 一组恶意 validator 可能会尝试创建分叉。请注意,它没有 无论底层共识是否为BFT,都会破坏足够的数量 validators 总是可以创建一个分叉。 单个分片超过 50% 被损坏的可能性比整个网络超过 50% 被损坏的可能性要大得多(我们将 在第 2.1 节中更深入地研究这些概率)。正如 1.4 节中所讨论的, 跨分片交易涉及多个分片中的某些状态变化,并且 应用此类状态更改的分片中的相应块必须 要么全部最终确定(即出现在所选链的相应位置上) 分片),或者全部成为孤立的(即不出现在其相应分片上的选定链中)。因为通常分片被损坏的概率 4请参阅 Zaki Manian 的这篇文章,来自 Cosmos:https://forum.cosmos.network/ t/polkadot-vs-cosmos/1397/2 以及本文档第一作者的推文风暴: https://twitter.com/AlexSkidanov/status/1129511266660126720 进行详细比较 两者中的

不可忽略,我们不能假设分叉不会发生,即使分片 validator 之间达成了拜占庭共识,或者许多块被删除了。 随着状态的改变,在块的顶部生成。 此问题有多种解决方案,最常见的一种是偶尔出现的 将最新的分片链区块交叉链接至信标链。 叉子 然后将分片链中的选择规则更改为始终优先选择当前的链 交叉链接,并且仅对分片特定的分叉选择规则应用 自上次交叉链接以来发布。 1.5.2 批准无效区块 一组 validator 可能会尝试创建一个错误应用状态转换函数的块。例如,从 Alice 的状态开始 有 10 个 token,而 Bob 有 0 个 token,该块可能包含以下交易: 从 Alice 向 Bob 发送 10 个 tokens,但最终达到 Alice 有的状态 0 tokens,Bob 有 1000 tokens,如图 3 所示。 图3: 无效块的示例 在经典的非分片 blockchain 中,这样的攻击是不可能的,因为所有 网络中的参与者验证所有块,并且具有这样的块 无效的状态转换将被其他区块生产者拒绝,并且 不创建区块的网络参与者。即使是恶意的 validators 继续在此类无效块之上创建块的速度快于 诚实的 validators 构建正确的链,从而拥有无效的链 块更长,并不重要,因为每个使用该块的参与者 blockchain 出于任何目的验证所有块,并丢弃所有块 建立在无效块之上。 图 4 中有五个 validator,其中三个是恶意的。他们 创建了一个无效的块A’,然后继续在上面构建新块 它的。两个诚实的 validator 将 A' 视为无效而丢弃,并在其上进行构建图4: 尝试在非分片 blockchain 中创建无效块 他们知道的最后一个有效块,创建一个分叉。由于数量较少 validators 在诚实的分叉中,它们的链更短。然而,在经典的非分片 blockchain 中,每个出于任何目的使用 blockchain 的参与者都是 负责验证他们收到的所有块并重新计算状态。 因此,任何对 blockchain 有兴趣的人都会观察到 A’ 无效,因此也立即丢弃 B'、C' 和 D',因此取 链A-B作为当前最长的有效链。 然而,在分片 blockchain 中,没有参与者可以验证所有分片上的所有交易,因此他们需要有某种方法来确认这一点 blockchain 的任何分片的历史记录点均不包含无效块。 请注意,与分叉不同,与信标链的交叉链接并不是一个充分的解决方案,因为信标链没有能力验证 块。它只能验证该分片中是否有足够数量的 validator 签署该区块(并因此证明其正确性)。 我们将在下面的 2.2 节中讨论这个问题的解决方案。

State Validity and Data Availability

State Validity and Data Availability

The core idea in sharded blockchains is that most participants operating or using the network cannot validate blocks in all the shards. As such, whenever any participant needs to interact with a particular shard they generally cannot download and validate the entire history of the shard. The partitioning aspect of sharding, however, raises a significant potential problem: without downloading and validating the entire history of a particular shard the participant cannot necessarily be certain that the state with which 5This section, except for subsection 2.5.3, was previously published at https://near.ai/ shard2. If you read it before, skip to the next section.

they interact is the result of some valid sequence of blocks and that such sequence of blocks is indeed the canonical chain in the shard. A problem that doesn’t exist in a non-sharded blockchain. We will first present a simple solution to this problem that has been proposed by many protocols and then analyze how this solution can break and what attempts have been made to address it. 2.1 Validators rotation The naive solution to state validity is shown on figure 5: let’s say we assume that the entire system has on the order of thousands validators, out of which no more than 20% are malicious or will otherwise fail (such as by failing to be online to produce a block). Then if we sample 200 validators, the probability of more than 1 3 failing for practical purposes can be assumed to be zero. Figure 5: Sampling validators 1 3 is an important threshold. There’s a family of consensus protocols, called BFT consensus protocols, that guarantees that for as long as fewer than 1 3 of participants fail, either by crashing or by acting in some way that violates the protocol, the consensus will be reached. With this assumption of honest validator percentage, if the current set of validators in a shard provides us with some block, the naive solution assumes that the block is valid and that it is built on what the validators believed to be the canonical chain for that shard when they started validating. The validators learned the canonical chain from the previous set of validators, who by the same assumption built on top of the block which was the head of the canonical chain before that. By induction the entire chain is valid, and since no set of validators at any point produced forks, the naive solution is also certain that the current chain is the only chain in the shard. See figure 6 for a visualization.

Figure 6: A blockchain with each block finalized via BFT consensus This simple solution doesn’t work if we assume that the validators can be corrupted adaptively, which is not an unreasonable assumption6. Adaptively corrupting a single shard in a system with 1000 shards is significantly cheaper than corrupting the entire system. Therefore, the security of the protocol decreases linearly with the number of shards. To have certainty in the validity of a block, we must know that at any point in history no shard in the system has a majority of validators colluding; with adaptive adversaries, we no longer have such certainty. As we discussed in section 1.5, colluding validators can exercise two basic malicious behaviors: create forks, and produce invalid blocks. Malicious forks can be addressed by blocks being cross-linked to the Beacon chain that is generally designed to have significantly higher security than the shard chains. Producing invalid blocks, however, is a significantly more challenging problem to tackle. 2.2 State Validity Consider figure 7 on which Shard #1 is corrupted and a malicious actor produces invalid block B. Suppose in this block B 1000 tokens were minted out of thin air on Alice’s account. The malicious actor then produces valid block C (in a sense that the transactions in C are applied correctly) on top of B, obfuscating the invalid block B, and initiates a cross-shard transaction to Shard #2 that transfers those 1000 tokens to Bob’s account. From this moment the improperly created tokens reside on an otherwise completely valid blockchain in Shard #2. Some simple approaches to tackle this problem are: 6Read this article for details on how adaptive corruption can be carried out: https://medium.com/nearprotocol/d859adb464c8. For more details on adaptive corruption, read https://github.com/ethereum/wiki/wiki/Sharding-FAQ# what-are-the-security-models-that-we-are-operating-under

Figure 7: A cross-shard transaction from a chain that has an invalid block 1. For validators of Shard #2 to validate the block from which the transaction is initiated. This won’t work even in the example above, since block C appears to be completely valid. 2. For validators in Shard #2 to validate some large number of blocks preceding the block from which the transaction is initiated. Naturally, for any number of blocks N validated by the receiving shard the malicious validators can create N+1 valid blocks on top of the invalid block they produced. A promising idea to resolve this issue would be to arrange shards into an undirected graph in which each shard is connected to several other shards, and only allow cross-shard transactions between neighboring shards (e.g. this is how Vlad Zamfir’s sharding essentially works7, and similar idea is used in Kadena’s Chainweb [1]). If a cross-shard transaction is needed between shards that are not neighbors, such transaction is routed through multiple shards. In this design a validator in each shard is expected to validate both all the blocks in their shard as well as all the blocks in all the neighboring shards. Consider a figure below with 10 shards, each having four neighbors, and no two shards requiring more than two hops for a cross-shard communication shown on figure 8. Shard #2 is not only validating its own blockchain, but also blockchains of all the neighbors, including Shard #1. So if a malicious actor on Shard #1 is attempting to create an invalid block B, then build block C on top of it and initiate a cross-shard transaction, such cross-shard transaction will not go through since Shard #2 will have validated the entire history of Shard #1 which will cause it to identify invalid block B. 7Read more about the design here: https://medium.com/nearprotocol/37e538177ed9

Figure 8: An invalid cross-shard transaction in chainweb-like system that will get detected While corrupting a single shard is no longer a viable attack, corrupting a few shards remains a problem. On figure 9 an adversary corrupting both Shard

1 and Shard #2 successfully executes a cross-shard transaction to Shard #3

with funds from an invalid block B: Figure 9: An invalid cross-shard transaction in chainweb-like system that will not get detected Shard #3 validates all the blocks in Shard #2, but not in Shard #1, and has no way to detect the malicious block. There are two major directions of properly solving state validity: fishermen

and cryptographic proofs of computation. 2.3 Fisherman The idea behind the first approach is the following: whenever a block header is communicated between chains for any purpose (such as cross-linking to the beacon chain, or a cross-shard transaction), there’s a period of time during which any honest validator can provide a proof that the block is invalid. There are various constructions that enable very succinct proofs that the blocks are invalid, so the communication overhead for the receiving nodes is way smaller than that of receiving a full block. With this approach for as long as there’s at least one honest validator in the shard, the system is secure. Figure 10: Fisherman This is the dominant approach (besides pretending the problem doesn’t exist) among the proposed protocols today. This approach, however, has two major disadvantages: 1. The challenge period needs to be sufficiently long for the honest validator to recognize a block was produced, download it, fully verify it, and prepare the challenge if the block is invalid. Introducing such a period would significantly slow down the cross-shard transactions. 2. The existence of the challenge protocol creates a new vector of attacks when malicious nodes spam with invalid challenges. An obvious solution to this problem is to make challengers deposit some amount of tokens that are returned if the challenge is valid. This is only a partial solution, as it might still be beneficial for the adversary to spam the system (and burn the deposits) with invalid challenges, for example to prevent the valid

challenge from a honest validator from going through. These attacks are called Grieving Attacks. See section 3.7.2 for a way to get around the latter point. 2.4 Succinct Non-interactive Arguments of Knowledge The second solution to multiple-shard corruption is to use some sort of cryptographic constructions that allow one to prove that a certain computation (such as computing a block from a set of transactions) was carried out correctly. Such constructions do exist, e.g. zk-SNARKs, zk-STARKs and a few others, and some are actively used in blockchain protocols today for private payments, most notably ZCash. The primary problem with such primitives is that they are notoriously slow to compute. E.g. Coda Protocol, that uses zk-SNARKs specifically to prove that all the blocks in the blockchain are valid, said in one of the interviews that it can take 30 seconds per transaction to create a proof (this number is probably smaller by now). Interestingly, a proof doesn’t need to be computed by a trusted party, since the proof not only attests to the validity of the computation it is built for, but to the validity of the proof itself. Thus, the computation of such proofs can be split among a set of participants with significantly less redundancy than would be necessary to perform some trustless computation. It also allows for participants who compute zk-SNARKs to run on special hardware without reducing the decentralization of the system. The challenges of zk-SNARKs, besides performance, are: 1. Dependency on less-researched and less-time-tested cryptographic primitives; 2. ”Toxic waste” — zk-SNARKs depend on a trusted setup in which a group of people performs some computation and then discards the intermediate values of that computation. If all the participants of the procedure collude and keep the intermediate values, fake proofs can be created; 3. Extra complexity introduced into the system design; 4. zk-SNARKs only work for a subset of possible computations, so a protocol with a Turing-complete smart contract language wouldn’t be able to use SNARKs to prove the validity of the chain. 2.5 Data Availability The second problem we will touch upon is data availability. Generally nodes operating a particular blockchain are separated into two groups: Full Nodes, those that download every full block and validate every transaction, and Light Nodes, those that only download block headers, and use Merkle proofs for parts of the state and transactions they are interested in, as shown on figure 11.

Figure 11: Merkle Tree Now if a majority of full nodes collude, they can produce a block, valid or invalid, and send its hash to the light nodes, but never disclose the full content of the block. There are various ways they can benefit from it. For example, consider figure 12: Figure 12: Data Availability problem There are three blocks: the previous, A, is produced by honest validators; the current, B, has validators colluding; and the next, C, will be also produced by honest validators (the blockchain is depicted in the bottom right corner). You are a merchant. The validators of the current block (B) received block A from the previous validators, computed a block in which you receive money,

and sent you a header of that block with a Merkle proof of the state in which you have money (or a Merkle proof of a valid transaction that sends the money to you). Confident the transaction is finalized, you provide the service. However, the validators never distribute the full content of the block B to anyone. As such, the honest validators of block C can’t retrieve the block, and are either forced to stall the system or to build on top of A, depriving you as a merchant of money. When we apply the same scenario to sharding, the definitions of full and light node generally apply per shard: validators in each shard download every block in that shard and validate every transaction in that shard, but other nodes in the system, including those that snapshot shard chains state into the beacon chain, only download the headers. Thus the validators in the shard are effectively full nodes for that shard, while other participants in the system, including the beacon chain, operate as light nodes. For the fisherman approach we discussed above to work, honest validators need to be able to download blocks that are cross-linked to the beacon chain. If malicious validators cross-linked a header of an invalid block (or used it to initiate a cross-shard transaction), but never distributed the block, the honest validators have no way to craft a challenge. We will cover three approaches to address this problem that complement each other. 2.5.1 Proofs of Custody The most immediate problem to be solved is whether a block is available once it is published. One proposed idea is to have so-called Notaries that rotate between shards more often than validators whose only job is to download a block and attest to the fact that they were able to download it. They can be rotated more frequently because they don’t need to download the entire state of the shard, unlike the validators who cannot be rotated frequently since they must download the state of the shard each time they rotate, as shown on figure 13. The problem with this naive approach is that it is impossible to prove later whether the Notary was or was not able to download the block, so a Notary can choose to always attest that they were able to download the block without even attempting to retrieve it. One solution to this is for Notaries to provide some evidence or to stake some amount of tokens attesting that the block was downloaded. One such solution is discussed here: https://ethresear.ch/t/ 1-bit-aggregation-friendly-custody-bonds/2236. 2.5.2 Erasure Codes When a particular light node receives a hash of a block, to increase the node’s confidence that the block is available it can attempt to download a few random pieces of the block. This is not a complete solution, since unless the light nodes collectively download the entire block the malicious block producers can choose

Figure 13: Validators need to download state and thus cannot be rotated frequently to withhold the parts of the block that were not downloaded by any light node, thus still making the block unavailable. One solution is to use a construction called Erasure Codes to make it possible to recover the full block even if only some part of the block is available, as shown on figure 14. Figure 14: Merkle tree built on top of erasure coded data Both Polkadot and Ethereum Serenity have designs around this idea that provide a way for light nodes to be reasonably confident the blocks are available. The Ethereum Serenity approach has a detailed description in [2].

2.5.3 Polkadot’s approach to data availability In Polkadot, like in most sharded solutions, each shard (called parachain) snapshots its blocks to the beacon chain (called relay chain). Say there are 2f + 1 validators on the relay chain. The block producers of the parachain blocks, called collators, once the parachain block is produced compute an erasure coded version of the block that consists of 2f +1 parts such that any f parts are sufficient to reconstruct the block. They then distribute one part to each validator on the relay chain. A particular relay chain validator would only sign on a relay chain block if they have their part for each parachain block that is snapshotted to such relay chain block. Thus, if a relay chain block has signatures from 2f + 1 validators, and for as long as no more than f of them violated the protocol, each parachain block can be reconstructed by fetching the parts from the validators that follow the protocol. See figure 15. Figure 15: Polkadot’s data availability 2.5.4 Long term data availability Note that all the approaches discussed above only attest to the fact that a block was published at all, and is available now. Blocks can later become unavailable for a variety of reasons: nodes going offline, nodes intentionally erasing historical data, and others. A whitepaper worth mentioning that addresses this issue is Polyshard [3], which uses erasure codes to make blocks available across shards even if several shards completely lose their data. Unfortunately their specific approach requires all the shards to download blocks from all other shards, which is prohibitively expensive. The long term availability is not as pressing of an issue: since no participant in the system is expected to be capable of validating all the chains in all the

shards, the security of the sharded protocol needs to be designed in such a way that the system is secure even if some old blocks in some shards become completely unavailable.

状态有效性和数据可用性

分片 blockchain 的核心思想是大多数参与者操作或 使用网络无法验证所有分片中的块。因此,每当 任何参与者都需要与他们通常无法进行的特定分片进行交互 下载并验证分片的整个历史记录。 然而,分片的分区方面带来了巨大的潜力 问题:没有下载和验证特定的整个历史记录 分片参与者不一定能确定分片的状态 5 除第 2.5.3 小节外,本节先前发布于 https://near.ai/ 碎片2。如果您之前阅读过,请跳至下一部分。

它们相互作用是一些有效的块序列的结果,并且这种序列 区块确实是分片中的规范链。一个不存在的问题 存在于非分片 blockchain 中。 我们首先将针对已提出的这个问题提出一个简单的解决方案 通过许多协议,然后分析这个解决方案如何破坏以及什么 已经尝试解决这个问题。 2.1 验证者轮换 状态有效性的朴素解决方案如图 5 所示:假设我们假设 整个系统有数千个 validator 的数量级,其中 不超过 20% 是恶意的或会失败(例如未能 在线生成区块)。那么如果我们采样 200 validators,概率 超过 1 个 3 出于实际目的,失败可以假设为零。 图5: 采样 validators 1 3是一个重要的门槛。有一系列共识协议,称为 BFT 共识协议,保证只要少于 1 3 个 参与者失败了,要么是崩溃了,要么是以某种违反规则的方式行事 协议,达成共识。 假设诚实的 validator 百分比,如果当前集合 分片中的 validators 为我们提供了一些块,天真的解决方案假设 该块是有效的,并且它是建立在 validator 所认为的基础上的 当他们开始验证时该分片的规范链。 validators 从前一组 validator 中学习了规范链,它们由相同的 假设建立在作为规范链头部的区块之上 在那之前。通过归纳,整个链都是有效的,并且因为没有 validator 集合 在任何产生分叉的点,简单的解决方案也可以确定当前 chain 是分片中唯一的链。可视化见图 6。

图6: 每个区块都通过 BFT 共识最终确定的 blockchain 如果我们假设 validators 可以是 自适应地损坏,这不是一个不合理的假设6。适应性地 损坏具有 1000 个分片的系统中的单个分片的成本要低得多 而不是破坏整个系统。因此,协议的安全性随着分片数量的增加而线性下降。确定其有效性 一个区块,我们必须知道,在历史上的任何时刻,系统中都没有分片 大多数 validator 串通一气;有了适应性强的对手,我们不再有 这样的确定性。正如我们在 1.5 节中讨论的,串通 validators 可以行使 两种基本的恶意行为:创建分叉和产生无效区块。 恶意分叉可以通过与信标链交叉链接的区块来解决,信标链通常被设计为具有比信标链更高的安全性 分片链。 然而,产生无效块是一个更严重的问题。 具有挑战性的问题需要解决。 2.2 状态有效性 考虑图 7,其中 Shard #1 已损坏并且恶意行为者产生了 无效区块 B。假设在该区块 B 中,有 1000 个 token 被铸造出来 爱丽丝账户上的空气。然后,恶意行为者会生成有效的区块 C(在 感觉 C 中的交易被正确应用)在 B 之上,混淆 无效块 B,并向分片 #2 发起跨分片交易 将这 1000 个 token 转入 Bob 的帐户。从这一刻起,不当行为 创建的 token 驻留在分片 #2 中完全有效的 blockchain 上。 解决这个问题的一些简单方法是: 6阅读 这个 文章 为了 详情 上 如何 适应性 腐败 可以 是 携带 输出: https://medium.com/nearprotocol/d859adb464c8. 对于 更多 详情 上 适应性 腐败, 读 https://github.com/ethereum/wiki/wiki/Sharding-FAQ# 我们正在使用什么安全模型图7: 来自具有无效块的链的跨分片交易 1. 对于分片 #2 的 validators 来验证交易所在的块 已启动。即使在上面的例子中这也不起作用,因为块 C 似乎是完全有效的。 2. 对于分片 #2 中的 validators,验证发起交易的区块之前的大量区块。自然地,对于 由接收分片验证的任意数量的块 N 恶意 validators 可以在无效块之上创建 N+1 个有效块 产生的。 解决这个问题的一个有前途的想法是将分片排列成 无向图,其中每个分片都连接到其他几个分片,以及 只允许相邻分片之间的跨分片交易(例如,这就是 Vlad Zamfir 的分片本质上是有效的7,Kadena 的分片也使用了类似的想法 Chainweb [1])。如果分片之间需要跨分片交易 不是邻居,此类事务通过多个分片路由。在这个设计中 每个分片中的 validator 预计会验证其分片中的所有块 以及所有相邻分片中的所有块。考虑下图 有 10 个分片,每个分片有 4 个邻居,并且没有两个分片需要更多 跨分片通信的跳数少于图 8 所示。 分片 #2 不仅验证其自己的 blockchain,还验证 blockchain 所有邻居,包括 1 号分片。因此,如果 Shard #1 上有恶意行为者 试图创建一个无效的块 B,然后在其上构建块 C 并发起跨分片交易,这样的跨分片交易不会走 自从分片 #2 将验证分片 #1 的整个历史以来, 会导致它识别无效的块B。 7在此处阅读有关设计的更多信息:https://medium.com/nearprotocol/37e538177ed9

图8: 类链网系统中无效的跨分片交易将 被发现 虽然破坏单个分片不再是一种可行的攻击,但破坏一个分片 碎片很少仍然是一个问题。在图 9 中,对手破坏了两个分片

1 和分片 #2 成功执行到分片 #3 的跨分片交易

来自无效区块 B 的资金: 图9: 类链网系统中无效的跨分片交易将 不被检测到 分片 #3 验证分片 #2 中的所有块,但不验证分片 #1 中的所有块,并且 无法检测恶意块。 妥善解决状态有效性有两个主要方向:

和计算的密码证明。 2.3 渔夫 第一种方法背后的想法如下:每当块头 出于任何目的(例如交叉链接到 信标链,或跨分片交易),有一段时间 任何诚实的 validator 都可以提供该块无效的证据。那里 是各种结构,可以非常简洁地证明这些块是 无效,因此接收节点的通信开销要小得多 比收到一个完整的块。 只要存在至少一个诚实的 validator ,就可以使用这种方法 分片,系统是安全的。 图10: 渔夫 这是当今提议的协议中的主要方法(除了假装问题不存在之外)。 然而,这种方法有两个 主要缺点: 1. 对于诚实的validator来说,挑战期需要足够长 识别一个块已生成,下载它,完全验证它,并准备 如果块无效则提出挑战。 引入这样一个时期将 显着减慢跨分片交易的速度。 2. 挑战协议的存在创造了新的攻击向量 当恶意节点发送带有无效挑战的垃圾邮件时。一个显而易见的解决方案 解决这个问题的方法是让挑战者存入一定数量的 token 如果挑战有效则返回。这只是部分解决方案,因为它 对于对手向系统发送垃圾邮件(并烧毁 存款)进行无效挑战,例如阻止有效的挑战来自诚实的 validator 的挑战。这些攻击是 称为悲伤攻击。 有关解决后一点的方法,请参阅第 3.7.2 节。 2.4 简洁的非交互式知识论证 多分片损坏的第二种解决方案是使用某种加密结构,允许人们证明某种计算(例如 因为从一组交易中计算一个块)被正确执行。 这种结构确实存在,例如zk-SNARKs、zk-STARKs 和其他一些, 有些目前在 blockchain 协议中积极用于私人支付, 最值得注意的是 ZCash。这些原语的主要问题是它们 众所周知,计算速度很慢。例如。 Coda 协议,使用 zk-SNARK 特别是为了证明 blockchain 中的所有块都是有效的,如 的采访表明,每笔交易可能需要 30 秒才能创建证明 (现在这个数字可能更小)。 有趣的是,证明不需要由受信任方计算,因为 该证明不仅证明了其所构建的计算的有效性,而且证明了 证明本身的有效性。因此,此类证明的计算可以分开 一组参与者之间的冗余度明显低于实际情况 执行一些无需信任的计算所必需的。它还允许参与者 他们计算 zk-SNARK 在特殊硬件上运行,而不降低 系统的去中心化。 除了性能之外,zk-SNARK 的挑战还包括: 1. 依赖于研究较少且测试较少的密码原语; 2.“有毒废物”——zk-SNARK 依赖于一个可信的设置,其中一个组 的人执行一些计算,然后丢弃中间结果 该计算的值。如果程序的所有参与者都串通 并保留中间值,可以创建假证明; 3. 系统设计引入额外的复杂性; 4. zk-SNARK 仅适用于可能计算的子集,因此协议 使用图灵完备的 smart contract 语言将无法使用 SNARKs 证明链的有效性。 2.5 数据可用性 我们要讨论的第二个问题是数据可用性。一般节点 操作特定的 blockchain 分为两组:完整节点, 那些下载每个完整区块并验证每笔交易的人,以及 Light 节点,仅下载区块头并使用 Merkle 证明作为部分的节点 他们感兴趣的状态和交易,如图 11 所示。

图11: 默克尔树 现在,如果大多数全节点串通,他们可以生成一个区块,有效或 无效,并将其 hash 发送到轻节点,但绝不泄露完整内容 块的。他们可以通过多种方式从中受益。例如, 考虑图 12: 图 12: 数据可用性问题 共有三个区块:前一个 A,是由诚实的 validators 产生的; 当前 B 有 validator 串通;下一个C也将被生产 由诚实的 validators 提供(blockchain 位于右下角)。 你是一个商人。当前块(B)接收块的validators 来自之前的 validator 的 A,计算出您收到资金的区块,并向您发送了该区块的标头,其中包含状态的 Merkle 证明 你有钱(或者发送这笔钱的有效交易的 Merkle 证明) 给你)。确认交易完成后,您即可提供服务。 然而,validators 永远不会将块 B 的全部内容分发给 任何人。因此,块 C 的诚实 validators 无法检索该块,并且 要么被迫停止系统,要么在 A 之上构建,剥夺你作为 金钱商人。 当我们将相同的场景应用于分片时,完整和分片的定义 轻节点通常适用于每个分片:每个分片中的 validators 每下载一次 阻止该分片并验证该分片中的每笔交易,但其他 系统中的节点,包括那些将分片链状态快照到 信标链,仅下载标头。因此分片中的 validator 是 该分片的有效完整节点,而系统中的其他参与者, 包括信标链,作为轻节点运行。 为了使我们上面讨论的渔夫方法发挥作用,诚实的validators 需要能够下载与信标链交叉链接的块。 如果恶意 validators 交叉链接无效块的标头(或使用它来 发起跨分片交易),但从未分发过区块,诚实的人 validators 无法制定挑战。 我们将介绍解决这个问题的三种方法,这些方法相互补充 彼此。 2.5.1 监护权证明 最迫切需要解决的问题是区块一次是否可用 它被出版了。 一个提议的想法是让所谓的公证人进行轮换 分片之间比 validators 更频繁,其唯一的工作就是下载 阻止并证明他们能够下载它。他们可以是 轮换更频繁,因为他们不需要下载整个状态 分片的,与 validator 不同,后者不能频繁轮换,因为它们 每次旋转时都必须下载分片的状态,如图所示 13. 这种幼稚方法的问题是无法在以后证明 公证人是否能够下载该块,因此公证人 可以选择始终证明他们能够下载该块而无需 甚至试图找回它。解决此问题的一种方法是公证人提供 一些证据或抵押一定数量的 token 来证明该区块是 下载了。这里讨论了一种这样的解决方案:https://ethresear.ch/t/ 1 位聚合友好托管债券/2236。 2.5.2 纠删码 当特定的轻节点收到一个区块的 hash 时,增加该节点的 确信该块可用,它可以尝试下载一些随机的 块的碎片。这不是一个完整的解决方案,因为除非轻节点 共同下载恶意区块生产者可以选择的整个区块

图 13: 验证者需要下载状态,因此无法轮换 经常 保留未由任何轻节点下载的块部分, 从而仍然使该块不可用。 一种解决方案是使用一种称为纠删码的结构来使其成为可能 即使只有部分块可用,也可以恢复整个块,如图所示 如图 14 所示。 图 14: Merkle tree 建立在纠删码数据之上 Polkadot 和 Ethereum Serenity 都围绕这个想法进行设计 为轻节点提供一种合理确信区块可用的方法。 Ethereum Serenity 方法在 [2] 中有详细描述。2.5.3 Polkadot 的数据可用性方法 在 Polkadot 中,与大多数分片解决方案一样,每个分片(称为平行链)将其块快照到信标链(称为中继链)。假设有 2f + 1 中继链上的 validators。平行链区块的区块生产者,称为 整理者,一旦生成了平行链区块,就计算该区块的纠删码版本,该版本由 2f +1 个部分组成,这样任何 f 个部分就足够了 重建块。然后,他们将一份零件分发给 validator 上的每个 validator 中继链。特定中继链 validator 只会在中继链上签名 块,如果它们对每个被快照的平行链块都有自己的部分 这样的中继链区块。因此,如果中继链区块具有来自 2f + 1 的签名 validators,只要不超过 f 个违反协议,每个 平行链区块可以通过从 validators 中获取部分来重建 遵循协议。见图 15。 图 15: Polkadot 的数据可用性 2.5.4 长期数据可用性 请注意,上面讨论的所有方法仅证明了以下事实:块 已全部发布,并且现已可用。块稍后可能会变得不可用 由于多种原因:节点离线、节点故意删除历史记录 数据等。 值得一提的解决此问题的白皮书是 Polyshard [3], 它使用纠删码来使块可以跨分片使用,即使多个分片 分片完全丢失数据。不幸的是,他们的具体方法需要 所有分片从所有其他分片下载块,这是令人望而却步的 昂贵。 长期可用性并不是一个紧迫的问题:因为没有参与者 系统中预计能够验证所有链中的所有链

分片,分片协议的安全性需要这样设计 即使某些分片中的某些旧块变得如此,系统也是安全的 完全不可用。

Nightshade

Nightshade

3.1 From shard chains to shard chunks The sharding model with shard chains and a beacon chain is very powerful but has certain complexities. In particular, the fork choice rule needs to be executed in each chain separately, the fork choice rule in the shard chains and the beacon chain must be built differently and tested separately. In Nightshade we model the system as a single blockchain, in which each block logically contains all the transactions for all the shards, and changes the whole state of all the shards. Physically, however, no participant downloads the full state or the full logical block. Instead, each participant of the network only maintains the state that corresponds to the shards that they validate transactions for, and the list of all the transactions in the block is split into physical chunks, one chunks per shard. Under ideal conditions each block contains exactly one chunk per shard per block, which roughly corresponds to the model with shard chains in which the shard chains produce blocks with the same speed as the beacon chain. However, due to network delays some chunks might be missing, so in practice each block contains either one or zero chunks per shard. See section 3.3 for details on how blocks are produced. Figure 16: A model with shard chains on the left and with one chain having blocks split into chunks on the right

3.2 Consensus The two dominant approaches to the consensus in the blockchains today are the longest (or heaviest) chain, in which the chain that has the most work or stake used to build it is considered canonical, and BFT, in which for each block some set of validators reach a BFT consensus. In the protocols proposed recently the latter is a more dominant approach, since it provides immediate finality, while in the longest chain more blocks need to be built on top of the block to ensure the finality. Often for a meaningful security the time it takes for sufficient number of blocks to be built takes on the order of hours. Using BFT consensus on each block also has disadvantages, such as: 1. BFT consensus involves considerable amount of communication. While recent advances allow the consensus to be reached in linear time in number of participants (see e.g. [4]), it is still noticeable overhead per block; 2. It is unfeasible for all the network participants to participate in the BFT consensus per block, thus usually only a randomly sampled subset of participants reach the consensus. A randomly sampled set can be, in principle, adaptively corrupted, and a fork in theory can be created. The system either needs to be modelled to be ready for such an event, and thus still have a fork-choice rule besides the BFT consensus, or be designed to shut down in such an event. It is worth mentioning that some designs, such as Algorand [5], significantly reduce the probability of adaptive corruption. 3. Most importantly, the system stalls if 1 3 or more of all the participants are offline. Thus, any temporary network glitch or a network split can completely stall the system. Ideally the system must be able to continue to operate for as long as at least half of the participants are online (heaviest chain-based protocols continue operating even if less than half of the participants are online, but the desirability of this property is more debatable within the community). A hybrid model in which the consensus used is some sort of the heaviest chain, but some blocks are periodically finalized using a BFT finality gadget maintain the advantages of both models. Such BFT finality gadgets are Casper FFG [6] used in Ethereum 2.0 8, Casper CBC (see https://vitalik. ca/general/2018/12/05/cbc_casper.html) and GRANDPA (see https:// medium.com/polkadot-network/d08a24a021b5) used in Polkadot. Nightshade uses the heaviest chain consensus. Specifically when a block producer produces a block (see section 3.3), they can collect signatures from other block producers and validators attesting to the previous block. See section 3.8 for details how such large number of signatures is aggregated. The weight 8Also see the whiteboard session with Justin Drake for an indepth overview of Casper FFG, and how it is integrated with the GHOST heaviest chain consensus here: https://www. youtube.com/watch?v=S262StTwkmo

of a block is then the cumulative stake of all the signers whose signatures are included in the block. The weight of a chain is the sum of the block weights. On top of the heaviest chain consensus we use a finality gadget that uses the attestations to finalize the blocks. To reduce the complexity of the system, we use a finality gadget that doesn’t influence the fork choice rule in any way, and instead only introduces extra slashing conditions, such that once a block is finalized by the finality gadget, a fork is impossible unless a very large percentage of the total stake is slashed. Casper CBC is such a finality gadget, and we presently model with Casper CBC in mind. We also work on a separate BFT protocol called TxFlow. At the time of writing this document it is unclear if TxFlow will be used instead of Casper CBC. We note, however, that the choice of the finality gadget is largely orthogonal to the rest of the design. 3.3 Block production In Nightshade there are two roles: block producers and validators. At any point the system contains w block producers, w = 100 in our models, and wv validators, in our model v = 100, wv = 10, 000. The system is Proof-of-Stake, meaning that both block producers and validators have some number of internal currency (referred to as ”tokens”) locked for a duration of time far exceeding the time they spend performing their duties of building and validating the chain. As with all the Proof of Stake systems, not all the w block producers and not all the wv validators are different entities, since that cannot be enforced. Each of the w block producers and the wv validators, however, do have a separate stake. The system contains n shards, n = 1000 in our model. As mentioned in section 3.1, in Nightshade there are no shard chains, instead all the block producers and validators are building a single blockchain, that we refer to as the main chain. The state of the main chain is split into n shards, and each block producer and validator at any moment only have downloaded locally a subset of the state that corresponds to some subset of the shards, and only process and validate transactions that affect those parts of the state. To become a block producer, a participant of the network locks some large amount of tokens (a stake). The maintenance of the network is done in epochs, where an epoch is a period of time on the order of days. The participants with the w largest stakes at the beginning of a particular epoch are the block producers for that epoch. Each block producer is assigned to sw shards, (say sw = 40, which would make sww/n = 4 block producers per shard). The block producer downloads the state of the shard they are assigned to before the epoch starts, and throughout the epoch collects transactions that affect that shard, and applies them to the state. For each block b on the main chain, and for every shards s, there’s one of the assigned block producers to s who is responsible to produce the part of b related to the shard. The part of b related to shard s is called a chunk, and contains the list of the transactions for the shard to be included in b, as well as the merkle

root of the resulting state. b will ultimately only contain a very small header of the chunk, namely the merkle root of all the applied transactions (see section 3.7.1 for exact details), and the merkle root of the final state. Throughout the rest of the document we often refer to the block producer that is responsible to produce a chunk at a particular time for a particular shard as a chunk producer. Chunk producer is always one of the block producers. The block producers and the chunk producers rotate each block according to a fixed schedule. The block producers have an order and repeatedly produce blocks in that order. E.g. if there are 100 block producers, the first block producers is responsible for producing blocks 1, 101, 201 etc, the second is responsible for producing 2, 102, 202 etc). Since chunk production, unlike the block production, requires maintaining the state, and for each shard only sww/n block producers maintain the state per shard, correspondingly only those sww/n block producers rotate to create chunks. E.g. with the constants above with four block producers assigned to each shard, each block producer will be creating chunks once every four blocks. 3.4 Ensuring data availability To ensure the data availability we use an approach similar to that of Polkadot described in section 2.5.3. Once a block producer produces a chunk, they create an erasure coded version of it with an optimal (w, ⌊w/6 + 1⌋) block code of the chunk. They then send one piece of the erasure coded chunk (we call such pieces chunk parts, or just parts) to each block producer. We compute a merkle tree that contains all the parts as the leaves, and the header of each chunk contains the merkle root of such tree. The parts are sent to the validators via onepart messages. Each such message contains the chunk header, the ordinal of the part and the part contents. The message also contains the signature of the block producer who produced the chunk and the merkle path to prove that the part corresponds to the header and is produced by the proper block producer. Once a block producer receives a main chain block, they first check if they have onepart messages for each chunk included in the block. If not, the block is not processed until the missing onepart messages are retrieved. Once all the onepart messages are received, the block producer fetches the remaining parts from the peers and reconstructs the chunks for which they hold the state. The block producer doesn’t process a main chain block if for at least one chunk included in the block they don’t have the corresponding onepart message, or if for at least one shard for which they maintain the state they cannot reconstruct the entire chunk. For a particular chunk to be available it is enough that ⌊w/6⌋+1 of the block producers have their parts and serve them. Thus, for as long as the number of malicious actors doesn’t exceed ⌊w/3⌋no chain that has more than half block producers building it can have unavailable chunks.

Figure 17: Each block contains one or zero chunks per shard, and each chunk is erasure coded. Each part of the erasure coded chunk is sent to a designated block producer via a special onepart message 3.4.1 Dealing with lazy block producers If a block producer has a block for which a onepart message is missing, they might choose to still sign on it, because if the block ends up being on chain it will maximize the reward for the block producer. There’s no risk for the block producer since it is impossible to prove later that the block producer didn’t have the onepart message. To address it we make each chunk producer when creating the chunk to choose a color (red or blue) for each part of the future encoded chunk, and store the bitmask of assigned color in the chunk before it is encoded. Each onepart message then contains the color assigned to the part, and the color is used when computing the merkle root of the encoded parts. If the chunk producer deviates from the protocol, it can be easily proven, since either the merkle root will not correspond to onepart messages, or the colors in the onepart messages that correspond to the merkle root will not match the mask in the chunk. When a block producer signs on a block, they include a bitmask of all the red parts they received for the chunks included in the block. Publishing an incorrect bitmask is a slashable behavior. If a block producer hasn’t received a onepart message, they have no way of knowing the color of the message, and thus have a 50% chance of being slashed if they attempt to blidnly sign the block. 3.5 State transition application The chunk producers only choose which transactions to include in the chunk but do not apply the state transition when they produce a chunk. Correspondingly,

the chunk header contains the merkle root of the merkelized state as of before the transactions in the chunk are applied. The transactions are only applied when a full block that includes the chunk is processed. A participant only processes a block if 1. The previous block was received and processed; 2. For each chunk the participant doesn’t maintain the state for they have seen the onepart message; 3. For each chunk the participant does maintain the state for they have the full chunk. Once the block is being processed, for each shard for which the participant maintains the state for, they apply the transactions and compute the new state as of after the transactions are applied, after which they are ready to produce the chunks for the next block, if they are assigned to any shard, since they have the merkle root of the new merkelized state. 3.6 Cross-shard transactions and receipts If a transaction needs to affect more than one shard, it needs to be consecutively executed in each shard separately. The full transaction is sent to the first shard affected, and once the transaction is included in the chunk for such shard, and is applied after the chunk is included in a block, it generates a so called receipt transaction, that is routed to the next shard in which the transaction need to be executed. If more steps are required, the execution of the receipt transaction generates a new receipt transaction and so on. 3.6.1 Receipt transaction lifetime It is desirable that the receipt transaction is applied in the block that immediately follows the block in which it was generated. The receipt transaction is only generated after the previous block was received and applied by block producers that maintain the originating shard, and needs to be known by the time the chunk for the next block is produced by the block producers of the destination shard. Thus, the receipt must be communicated from the source shard to the destination shard in the short time frame between those two events. Let A be the last produced block which contains a transaction t that generates a receipt r. Let B be the next produced block (i.e. a block that has A as its previous block) that we want to contain r. Let t be in the shard a and r be in the shard b. The lifetime of the receipt, also depicted on figure 18, is the following: Producing and storing the receipts. The chunk producer cpa for shard a receives the block A, applies the transaction t and generates the receipt r. cpa then stores all such produced receipts in its internal persistent storage indexed by the source shard id.

Distributing the receipts. Once cpa is ready to produce the chunk for shard a for block B, they fetch all the receipts generated by applying the transactions from block A for shard a, and included them into the chunk for shrad a in block B. Once such chunk is generated, cpa produces its erasure coded version and all the corresponding onepart messages. cpa knows what block producers maintain the full state for which shards. For a particular block producer bp cpa includes the receipts that resulted from applying transactions in block A for shard a that have any of the shards that bp cares about as their destination in the onepart message when they distributed the chunk for shard a in block B (see figure 17, that shows receipts included in the onepart message). Receiving the receipts. Remember that the participants (both block producers and validators) do not process blocks until they have onepart messages for each chunk included in the block. Thus, by the time any particular particpiant applies the block B, they have all the onepart messages that correspond to chunks in B, and thus they have all the incoming receipts that have the shards the participant maintains state for as their destination. When applying the state transition for a particular shard, the participant apply both the receipts that they have collected for the shard in the onepart messages, as well as all the transactions included in the chunk itself. Figure 18: The lifetime of a receipt transaction 3.6.2 Handling too many receipts It is possible that the number of receipts that target a particular shard in a particular block is too large to be processed. For example, consider figure 19, in which each transaction in each shard generates a receipt that targets shard 1. By the next block the number of receipts that shard 1 needs to process is comparable to the load that all the shards combined processed while handling the previous block.

Figure 19: If all the receipts target the same shard, the shard might not have the capacity to process them To address it we use a technique similar to that used in QuarkChain 9. Specifically, for each shard the last block B and the last shard s within that block from which the receipts were applied is recorded. When the new shard is created, the receipt are applied in order first from the remaining shards in B, and then in blocks that follow B, until the new chunk is full. Under normal circumstances with a balanced load it will generally result in all the receipts being applied (and thus the last shard of the last block will be recorded for each chunk), but during times when the load is not balanced, and a particular shard receives disproportionately many receipts, this technique allows them to be processed while respecting the limits on the number of transactions included. Note that if such unbalanced load remains for a long time, the delay from the receipt creation until application can continue growing indefinitely. One way to address it is to drop any transaction that creates a receipt targeting a shard that has a processing delay that exceeds some constant (e.g. one epoch). Consider figure 20. By block B the shard 4 cannot process all the receipts, so it only processes receipts origination from up to shard 3 in block A, and records it. In block C the receipts up to shard 5 in block B are included, and then by block D the shard catches up, processing all the remaining receipts in block B and all the receipts from block C. 3.7 Chunks validation A chunk produced for a particular shard (or a shard block produced for a particular shard chain in the model with shard chains) can only be validated by the 9See the whiteboard episode with QuarkChain here: https://www.youtube.com/watch? v=opEtG6NM4x4, in which the approach to cross-shard transactions is discussed, among other things

Figure 20: Delayed receipts processing participants that maintain the state. They can be block producers, validators, or just external witnesses that downloaded the state and validate the shard in which they store assets. In this document we assume that majority of the participants cannot store the state for a large fraction of the shards. It is worth mentioning, however, that there are sharded blockchains that are designed with the assumption that most participants do have capacity to store the state for and validate most of the shards, such as QuarkChain. Since only a fraction of the participants have the state to validate the shard chunks, it is possible to adaptive corrupt just the participants that have the state, and apply an invalid state transition. Multiple sharding designs were proposed that sample validators every few days, and within a day any block in the shard chain that has more than 2/3 of signatures of the validators assigned to such shard is immediately considered final. With such approach an adaptive adversary only needs to corrupt 2n/3+1 of the validators in a shard chain to apply an invalid state transition, which, while is likely hard to pull off, is not a level of security sufficient for a public blockchain. As discussed in section 2.3, the common approach is to allow a certain window of time after a block is created for any participant that has state (whether it’s a block producer, a validator, or an external observer) to challenge its validity. Such participants are called Fishermen. For a fisherman to be able to challenge an invalid block, it must be ensured that such a block is available to them. The data availability in Nightshade is discussed in section 3.4. In Nightshade once a block is produced, the chunks were not validated by anyone but the actual chunk producer. In particular, the block producer that suggested the block naturally didn’t have the state for most of the shards, and

was not able to validate the chunks. When the next block is produced, it contains attestations (see section 3.2) of multiple block producers and validators, but since the majority of block producers and validators do not maintain state for most shards as well, a block with just one invalid chunk will collect significantly more than half of the attestations and will continue being on the heaviest chain. To address this issue, we allow any participant that maintains the state of a shard to submit a challenge on-chain for any invalid chunk produced in that shard. 3.7.1 State validity challenge Once a participant detects that a particular chunk is invalid, they need to provide a proof that the chunk is invalid. Since the majority of the network participants do not maintain the state for the shard in which the invalid chunk is produced, the proof needs to have sufficient information to confirm the block is invalid without having the state. We set a limit Ls of the amount of state (in bytes) that a single transaction can cumulatively read or write. Any transaction that touches more than Ls state is considered to be invalid. Remember from the section 3.5 that the chunk in a particular block B only contains the transactions to be applied, but not the new state root. The state root included in the chunk in block B is the state root before applying such transactions, but after applying the transactions from the last chunk in the same shard before the block B. A malicious actor that wishes to apply an invalid state transition would include an incorrect state root in block B that doesn’t correspond to the state root that results from applying the transactions in the preceding chunk. We extend the information that a chunk producer includes in the chunk. Instead of just including the state after applying all the transactions, it instead includes a state root after applying each contiguous set of transactions that collectively read and write Ls bytes of state. With this information for the fisherman to create a challenge that a state transition is applied incorrectly it is sufficient to find the first such invalid state root, and include just Ls bytes of state that are affected by the transactions between the last state root (which was valid) and the current state root with the merkle proofs. Then any participant can validate the transactions in the segment and confirm that the chunk is invalid. Similarly, if the chunk producer attempted to include transactions that read and write more than Ls bytes of state, for the challenge it is enough to include the first Ls bytes it touches with the merkle proofs, which will be enough to apply the transactions and confirm that there’s a moment when an attempt to read or write content beyond Ls bytes is made.

3.7.2 Fishermen and fast cross-shard transactions As discussed in section 2.3, once we assume that the shard chunks (or shard blocks in the model with shard chains) can be invalid and introduce a challenge period, it negatively affects the finality, and thus cross-shard communication. In particular, the destination shard of any cross-shard transction cannot be certain the originating shard chunk or block is final until the challenge period is over (see figure 21). Figure 21: Waiting for the challenge period before applying a receipt The way to address it in a way that makes the cross-shard transactions instantenious is for the destination shard to not wait for the challenge period after the source shard transaction is published, and apply the receipt transaction immediately, but then roll back the destination shard together with the source shard if later the originating chunk or block was found to be invalid (see figure 22). This applies very naturally to the Nightshade design in which the shard chains are not independent, but instead the shard chunks are all published together in the same main chain block. If any chunk is found to be invalid, the entire block with that chunk is considered invalid, and all the blocks built on top of it. See figure 23. Both of the above approaches provide atomicity assuming that the challenge period is sufficiently long. We use the latter approach since providing fast crossshard transactions under normal circumstances outweights the inconvenience of the destination shard rolling back due to an invalid state transition in one of the source shards, which is an extremely rare event. 3.7.3 Hiding validators The existence of the challenges already significantly reduces the probability of adaptive corruption, since to finalize a chunk with an invalid state transition post

Figure 22: Applying receipts immediately and rolling back the destination chain if the source chain had an invalid block Figure 23: Fisherman challenge in Nightshade the challenge period the adaptive adversary needs to corrupt all the participants that maintain the state of the shard, including all the validators. Estimating the likelihood of such an event is extremely complex, since no sharded blockchain has been live sufficiently long for any such attack to be attempted. We argue that the probability, while extremely low, is still sufficiently large for a system that is expected to execute multi-million transactions and run a world-wide financial operations. There are two main reasons for this belief: 1. Most of the validators of the Proof-of-Stake chains and miners of the

Proof-of-Work chains are primarily incentivized by the financial upside. If an adaptive adversary offers them more money then the expected return from operating honestly, it is reasonable to expect that many validators will accept the offer. 2. Many entities do validation of Proof-of-Stake chains professionally, and it is expected that a large percentage of the stake in any chain will be from such entities. The number of such entities is sufficiently small for an adaptive adversary to get to know most of them personally and have a good understanding of their inclanation to be corrupted. We take one step further in reducing the probability of the adaptive corruption by hiding which validators are assigned to which shard. The idea is remotely similar to the way Algorand [5] conceals validators. It is critical to note that even if the validators are concealed, as in Algorand or as described below, the adaptive corruption is still in theory possible. While the adaptive adversary doesn’t know the participants that will create or validate a block or a chunk, the participants themselves do know that they will perform such a task and have a cryptographic proof of it. Thus, the adversary can broadcast their intent to corrupt, and pay to any participant that will provide such a cryptographic proof. We note however, that since the adversary doesn’t know the validators that are assigned to the shard they want to corrupt, they have no other choice but to broadcast their intent to corrupt a particular shard to the entire community. At that point it is economically beneficial for any honest participant to spin up a full node that validates that shard, since there’s a high chance of an invalid block appearing in that shard, which is an opportunity to create a challenge and collect associated reward. To not reveal the validators that are assigned to a particular shard, we do the following (see figure 24): Using VRF to get the assignment. At the beginning of each epoch each validator uses a VRF to get a bitmask of the shards the validator is assigned to. The bitmask of each validator will have Sw bits (see section 3.3 for the definition of Sw). The validator then fetches the state of the corresponding shards, and during the epoch for each block received validates the chunks that correspond to the shards that the validator is assigned to. Sign on blocks instead of chunks. Since the shards assignment is concealed, the validator cannot sign on chunks. Instead it always signs on the entire block, thus not revealing what shards it validates. Specifically, when the validator receives a block and validates all the chunks, it either creates a message that attests that all the chunks in all the shards the validator is assigned to are valid (without indicating in any way what those shards are), or a message that contains a proof of an invalid state transition if any chunk is invalid. See the section 3.8 for the details on how such messages are aggregated, section 3.7.4 for the details on how to prevent validators from piggy-backing on messages from other validators, and section 3.7.5 for the details how to reward and punish validators should a successful invalid state transition challenge actually happen.

Figure 24: Concealing the validators in Nightshade 3.7.4 Commit-Reveal One of the common problems with validators is that a validator can skip downloading the state and actually validating the chunks and blocks, and instead observe the network, see what the other validators submit and repeat their messages. A validator that follows such a strategy doesn’t provide any extra security for the network, but collects rewards. A common solution for this problem is for each validator to provide a proof that they actually validated the block, for example by providing a unique trace of applying the state transition, but such proofs significantly increase the cost of validation. Figure 25: Commit-reveal

Instead we make the validators first commit to the validation result (either the message that attests to the validity of the chunks, or the proof of an invalid state transition), wait for a certain period, and only then reveal the actual validation result, as shown on figure 25. The commit period doesn’t intersect with the reveal period, and thus a lazy validator cannot copycat honest validators. Moreover, if a dishonest validator committed to a message that attests to the validity of the assigned chunks, and at least one chunk was invalid, once it is shown that the chunk is invalid the validator cannot avoid the slashing, since, as we show in section 3.7.5, the only way to not get slashed in such a situation is to present a message that contains a proof of the invalid state transition that matches the commit. 3.7.5 Handling challenges As discussed above, once a validator receives a block with an invalid chunk, they first prepare a proof of the invalid state transition (see section 3.7.1), then commit to such a proof (see 3.7.4), and after some period reveal the challenge. Once the revealed challenge is included in a block, the following happens: 1. All the state transitions that happened from the block containing the invalid chunk until the block in which the revealed challenge is included get nullyfied. The state before the block that includes the revealed challenge is considered to be the same as the state before the block that contained the invalid chunk. 2. Within a certain period of time each validator must reveal their bitmask of the shards they validate. Since the bitmask is created via a VRF, if they were assigned to the shard that had the invalid state transition, they cannot avoid revealing it. Any validator that fails to reveal the bitmask is assumed to be assigned to the shard. 3. Each validator that after such period is found to be assigned to the shard, that did commit to some validation result for the block containing the invalid chunk and that didn’t reveal the proof of invalid state transition that corresponds to their commit is slashed. 4. Each validator gets a new shards assignment, and a new epoch is scheduled to start after some time sufficient for all the validators to download the state, as shown on figure 26. Note that from the moment the validators reveal the shards they are assigned to until the new epoch starts the security of the system is reduced since the shards assignment is revealed. The participants of the network need to keep it in mind while using the network during such period. 3.8 Signature Aggregation For a system with hudreds of shards to operate securely, we want to have on the order of 10, 000 or more validators. As discussed in section 3.7, we want each

Figure 26: Handling the challenge validator to publish a commit to a certain message and a signature on average once per block. Even if the commit messages were the same, aggregating such a BLS-signature and validating it would have been prohibitively expensive. But naturally the commit and reveal messages are not the same across validators, and thus we need some way to aggregate such messages and the signatures in a way that allows for fast validation later. The specific approach we use is the following: Validators joining block producers. The block producers are known some time before the epoch starts, since they need some time to download the state before the epoch starts, and unlike the validators the block producers are not concealed. Each block producer has v validator slots. Validators submit off-chain proposals to the block producers to get included as one of their v validators. If a block producer wishes to include a validator, they submit a transaction that contains the initial off-chainrequest from the validator, and the block producer’s signature that makes the validator join the block producer. Note that the validators assigned to the block producers do not necessarily validate the same shards that the block producer produces chunks for. If a validator applied to join multiple block producers, only the transaction from the first block producer will succeed. Block producers collect commits. The block producer constantly collects the commit and reveal messages from the validators. Once a certain number of such messages are accumulated, the block producer computes a merkle tree of these messages, and sends to each validator the merkle root and the merkle path to their message. The validator validates the path and signs on the merkle root. The block producer then accumulates a BLS signature on the merkle root from the validators, and publishes only the merkle root and the accumulated signature. The block producer also signs on the validity of the multisignature using a cheap ECDSA signature. If the multisignature doesn’t match the merkle root submitted or the bitmask of the validators participating, it is a slashable behavior. When synchronizing the chain, a participant can choose to validate all the BLS signatures from the validators (which is extremely expensive since it involves aggregating validators public keys), or only

the ECDMA signatures from the block producers and rely on the fact that the block producer was not challenged and slashed. Using on-chain transactions and merkle proofs for challenges. It can be noted that there’s no value in revealing messages from validators if no invalid state transition was detected. Only the messages that contain the actual proofs of invalid state transition need to be revealed, and only for such messages it needs to be shown that they match the prior commit. The message needs to be revealed for two purposes: 1. To actually initiate the rollback of the chain to the moment before the invalid state transition (see section 3.7.5). 2. To prove that the validator didn’t attempt to attest to the validity of the invalid chunk. In either case we need to address two issues: 1. The actual commit was not included on chain, only the merkle root of the commit aggregated with other messages. The validator needs to use the merkle path provided by the block producer and their original commit to prove that they committed to the challenge. 2. It is possible that all the validators assigned to the shard with the invalid state transition happen to be assigned to corrupted block producers that are censoring them. To get around it we allow them to submit their reveals as a regular transaction on-chain and bypass the aggregation. The latter is only allowed for the proofs of invalid state transition, which are extremely rare, and thus should not result in spamming the blocks. The final issue that needs to be addressed is that the block producers can choose not to participate in messages aggregation or intentionally censor particular validators. We make it economically disadvantageous, by making the block producer reward proportional to the number of validators assigned to them. We also note that since the block producers between epochs largely intersect (since it’s always the top w participants with the highest stake), the validators can largely stick to working with the same block producers, and thus reduce the risk of getting assigned to a block producer that censored them in the past. 3.9 Snapshots Chain Since the blocks on the main chain are produced very frequently, downloading the full history might become expensive very quickly. Moreover, since every block contains a BLS signature of a large number of participants, just the aggregation of the public keys to check the signature might become prohibitively expensive as well. Finally, since in any foreseeable future Ethereum 1.0 will likely remain one of the most used blockchains, having a meaningful way to transfer assets from

Near to Ethereum is a requirement, and today verifying BLS signatures to ensure Near blocks validity on Ethereum’s side is not possible. Each block in the Nightshade main chain can optionally contain a Schnorr multisignature on the header of the last block that included such a Schnorr multisignature. We call such blocks snapshot blocks. The very first block of every epoch must be a snapshot block. While working on such a multisignature, the block producers must also accumulate the BLS signatures of the validators on the last snapshot block, and aggregate them the same way as described in section 3.8. Since the block producers set is constant throughout the epoch, validating only the first snapshot blocks in each epoch is sufficient assuming that at no point a large percentage of block producers and validators colluded and created a fork. The first block of the epoch must contain information sufficient to compute the block producers and validators for the epoch. We call the subchain of the main chain that only contains the snapshot blocks a snapshot chain. Creating a Schnorr multisignature is an interactive process, but since we only need to perform it infrequently, any, no matter how inefficient, process will suffice. The Schnorr multisignatures can be easily validated on Ethereum, thus providing crucial primitives for a secure way of performing cross-blockchain communication. To sync with the Near chain one only needs to download all the snapshot blocks and confirm that the Schnorr signatures are correct (optionally also verifying the individual BLS signatures of the validators), and then only syncing main chain blocks from the last snapshot block.

Nightshade

3.1 从分片链到分片块 具有分片链和信标链的分片模型非常强大,但是 具有一定的复杂性。特别需要执行分叉选择规则 分别在每个链中,分片链和信标中的分叉选择规则 链必须以不同的方式构建并单独测试。 在 Nightshade 中,我们将系统建模为单个 blockchain,其中每个 块逻辑上包含所有分片的所有交易,并更改 所有分片的完整状态。然而,实际上,没有参与者下载 完整状态或完整逻辑块。相反,网络的每个参与者仅 维护与验证交易的分片相对应的状态,并且块中所有交易的列表被分为物理部分 块,每个分片一个块。 在理想条件下,每个块每个分片仅包含一个块 块,它大致对应于分片链的模型,其中 分片链产生区块的速度与信标链相同。然而, 由于网络延迟,某些块可能会丢失,因此实际上每个块 每个分片包含一个或零个块。有关如何操作的详细信息,请参阅第 3.3 节 块被生产出来。 图 16: 左侧有分片链且其中一条链具有 块在右侧分成块

3.2 共识 今天 blockchain 达成共识的两种主要方法是 最长(或最重)的链,其中具有最多工作量或权益的链 用于构建它被认为是规范的,并且 BFT,其中每个块都有一些 validator 组达成 BFT 共识。 在最近提出的协议中,后者是更占主导地位的方法, 因为它提供了立即的最终性,而在最长的链中需要更多的块 构建在块的顶部以确保最终性。常常为了有意义的事 安全性 构建足够数量的区块所需的时间 时间顺序。 在每个区块上使用 BFT 共识也有缺点,例如: 1. BFT 共识涉及大量的沟通。同时 最近的进展允许在线性时间内达成数量上的共识 参与者的数量(例如 [4]),每个块的开销仍然是明显的; 2. 所有网络参与者都参与BFT是不可行的 每个区块达成共识,因此通常只有随机抽样的参与者子集才能达成共识。原则上,随机采样的集合可以是: 自适应损坏,理论上可以创建分叉。系统 要么需要建模才能为此类事件做好准备,因此仍然 除了 BFT 共识之外,还有一个分叉选择规则,或者被设计为关闭 在这样的事件中。值得一提的是,有些设计,比如 Algorand [5],显着降低自适应损坏的概率。 3. 最重要的是,如果出现以下情况,系统就会停止运行 1 所有参与者中有 3 名或以上 离线。因此,任何临时的网络故障或网络分裂都可能导致系统完全瘫痪。理想情况下,系统必须能够持续 只要至少一半的参与者在线(最重的 即使不到一半的参与者在线,基于链的协议也会继续运行,但这种属性的可取性更有争议 社区内)。 一种混合模型,其中使用的共识是某种最重的共识 链,但一些块使用 BFT 最终性小工具定期完成,保持了两种模型的优点。这样的 BFT 最终性小工具是 Casper FFG [6] 用于 Ethereum 2.0 8,Casper CBC(请参阅 https://vitalik. ca/general/2018/12/05/cbc_casper.html)和 GRANDPA(参见 https:// medium.com/polkadot-network/d08a24a021b5) 用于 Polkadot。 Nightshade 采用最重链共识。 特别是当一个块 生产者生产一个区块(参见第 3.3 节),他们可以收集签名 其他区块生产者和 validator 证明前一个区块。参见部分 3.8 了解如何聚合如此大量的签名的详细信息。重量 8另请参阅 Justin Drake 的白板会议,深入了解 Casper FFG,以及它如何与 GHOST 最重链共识集成:https://www. youtube.com/watch?v=S262StTwkmo一个区块的总权益就是所有签名者的累积权益 包含在块中。链的权重是块权重的总和。 在最重链共识之上,我们使用了一个最终性小工具,该小工具使用 完成区块的证明。为了降低系统的复杂度, 我们使用一个最终性小工具,它不会以任何方式影响分叉选择规则, 相反,只引入额外的削减条件,这样一旦一个块被 由最终性小工具最终确定,除非有很大比例,否则分叉是不可能的 总股份被削减。 Casper CBC 就是这样一个最终结果小工具,我们 目前模型考虑的是 Casper CBC。 我们还致力于一个名为 TxFlow 的独立 BFT 协议。当时 撰写本文档时,尚不清楚是否会使用 TxFlow 来代替 Casper 加拿大广播公司。然而,我们注意到,最终性小工具的选择很大程度上与设计的其余部分正交。 3.3 区块生产 在 Nightshade 中有两个角色:区块生产者和 validators。 在任何 点系统包含 w 个区块生产者,在我们的模型中 w = 100,并且 wv validators,在我们的模型中 v = 100, wv = 10, 000。该系统是权益证明系统, 这意味着区块生产者和 validators 都有一定数量的内部 货币(称为“tokens”)锁定时间远远超过 他们花时间履行构建和验证链的职责。 与所有权益证明系统一样,并非所有 w 区块生产者也不是 所有 wv validator 都是不同的实体,因为无法强制执行。每个 然而,w 区块生产者和 wv validators 确实有一个单独的 股份。 系统包含 n 个分片,在我们的模型中 n = 1000。正如中提到的 第 3.1 节,在 Nightshade 中,没有分片链,而是所有区块生产者和 validator 都在构建一个 blockchain,我们将其称为 主链。主链的状态被分成n个分片,每个区块 生产者和 validator 在任何时刻都只在本地下载了一个子集 对应于分片的某个子集的状态,并且仅处理和 验证影响状态这些部分的交易。 要成为区块生产者,网络参与者需要锁定一些大的区块 数量 tokens(股份)。网络的维护是分周期完成的, 其中纪元是一段以天为单位的时间段。 参加者 在特定时期开始时,w 最大的赌注是区块 那个时代的生产者。每个区块生产者都被分配给 sw 分片(比如 sw = 40,这将使 sww/n = 每个分片有 4 个区块生产者)。街区 生产者下载纪元之前分配给他们的分片的状态 开始,并在整个纪元中收集影响该分片的交易, 并将它们应用到国家。 对于主链上的每个区块 b 和每个分片 s,都有一个 将区块生产者分配给 s,s 负责生产与 b 相关的部分 到碎片。 b 中与 shard s 相关的部分称为 chunk,包含 b 中包含的分片交易列表以及 merkle结果状态的根。 b 最终只会包含一个非常小的标头 块,即所有应用交易的 Merkle 根(参见部分 3.7.1 了解具体细节),以及最终状态的 merkle 根。 在文档的其余部分中,我们经常提到区块生产者 负责在特定时间为特定分片生成块 作为块生产者。区块生产者始终是区块生产者之一。 块生产者和块生产者根据以下方式轮换每个块: 到固定的时间表。区块生产者有订单并重复生产 按该顺序块。 例如。 如果有 100 个区块生产者,则第一个区块 生产者负责生产区块 1、101、201 等,第二个是 负责生产2、102、202等)。 由于块生产与块生产不同,需要维护 状态,对于每个分片,只有 sww/n 块生产者维护状态 每个分片,相应地只有那些 sww/n 块生产者轮流创建 大块。例如。上面的常量被分配给四个区块生产者 每个分片,每个块生产者将每四个块创建一次块。 3.4 确保数据可用性 为了确保数据可用性,我们使用类似于 Polkadot 的方法 2.5.3 节中描述。一旦区块生产者产生了一个区块,他们就会创建 它的纠删码版本,具有最佳 (w, ⌊w/6 + 1⌋) 块代码 块。 然后,他们发送一块纠删码块(我们称这些块为 块部分,或只是部分)到每个块生产者。 我们计算一棵默克尔树,其中包含作为叶子的所有部分,并且 每个块的标头包含该树的默克尔根。 这些部件通过 onepart 消息发送到 validators。每一条这样的消息 包含块头、部分序号和部分内容。的 消息还包含产生该区块的区块生产者的签名 chunk 和merkle路径来证明该部分对应于header 并由适当的区块生产者生产。 一旦区块生产者收到主链区块,他们首先检查是否 块中包含的每个块都有 onepart 消息。如果没有,则块 在检索到丢失的 onepart 消息之前不会进行处理。 一旦收到所有 onepart 消息,区块生产者就会获取 来自对等体的剩余部分并重建它们所持有的块 国家。 区块生产者不会处理主链区块,如果至少有一个 块中包含的块没有相应的 onepart 消息,或者如果对于至少一个分片,他们无法维护其状态 重建整个块。 对于可用的特定块,该块的 ⌊w/6⌋+1 就足够了 生产者有自己的部分并为他们服务。因此,只要数量 恶意行为者不超过⌊w/3⌋没有超过一半区块的链 构建它的生产者可能有不可用的块。图 17: 每个块包含每个分片一个或零个块,并且每个块 是擦除编码的。纠删码块的每个部分都被发送到指定的 区块生产者通过特殊的 onepart 消息 3.4.1 对付懒惰的区块生产者 如果区块生产者有一个区块缺少 onepart 消息,那么他们 可能会选择仍然在其上签名,因为如果该块最终在链上 将使区块生产者的奖励最大化。区块没有风险 生产者,因为事后无法证明区块生产者没有 onepart 消息。 为了解决这个问题,我们在创建块时让每个块生成器 为未来编码块的每个部分选择一种颜色(红色或蓝色),并存储 编码之前块中指定颜色的位掩码。每一个部分 然后消息包含分配给该部件的颜色,并且在以下情况下使用该颜色 计算编码部分的默克尔根。如果块生产者偏离 从协议中,可以很容易地证明这一点,因为默克尔根不会 对应于 onepart 消息,或者 onepart 消息中的颜色 对应于merkle根将与块中的掩码不匹配。 当区块生产者签署区块时,它们会包含所有区块的位掩码 他们收到的红色部分是块中包含的块。出版一个 不正确的位掩码是一种可削减的行为。如果区块生产者还没有收到 onepart 消息,他们无法知道消息的颜色,并且 因此,如果他们试图盲目地签署协议,就有 50% 的机会被削减 块。 3.5 状态转换申请 块生产者仅选择要包含在块中的交易,但是 当它们生成块时不应用状态转换。相应地,

块头包含之前默克尔化状态的默克尔根 应用块中的事务。 仅当包含该块的完整块时才应用事务 已处理。参与者仅处理一个块,如果 1. 前一个区块被接收并处理; 2.对于每个块,参与者不维护他们拥有的状态 看到了 onepart 消息; 3. 对于每个块,参与者确实维护状态,因为他们有 完整的块。 一旦块被处理,对于参与者的每个分片 维护状态,他们应用交易并计算新状态 在应用交易后,它们就准备好生产 下一个块的块(如果它们被分配给任何分片),因为它们有 新默克尔化状态的默克尔根。 3.6 跨分片交易和收据 如果一笔交易需要影响多个分片,则需要连续进行 在每个分片中单独执行。完整的交易被发送到第一个分片 受影响,并且一旦交易包含在此类分片的块中,并且 在块包含在块中之后应用,它会生成所谓的收据 交易,即路由到交易需要的下一个分片 被处决。如果需要更多步骤,则执行收据交易 生成新的收据交易等。 3.6.1 收据交易生命周期 理想的是,收据交易应用于紧随其生成的块之后的块中。收据交易仅 区块生产者收到并应用前一个区块后生成 维护原始分片,并且需要在 下一个块的块由目的地的块生产者生成 碎片。因此,收据必须从源分片传送到 这两个事件之间的短时间内的目标分片。 令 A 为最后产生的区块,其中包含生成收据 r 的交易 t。令 B 为下一个生成的块(即具有 A 的块) 它的前一个块)我们想要包含 r。设 t 在分片 a 中,r 为 在碎片 b. 收据的生命周期(也如图 18 所示)如下: 制作并保存收据。分片的块生产者 cpa a 接收区块 A,应用交易 t 并生成收据 r。注册会计师 然后将所有此类生成的收据存储在其索引的内部持久存储中 通过源分片 ID。分发收据。一旦 cpa 准备好生成块 块 B 的分片 a,它们获取通过应用分片 a 的块 A 中的交易而生成的所有收据,并将它们包含到 shrad 的块中 块 B 中的 a。一旦生成这样的块,cpa 就会生成其纠删码 版本和所有相应的 onepart 消息。 cpa 知道哪些区块生产者维护哪些分片的完整状态。对于特定的区块生产者 bp cpa 包括在 A 块中应用交易产生的收据 对于具有 bp 关心的任何分片作为其目的地的分片 a 当他们在块 B 中分发分片 a 的块时,在 onepart 消息中 (见图 17,显示了 onepart 消息中包含的收据)。 收到收据。请记住,参与者(区块生产者和 validators)在收到 onepart 消息之前不会处理区块 对于块中包含的每个块。因此,当任何特定参与者应用块 B 时,他们拥有对应于的所有 onepart 消息 B 中的块,因此它们拥有包含分片的所有传入收据 参与者将状态维持为目的地。 当应用 特定分片的状态转换,参与者应用两个收据 他们在 onepart 消息中为分片收集的信息,以及所有 包含在块本身中的事务。 图 18: 收据交易的生命周期 3.6.2 处理过多的收据 有可能针对某个特定分片的收据数量 特定块太大而无法处理。例如,考虑图 19, 每个分片中的每笔交易都会生成一个针对分片 1 的收据。 到下一个块时,分片 1 需要处理的收据数量为 与处理时所有碎片组合处理的负载相当 前一个块。

图 19: 如果所有收据都针对同一个分片,则该分片可能没有 处理它们的能力 为了解决这个问题,我们使用了类似于 QuarkChain 9 中使用的技术。 具体来说,对于每个分片,最后一个块 B 和该分片中的最后一个分片 s 记录应用收据的块。当新分片出现时 创建后,收据首先从 B 中的剩余分片中应用, 然后在 B 之后的块中,直到新块已满。正常情况下 负载均衡的情况下,一般会导致所有收据 正在应用(因此最后一个块的最后一个分片将被记录为 每个块),但在负载不平衡的时候,以及特定的 分片收到不成比例的大量收据,这种技术使他们能够 进行处理,同时遵守所包含交易数量的限制。 请注意,如果这种不平衡负载持续较长时间,则延迟 收据创建直到应用程序可以无限期地继续增长。一 解决这个问题的方法是删除任何创建收据的交易 处理延迟超过某个常量(例如一个时期)的分片。 考虑图 20。到了区块 B,分片 4 无法处理所有收据, 因此它只处理来自区块 A 中最多分片 3 的收据,并且 记录下来。在区块 C 中,包含区块 B 中最多分片 5 的收据,并且 然后到了块 D,分片赶上了,处理所有剩余的收据 B 块和 C 块的所有收据。 3.7 块验证 为特定分片生成的块(或在具有分片链的模型中为特定分片链生成的分片块)只能由 9查看 QuarkChain 的白板片段:https://www.youtube.com/watch? v=opEtG6NM4x4,其中讨论了跨分片交易的方法等 东西图 20: 延迟收据处理 维持状态的参与者。他们可以是区块生产者,validators, 或者只是下载状态并验证分片的外部见证人 他们存储资产。 在本文档中,我们假设大多数参与者无法存储 大部分分片的状态。不过值得一提的是, 存在分片 blockchain ,其设计假设为 大多数参与者确实有能力存储状态并验证大部分 分片,例如 QuarkChain。 由于只有一小部分参与者拥有验证分片的状态 块,有可能自适应地破坏具有 状态,并应用无效的状态转换。 提出了多重分片设计,每隔几次采样 validators 天,以及一天之内分片链中任何拥有超过 2/3 的区块 立即考虑分配给此类分片的 validator 的签名 最终。通过这种方法,自适应对手只需要破坏 2n/3+1 分片链中的 validators 来应用无效状态转换,其中, 虽然可能很难实现,但对于公众来说安全水平还不够 blockchain。 正如第 2.3 节中所讨论的,常见的方法是在为任何具有状态的参与者(无论是 它是一个区块生产者、validator 或外部观察者)来挑战其有效性。这些参与者被称为渔民。为了让渔民能够 挑战无效区块,必须确保这样的区块可供使用 他们。 Nightshade 中的数据可用性将在第 3.4 节中讨论。 在 Nightshade 中,一旦生成了一个块,这些块就不会被验证 除了实际的块生产者之外的任何人。特别是,区块生产者 建议该块自然不具有大多数分片的状态,并且无法验证这些块。当下一个区块产生时,它包含多个区块生产者和 validator 的证明(参见第 3.2 节), 但由于大多数区块生产者和 validator 不维护状态 对于大多数分片来说,只有一个无效块的区块将收集超过一半的证明,并且将继续处于最重的状态 链。 为了解决这个问题,我们允许任何维持以下状态的参与者 一个分片,用于针对该分片中产生的任何无效块在链上提交挑战 碎片。 3.7.1 状态有效性挑战 一旦参与者检测到特定块无效,他们需要提供该块无效的证明。由于大多数网络参与者不维护无效块所在分片的状态 产生后,证明需要有足够的信息来确认该块是 无状态无效。 我们设置单个事务的状态量(以字节为单位)的限制 Ls 可以累积读取或写入。任何触及 Ls 以上的交易 状态被认为是无效的。请记住 3.5 节中的块 特定区块B中仅包含要应用的交易,但不包含 新的状态根。块B中的块包含的状态根是状态 在应用此类事务之前,但在应用事务之后 块 B 之前同一分片中的最后一个块。恶意行为者 希望应用无效的状态转换将包括不正确的状态根 在块 B 中,它与应用产生的状态根不对应 前一个块中的交易。 我们扩展块生产者在块中包含的信息。 它不是只包含应用所有事务后的状态,而是 在应用每个连续的交易集之后包括一个状态根 共同读取和写入 Ls 个字节的状态。 有了这些信息对于 渔夫提出了一个挑战,即状态转换应用不正确 足以找到第一个这样的无效状态根,并且只包含 Ls 字节 受最后一个状态根(之前是 valid)和带有默克尔证明的当前状态根。那么任意参与者 可以验证段中的事务并确认该块是 无效。 类似地,如果块生产者尝试包含读取的事务 并写入超过 Ls 个字节的状态,对于挑战来说,包含就足够了 它与默克尔证明接触的第一个 Ls 字节,这足以 应用交易并确认有一个时刻尝试 读取或写入的内容超出 Ls 字节。

3.7.2 渔民和快速跨分片交易 正如 2.3 节中所讨论的,一旦我们假设分片块(或分片 具有分片链的模型中的块)可能无效并带来挑战 期间,它会对最终结果产生负面影响,从而影响跨分片通信。在 特别是,任何跨分片交易的目标分片都无法确定 在挑战期结束之前,原始分片块或块是最终的 (见图 21)。 图 21: 在申请收据之前等待挑战期 以使得跨分片交易的方式来解决这个问题 Instantinious 是指目标分片不等待挑战期 源分片交易发布后,应用收据交易 立即,然后将目标分片与源分片一起回滚 如果后来发现原始块或块无效(见图 22)。这非常自然地适用于 Nightshade 设计,其中碎片 链不是独立的,而是分片块全部发布 一起在同一个主链区块中。如果发现任何块无效, 包含该块的整个块被认为是无效的,并且构建在其上的所有块 其顶部。见图 23。 上述两种方法都提供原子性,假设挑战 周期足够长。我们使用后一种方法,因为在正常情况下提供快速的跨分片交易比 由于其中之一的状态转换无效,目标分片回滚 源碎片,这是极其罕见的事件。 3.7.3 隐藏 validators 挑战的存在已经大大降低了 自适应损坏,因为要使用无效的状态转换后完成一个块图 22: 立即应用收据并回滚目的地 链如果源链有无效块 图 23: 茄属渔夫挑战 适应性对手需要腐蚀所有参与者的挑战期 维护分片的状态,包括所有 validator。 估计此类事件的可能性极其复杂,因为没有 sharded blockchain 已经存在足够长的时间来尝试任何此类攻击。我们认为这种可能性虽然极低,但仍然足够 对于预计执行数百万笔交易的系统来说很大 经营全球金融业务。 这种信念有两个主要原因: 1. 权益证明链的大多数 validator 和矿工

工作量证明链主要受到财务上行的激励。如果 适应性强的对手为他们提供的钱多于预期回报 从诚实经营的角度来看,可以合理地预期许多 validator 将接受该提议。 2. 许多实体对权益证明链进行专业验证,并且 预计任何连锁店的大部分股权都将被 来自此类实体。此类实体的数量对于 适应性强的对手能够亲自了解他们中的大多数人并拥有 很好地理解了他们的本性被腐蚀了。 我们通过隐藏哪些 validator 分配给哪个分片,进一步降低了自适应损坏的概率。这个想法是 与 Algorand [5] 隐藏 validators 的方式非常相似。 值得注意的是,即使 validator 被隐藏,如 Algorand 中所示 或者如下所述,自适应腐败在理论上仍然是可能的。同时 适应性对手不知道将创建或验证的参与者 一个块或一个块,参与者自己确实知道他们将执行 这样的任务并有它的密码证明。 这样,对手就可以 传播他们的腐败意图,并向任何愿意提供的参与者支付费用 这样的密码学证明。然而我们注意到,由于对手不 知道分配给他们想要破坏的分片的 validator,他们 别无选择,只能将他们破坏特定分片的意图传播给 整个社区。到那时,这对任何诚实的人来说都是经济上有利的 参与者启动一个验证该分片的完整节点,因为有一个很高的 该分片中出现无效块的机会,这是一个机会 创建挑战并收集相关奖励。 为了不泄露分配给特定分片的 validator,我们这样做 如下(见图 24): 使用 VRF 获取任务。在每个纪元的开始 validator 使用 VRF 获取分配给 validator 的分片的位掩码。 每个 validator 的位掩码将具有 Sw 位(请参阅第 3.3 节了解定义 的 Sw)。然后 validator 获取相应分片的状态,并且 在收到的每个块的纪元期间验证对应的块 分配给 validator 的分片。 对块而不是块进行签名。由于分片分配是隐藏的,因此 validator 无法在块上签名。相反,它总是在整个 块,因此不会透露它验证的分片。具体来说,当 validator 接收到一个块并验证所有块时,它要么创建一条消息 证明 validator 分配到的所有分片中的所有块都是 有效(不以任何方式表明这些分片是什么),或者一条消息 如果任何块无效,则包含无效状态转换的证明。请参阅 有关如何聚合此类消息的详细信息,请参阅第 3.8 节,有关如何聚合此类消息的详细信息,请参阅第 3.7.4 节 有关如何防止 validators 捎带消息的详细信息 其他 validators,以及第 3.7.5 节了解如何奖励和惩罚的详细信息 validators 应该成功地发生无效状态转换挑战。图 24: 将 validator 隐藏在茄属植物中 3.7.4 提交-揭示 validators 的常见问题之一是 validator 可以跳过下载状态并实际验证块和块,而是 观察网络,查看其他 validator 提交的内容并重复他们的 消息。遵循这种策略的 validator 不提供任何额外的 网络安全,但收取奖励。 此问题的常见解决方案是为每个 validator 提供一个证明 他们实际上验证了该块,例如通过提供唯一的跟踪 应用状态转换,但这样的证明显着增加了成本 的验证。 图 25: 提交-揭示

相反,我们让 validators 首先提交验证结果(或者 证明块有效性的消息,或无效证明 状态转换),等待一定的时间,然后才显示实际的验证结果,如图25所示。提交时间与 揭示期,因此懒惰的 validator 无法模仿诚实的 validator。 此外,如果不诚实的 validator 提交了一条证明 分配的块的有效性,并且至少有一个块是无效的,一旦 显示该块无效,validator 无法避免削减,因为, 正如我们在第 3.7.5 节中所示,在这种情况下不被削减的唯一方法 是呈现一条消息,其中包含无效状态转换的证明 匹配提交。 3.7.5 应对挑战 如上所述,一旦 validator 接收到带有无效块的块, 他们首先准备无效状态转换的证明(参见第 3.7.1 节),然后 致力于这样的证明(见 3.7.4),并在一段时间后揭示挑战。 一旦揭示的挑战被包含在区块中,就会发生以下情况: 1. 从包含该块的块中发生的所有状态转换 无效块,直到包含所揭示的挑战的块得到 无效。包含已揭示挑战的区块之前的状态 被认为与包含该块的块之前的状态相同 无效块。 2. 在一定的时间内,每个 validator 必须揭示其位掩码 他们验证的分片的数量。由于位掩码是通过 VRF 创建的,如果 他们被分配到具有无效状态转换的分片,他们 无法避免暴露它。任何无法显示位掩码的 validator 假设已分配给分片。 3. 在这段时间内发现分配给分片的每个 validator, 确实提交了包含以下内容的块的某些验证结果 无效块并且没有揭示无效状态转换的证明 与他们的提交相对应的内容被削减。 4. 每个validator得到一个新的分片分配,并安排一个新的epoch 一段时间后开始,足以让所有 validator 下载 状态,如图 26 所示。 请注意,从 validator 显示分配给它们的分片那一刻起 直到新纪元开始,系统的安全性都会降低,因为 分片分配被揭示。网络参与者需要保留它 在此期间使用网络时请谨记。 3.8 签名聚合 为了让拥有数百个分片的系统能够安全运行,我们希望 10、000 或更多 validator 的顺序。正如第 3.7 节中所讨论的,我们希望每个图 26: 应对挑战 validator 发布对特定消息的提交和平均签名 每个块一次。即使提交消息相同,聚合这样的 BLS 签名并对其进行验证的成本将高得令人望而却步。但是 当然,validator 中的提交和显示消息并不相同, 因此我们需要某种方法来聚合这些消息和签名 以便稍后快速验证的方式。 我们使用的具体方法如下: 验证者加入区块生产者。区块生产者是已知的 在纪元开始之前的某个时间,因为他们需要一些时间来下载 纪元开始之前的状态,与 validator 不同,区块生产者是 没有隐藏。每个区块生产者都有 v 个 validator 插槽。验证者提交 向区块生产者提出链下提案,将其纳入其 v 之一 validators。如果区块生产者希望包含 validator,他们会提交 包含来自 validator 的初始链外请求的交易,以及 区块生产者的签名,使 validator 加入区块生产者。 请注意,分配给区块生产者的 validator 不一定是 验证区块生产者为其生成块的相同分片。 如果一个 validator 申请加入多个区块生产者,仅来自 第一个区块生产者将会成功。 区块生产者收集提交。区块生产者不断收集来自 validator 的提交和显示消息。一旦累积了一定数量的此类消息,区块生产者就会计算默克尔 这些消息的树,并向每个 validator 发送 merkle 根和 他们的消息的默克尔路径。 validator 验证路径并登录 默克尔根。然后,区块生产者在区块链上累积 BLS 签名 来自 validators 的 merkle 根,并且仅发布 merkle 根和 累计签名。区块生产者还签署了该区块的有效性 使用廉价的 ECDSA 签名进行多重签名。如果多重签名没有 匹配提交的默克尔根或参与的 validator 的位掩码,这是可削减的行为。同步链时,参与者 可以选择验证来自 validators 的所有 BLS 签名(这非常昂贵,因为它涉及聚合 validators 公钥),或者仅验证来自区块生产者的 ECDMA 签名,并依赖于以下事实: 区块生产者没有受到挑战和削减。 使用链上交易和默克尔证明来应对挑战。它 可以注意到,如果没有,那么泄露来自 validators 的消息是没有价值的。 检测到无效的状态转换。仅包含实际内容的消息 需要揭示无效状态转换的证明,并且仅针对此类消息 需要证明它们与之前的提交相匹配。该消息需要 被披露有两个目的: 1. 真正启动链回滚到之前的那一刻 无效状态转换(参见第 3.7.5 节)。 2. 证明 validator 并未试图证明该信息的有效性 无效块。 无论哪种情况,我们都需要解决两个问题: 1. 实际提交并未包含在链上,仅包含在链上的merkle根 提交与其他消息聚合。 validator 需要使用 区块生产者提供的默克尔路径及其原始承诺 证明他们致力于挑战。 2. 有可能分配给无效分片的所有validator 状态转换恰好被分配给损坏的区块生产者 正在审查他们。为了解决这个问题,我们允许他们提交他们的揭露 作为链上的常规交易并绕过聚合。 后者仅适用于无效状态转换的证明,即 极其罕见,因此不应导致垃圾邮件块。 需要解决的最后一个问题是区块生产者可以 选择不参与消息聚合或故意审查特定的 validator。我们通过制作区块使其在经济上处于不利地位 生产者奖励与分配给他们的 validator 数量成正比。我们 还请注意,由于纪元之间的区块生产者很大程度上相交(因为 它始终是拥有最高赌注的前 w 参与者),validators 可以 很大程度上坚持与相同的区块生产者合作,从而降低风险 被分配给过去审查过他们的区块生产者。 3.9 快照链 由于主链上的区块产生非常频繁,因此下载 完整的历史可能很快就会变得昂贵。而且,由于每 区块包含大量参与者的 BLS 签名,仅聚合公钥来检查签名可能会变得令人望而却步 也很贵。 最后,因为在任何可预见的未来 Ethereum 1.0 可能仍然是一个 最常用的 blockchain 中,有一种有意义的方式来转移资产

要求接近 Ethereum,今天验证 BLS 签名以确保 Ethereum 一侧的近块有效性是不可能的。 Nightshade 主链中的每个区块都可以选择包含 Schnorr 包含此类 Schnorr 的最后一个块的标头上的多重签名 多重签名。我们将此类块称为快照块。第一个块 每个纪元必须是一个快照块。在研究这样的多重签名时, 区块生产者还必须累积 validator 的 BLS 签名 在最后一个快照块上,并按照与中所述相同的方式聚合它们 第 3.8 节。 由于区块生产者集在整个时期内保持不变,因此验证 假设在没有任何情况下,只有每个时期中的第一个快照块就足够了 指出很大一部分区块生产者和 validator 串通并创建 叉子。 纪元的第一个块必须包含足以计算的信息 该纪元的区块生产者和 validators。 我们称主链中只包含快照的子链为 阻止快照链。 创建 Schnorr 多重签名是一个交互式过程,但由于我们 只需要很少执行,任何流程,无论效率如何低下 就足够了。 Schnorr 多重签名可以在 Ethereum 上轻松验证, 从而为执行跨blockchain的安全方式提供关键原语 沟通。 与近链同步只需下载所有快照 块并确认 Schnorr 签名正确(也可以选择验证 validator 的各个 BLS 签名),然后仅同步 主链区块来自最后一个快照区块。

Conclusion

Conclusion

In this document we discussed approaches to building sharded blockchains and covered two major challenges with existing approaches, namely state validity and data availability. We then presented Nightshade, a sharding design that powers NEAR Protocol. The design is work in progress, if you have comments, questions or feedback on this document, please go to https://near.chat.

结论

在本文档中,我们讨论了构建分片 blockchain 的方法和 克服了现有方法的两个主要挑战,即状态有效性 和数据可用性。然后我们提出了 Nightshade,一种分片设计 赋予 NEAR 协议权力。 设计正在进行中,如果您有意见、问题或反馈 对于本文档,请转至 https://near.chat.