How the blockchain has been used over the years to store arbitrary data not related to bitcoin transactions.
Since the birth of Bitcoin, it has quickly become evident that, as it is structured, the Bitcoin blockchain is the most resilient source of digital data available to humanity. Indeed, not only are the blockchain data distributed over tens of thousands of computers around the world, but there is also an incentive for Bitcoin users to continue to store and maintain a copy of the blockchain over time.
The incentive is not directly economic, no one is paid to run a node on their computer, but having a local copy of the blockchain allows one to independently verify, without the need for third parties, whether a specific Bitcoin transaction is valid or not. This is fundamental to ensure that the receipt of Bitcoin payments can be truly trustless, independent of any external service.
As long as Bitcoin continues to be used, a large number of easily accessible copies of the blockchain will continue to exist somewhere in the world, making the information written within it much more persistent than any other alternative for storing digital data. This characteristic has made the blockchain very interesting for all those who were intent on writing a message or uploading an image that would remain accessible to humanity for very long times, ideally for eternity.
Inscribing arbitrary messages on the blockchain, which do not represent a movement of bitcoin, is considered by many to be spam or even an attack on the network. Those who spend resources to keep a Bitcoin node active on their computer do so primarily to stay updated on the state of the blockchain and keep track of the legitimate owners of every bitcoin in circulation. These users are not interested in consuming bandwidth, memory, computing power, and disk space just to store the messages of strangers, making them accessible in the future to others.
Therefore, the improper use of Bitcoin harms those who are trying to use it for its original purpose: to make payments through a neutral, censorship-resistant, and unconfiscatable monetary system.
How to write on the blockchain
But if Bitcoin was created to serve only financial use cases, how is it possible to use it as a virtual wall on which to engrave messages and images? There are several methodologies available, the first of which was used by Satoshi Nakamoto himself in the Bitcoin genesis block.
In block 0, the famous phrase “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks” was engraved, corresponding to the title of The Times on January 3, 2009. It was used to demonstrate that the genesis block was actually mined on January 3 and not months (or years) earlier, which would have allowed Satoshi to prepare a very long blockchain in advance, thereby being able to perform double spends later, by publishing it after conducting transactions on the shorter public chain. Perhaps by chance or perhaps deliberately, the message also has a political component, as it highlights the failures of a financial system that regularly requires bailouts.
From a technical standpoint, the message in the Bitcoin genesis block was inserted into the input of the transaction that assigns the block reward, known as ‘coinbase‘ (not to be confused with the homonymous exchange). Since the reward transaction does not transfer existing bitcoins but generates new ones, the input field is technically superfluous and can be used by the miner to insert data at their discretion.
Satoshi was not the only one to use the coinbase to insert messages on the blockchain. Already in 2011, the mining pool Eligius (later relaunched as Ocean) began inserting christian prayers in the coinbase, as a testament the strong Catholic faith of its maintainer, the well-known Bitcoin developer Luke-jr. This event caused numerous controversies, not so much for the idea of considering it spam on the blockchain, but rather for the content of the messages, which many people did not share.
Writing on the blockchain without being a miner
The practice of writing messages within the coinbase of a block is accessible only to miners. Therefore, a regular user must rely on other techniques if they wish to achieve the same result. One of the first methodologies developed for this purpose is to create ‘fake’ Bitcoin addresses. These addresses are not generated from a private key but are solely intended to contain a string with arbitrary data inside.
Among the first examples of this system is the inscription of a photograph of Nelson Mandela, created in his memory shortly after his death in 2013. To do this, the image was divided into short hexadecimal strings, which were then inserted into the outputs of Bitcoin transactions. These strings, visible as addresses in a block explorer, can be assembled to reconstruct the original image.
For example, the address 15gHNr4TCKmhHDEG31L2XFNvpnEcnPSQvd (which appears for the first time on the blockchain in this transaction) corresponds to the hexadecimal 334E656C736F6E2D4D616E64656C612E6A70673F, which converted into Unicode produces the string “3Nelson-Mandela.jpg?”, i.e., the name of the image file.
This method of writing arbitrary data on the blockchain is particularly problematic. In addition to taking up disk space of those running a full Bitcoin node, such inscriptions occupy non-prunable space. Bitcoin nodes allow users to not store old transactions and old blocks once validated, thus saving disk space without compromising security.
The goal of a node is to obtain a view of the UTXO set (Unspent Transaction Output), i.e., which addresses have a positive balance and can move Bitcoin at any moment. By creating outputs on the blockchain used not to spend bitcoins in the future, but only as data storage, it imposes a greater cost on the nodes. These outputs are not prunable, as the node cannot know if they are spendable or not, and consequently, it is obliged to dedicate permanent disk space to them.
Note that this is not only a problem for current nodes but is a cost that will affect every future Bitcoin node, at least until some new technological development allows nodes to compress the state of the blockchain (e.g., zero-knowledge proofs).
The introduction of OP_RETURN
To mitigate the problem of non-prunable inscriptions, in 2014, with version 0.9 of Bitcoin Core, support was introduced for transactions with OP_RETURN outputs. According to Bitcoin’s validation rules, this new type of output cannot be spent, even if it contains bitcoins. In this way, nodes can be sure that the information contained in the output is not relevant for reconstructing the state of the network and can therefore delete it from the disk. OP_RETURN outputs are suitable for inscribing arbitrary data, as they minimize negative externalities on the network.
However, before being pruned, the content of an OP_RETURN output must be downloaded and validated by every node. To avoid abuses, the maximum size of the output during the relay phase of the transaction has been limited to 83 bytes (initially it was 40 bytes), an amount deemed sufficient to cover the most ‘legitimate’ use cases, such as the publication of hashes and metadata.
Over the years, various protocols have leveraged OP_RETURN outputs to support new use cases not related to the mere movement of bitcoins, such as the first tokenization protocols on Bitcoin, including Counterparty, which was also subject to many criticisms for how it used blockchain space, or non-financial applications like Eternity Wall, a service for writing indelible messages on the blockchain, and Open Timestamp, an efficient document notarization protocol on Bitcoin.
Ordinal and Taproot
More recently, with the Ordinal Inscriptions in 2023, a new methodology for writing large amounts of data on the blockchain has spread. The Ordinal Inscriptions exploit some characteristics of Taproot transactions to insert data in the witness field (i.e., the input) of the transaction, where there are no byte limits. This allows for the insertion not only of short strings but also, if desired, entire files, with the only size limit being the 4MB of the Bitcoin block. This methodology also takes advantage of the so-called ‘witness discount‘, a system that reduces fees for bytes within the witness field of a Segwit transaction compared to bytes in the output field.
Just like in the case of OP_RETURN, the data written inside the witness field are prunable by nodes, but without very low space limits, the inscription of entire files is incentivized rather than more concise hashes. The Ordinals have shown to experience periods of fluctuating popularity, with periods in which spikes in the use of the protocol have filled the blocks on Bitcoin, causing a significant increase in fees for the entire network, thereby attracting the dislike of many users.
The Illusion of Eternity
The only reason to pay for the costly space on the blockchain to insert images and messages is the idea that these inscriptions may remain forever.
To maximize this hope, the STAMP protocol has even taken a technological step back, foregoing the convenience of OP_RETURN and the space and fee advantages offered by Ordinal Inscriptions. It chose to use bare multisig as a technique for inscribing images. Bare multisig was the first attempt to implement multi-signature transactions on Bitcoin, but they were not widely adopted due to their complexity of use (the first multisig wallets emerged only with the introduction of P2SH). However, they have the capability to create very large outputs that comply with the network’s relay rules. Such outputs can be used to insert data into the blockchain, but are seemingly spendable outputs and therefore become part of the UTXO set, making them non-prunable by nodes. This non-prunability is precisely the characteristic sought by the developers of STAMP, as it would ensure that all Bitcoin nodes are obliged to store the data of those who made the inscription.
This technique is not only extremely parasitic, but it could also prove to be futile in the long term, as new pruning methodologies could emerge. For example, specialized software could be developed that maps all known inscriptions and allows nodes to delete the data related to them. Furthermore, future systems of blockchain and UTXO set compression through the use of zero-knowledge proofs could be developed, making the idea of eternal storage on third-party computers once again illusory.
What categorizes a transaction as spam?
Since the earliest examples of data inscription on the blockchain, there has been a debate about whether such transactions should be considered spam. On one hand, one could argue that the only requirements for a Bitcoin transaction are to respect the protocol rules and pay an appropriate fee. On the other hand, it’s clear that those who decide to dedicate resources to maintain an active Bitcoin node do not do so with the intention of providing free storage for strangers’ images. The mere fact of being compatible with the protocol and paying a sufficient fee is not a sufficient condition to be considered non-spam. Similarly, even the insistent phone calls of a call center, which respect the telephone protocol and pay the telephone bill, are still considered spam.
In the end, the perception of spam is undoubtedly subjective. A lonely old person might even appreciate a call from a call center, just as a bored millennial might be happy to have the JPEG of a pixelated image on their node. From a certain point of view, someone who owns a node might consider all transactions not related to their wallet as useless spam that would be better off not existing. However, the moment one decides to install a Bitcoin node, one implicitly accepts the possibility of receiving up to 4MB per block of potentially irrelevant data. Whether these data encode useless JPEGs or transactions between people on the other side of the world we don’t know, in the end, doesn’t make a big difference.
The fundamental assumption of Bitcoin to ensure that the resources demanded by the protocol are used for productive purposes is based on the idea that, in a free market for fees, productive users can outbid those with low-value use cases. If we continue to observe activities that we consider of low value, this may be due to either the lack of sufficient users employing Bitcoin for high-value-added purposes capable of filling the blocks, or the fact that the blocks themselves are too large.
Modifying the size of the Bitcoin block would require a soft fork, difficult to implement. Therefore, the best way to reduce spam on the blockchain is to encourage more high-value activity, like large settlements, so that use cases of dubious utility just get priced out of the blockchain.