Tag Archives: distributed ledger

RMTM Review of the Filecoin White Paper

By Jeff Stollman, Principal Consultant

©Rocky Mountain Technical Marketing, Inc. 2018

 

This article is for information purposes only.  It does not represent investment advice.

                                             Jeff Stollman

Scope

Filecoin claims that its forthcoming network “achieves staggering economies of scale by allowing anyone worldwide to participate as storage providers. It also makes storage resemble a commodity or utility by decoupling hard-drive space from additional services.”  This notion allowed it to become the biggest fund raiser when it went to ICO and raised $257 million.  It has since been surpassed, but this amount raised eyebrows at the time.  A lot of people gave the project a very strong vote of confidence.

Technical Case

Solution

Filecoin leverages the open-source Inter Planetary File System (PFS) to store files across a number of hard drives owned by members of the Filecoin network.  Network members offer up free space on their hard drives in exchange for payments in Filecoin tokens.  Those seeking to store files pay for the storage using Filecoin tokens.

As stated in the white paper:  “It liberates data from silos, survives network partitions, works offline, routes around censorship, and gives permanence to digital information.”

Under IPFS, files are broken into “shards” and the shards are distributed across a number of drives.  IPFS extends the self-healing RAID 5 approach.  In RAID 5 storage arrays, files are distributed across multiple drives with sufficient information overlap to allow recovery on all the data – even if one drive fails..  But while RAID is a centralized solution where all the storage media is controlled by a single provider, IPFS supports a decentralized approach that distributes portions of a file across multiple drives owned by its community.

The cost to those seeking to store files and the payment for those offering their hard drive space fluctuates depending on the availability of the storage space (latency period to recover files) and bandwidth (speed at which the data be transferred when requested).  The full Filecoin solution supplements IPFS with several innovative approaches to ensure the files are actually stored by community members who are being paid for string them and for validated bandwidth (Proof of Spacetime and Proof of Replication).

A further capability of the sharding approach is the confidential files can be stored on public hard drives without jeopardizing confidentiality.

[“Sharding is the process of breaking a file into pieces and then storing each piece in a different place.  This allows large files to be spread across multiple storage devices.  The benefits of sharding include

  • no single point of failure. Even if one storage drive is not accessible, others should be. And shards can be overlapping in their content so that even when one shard is missing the entire file can be rebuilt.
  • faster data transfer.  Because the data are moving along multiple parallel paths from the different storage devices, rather than through a single pipe with centralized storage, transfer rates can be faster.  (Of course, this presumes that the collection point can accept data from all of these input sources at once.)
  • enhanced security. Because breaking into a storage site only provides the adversary with a single shard, it is necessary to break into many separate storage sites to rebuild the entire file.

 A master directory keeps track of all the pieces in order to transfer the entire file and rebuild it on request from an authorized user.]

Depending on how the sharding is accomplished (there are multiple ways of breaking up the data), individual shards may not include enough information to allow determined adversaries to make sense of individual shards.  They would have to learn the locations of a large number of the shards in order to put enough of them together to make sense of the information.

Conspicuously absent from the white paper is the fact that the IPFS solution will require significant overhead to manage the large number of storage volunteered by its community members.  Not only does Filecoin have to track the location of all the shards of a file, it will have to store multiple copies of the data in order to recover the file if one (or several) drives is offline or busy when a request for the file arises.  And because Filecoin has limited control over the community’s storage, it may require a significant amount of additional storage over a centralized solution to meet its service levels for latency and bandwidth.  This overhead will be automated and may not require significant labor.  It will require additional computing power.

Credibility

The Filecoin team includes the developers of IPFS.  IPFS is proven technology.  It works and represents a significant technical accomplishment.   It allows large files to be spread across many drives and the file owner does not have to own this capacity.  It is sufficiently trusted that several other projects (e.g., Storj) have held successful ICOs to raise money to offer nearly identical solutions.

Given the popularity of IPFS, there is little doubt that it can do what is claimed.  The additional innovations used to monitor the health of the network require some work, but there is nothing to suggest that the hurdles of completing these efforts are insurmountable.

Business Case

Business Model

Filecoin claims that its decentralized storage model will achieve “staggering economies of scale” versus dedicated storage pools such as those offered by Amazon Web Service for on-demand file storage capacity.  This is a bold claim for the storage of ordinary files.  It presumes that users will offer vast amounts of spare storage and will likely be content to earn very little value for this storage which would otherwise be idle.  It is similar to the model used by the Search for Extra-Terrestial Intelligence (SETI).  SETI@home is a scientific experiment, based at UC Berkeley, that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI). Computer owners can participate by running a free program that downloads and analyzes some of the terabytes of radio telescope data. SETI pays users nothing, but only leverages unused calculation and storage capacity.

One area in which the decentralized model may find a strong niche is in large confidential databases.  Through sharding, confidential information can potentially be stored much more securely than using centralized storage.  Enterprises with large databases of personally identifying information (PII) may be able to store such information in a sharded form across the Filecoin network more securely than leaving it in their own “secure” data center.  Storing confidential product technology in this way may help prevent its exposure to industrial espionage.  Other confidential files used by the intelligence communities around the world may leverage this option.  Of course, the files will be brought together when they are used by their owners.  So this is not a foolproof solution.  It also suggests that Filecoin will need to closely guard its IPFS directory which tells where all of the shards for a particular file are stored.

Business Credibility

In essence Filecoin is the opportunity of the open-source development team to cash in the development of IPFS – which was done more-or-less on a volunteer basis.  The team has leveraged its credentials as the developers of IPFS to attract what at the time of its ICO was a record token raise:  $257 million.  Its vision of a decentralized storage network that pays members of its community for the use of spare storage on their various computing devices resonates with the large and vocal contingent of decentralization zealots within the blockchain community.

Looking at the business model without the lens of decentralization, however, it is unclear that the Filecoin business model will live up to the expectations of the investors who plunked down $257 million for Flecoin tokens.  The business model is predicated on some critical claims which are not substantiated in the white paper.  In fact, the white paper includes no discussion of the business model at all.  The entire white paper describes in detail the technical solution and offers no substantiation of the credibility of the assumed business model.

Incentive to offer storage

In the white-paper abstract, Filecoin claims that by offering rewards to community members for their unused storage, “creates a powerful incentive for miners to amass as much storage as they can, and rent it out to clients.”  This validity of this claim is suspect.  Users continuously add applications and data to their computing devices.  When first purchased, such devices may have ample excess storage.  But over time they fill up.  And it is impractical to completely fill a disk drive.  This can slow performance markedly – especially if file storage and retrieval of third-party files occurs at a critical time occupying the storage drive as well as the internet bandwidth that the user needs.

More importantly, the white paper is silent on how much users can make by offering their storage. The white paper dismisses the pricing issue by stating that the market will determine the price of storage.  Given their business model of allowing participants to offer their own prices for storage, this is accurate.  But the white paper could have offered the cost of AWS storage as an upper bound the price that could be obtained.  Calculating this upper bound using US East Coast pricing, AWS will charge $99.40 per month to store 1 TB of data in a single volume that is refreshed once per month.  (That is 10 cents per GB per month.)  This cost is an upper bound.  Filecoin will have to get a share of this to pay for their operations, maintenance, and profit.  And if they do not operate as efficiently as AWS because either they lack the economies of scale and/or they have higher overhead costs to manage their decentralized storage network, these operational inefficiencies will have to be deducted even from this upper bound.  If we assume that Filecoin will take 50% of this upper bound to cover their costs and profit, this leaves less than five cents per GB for users. If they make only a few pennies a month, is it worth it to them to participate and risk degrading the performance of their own computing devices?  Alternatively, is it worth buying new storage capacity to exploit this small payout?

Low-cost storage

Filecoin’s website claims that it will “reliably store files at hyper­competitive prices.”  But neither the white paper nor the website provide any substantiation.

A first red flag surfaces here because the cost of storage is not high to begin with.  And its cost continues to drop.

A second caution comes from an analysis of how these cost savings could be achieved.  Large centralized storage business such as AWS achieve significant economies of scale.  They achieve these economies through (1) leveraging their purchasing power to obtain storage capacity at a lower cost than most community member can achieve, (2) refreshing their equipment on a regular schedule to continuously offer faster and more secure storage capability, (3) optimizing their operating costs, and (4) spreading their maintenance and support costs over a huge infrastructure.  Furthermore, operating as centralized entities, large, global providers frequently demonstrate the agility to rapidly modify their practices to address new threats – including adapting to use some of the practices offered by new competitors offering different approaches.  For example, if decentralized storage networks begin to attract significant business providing enhanced confidentiality, centralized providers can distribute storage among the many storage units in their globally distributed data centers.

But Filecoin’s white paper and website fail to address the purchase cost of storage for customers seeking to store their data.  So how can it be determined that they can achieve hyper-competitive prices?   Assuming that organizations seeking to save money with Filecoin are large data users who have more incentive to seek lower prices, we can use a similar estimate for storage buyers as we did for storage sellers.  Using the same AWS price estimate we can gauge the upper-bound cost that such users would be willing to pay for storage.

The question then becomes, “How much of a savings does Filecoin have to provide to induce a current centralized-storage user to switch?”  There is a real cost of switching.  The new client’s procurement department will need to approve a new vendor, migrate the data from the current provider to Filecoin, and cancel their current contract.  These are not big costs and they are one-time costs.  But they add enough friction that most businesses seeking storage won’t migrate for only a small savings.  They need to save enough to overcome the friction.

A third red flag arises from the ability of the centralized provider to react to competitive pricing.  If competitors such as  Filecoin do offer sufficient savings to induce many customer to switch, the centralized players will likely lower their prices (by reducing margins) to remain competitive.

Confidential storage

Even the niche market for storage that uses decentralized storage to achieve improved data confidentiality.  If the market for this niche grows large enough, centralized storage vendors will offer their own version of “distributed storage.”  They will use sharding (and possibly IPFS) to distribute files among their own storage equipment which is already distributed in multiple data centers around the world.  In the example of AWS, they have six data centers in North America, one in South America, six in Asia/Pacific, and four in Europe.  This might not be as widely distributed as Filecoins network, but even within a single data center, files can be broken up across multiple storage devices on different storage networks.

There is an argument that security may differ among the two scenarios.  A centralized storage supplier may not offer the “defense-in-depth” for security.  A hacker able to break into the storage at one AWS facility may have the skill to break into the others to collect enough of the data shards to replicate the original file.  Filecoin’s community, on the other hand, will likely use a diverse set of security solutions to protect their individual storage devices.  But Filecoin’s IPFS directory will still provide a single-point-of-attack.  And the many users offering up storage may have diverse but easy-to-hack security practices compared to the large, centralized data centers.

Legal Issues

Filecoin deserves credit for electing to take the “high road” regarding the legal/regulatory issues of its ICO.  They issued their ICO through a Simple Agreement for Future Tokens (SAFT) and they incorporated Know Your Customer/Anti-Money-Laundering (KYC/AML) screening of potential investors.  But even these safeguards to no remove all risks.

SAFT

To their credit, Filecoin issued their tokens using as security tokens limited to accredited investors under the SAFT approach.  This allowed them to raise ICO funds from US citizens and residents.  And, in theory, once the network is in place and tokens are issued to the ICO investors, the tokens should be utility tokens and no longer subject to the requirement that their resale be limited only to accredited investors.

The SAFT approach has not been tested by the courts.  It, therefore, still represents a risk that if tokens are resold to investors who are not accredited, the SEC may step in to stop it.  It is my hope that the SEC will not take such action.

KYC/AML

The resale of tokens may still be restricted by Know Your Customer/Anti-Money-Laundering regulations that apply to multiple countries.  The issue for owners of Filecoin tokens is whether this screening (which has a cost) will dissuade the general public from participating in the secondary market.  Even “mining” tokens by offering storage resources may require this screening.

Governance Issues

Filecoin is a private enterprise.  It controls its destiny.  ICO investors have no official power to impact decisions made by the company.

Under the SAFT, the money invested during the ICO is not immediately liquid.  The nature of the SAFT is that tokens are not issued until the infrastructure is in place.  As a result, investors must wait for Filecoin to solve the various technical issues (most of which were identified in their white paper) and deploy the network before tokens are issued that can be resold on a secondary market – should one arise.  There are several risks in this approach.  These include the following:

Filecoin never deploys their network

If the network is never deployed, tokens will never be issued and money invested in the ICO will be lost.  This could occur because

  1. technical hurdles in developing this innovative technology cannot be solved
  2. the company runs out of money before they are able to deploy the network
  3. insufficient storage “miners” offer to participate preventing the network to grow large enough to allow for the issuance of utility tokens under the SAFT.

In the first two scenarios, investors will likely lose all their investment.  In the third scenario, and tokens issued would be securities and their resale would be restricted to accredited investors.  This would dramatically limit the number of people eligible to purchase Filecoin tokens on the secondary market.

Filecoin storage is not competitive

As noted above, Filecoin predicates its business model on its ability to provide cloud-based storage at lower costs that existing storage offerings from other suppliers.  But, this is a difficult challenge.  And Filecoin has not provided an explanation of how they will achieve lower costs.  If the pricing is not competitive, a secondary market for Filecoin tokens will not develop; no one will want to purchase the tokens because they will not be seeking to purchase storage through Filiecoin.  In this scenario, it is likely that, even if tokens are issued, they will soon become worthless.

The Many Misconceptions about Blockchain Decentralization

Decentralization has become one of the most misunderstood concepts in the blockchain world. As one of the key elements in the first blockchain killer app – Bitcoin – the term has taken on a life of its own. For many, the concept of decentralization has become a holy mantra that must be applied to all aspects of a “good” blockchain solution.

Part of the problem with the term “decentralization” is that in the blockchain world, it can apply to multiple aspects of a solution. In each, its meaning and consequences are different. And, depending on the application, the appropriateness of decentralization in each of its aspects can vary. In contrast to widespread beliefs of many blockchain enthusiasts, decentralization is not good in its own right. It is a valuable tool to mitigate particular risks that may apply to blockchain solutions.

I will describe the various ways that decentralization can be applied to mitigate risk blockchain solutions and discuss when and where it is appropriate below. But first, it is important to recognize that:

The essence of a blockchain solution is centralization.

The role of a blockchain solution is to create a single source of truth that can be trusted by all of its participants. While this single source of truth may be replicated in hundreds of places, it remains a single (and, thereby, centralized) source.

With this basic understanding, let’s look at the various ways that decentralization can be applied to mitigating risk in blockchain solutions. I posit that there are at least three separate problems for which decentralization may be a viable risk mitigation technique in a blockchain solution:

1. Hegemony
2. Single Point of Failure
3. Single Source of Vulnerability

We discuss each of these separately below.

Hegemony

The most important problem that decentralization can solve is hegemony. In the Libertarian tradition that seems to be at the heart of the blockchain movement, there is a resistance to having any single entity control the application. This is likely an element of human nature. We are wont to cede control of things that are vital to us, if it can be avoided.

Bitcoin is an excellent example. Bitcoin enthusiasts typical herald the independence of Bitcoin from the political agenda that can play a strong part in the valuation of fiat currencies controlled by a sovereign nation. When the US Federal Reserve wants to foster employment in the US, it can devalue the dollar (e.g., by reducing the interest rate paid on dollar-denominated funds held overnight). Investors in the dollar are thus vulnerable to this loss in value based solely on actions of the sovereign owner of the currency.

Such hegemony is not limited to cryptocurrencies. As we have been witnessing in the banking industry, there has been much jockeying for control of interbank payment applications under development. Because of the concerns that control of such an application would confer too much power on the owner of the application, banks are allying with partners to have some say in a collaborative solution in which control of the application is decentralized among the alliance members. But because each member, ultimately is seeking advantage, alliances, such as R3 and the Enterprise Ethereum Alliance, have experienced ongoing membership churn and varying levels of commitment by their members.

In the pharmaceutical industry where a blockchain application may prove to be a powerful solution to compliance with the US Drug Supply Chain Security Act (DSCSA), industry members first solution requirement is that any solution not be controlled by either a single company or even a group of companies (e.g., an alliance of distributors).

The reasons that potential participants cringe at the potential of hegemony manifests itself in three ways:

1. Data ownership
2. Data validation
3. Market manipulation

Data ownership

If a single entity (e.g., a sovereign nation, a business enterprise, or an alliance of businesses that doesn’t comprise the full membership of the user community) controls the blockchain, those members not in control fear abuse by the application owner. This abuse can come from the owner exploiting competitive intelligence gained from omniscient access to the data (because the owner may by privy to the identifiers assigned to the other users).

In a stock or bond trading application, the owner may use the data to front-run other trading partners. That is, the owner may place himself into the middle of trades between other trading partners, exacting a markup on each trade.

Data validation

If a single entity controls the validation of blockchain transactions, their ability to create false transactions that serve their own interests becomes available. Or they could change the time ordering of transactions to allow them to “past post” certain transaction to cover for their exploitation of their information advantage.

Sticking with a stock market example, if a large buy order comes in with a limit of 53 when an asset is currently trading at 51, they may buy up the asset at 51 and only after moving the market through such purchases, resell it to the true buyer at 53.

Market manipulation

Exercising control over the application, the owner may place its own interests above those of the other users. For example, the owner of an application for matching and clearing stock or bond transactions may limit the matching of bids only to their own trading desk. They may even go so far as to arbitrarily raise or lower the price of an asset to better exploit their proprietary trading positions in certain assets.

The owner may choose to unilaterally alter the rules that govern the blockchain application. Raising transaction fees once other partners have committed substantial investments in use the system would be one example. Or they may place limits on the ability of some users to freely transact. They might also change the prioritization of certain transaction.

Another manipulation technique would be to delay reporting of current prices in order to exploit the changes in prices that may occur during the gap. This is similar to “past posting” in horse racing. If the owner delays information on the price of an asset, he can transact without risk knowing that the market is moving in a particular direction while other members are unaware of this movement.

Decentralizing ownership of the application can resolve each of these problems. But decentralization need not imply that the blockchain is public. A permissioned blockchain in which all members have a say in the governance of the blockchain can provide sufficient decentralization to address the hegemony issues.

Single Point of Failure

A centralized solution – one in which there is only a single copy of “the truth” is always vulnerable to the failure of the system. Failures such as power shortages or natural disasters can cause a centralized solution to fail. Even if the solution is governed by a large number of users, failure to decentralize the servers and the data themselves, subjects the solution to significant risk. In a solution for which the infrastructure is centralized within a particularly country or region, it may be subject to being shut down by political action or acts of war.

By decentralizing the infrastructure and the data and distributing it both geographically and internationally, much of this failure risk can be reduced.  But this can be done with a single owner data file or by a file shared among a small group of “permissioned” members.  Many enterprises today maintain multiple copies of critical data that a geographically dispersed to ensure that they can continue to run their business even when one or more sites suffers from a disaster — be it man-made of an act of nature.  While the argument can be made that “more is better” and that distributing data to every possible user is safer than maintaining only a handful of copies, the risk is diminished only negligibly after a handful of copies are deployed, and the support costs rise arithmetically with each new instance.  Even though no central owner has to pay these costs, there is a cost to the owners of each copy for storing the data, the network link to keep it up to date, the computing power to read and write the data, and the electricity to run the node.

Single Source of Vulnerability

Data security is a critical element of most blockchain applications. The value of having a single source of truth is negated if the truth can be manipulated by adversaries out for their own gain. Even organizations with some of the strongest data security processes have been compromised by such adversaries. If the blockchain ledger is held in one location, a determined adversary is likely to be able to gain access to it and could alter it, compromising its value as a source of truth.

By distributing copies of the continuously updated ledger to many locations, adversaries have to penetrate each location and alter multiple copies of the data structure in real time in order to falsify records. The more copies of the current data that exist, the more difficult is the task of an adversary seeking to falsify the records.

The Single Source of Vulnerability problem is similar to the Single Point of Failure problem.  Both benefit from maintaining multiple copies of the data in different locations.  The difference is that the security employed to keep adversaries out of different nodes is different, further reducing the vulnerability of the data to tampering.   Because most nodes are independent of one another, they often use different computing systems, with different operating systems, and employ different security techniques.  This increases the difficulty of manipulating data on multiple blockchain nodes.  The adversary has to be able to penetrate each node with their different defenses and still be able to alter the data on at least 51% of them at the same point in time.  This is very difficult when the nodes are independent of one another and use different hardware and software.

The task becomes a bit easier, however, if a large percentage of the validating nodes are using similar equipment.  Because adversaries need only alter 51% of the nodes to alter the blockchain, there may enough of one type of security infrastructure in use that it could be done.  But more important is the fact that this “defense-in-depth” — the use of different security tools and procedures at different points in a system — could be deployed by a private blockchain as well.  If a blockchain owner (individual or group) elected to distribute the hosting of their validating nodes on a variety of clouds (e.g., Amazon, Microsoft, IBM, Oracle, etc.) they could create a similar level of protection.  Arguably, such a system may be more secure because the major cloud service providers each employ strong security to protect their clients and their reputation.  Validating nodes on permissionless blockchains may be much easier to penetrate — even if there are more of them.

Jeff Stollman is Principal Consultant at Rocky Mountain Technical Marketing, Inc.  He has been working in the blockchain space for over three years, assessing the technology, designing enterprise blockchain solutions, and developing go-to-market strategy and white papers for other blockchain projects.  He has four patents pending in the blockchain arena.