Efficient, Usable, And Cheap Storage Of IPFS Hashes In Solidity Smart Contracts

in #ethereum6 years ago

<h1>Efficient, Usable, And Cheap Storage Of IPFS Hashes In Solidity Smart Contracts <p dir="auto">Recently while working on a new project, I encountered the need to store IPFS Hashes, in particular CIDv1 type on the ethereum blockchain, in a user-friendly but also gas efficient manner. Why CIDv1? Well CIDv0 uses sha2-256, whose output size does not fit into a single <code>bytes32 storage slot. From what I've seen in the wild, anytime people need to store IPFS hashes in smart contracts, it's almost always a hash of the IPFS CID stored in a <code>bytes32 storage variable, or the IPFS hash itself stored in a <code>string storage variable. Working with CIDv1 does not suffer from this :) <p dir="auto">While there is nothing wrong with either of those two approaches, they are suboptimal. Hashing a hash simply to fit into a single storage slot makes it significantly harder to consume and is only worthwhile if you need to do so for security reasons. Storing the IPFS hash in a <code>string storage variable is very expensive in general, and also suboptimal. <p dir="auto">The trick here is being able to fit into as few, but fully occupied storage slots as possible (efficient), while being easy to consume (usable) and using minimal gas (cheap). This sounds simple in theory as all you need to do is find a hash function that takes up a single <code>bytes32 storage variable, or two <code>bytes32 storage variables, but because IPFS uses multiformats (or more appropriately, multihash) this isn't as easy as it sounds. To figure this out, we're going to need to talk a bit about multihash. <h2>Multihash <p dir="auto"><a href="https://multiformats.io/multihash/" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Multihash is a subset of the <a href="https://multiformats.io/" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">multiformat specification and allows us to create <a href="https://github.com/multiformats/multihash" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">self describing hashes. The format for multihash looks like (image is from github.com/multiformats/multiformats): <p dir="auto"><img src="https://images.hive.blog/768x0/https://raw.githubusercontent.com/multiformats/multiformats/eb22cd807db692877a9094b5bfb4d2997fd0278a/img/multihash.006.jpg" srcset="https://images.hive.blog/768x0/https://raw.githubusercontent.com/multiformats/multiformats/eb22cd807db692877a9094b5bfb4d2997fd0278a/img/multihash.006.jpg 1x, https://images.hive.blog/1536x0/https://raw.githubusercontent.com/multiformats/multiformats/eb22cd807db692877a9094b5bfb4d2997fd0278a/img/multihash.006.jpg 2x" /> <p dir="auto">So given the above, this means that even if you were able to find a hash function whose output is 32 bytes, multihash would then add an additional 4 bytes onto that for a total of 36 bytes. This means we then need to store it in a <code>bytes storage variable which has a single word overhead. <h2>Solution <p dir="auto">This then lead me to the question, what if we can find a hash function which when in multihash format takes up a single <code>bytes32 storage slot we would be good to go! Well not quite, the only multihash that did this was <code>blake2b-136, which <code>go-ipfs nodes don't accept by default due to security risks... yikes! <p dir="auto">After several hours of different experimentation (aka blindly trying multihashes), I was finally able to find a hash function which when in multihash the output is 64 bytes, which means we can store this in exactly two <code>bytes32 storage slots, completely filling two slots and not wasting any. The multihash is <code>blake2b-328. So to use this, all we need to do is take the 64 bytes output, split it in two and we're good to go! <p dir="auto"><img src="https://images.hive.blog/768x0/https://gateway.temporal.cloud/ipfs/QmR5uZMmt6rzsyW4iyHjFSb8ieUqqrWJWz7ea2iA6pBV1M" srcset="https://images.hive.blog/768x0/https://gateway.temporal.cloud/ipfs/QmR5uZMmt6rzsyW4iyHjFSb8ieUqqrWJWz7ea2iA6pBV1M 1x, https://images.hive.blog/1536x0/https://gateway.temporal.cloud/ipfs/QmR5uZMmt6rzsyW4iyHjFSb8ieUqqrWJWz7ea2iA6pBV1M 2x" /> <h2>Example <p dir="auto">To keep things short, I'll demonstrate the most optimal solution I found after trying a few different combinations and ways to store two <code>bytes32 storage variables. <p dir="auto">The first step is two define two parts of the hash, <code>hashPart1 and <code>hashPart2. In order to store our IFPS hashes here, we need to take the 64 bytes output of the <code>blake2b-328 multihash, split it in half, storing each half within a 2 element array of <code>bytes32 type, passing that into the function <code>updateHash. <p dir="auto">Now, whenever we want to consume this data, all we have to do is call <code>getHash which will return the complete hash in <code>bytes type. If you're consuming this in a mobile phone DApp, then all you need to do is convert to string, which in golang would be <code>string(returnedBytes) and you have your IPFS Hash in plaintext! <pre><code>pragma solidity 0.5.7; contract Hash { bytes32 public hashPart1; bytes32 public hashPart2; function updateHash(bytes32[2] memory _hash) public returns (bool) { hashPart1 = _hash[0]; hashPart2 = _hash[1]; return true; } function getHash() public view returns (bytes memory) { bytes memory joined = new bytes(64); bytes32 h1 = link.currentHash[0]; bytes32 h2 = link.currentHash[1]; assembly { mstore(add(joined, 32), h1) mstore(add(joined, 64), h2) } return joined; } } <p dir="auto">So does this actually save gas, or did I just waste your time? <h2>Gas Consumption <p dir="auto">To test gas consumption, i wrote the following fairly ugly test contract to measure gas consumption. To get cas costs, I would check the <code>CumulativeGasUsed field of the transaction receipt. Tests were ran on a local private PoA chain on my laptop with a 1sec block time, using <code>geth 1.8.27-stable. <p dir="auto">The functions <code>updateLink, <code>updateLinkHash and <code>updateLinkParts are used to test gas costs from different ways of storing data in two <code>bytes32 storage slots. The function <code>setCID is used to test gas costs for storing a hashed IPFS hash.. The function <code>setCIDString was used to test gas costs from storing the plaintext (aka string) version of the IPFS hash. <pre><code>pragma solidity 0.5.7; contract GasTest { string public hashString; bytes32 public hash; bytes32[2] public linkHash; bytes32 public linkPart1; bytes32 public linkPart2; LinkObject private link; struct LinkObject { bytes32[2] currentHash; } function updateLink(bytes32[2] memory _newHash) public returns (bool) { link.currentHash = _newHash; return true; } function updateLinkHash(bytes32[2] memory _newHash) public returns (bool) { linkHash = _newHash; return true; } function updateLinkParts(bytes32[2] memory _newHash) public returns (bool) { linkPart1 = _newHash[0]; linkPart2 = _newHash[1]; return true; } function setCID(bytes32 _cid) public returns (bool) { hash = _cid; return true; } function setCIDString(string memory _cid) public returns (bool) { hashString = _cid; return true; } } <p dir="auto">And the results of cumulative gas usage from the above contract is <div class="table-responsive"><table> <thead> <tr><th>Function<th>Cumulative Gas Used <tbody> <tr><td>updateLink<td>66360 <tr><td>updateLinkHash<td>66426 <tr><td>updateLinkParts<td>66071 <tr><td>setCID<td>43810 <tr><td>setCIDString<td>86213 <p dir="auto">Initially you might be looking at the gas cost for <code>setCID and start thinking that I just wasted your precious time. However, we need to consider the fact that this isn't actually just the IPFS hash. It is a hash, <em>of the IPFS hash. So while this may be gas efficient, it is not easy to consume outside of smart contracts, and is abysmal at best to consume within other smart contracts because: <ol> <li>We need to store a plaintext copy of the hash somewhere accessible by the smart contract (storage) <li>We need to read the plaintext data from storage, hash it, then compare the two hashed hashes. <p dir="auto">Now after considering that, the gas prices for the hash storage methods being talked about here (66071 -> 66360 depending on the method being used), combined with the fact that there you can store+consume the hashes as is, seems pretty useful in my eyes. <h2>Takeways <ul> <li>Use <code>blake2b-328 <li>Cast to bytes <li>Store in 2 <code>bytes32 storage slots <li>??? <li>profit!
Sort:  

!cheetah unban

Account verified.

Okay, I have unbanned @rtrade.

Warning! This user is on my black list, likely as a known plagiarist, spammer or ID thief. Please be cautious with this post!
our discord. If you believe this is an error, please chat with us in the #cheetah-appeals channel in

This is such bullshit, all because I refused to verify my account with some unknown third-party service?

I was about to rant about plagiarism but holy shit, you're the author indeed. (unless someone compromised this name on Steem)

Maybe try contacting whoever makes this anti-plagiarism measure (not sure if cheetah is the creator or the bot)

Posted using Partiko Android

I don't know if you have heard about utopian, utopian support open source project like yours. Sorry for cheetah flagging down your post. You can always join utopian discord channel here good luck and have fun.