In my previous posts (part 1 here, and part 2 here) I talked about what makes a dApp a dApp, and about the decentralization of the Steem ecosystems. When I realized how fundamentally critical it is to a decentralized ecosystem to also have decentralized interfaces into the ecosystem, I knew I would have put my money where my mouth is to start the push towards such a model.
Today I announce to the public my API infrastructure, reached by https://anyx.io. After a few weeks of planning, and another for building, testing, and setting up, it's now live and ready to service Steem users, Apps, and dApps for their needs.
At a high level, the infrastructure is quite simple. The goal was to have high-performance API node(s) as the back-end, with a lightweight front-end proxy to redirect traffic appropriately.
To do this, I deployed nginx on the front end (to handle SSL), pointing towards Jussi on the same machine. Jussi is a verification and redirection interface, allowing one to split requests based on type to a specific API backend. This is a form of load balancing primarily, but also enables specialization among the API nodes in use: not all need to contain the entirety of the data, each node can contain specific API data and Jussi can intelligently redirect requests to the specialized nodes.
In theory at least. Currently, the Jussi setup I have going is blocked due to the inability to specify ports, as noted by this Github issue here. So, while in theory I could load balance among Steem instances, there is currently only one back-end servicing API requests.
On the back-end, I have a heavyweight, fully customized, hand-built, enterprise grade server. The idea for hand building the server was to use my knowledge of how Steem functions with different hardware (especially for storage), picking out parts that would give the highest performance. Again, sadly, this was kind of in theory, as some sacrifices were made due to part availability in Canada. The back-end responds the the API requests that Jussi requests, replying back with the data.
Finally, I deployed Lineman on the server. Lineman translates legacy websocket requests into http requests. While not required for an API node to function (the usage of websockets has been deprecated), several services still use them. The two biggest ones are
cli_wallet, and the standalone Steem wallet Vessel. Supporting alternate wallet methods was critical to my design philosophy of decentralization of interfaces, so I spent the time to make sure that websocket requests work. With some configuration on the front-end, websocket requests go through Lineman first before going to Jussi, and are serviced from the same port (with SSL encryption!) after to setting up some magic with nginx.
At this time, non-ssl requests are not supported. This functionality is unfortunately also blocked due to the limitations of Jussi being unable to specify ports. (Dear Steemit Inc, please fix Jussi.)
When building any computer, there are a bunch of different hardware choices that need to be made. Enumerating them, they are:
- CPU choice (variant and count)
- RAM choice (mostly size)
- Storage (classes of storage, size of storage, configuration)
- Peripherals (power supply, cooling fans/solution)
Certain hardware aspects are not utilized for Steem, so I have left them out (e.g. accelerators and graphics cards).
When building a server for Steem, the CPU is one of the biggest considerations to be made. There are two things to consider for Steem: 1. How much RAM can the CPU support? 2. How fast is single core performance?
The consideration of CPU count is actually not very important if we are just considering Steem's needs. As Steem is single threaded, it quite literally does not matter how many cores are in the socket, as only one will ever be used. However, there are a few other considerations to take into account. If you plan on running multiple instances of Steem on the same physical machine (e.g. via virtualization) for load balancing, you will need high clock speed cores for each process. Furthermore, if you plan on running other services on the same machine, more cores will always help.
Notably, for consensus machines, while CPU single core performance matters a lot during replays, after the replay is complete the performance is less critical. However, for API nodes, the incoming API requests can be mostly parallelized (even using multiple threads). The way this is done involves read and write locks on the database file; this means that while parallelism is available, it still can be throttled by poor single core performance, so striking a good balance is key.
Some other things I will only briefly discuss: PCIe lanes don't really matter (as we do not use PCIe peripherals), CPU cache is not well utilized by Steem and thus does not make a significant impact, and multi-socket configurations aren't particularity well used (aside from supporting more RAM) as Steem itself is single threaded.
In terms of RAM considerations, we have 3 classes of hardware we can use for Steem:
- Commodity machines: These can support 64GB for regular chips, and 128GB for HEDT. Non-ECC.
- Workstation Xeons: These are single socket only, and can support 512GB of ECC RAM.
- Regular Xeons: Each socket can support 768GB of ECC RAM. Depending on the chip, they can be configured with multiple sockets on the same motherboard, allowing access to a large memory pool. (e.g. a 4 Socket systems can have 3TB of memory).
For Steem, the current usage as of this articles time of writing is 45 GB for a consensus node. Each API you add will require more storage (for example, the follow plugin requires about 20 GB), adding up to 247 GB for a "full node" with all plugins on the same process (note, history consists of an addition 138 GB on disk).
Notably, if you split up the full node into different parts, each will need a duplicate copy of the 45 GB of consensus data, but you can split the remaining requirements horizontally.
With these constraints in mind, commodity machines (which have the highest single core speeds) are excellent candidates for consensus nodes, with Workstation class Xeons coming in a close second. However, the scalability of regular Xeons offers an unparalleled ability to condense hardware into a single machine.
As a final note on RAM, there are different properties of the RAM to consider: the RAM timings and speed, as well as the functionality of ECC. In my testing, using a higher speed of RAM can indeed help, but is limited in its effect as performance remains dominated by the single core clock speed of the CPU. Memory channels are not utilized well by Steem due to it's single threaded nature, so they are fairly irrelevant at this time.
Using ECC RAM isn't particularly something that can be tested, but it's benefit is that long-running processes are far, far less likely to be affected by faults. While most applications for commodity machines are short running and fault tolerant in other ways, long running applications really do benefit from ECC RAM, to avoid an application crash that would corrupt your state.
For the consensus state, having a significant amount of RAM matters. However, with the introduction of RocksDB, history data can now be placed on disk. At the time of writing, this consumes about 138 GB of data. Finally, there is the blockchain itself, which is 156 GB.
For storage, there are three important factors: Capacity (amount of GB it can store), Bandwidth (transfer rate for data), and Latency (time between requesting data and beginning to recieve it). There are several storage classes that trade off on certain aspects of these for others.
- Traditional Hard Disk Drives (HDDs). These drives offer a large capacity, but at the cost of poor bandwidth and extremely long latency timings. These drives have a very low cost.
- Solid State Drives (SSDs). These offer decent capacity and decent bandwidth, but often lack in latency. These drives are fairly low cost, and are about 6x "faster" than HDDs.
- Non Volatile Memory Express (NVMEs). NVME can reach capacities similar to SSDs, while offering excellent bandwidth and lower latency. These drives are moderately expensive, and are up to 2x to 3x "faster" than SSDs.
- As a final class, we have the new technology '3D XPoint', or Optane drives. While bandwidth numbers are similar to NVME drives, the key advantage that Optane brings is extremely low latency. They can be up to 4x "faster" than NVME, though specifically for random-access workloads. Unfortunately, these drives are very expensive.
We have two sets of data to consider for Steem: The blockchain, and the history data. While obviously getting the best-in-class for all data will always see improvements, the benefits diminish depending on workload.
In my experience, any SSD is perfectly reasonable for performance for the blockchain. Storing the blockchain on NVME or Optane offers negligible performance improvements. SSD's also seem to work alright for history data, but an advantage can definitely be seen when moving to NVMEs or Optane. Further, though I have not tested yet, I anticipate when random-access requests in the form of accessing history data arrive at the API node, the quicker response time by Optanes can be a boon to performance.
When building any kind of computer, there are a few considerations that are fairly static, that I won't dive too deeply into. Ensure your PSU can adequately feed your computer, and ensure you have sufficient cooling and airflow. Steem does not require a graphics card, and will not use it. Don't bother getting one, a headless server is fine.
My Hardware Choices
For this section, I'll talk about the "guts" of the infrastructure: building the big fat back-end server.
For the CPU, I actually intended to get a Workstation class Xeon, like the Xeon 2145. While it has a low core count, they can turbo up to 4.5GHz on a single core, which is extremely beneficial for replays. However, they offer little to no advantage once the replay is complete. Unfortunately, I was unable to find a provider that sold them in Canada when I was purchasing, so I went with a Xeon Gold 6130 which was available. This CPU can still go to 3.7 GHz on a single core, which is still quite fast. It does offers a few advantages: it has many more cores, making it more general purpose for running other programs, it allows the ability to upgrade in the future to a dual-socket configuration by only buying a sister CPU, rather than 2 new CPUs, and it has a greater L3 Cache size (it helps a little bit). However, if you are building a Steem server, I do recommend looking into the ability to get a Workstation Xeon instead.
For RAM, I got 512GB of generic DDR4 2666 ECC RAM by Micron. The large volume (double that of a full-nodes requirement) was to allow scalability into the future of Steem, as well as the ability to run multiple Steem nodes on the same machine.
For storage, I actually got a bit of everything -- it gave me the ability to test storage classes.
- Optane 905P, 960 GB
- NVME: Samsung 960 pro, 1 TB
- SSD: Crucial MX 500, 1TB
- some crappy HDD I had lying around, 2 TB
With lots of room to play, I currently have the history on the Optane and the blockchain on the NVME.
For peripherals, I chose a 1U Short depth server with a 350 Watt PSU built in. Though the build only uses about 100 Watts running Steem, the extra wattage allows room for growth in the future (and hey, it was on sale). The CPU fan is a blower by Dynatron, and I installed some extra cheap fans to help with airflow.
Some fun bits about the build:
- 1U Short depth is tiny! It was cramped to get everything in there. There is basically no more room for new drives -- though, it won't really be needed.
- Temperatures can get pretty high in a 1U server: under full load the CPU can get up to 80C! This is really due to the limitation of having to use a blower cooler on you CPU. I added some fans top help cope and get the hot air out of the cramped case, which brought the maximum CPU temp down to 70C. When running just Steem however, it only gets up to about 50C, as most cores are just idle. If I were to do it again, I would probably aim for a 2U or greater.
- A 90 degree PCIe riser is needed for the Optane card to fit it in 1U. Funny enough, my riser was a bit too short, so the Optane is a bit crooked in the server. Eh, it works!
- The RAM was the most expensive part, at about $1,000 CAD before taxes per 64GB DIMM.
- The whole build came out to about $16,000 CAD after taxes.
Part links for the geeks among us:
I still have a bunch of work to do getting Jussi configured correction (especially after the port issue gets fixed). But, I would absolutely love for users and dApps to test out my new infrastructure and tell me what they they think! I'd love to get feedback on performance, consistency, and latency that they see. Optimization will take a while, and so the more feedback the better.