You are viewing a single comment's thread from:

RE: Hive Node Setup for the Smart, the Dumb, and the Lazy.

in Blockchain Wizardry3 months ago

sudo mount -o remount,size=30G /run

<p dir="auto">Why 30G though? Isn't it enough to be of the size of <code>shared_memory.bin? In that case setting both the size of shm and ram-disk to 22G should still have decent margin (4-5G). <blockquote> <p dir="auto">downloading it could take a significant amount of time (6-12 hours even with a decent network connection) <p dir="auto">12 hours is only a bit less than syncing from scratch through p2p, so downloading in that case is not a viable solution 😁 <blockquote> <p dir="auto">/home/hive/datadir/blockchain/block_log and /home/hive/datadir/blockchain/block_log.artifacts will be created <p dir="auto">So, I guess the version supporting split block log is the next one, right?
Sort:  

Why 30G though? Isn't it enough to be of the size of shared_memory.bin? In that case setting both the size of shm and ram-disk to 22G should still have decent margin (4-5G).

<p dir="auto"><code>/run that I use in my way of setting things up is a system-wide place to store various run-time data, so I can't use all of it. I use higher values because I keep same setup scripts for other nodes, and for my fully featured account history node it's already:<br /> <code>du -csh /run/hive/shared_memory.bin: <pre><code>22G /run/hive/shared_memory.bin <p dir="auto">But that doesn't matter much, the configured size limit doesn't pre-allocate RAM. It simply sets an upper boundary on how much space can be used. <blockquote> <p dir="auto">12 hours is only a bit less than syncing from scratch through p2p, so downloading in that case is not a viable solution 😁 <p dir="auto">I'm not that sure if it's just a bit less, one of my recent sync tests (6 weeks ago) took me 42 hours. I'm afraid that you might be too optimistic about sync speed in real life conditions. <blockquote> <p dir="auto">So, I guess the version supporting split block log is the next one, right? <p dir="auto">Yes! :-) I can't wait for that. Unfortunately being a most used block_log provider I have to wait for global adoption. Or do I? :-) Once it's officially released I will switch :-D

Damn you 😡 It is still going. You were right and I remembered it wrong. I've dug out a 15 months old results of full sync and it was running over 37 hours up to 72M+. Compared to that current version appears to be slightly faster, but still couple times slower than what I thought it would be.

To be honest it smells like a bug (or more optimistically - as an optimization opportunity). There are couple of hiccups when node is not receiving blocks fast enough, but for the most part block processing is reported at close to 100% time. On the other hand computer seems to be sleeping, using around single core only, which is weird, since decomposing signatures, that used to make sync 7 times slower than replay, since HF26 is supposedly done on multiple threads and preemptively, as soon as block arrives, so I'd expect at least some bursts of higher CPU activity. Maybe I should use some config option for that?

It would be nice to have a comparison on the same machine: pure replay vs replay with full validation vs sync.

Signatures are checked ahead of time in separate threads, and sufficient number of threads are default allocated.

Whenever you see block processing at 100%, then the bottleneck is the single-core speed of your system (it's processing operations and updating state).

The results are in:

<ul> <li>revision: <code>4921cb8c4abe093fa173ebfb9340a94ddf5ace7a <li>same config in both runs (no AH or other plugins that add a lot of data, just witness and APIs, including wallet bridge) <li>in both runs 87310000 blocks were processed (actually slightly more, with replay covering around 10 blocks extra that previous sync run added to block log while in live sync) <li>replay with validation (from start up to <code>Performance report (total).) - <code>124225649 ms which is <code>34.5 hours, avg. block processing time (from <code>Performance report at block) is <code>1.423 ms/block <li>sync (from start up to <code>entering live mode) - <code>143988777 ms which is <code>40 hours, avg. block processing time (from <code>Syncing Blockchain) is <code>1.649 ms/block <p dir="auto"><span>I'm curious how <a href="/@gtg">@gtg measurements will look in comparison. <p dir="auto">Sync to replay ratio shoots up the most in areas of low blockchain activity, which is understandable, since small blocks are processed faster than they can be acquired from network, but in other areas sync is still 10-20% slower. <p dir="auto">And the likely reason I remembered sync as faster than that is due to difference in computer speed - my home computer appears to be over 60% faster than the one I was running above experiments on, which would mean it should almost fit the sync inside 24 hours.

For now I have results for first 50M blocks:

<div class="table-responsive"><table> <thead> <tr><th>50M blocks<th style="text-align:right">Real time<th style="text-align:right">last 100k real time<th style="text-align:right">last 100k cpu time<th style="text-align:center">parallel speedup <tbody> <tr><td>Replay<td style="text-align:right"><code>6:32:45<td style="text-align:right"><code> <code>43.466s<td style="text-align:right"><code> <code>61.132s<td style="text-align:center"><code>x1.4064 <tr><td>Replay + Validate<td style="text-align:right"><code>11:03:00<td style="text-align:right"><code> <code>84.337s<td style="text-align:right"><code>395.575s<td style="text-align:center"><code>x4.6904 <tr><td>Resync<td style="text-align:right"><code>14:31:33<td style="text-align:right"><code>103.266s<td style="text-align:right"><code>182.288s<td style="text-align:center"><code>x1.7652 <p dir="auto">I just counted last 100k block times (cpu / real) so it's not a great measurement. I can have better numbers once I complete those runs. But it seems that replay with validation can somehow make a better use of multiple threads than validation during resync.

It might be the state undo logic slowing down blockchain processing in a sequential manner (this computation is probably skipped for replay+validate). But I doubt there is a way to disable it to check that, short of modifying the code for the test.

Probably we should modify the code dealing with checkpoints to skip undo logic up to the checkpoint. This would allow us to confirm if it is the bottleneck, and it would also give us a speedup when checkpoints are set if it turns out to be the bottleneck.

It should be easy to test - just cut out two lines with session in database::apply_block_extended (I'm actually assuming that out of order blocks won't reach that routine during sync, but if they do, it would be a source of slowdown).

<p dir="auto">I'd be surprised if undo sessions were the problem. They are relatively slow and <a href="https://gitlab.syncad.com/hive/hive/-/issues/675#note_159293" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">worthy of optimization, but in relation to simple transactions, mostly custom_jsons, so their performance is significant when there is many of them, like during block production, reapplication of pending or in extreme stress tests with <code>colony+<code>queen. During sync we only have one session per block.
Loading...

I'm afraid that you might be too optimistic about sync speed in real life conditions.

I just started syncing on latest develop, so I guess we will know soon enough 😄