A Hacky Guide to Hive (part 2.2.2: custom_YOson)

in #dev3 months ago (edited)

Context

<p dir="auto">Please read <a href="/@felixxx/a-hacky-guide-to-hive-part-221-blocks">last post for context. <h2>SQLAlchemy <p dir="auto">This is no Python guide, this is no SQL guide.<br /> I just want to get the job done, so I use <a href="https://www.sqlalchemy.org/" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">SQLAlchemy.<br /> You could build the following better or faster, in any language you want, without SQLAlchemy, for a different storage than SQLite... the main thing is: You'll probably need other data.<br /> This is just a way to get to a working demo quickly. <h3>db.py <pre><code>from typing import List import datetime from sqlalchemy import String, ForeignKey, DateTime, create_engine from sqlalchemy.sql import func from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship, Session class Base(DeclarativeBase): pass class Block(Base): __tablename__ = "blocks" num: Mapped[int] = mapped_column(primary_key=True) timestamp: Mapped[str] = mapped_column(String(19)) sql_timestamp: Mapped[datetime.datetime] = mapped_column(DateTime(timezone=True), server_default=func.now()) yos: Mapped[List["Yo"]] = relationship(back_populates="block", cascade="all, delete-orphan") class Yo(Base): __tablename__ = "yos" id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True) author: Mapped[str] = mapped_column(String(16)) tx_id: Mapped[str] = mapped_column(String(40)) block_num: Mapped[int] = mapped_column(ForeignKey("blocks.num")) block: Mapped[Block] = relationship(back_populates="yos") <p dir="auto">This is no SQLAlchemy quide.<br /> It is for the most part just the example from the <a href="https://docs.sqlalchemy.org/en/20/orm/quickstart.html" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">SQLalchemy tutorial, adapted to my data.<br /> It allows me to map SQL tables <em>blocks and <em>yos as classes <em>Block and <em>Yo.<br /> I am not even sure, if I am doing it right, but for the demo it doesn't matter. <pre><code>def make_db(db): Base.metadata.create_all(create_engine(db)) def set_blocks(blocks, session: Session): session.add_all(blocks) def get_last_block(session: Session): return session.query(Block.num).order_by(Block.num.desc()).limit(1).first()[0] <p dir="auto">The main goal of db.py is to be able to: <ul> <li>make a DB <li>add blocks to DB table (set) <li>return blocks from DB (get) <p dir="auto">For now, I only need to get to the last block, so I built a specialized function, to do just that. <pre><code>def make_yo_blocks(start, blocks): num = start yo_blocks = [] for block in blocks: yos = [] for tx_i, transaction in enumerate(block['transactions']): for operation in transaction['operations']: if operation['type'] == 'custom_json_operation': if operation['value']['id'] == 'YO': yo = Yo(author= operation['value']['required_posting_auths'][0], tx_id= block['transaction_ids'][tx_i]) yos.append(yo) yo_block = Block(num= num, timestamp= block['timestamp'], yos= yos) yo_blocks.append(yo_block) num += 1 return yo_blocks <p dir="auto">The above is the last piece.<br /> It does a lot of things for a single function.<br /> It takes the <em>blocks data from the next block and converts it to <em>Blocks (and <em>Yos).<br /> It can break in a lot of places, but that's exactly what it should do for now. <h4>Example <pre><code>db_file = "sqlite:///yo.db" init_block = Block(num = 89040473, timestamp = '2024-09-18T12:55:36', yos = [Yo(author = 'felixxx', tx_id = 'eb025cf797ee5bc81d7399282268079cc29cc66d')]) def init(): make_db(db_file) with Session(create_engine(db_file, echo=True)) as session: set_blocks([init_block], session) session.commit() <p dir="auto">If you run this example init twice, you will get an error:<br /> Demonstrating that SQL is already doing stuff; it prevents inserting a block of the same <em>num twice. <p dir="auto"><img src="https://images.hive.blog/768x0/https://files.peakd.com/file/peakd-hive/felixxx/23wMQ8bUMubWa5kZfGGCiibyJtDVy9QXdN6ze5v4HL3nRaVrmio9jjipahrVJf8sArgis.png" alt="custom_YOson.png" srcset="https://images.hive.blog/768x0/https://files.peakd.com/file/peakd-hive/felixxx/23wMQ8bUMubWa5kZfGGCiibyJtDVy9QXdN6ze5v4HL3nRaVrmio9jjipahrVJf8sArgis.png 1x, https://images.hive.blog/1536x0/https://files.peakd.com/file/peakd-hive/felixxx/23wMQ8bUMubWa5kZfGGCiibyJtDVy9QXdN6ze5v4HL3nRaVrmio9jjipahrVJf8sArgis.png 2x" /> <h2>custom_YOson <p dir="auto">In the final process the api and db handling meet. <h3>api.py <pre><code>import requests def get_block_range(start, count, url): data = '{"jsonrpc":"2.0", "method":"block_api.get_block_range","params":{"starting_block_num":'+str(start)+',"count": '+str(count)+'},"id":1}' return requests.post(url, data) <p dir="auto">That's the Hive api from last post. It's really that short.<br /> I modified it just a little, to move data conversion further upstream; Instead of a list of blocks, it now returns raw data. <h3>custom_YOson.py <pre><code>import db, api url = 'https://api.hive.blog' db_file = "sqlite:///yo.db" count = 10 innit_block = db.Block(num = 89040473, timestamp = '2024-09-18T12:55:36', yos = [db.Yo(author = 'felixxx', tx_id = 'eb025cf797ee5bc81d7399282268079cc29cc66d')]) def innit(): db.make_db(db_file) with db.Session(db.create_engine(db_file)) as session: db.set_blocks([innit_block], session) session.commit() def tick(): with db.Session(db.create_engine(db_file)) as session: last = db.get_last_block(session) response = api.get_block_range(last + 1, count, url) blocks = response.json()['result']['blocks'] with db.Session(db.create_engine(db_file)) as session: if last == db.get_last_block(session): # cheap way to make this thread-safe db.set_blocks(db.make_yo_blocks(last + 1, blocks), session) session.commit() <p dir="auto">The main procedure, that should happen during every tick: <ul> <li>with open db session: <ul> <li>get last block's num from db <li>query Hive node for next (count = 10) blocks <li>convert Hive node response to dictionary, select key 'blocks' <li>with open db session: <ul> <li>get last block's num again, if same: <ul> <li>convert blocks to Blocks <li>store Blocks <h3>Process <p dir="auto">custom_YOson is only 75 lines long: <ul> <li>api.py (5) <li>db.py (48) <li>custom_YOson.py (22) <p dir="auto">You'd still have to wrap this in a loop, but the core procedure works for stream and resync.<br /> My main goal here was to use functions in tick() that each represent one step of the procedure. <h3>Performance <div class="table-responsive"><table> <thead> <tr><th>1 block @ 26 KB<th>10 blocks @ 168 KB<th>100 blocks @ 1919 KB <tbody> <tr><td>api.hive.blog: 0.693 s<br />YOson: 0.020 s<br />total: 0.714 s<td>api.hive.blog: 0.976 s<br />YOson: 0.017 s<br />total: 0.993 s<td>api.hive.blog: 7.910 s<br />YOson: 0.028 s<br />total: 7.939 s <p dir="auto">I did 3 test shots and what's clear: The node performance is the bottleneck.<br /> I could optimize my code or try a different language and database, but I could only shave off miliseconds.<br /> Acessing a node via localhost would probably be substantially faster, but even then: to improve this there probably isn't much you can do in Python... <h2>Conclusion <p dir="auto">Hive's state changes. The only way to observe state changes is querying a node.<br /> In many ways, Hive is a like a black box.<br /> Goal was to build a white box, that has healthy data. (yo.db)<br /> How to best sanitize, store and handle the data depends on the application.<br /> You could build something like HiveSQL, hive-engine, a voting service or a game with a yo.db as engine, like above. <p dir="auto">Next post I'll cover error handling and deployment and build an application. <p dir="auto">Had I understood 8 years ago, when I first found this chain, how to build stuff like above... I could have made a lot of money. A lot of it isn't even Hive specific... <p dir="auto"><strong>Please test & comment!
Sort:  

Congratulations for your level of mastery. Hats off !

Thanks!
Certainly no master. I have advanced a level above writing long scripts.
Still haven't deployed anything cool.

Have a look at what we are trying to do with OffChain Luxembourg (@offchain-lux) and let me know if you are interested in collaborating. The coolest things come from collaboration IMO

Have a look at what we are trying to do with OffChain Luxembourg

Will do. Thanks!

The coolest things come from collaboration

Not only that.
Alone, I just can't operate any service 24/7.
It takes a few people - whatever the code may be.

thats interesting and i loke that you are caring about the performance from start. great job.

i was wondering how i could use custom_json without having a database. not sure yet what path i will take, but this solution was something like i had in mind. good to see that i was not wrong having a database to query it easier when need past data.