<p dir="auto">As I promised in the <a href="/pt/@ataliba/backup-dos-posts-da-rede">first post related to backups using Markdown, I was going to make something better.
<p dir="auto">The first one was very confusing, and now I want to be a bit more concise and clear. It was really meant to show the Brazilian community that we can come up with small solutions that can help the entire community.
<p dir="auto">As I mentioned in the first post, I used to use a solution like this a few years ago. It actually worked well with the older versions of the <em>beem library, but for some reason, it stopped working.
<p dir="auto">So I started working on porting an old piece of code I found on the Steemit network to this new version. Now, it’s a code that has been almost completely rebuilt to allow backups of posts from the Steemit and Hive networks to your local disk.
<p dir="auto">Since I use the <a href="https://gohugo.io" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Hugo platform, I built the code to work with it. However, if you need to use it for Jekyll, <a href="https://import.jekyllrb.com/docs/home/" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">just use Jekyll's tools to convert the Hugo format to Jekyll.
<p dir="auto">This script automates the process of retrieving blog posts from Hive or Steemit blockchain accounts, downloading associated images, and saving the posts as Markdown files. It is particularly useful for archiving or republishing content from Hive or Steemit.
<h2>Main Features:
<ul>
<li><strong>Blockchain Connection: Connects to Hive or Steemit, depending on the user’s selection, to access posts from a specific account.
<li><strong>Post Filtering: Provides options to filter posts based on the most recent ones, posts published today, or all posts. Additionally, posts with specific tags like "actifit" can be excluded if desired.
<li><strong>Image Downloading: Detects image URLs in each post and downloads them, saving them with a unique filename. This includes images set in both the JSON metadata and the Markdown content.
<li><strong>Metadata Handling: Automatically extracts metadata from the posts, such as title, tags, and publication date, and inserts it into the YAML header of the Markdown files.
<h3>Usage Examples:
<h4>To get <strong>all posts:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --all
<h4>To get only today's posts:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --today
<h4>To get all posts, including those with <code>actifit:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --all --actifit
<h4>To get only the latest post of today:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --last --today
<p dir="auto">I believe it’s now more organized.<br />
And if you'd like to follow along, <a href="https://github.com/ataliba/hive-to-markown/tree/main" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link"> the script is now in its own repository.
<h1>Português
<p dir="auto">Conforme eu havia prometido no <a href="/pt/@ataliba/backup-dos-posts-da-rede">primeiro post relacionado ao backup usando markdown, eu iria fazer algo melhor .
<p dir="auto">O primeiro ficou muito confuso e agora, quero ser um pouco mais conciso e claro. Foi mesmo para mostrar para a comunidade brasileira que podemos soltar soluções pequenas que podem ajudar toda a comunidade.
<p dir="auto">Como eu falei no primeiro post eu usava uma solução destas a uns anos. Ela por sinal funcionava bem nas versões antigas da biblioteca <em>beem, mas, por algum motivo, parou.
<p dir="auto">E eu iniciei o trabalho de portar um antigo código que eu peguei na rede Steemit para este novo meu. Agora, sim, ele é um código que foi praticamente reconstruído para permitr o backup de posts da rede Steemit e Hive para seu disco interno.
<p dir="auto">Como uso a plataforma <a href="https://gohugo.io" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Hugo eu construi o código para usar com ela. Mas, caso precise usar para o Jekyll <a href="https://import.jekyllrb.com/docs/home/" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">é só usar as ferramentas do mesmo para converter o formato do Hugo para o Jekyll.
<p dir="auto">Este script automatiza o processo de recuperação de postagens de blogs das contas da blockchain Hive ou Steemit, fazendo o download das imagens associadas e salvando as postagens como arquivos Markdown. Ele é particularmente útil para arquivar ou republicar conteúdo de Hive ou Steemit.
<h2>Principais Funcionalidades:
<ul>
<li>Conexão com a Blockchain: Conecta-se ao Hive ou Steemit, conforme a seleção do usuário, para acessar as postagens de uma conta específica.
<li>Filtragem de Postagens: Oferece opções para filtrar postagens com base na postagem mais recente, postagens publicadas hoje ou todas as postagens. Além disso, postagens com tags específicas como "actifit" podem ser excluídas, se desejado.
<li>Download de Imagens: Detecta URLs de imagens em cada postagem e faz o download, salvando-as com um nome de arquivo único. Isso inclui imagens definidas tanto nos metadados JSON quanto no conteúdo Markdown.
<li>Tratamento de Metadados: Extrai automaticamente os metadados das postagens, como título, tags e data de publicação, para o cabeçalho YAML nos arquivos Markdown.
<h3>Exemplos de uso:
<h3>Para pegar <strong>todos os posts:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --all
<h3>Para pegar <strong>apenas os posts de hoje:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --today
<h3>Para pegar <strong>todos os posts, incluindo os com <code>actifit:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --all --actifit
<h3>Para pegar <strong>somente o último post de hoje:
<pre><code>venv/bin/python hive_posts_yesterday_to_md.py ataliba hive --last --today
<p dir="auto">Agora acredito que tenha ficado mais organizado.<br />
E caso queiram acompanhar o <a href="https://github.com/ataliba/hive-to-markown" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">script foi para um repositório próprio.
<h2>Code
<pre><code>#!/usr/bin/python
# -*- coding: utf-8 -*-
from beem import Hive
from beem.account import Account
import os
import io
import argparse
import requests
import uuid
from urllib.parse import urlparse
import re
from datetime import datetime, timedelta
def download_image(image_url, path):
try:
# Download the image
response = requests.get(image_url)
if response.status_code == 200:
# Extract the file extension
parsed_url = urlparse(image_url)
_, ext = os.path.splitext(parsed_url.path)
# Generate a unique filename with UUID
unique_filename = f"{uuid.uuid4()}{ext}"
file_path = os.path.join(path, unique_filename)
# Save the image to disk
with open(file_path, 'wb') as f:
f.write(response.content)
print(f"Image downloaded and saved as: {file_path}")
return unique_filename
else:
print(f"Error downloading the image: {image_url} (Status Code: {response.status_code})")
return None
except Exception as e:
print(f"Error processing the image {image_url}: {e}")
return None
def extract_images_from_markdown(markdown_content):
# Search for images in the format ![alt](image_url)
image_urls = re.findall(r'!\[.*?\]\((.*?)\)', markdown_content)
return image_urls
def main(author, path, last=False, include_actifit=False, all_posts=False, today=False, platform="hive"):
# Select the blockchain based on the platform
if platform == "hive":
node_url = "https://api.hive.blog"
else: # steemit
node_url = "https://api.steemit.com"
# Connect to the Hive or Steemit blockchain
hive = Hive(node=node_url)
account = Account(author, blockchain_instance=hive)
# Yesterday's and today's dates
yesterday = (datetime.utcnow() - timedelta(days=1)).date()
today_date = datetime.utcnow().date()
# Get the account's posts
posts = account.get_blog(limit=500) # Adjust the limit as needed
if last:
# Get only the last post
posts = [posts[0]] if posts else []
# Process each post
for post in posts:
if post["author"] != author:
continue
# Check if the 'actifit' tag is in the post
if 'actifit' in post.get('json_metadata', {}).get('tags', []):
if not include_actifit:
print(f"Post skipped due to 'actifit' tag: {post['title']}")
continue
# Use the 'created' field directly as datetime
post_date = post["created"].date()
# Conditions for --all, --today, and yesterday's posts
if not all_posts:
if today:
if post_date != today_date:
continue
else:
if post_date != yesterday:
continue
markdown_content = post['body']
title = post['title']
permlink = post['permlink']
link_for_post = f'https://{platform}.blog/@{author}/{permlink}'
# Download images and replace the links in markdown
images = post.get('json_metadata', {}).get('image', [])
if images:
print(f"Images found in the post (json_metadata): {images}")
# Extract images from markdown
markdown_images = extract_images_from_markdown(markdown_content)
if markdown_images:
print(f"Images found in markdown: {markdown_images}")
# Download all images found in json_metadata and markdown
all_images = images + markdown_images
for image_url in all_images:
downloaded_image_name = download_image(image_url, path)
if downloaded_image_name:
markdown_content = markdown_content.replace(image_url, downloaded_image_name)
post_final = f'---\n<br />**Originally posted on {platform.capitalize()} network: [{link_for_post}]({link_for_post})** <br />\n----'
yaml_prefix = '---\n'
TitleYaml = title.replace(':', '').replace('\'', '').replace('#', '').replace('(', '').replace(')', '')
# Get the post tags and categories
tags = post.get('json_metadata', {}).get('tags', [])
tags_str = "\n".join([f" - {tag}" for tag in tags])
# Set the category as the first tag or "General" if there are no tags
category = tags[0] if tags else "General"
if platform == 'hive':
category_str = f' - {category.capitalize()}\n - Hive\n'
else:
category_str = f' - {category.capitalize()}\n - Steemit\n'
# Build the YAML prefix
yaml_prefix += f'title: {TitleYaml}\n'
yaml_prefix += f'date: {post["created"]}\n'
yaml_prefix += f'permlink: /{platform}/{permlink}\n'
yaml_prefix += 'type: posts\n'
yaml_prefix += f'categories:\n{category_str}\n'
yaml_prefix += f'tags:\n{tags_str}\n'
yaml_prefix += f'author: {author}\n---\n'
# Filename
filename = os.path.join(path, f"{post_date}_{permlink}.md")
# Save the content to a Markdown file
with io.open(filename, "w", encoding="utf-8") as f:
f.write(yaml_prefix + markdown_content + post_final)
print(f"Post saved: {filename}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("author", help="Account name on Hive or Steemit")
parser.add_argument("path", help="Path where the Markdown files will be saved")
parser.add_argument("--last", action="store_true", help="Get only the last post")
parser.add_argument("--actifit", action="store_true", help="Include posts with the 'actifit' tag")
parser.add_argument("--all", action="store_true", help="Get all posts, ignoring the date filter")
parser.add_argument("--today", action="store_true", help="Get only today's posts")
parser.add_argument("--steemit", action="store_true", help="Use the Steemit network instead of Hive")
args = parser.parse_args()
# Define the platform (Hive or Steemit)
platform = "steemit" if args.steemit else "hive"
main(args.author, args.path, args.last, args.actifit, args.all, args.today, platform)
<p dir="auto">requirements.txt
<pre><code>beem==0.26.0
requests==2.31.0
<p dir="auto"><strong>Image from Pixabay/Imagem direto do Pixabay
Obrigado por promover a Língua Portuguesa em suas postagens.
Vamos seguir fortalecendo a comunidade lusófona dentro da Hive.