The performance cost of pl/pgsql exception block in Postgres

in #postgres10 months ago (edited)

One of the languages that Postgres offers for writing stored procedures is pl/pgsql. One of it's features is exception block that is used to catch errors raised in a code block. Appropriate use of this construct can improve code reliability, but when overused it can lead to decreased performance.

<p dir="auto">The official documentation says the following:<br /><span> <a href="https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING" target="_blank" rel="nofollow noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING <blockquote> <p dir="auto">A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need. <p dir="auto">But what exactly <code>significantly more expensive means? Is it 50% slower? 100% slower? Let's check this out. <p dir="auto">First, we'll need a table with some data. I'll use the following one: <pre><code>create table ops(id bigint, body jsonb); <p dir="auto">I filled this table with about 20 million rows. Each row contains some json data in <code>body column. The exact structure varies, but there's a common field: all stored jsons are objects containing <code>value key. <p dir="auto">Having this, I'll do a simple processing of that data.<br /> For each row in that table I will invoke the following function: <pre><code>create or replace function process_op(op ops) returns void volatile language 'plpgsql' as $$ begin insert into data(id, body) select op.id, op.body->>'value'; end; $$; <p dir="auto">This basically extracts the <code>value element and writes it into a separate table <code>data, which has the following structure: <pre><code>create table data(id bigint, body text); <p dir="auto">I will compare obtained results with a second scenario, which will be identical to the first one, with the exception that the following function will be used instead: <pre><code>create or replace function process_op_exn(op ops) returns void volatile language 'plpgsql' as $$ begin begin insert into data(id, body) select op.id, op.body->>'value'; exception WHEN division_by_zero then end; end $$; <p dir="auto">The only difference is extra <code>exception block that catches specified exception type. The exact type doesn't matter here, because it will never occur. I have to specify it nonetheless, so it'll be <code>division_by_zero. <p dir="auto">Now, here's the script that will do the timing of our processing: <pre><code>#!/bin/sh sql() { psql -c "$@" ex } trunc() { sql 'truncate data' } trunc echo 'process_op' sql 'set jit=off; explain analyze select process_op(op) from ops as op' trunc echo 'process_op_exn' sql 'set jit=off; explain analyze select process_op_exn(op) from ops as op' <p dir="auto">This script will use an empty database <code>ex that only contains our two functions and the input and output tables. Before each test, the output table is truncated.<br /> Note that <code>jit is disabled. It's because it'd just introduce additional variance in timing results.<br /> I'll run this script 10 times and collect the time reported by <code>explain analyze. <p dir="auto">Here's the times I got on my machine: (PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1)) <div class="table-responsive"><table> <thead> <tr><th style="text-align:right">process_op<th style="text-align:right">process_op_exn <tbody> <tr><td style="text-align:right">97213.305<td style="text-align:right">134802.694 <tr><td style="text-align:right">97133.119<td style="text-align:right">134685.843 <tr><td style="text-align:right">104139.789<td style="text-align:right">141188.065 <tr><td style="text-align:right">103916.659<td style="text-align:right">141514.376 <tr><td style="text-align:right">104230.254<td style="text-align:right">140969.064 <tr><td style="text-align:right">104054.297<td style="text-align:right">140396.022 <tr><td style="text-align:right">104578.363<td style="text-align:right">140755.807 <tr><td style="text-align:right">104225.124<td style="text-align:right">140159.735 <tr><td style="text-align:right">104693.086<td style="text-align:right">136347.270 <tr><td style="text-align:right">100057.957<td style="text-align:right">142008.653 <div class="table-responsive"><table> <thead> <tr><th style="text-align:center"><th style="text-align:right">process_op<th style="text-align:right">process_op_exn<th style="text-align:right">overhead[%] <tbody> <tr><td style="text-align:center">min<td style="text-align:right">97133.119<td style="text-align:right">134685.843<td style="text-align:right">38.66 <tr><td style="text-align:center">max<td style="text-align:right">104693.086<td style="text-align:right">142008.653<td style="text-align:right">35.64 <tr><td style="text-align:center">avg<td style="text-align:right">102424.195<td style="text-align:right">139282.753<td style="text-align:right">35.99 <tr><td style="text-align:center">stddev<td style="text-align:right">3070.819<td style="text-align:right">2845.707<td style="text-align:right"> <p dir="auto">I repeated the measurements for half of the data size (so about 10 million rows): <div class="table-responsive"><table> <thead> <tr><th style="text-align:right">process_op<th style="text-align:right">process_op_exn <tbody> <tr><td style="text-align:right">54302.993<td style="text-align:right">76026.787 <tr><td style="text-align:right">54081.487<td style="text-align:right">72822.546 <tr><td style="text-align:right">53893.595<td style="text-align:right">79227.740 <tr><td style="text-align:right">116475.741<td style="text-align:right">71090.325 <tr><td style="text-align:right">52185.475<td style="text-align:right">71807.784 <tr><td style="text-align:right">55144.720<td style="text-align:right">79174.003 <tr><td style="text-align:right">59353.608<td style="text-align:right">123744.166 <tr><td style="text-align:right">59466.183<td style="text-align:right">77938.334 <tr><td style="text-align:right">77143.960<td style="text-align:right">83018.802 <tr><td style="text-align:right">61518.751<td style="text-align:right">74555.020 <div class="table-responsive"><table> <thead> <tr><th style="text-align:center"><th style="text-align:right">process_op<th style="text-align:right">process_op_exn<th style="text-align:right">overhead[%] <tbody> <tr><td style="text-align:center">min<td style="text-align:right">52185.475<td style="text-align:right">71090.325<td style="text-align:right">36.23 <tr><td style="text-align:center">max<td style="text-align:right">116475.741<td style="text-align:right">123744.166<td style="text-align:right">6.24 <tr><td style="text-align:center">avg<td style="text-align:right">64356.651<td style="text-align:right">80940.551<td style="text-align:right">25.77 <tr><td style="text-align:center">stddev<td style="text-align:right">19680.908<td style="text-align:right">15501.945<td style="text-align:right"> <p dir="auto">And here's measurements repeated for double the data size (about 40 million rows): <div class="table-responsive"><table> <thead> <tr><th style="text-align:right">process_op<th style="text-align:right">process_op_exn <tbody> <tr><td style="text-align:right">286233.511<td style="text-align:right">297223.475 <tr><td style="text-align:right">284152.678<td style="text-align:right">351405.713 <tr><td style="text-align:right">216810.749<td style="text-align:right">389101.092 <tr><td style="text-align:right">295131.977<td style="text-align:right">297492.770 <tr><td style="text-align:right">204913.139<td style="text-align:right">274064.567 <tr><td style="text-align:right">202276.802<td style="text-align:right">331195.117 <tr><td style="text-align:right">266659.591<td style="text-align:right">316588.293 <tr><td style="text-align:right">272660.505<td style="text-align:right">336403.196 <tr><td style="text-align:right">299557.913<td style="text-align:right">308608.560 <tr><td style="text-align:right">210020.255<td style="text-align:right">364334.027 <div class="table-responsive"><table> <thead> <tr><th style="text-align:center"><th style="text-align:right">process_op<th style="text-align:right">process_op_exn<th style="text-align:right">overhead[%] <tbody> <tr><td style="text-align:center">min<td style="text-align:right">202276.802<td style="text-align:right">274064.567<td style="text-align:right">35.49 <tr><td style="text-align:center">max<td style="text-align:right">299557.913<td style="text-align:right">389101.092<td style="text-align:right">29.89 <tr><td style="text-align:center">avg<td style="text-align:right">253841.712<td style="text-align:right">326641.681<td style="text-align:right">28.68 <tr><td style="text-align:center">stddev<td style="text-align:right">40312.450<td style="text-align:right">34891.699<td style="text-align:right"> <p dir="auto">The timings are pretty consistent and we clearly see that the function with <code>exception block takes longer time to complete.<br /> The slowdown due to the exception block is about 35-38%. This overhead is definitely noticeable, but it'll of course depend on the work load. The more work is performed in such block, the less noticeable it'll be. The important part is that the performance hit is taken even if the exception is never raised. <p dir="auto">EDIT 02-02-2024: Added measurements for half and double data set sizes.
Sort:  

Congratulations @serpent7776! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

<table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@serpent7776/upvoted.png?202401300931" /><td>You received more than 10 upvotes.<br />Your next target is to reach 50 upvotes. <p dir="auto"><sub><em>You can view your badges on <a href="https://hivebuzz.me/@serpent7776" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">your board and compare yourself to others in the <a href="https://hivebuzz.me/ranking" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Ranking<br /> <sub><em>If you no longer want to receive notifications, reply to this comment with the word <code>STOP <p dir="auto"><strong>Check out our last posts: <table><tr><td><a href="/hive-122221/@hivebuzz/pud-202402"><img src="https://images.hive.blog/64x128/https://i.imgur.com/805FIIt.jpg" /><td><a href="/hive-122221/@hivebuzz/pud-202402">Hive Power Up Day - February 1st 2024

The conclusion is far too premature. The overhead might be constant per call or constant per row being processed, or data dependent but in a different way than the main processing. You are doing two relatively heavy operations in the example: searching for value in jsonb and filling up table. What if you had trivial operation instead? What if you did even more complicated query? How about checking with half of the table rows? Or twice as many? Also I understand the turning off jit, but if you didn't do it, it might actually notice the exception can't be cast and optimize it out - you'd need to run the test with jit on and exception that can be cast depending on data (actually at least two tests - one that depends on indexed data, other with data not included in any index).

<p dir="auto">Alternative is to look into the postgres sources to see what actually happens when exception handling is added to the block 😁

I added the measurements for half and double data sizes.
I really didn't want to go to explore jit possibilities. I'm not very familiar with it and I was also more interested in plpgsql itself.