Kunzite

Crystal clear ramblings

  

Debugging Complex SQL Queries: A Structured Logging Approach

TLDR:

  • Use json_build_object to convert query relations (tables or CTEs/WITH clauses) into JSON.

  • Build a reusable helper in JavaScript to abstract away the logging process.

Background and problem

Imagine you are working on something exciting, colorful, and wonderful like a warehouse management system.

As the system evolves, new features, requirements, and processing logic keep expanding, SQL queries start becoming more complex, so you resort to using WITH clauses to keep everything organized and maintainable. It works great.

However, when it comes to debugging intermediate steps/states/variables, while in the application layer, you start reaching out to structured logging libraries like JavaScript’s Pino or Golang’s slog, debugging SQL queries is a different beast altogether.

The idea

Postgres has advanced JSON support, so why not do something like the following?

JSON.stringify({
  data1: data1,
  data2: data2
});

In SQL, that would translate to:

SELECT json_build_object(
  'data1', data1,
  'data2', data2
)
FROM data1, data2;

Or, in a way to avoid cross joins and using JSONB (supposedly both are better for performance):

SELECT jsonb_build_object(
  'data1', (SELECT jsonb_agg(data1) FROM data1),
  'data2', (SELECT jsonb_agg(data2) FROM data2)
);

Example

Imagine your project manager assigns the following user story to you:

User Story: Inventory Reorder Alert

As a warehouse manager, I need a report identifying products requiring immediate reorder based on low total stock quantities across all locations.

Requirements:

  • Use the provided sample data (locations: S1, S2, S3; products and quantities).
  • Calculate the total quantity per product across all locations.
  • Flag products as Reorder if total stock is below 50 units; otherwise, mark as Sufficient.
  • Display only products needing reorder in the final output.

Acceptance Criteria:

  • Output columns: product.
  • Only include rows where stock_status is Reorder.

You could solve it using a single SQL query.

Notice that we are using CTEs and are only interested in Step 1 and Step 2.

-- Define raw data using the VALUES clause.
WITH raw_data(location, product, quantity) AS (
    VALUES
        ('S1', 'Cocoa beans', 40),
        ('S2', 'Cocoa beans', 10),
        ('S2', 'Sugar', 20),
        ('S3', 'Vanilla', 75)
),
-- Aggregation: Get total sum per product.
step1 AS (
    SELECT product,
           SUM(quantity) AS quantity
    FROM raw_data
    GROUP BY product
),
-- Transformation: Add a stock status based on total quantity.
step2 AS (
    SELECT
         product,
         quantity,
        CASE
            WHEN quantity < 50 THEN 'Reorder'
            ELSE 'Sufficient'
        END AS stock_status
    FROM step1
),
-- Filter
step3 AS (
    SELECT
        product
    FROM step2
    WHERE stock_status = 'Reorder'
),
logged_steps AS (
    SELECT jsonb_build_object(
        'step1', (SELECT jsonb_agg(step1) FROM step1),
        'step2', (SELECT jsonb_agg(step2) FROM step2)
    ) AS json_payload
)
SELECT *
FROM step3, logged_steps

Better DX

We can reuse this logic with the help of query builders like Knex and its modify method.

Helper:

function withLoggedSteps(qb, ...relations) {
  qb.crossJoin('logged_steps');
  qb.select('logged_steps.json_payload');
  const pairs = relations
    .map(rel => `'${rel}', (SELECT jsonb_agg(${rel}) FROM ${rel})`)
    .join(', ');
  qb.with(
    'logged_steps',
    knex.raw(`SELECT jsonb_build_object(${pairs}) AS json_payload`)
  );
}

Usage:

knex
  .select('step3.*')
  .with('generated_data', '(VALUES ...)')
  .with('step1', '...')
  .with('step2', '...')
  .with('step3', '...')
  .from('step3')
  .modify(withLoggedSteps, 'step1', 'step2')
  .then(rows => console.log(rows));

Interactive demo

See: Live demo

Using:

Performance Disclaimer

Let’s be honest - in my day-to-day development, I’m typically working with small, filtered datasets. Most of the time, I’m debugging queries for a specific product or a narrow time range.

So while these JSON logging techniques work wonderfully in my development environment, I haven’t extensively tested them on massive datasets that might process millions of rows.

When you’re in development mode, logging overhead is practically unimportant. We’re here to understand why our queries behave strangely and to figure out where your complex query goes wrong.

I’m aware of other options—like using RAISE NOTICE or capturing intermediate results in temporary tables—but I haven’t tested those.
Since our day-to-day debugging and development are firmly rooted in Node, I opted for an application-level solution. Besides, RAISE is a PL/pgSQL command, which doesn’t work with plain SQL queries, and temporary tables tend to make later inspection a hassle, disconnecting the query from the debugging process.
While most tools log the entire statement, my focus is on the crucial intermediate steps within the WITH clauses.

Conclusion

And there you have it, implementing JSON structured logging in SQL thanks to Postgres features.


END.