Three SQL Tips for Your Startup Sanity

John Anders
4 min readSep 3, 2021

--

Every tech worker remembers the first time they got introduced to a live, real-world production database. Those projects and slides from school were nice—or at least all the names made sense. But now, several days into your first tech job, there are so many tables of data that starting with any one of them just seems trivial. But you’ve got to start somewhere, right?

What follows are 3 SQL tips that you may wish you’d learned months ago, but as the saying goes, the second-best time be in-the-know is now!

1. Finding An Elusive Column

Have you ever wondered if, across hundreds of data tables, there were any headers that stored a particular value that, if present, would make all your data dreams come true? If the answer’s yes, you’ve also likely gone on a wild chase, manually scrolling through headers to see if anything lines up.

STOP THAT!

Use this query instead:

SELECT TABLE_CATALOG,
TABLE_SCHEMA,
TABLE_NAME,
COLUMN_NAME,
ORDINAL_POSITION,
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME ILIKE ‘%keyword%’
ORDER BY TABLE_NAME;

*note: queries written in PostgreSQL dialect.

All you need to change is the keyword to the desired search term, and you are able to search every header in the database. The primary part of the select statement to be aware of is, of course, “column_name”, but the others pull relevant info starting with the most general and ending with the more granular related info.

2. Prioritizing Complex Row Order

Let’s say you have a scenario where the outcome is highly dependent on the ordering of data. This is a case where a strategy of ORDER BY and GROUP BY for a subquery might do the trick.

But imagine in this case that you already have other logic preventing you from messing with the fundamental ordering of the table’s data. You might appreciate not needing to write yet another subquery…

Enter row partitioning. This technique allows for the ranking of rows within subsets. What makes it so useful is the calculation happens within the SELECT statement as a calculated column.

SELECT first_header, second_header,
ROW_NUMBER() OVER(PARTITION BY second_header order by first_header) as rn
FROM table;

The PARTITION BY clause is responsible for organizing subgroups, so it is followed by a header which determines how the subgroups are formed. Practically, this header is what the rows share in common, like a month or a departing airport or an order number. ORDER BY functions as you would expect, organizing the rows within each partition subgroup.

To walk through an example, imagine you want to rank the top players of a game by country. You would PARTITION BY player score and ORDER BY player country, giving each player a unique integer rank within their nation. Then you could easily pull not only top player from each country, but bottom or average ones, or top ten.

Wrapping this sort of query in an outer query, or using a WITH statement, is where row partitioning becomes most valuable, because you could use a CASE statement that provides rich directions. This would allow for instructions like “use this price if ranked first, or that price if ranked second.”

3. Handling JSON Values

At some point—it could have been last week or perhaps in your company’s infancy—an engineer decided to store JSON in a particular column, and this JSON continues to be updated even though it’s not really part of the data model yet.

You come along and realize that a particular key-value pair inside of the JSON segment may enable greater insight or potential functionality worth exploring. Thankfully, whether stored as arrays or objects (or both), most dialects allow direct interaction with JSON, translating those nested blobs into usable headers!

For grabbing a value by key, where JSONkey is the key-value key whose corresponding value you want to extract:

SELECT header_containing_json_obj::json->>’JSONKey’ as my_alias
FROM table;

For grabbing an array value, where X is an integer representing the array’s desired index:

SELECT header_containing_json_arr::json->>X as my_alias
FROM table;

This is straightforward, as long as you are somewhat familiar with key-value pairs.

These tips have been learned on the first 6 months of an analyst position at a rapidly expanding startup. I believe they can be of value to anyone, regardless of company size, and I hope they are a big time-saver for you.

Let me know what SQL snippets make your life easier in the comments below!

--

--

John Anders
John Anders

Written by John Anders

After spending several years working in the non-profit sector, I entered tech. I'm a lifelong devotee to refining how I lead and follow others.

No responses yet