/*
Q5: Select all products with "IMPORTED" in the category name.
*/
SELECT *
FROM products
WHERE category_name ILIKE '%IMPORTED%';
-- 2441 rows retrieved --
/*
Q6: Get the top 10 vendors of imported products ranked by total sales.
Note: There is a bit of ambiguity in the brief here. I would send a quick message to the
client asking for clarification; do they want the top 10 vendors overall who sell some
imported products OR the top ten vendors by total sales of imported products, specifically?
I have guessed that they are interested in the latter.
*/
-- CASE ONE --
SELECT DISTINCT vendor, sum(total)
FROM sales
WHERE vendor IN (SELECT vendor from sales WHERE category_name ILIKE '%IMPORTED%')
GROUP BY vendor
ORDER BY sum(total) DESC
LIMIT 10; -- 10 rows retrieved --
-- CASE TWO --
SELECT DISTINCT vendor, sum(total) as import_sales
FROM sales
WHERE category_name ILIKE '%IMPORTED%'
GROUP BY vendor
ORDER BY sum(total) DESC
LIMIT 10; -- 10 rows retrieved --
/*
Addendum: Because of who I am as a person, I couldn't help but notice that the Top 10
vendors by total sales who sell any imported liquor were identical to the Top 10 vendors
overall, unfiltered. Naturally, that made me wonder if my filter was doing anything, so
I played around a bit.
I thought that a nifty way to test that the filter works would be to negate it, adding NOT
before the IN clause so the output would be the Top 10 vendors in order of sales who only
sell domestic product. It occurred to me that this would be comically inefficient, running
O(n^2) in the best case scenario; the sub-query runs through the entire table for each
item in the main query, which is iterating through every item in the table. Add to that the
obvious insanity of doing this by searching for text in a string, which can't be optimal by
any stretch of the imagination. (Note to self: Look up how the engine runs that search, and
for that matter, what is the engine made out of? I’ll bet it’s C. I’ll give myself a gold star
if I remember to look this up and I’m right.)
*/
SELECT DISTINCT vendor, sum(total)
FROM sales
WHERE vendor NOT IN (SELECT vendor from sales WHERE category_name ILIKE '%IMPORTED%')
GROUP BY vendor
ORDER BY sum(total) DESC
LIMIT 10;
/*The good news is that absolutely no one is here to stop me from letting this query run just
to see how long it takes. Someone's paying for the resources, surely, but hopefully they're
running that server in a way that won't let me cause too much havoc ...
In the meantime, I'm going to have a cup of tea and think about how I can optimize this. It
is a fun challenge because I am a true beginner. As the ancient proverb goes, there are known
knowns, known unknowns, unknown unkno-- Ahem. What I'm trying to say here is that I have no idea
how to do this well, so I'm going to do it poorly. We've got one bad solution already bubbling
away in the cauldron. Now I'll find another way that's hopefully closer to glory.
Chapter 2:
I'm sure of two things: 1) I thought of a faster way to find the Top 10 vendors that only sell
domestic liquor in Iowa, and 2) That cup of tea didn't make me any smarter. This solution may
well be every bit as dumb as my first attempt, but as long as it's faster, I'm going to call it
a success.
Here's how it works: I get a list of the vendor_no of every vendor that sells imported liquor.
This actually shouldn't be a particularly long list of numbers, since there are (I think) not so
many distinct vendors to begin with. Next, I'll populate an array with the list of vendor_no's
so I can use it with the IN operator. I'll refactor the bonkers query from before -- it's going
strong at 3 hours and counting -- so that instead of running through all 40k+ entries searching
bit by bit (literally) for text times a zillion, there will be a relatively tiny number of trips
'round the bend. In theory. According to me. I tried asking my cat for advice, but she knows even
less about SQL than I do.
Here goes nothing, Round 2.
First, I found out how many are in the subset I’m after because if all of my assumptions are very
wrong — say, if I find out that there are 30k different vendors or something — I’ll be sorry.
*/
SELECT COUNT(DISTINCT vendor_no)
FROM sales;
SELECT COUNT(DISTINCT vendor_no)
FROM sales
WHERE category_name ILIKE '%IMPORTED%'; -- Returns 65 as expected --
/*
And now for the seeeecret sauce. I'm just pulling all 65 of those vendor_nos into Excel and =CONCAT("'",TEXTJOIN("', '",TRUE,A2:A66,"'")) et voila!
*/
SELECT COUNT(DISTINCT vendor_no)
FROM sales
WHERE vendor_no IN('10', '14', '35', '55', '65', '85', '90', '91', '108', '115',
'130', '150', '185', '190', '192', '239', '240', '255', '259',
'260', '267', '270', '277', '285', '288', '293', '295', '297',
'305', '312', '322', '330', '338', '357', '364', '365', '368',
'370', '375', '380', '386', '389', '390', '391', '395', '410',
'420', '421', '432', '434', '446', '459', '460', '461', '465',
'468', '469', '477', '487', '492', '501', '885', '962', '971',
'978'); -- Yes! Returns 65, only 15s to complete --
SELECT DISTINCT vendor, sum(total)
FROM sales
WHERE vendor_no NOT IN('10', '14', '35', '55', '65', '85', '90', '91', '108', '115',
'130', '150', '185', '190', '192', '239', '240', '255', '259',
'260', '267', '270', '277', '285', '288', '293', '295', '297',
'305', '312', '322', '330', '338', '357', '364', '365', '368',
'370', '375', '380', '386', '389', '390', '391', '395', '410',
'420', '421', '432', '434', '446', '459', '460', '461', '465',
'468', '469', '477', '487', '492', '501', '885', '962', '971',
'978')
GROUP BY vendor
ORDER BY sum(total) DESC
LIMIT 10;
/* Heck yeah! Done and done. Down to 5s of glorious linear time.
It ain't pretty and it ain't smart, but it works like a dream. */
Here are some projects I've worked on recently:
Project 1
Project 2
Project 3