CMU数据库系统作业自查(CMU Databases Systems / Fall 2019)

本文最后更新于:2024年2月12日 晚上

HOMEWORK #1 - SQL

Solution

课听的是2019年,作业作的是2021年,只因为Andy Pavlo的风格是我理想中老师的样子,听课就像看美剧一样。

HOMEWORK #1 - SQL

Q1 [0 POINTS] (Q1_SAMPLE):

sqlite3查询语句:

SELECT CategoryName FROM Category ORDER BY CategoryName;

输出:

Beverages
Condiments
Confections
Dairy Products
Grains/Cereals
Meat/Poultry
Produce
Seafood

Q2 [5 POINTS] (Q2_STRING_FUNCTION):

Get all unique ShipNames from the Order table that contain a hyphen '-'.

Details: In addition, get all the characters preceding the (first) hyphen. Return ship names alphabetically. Your first row should look like Bottom-Dollar Markets|Bottom

-- 从string截取从start到end之间的字符串,包括start和end位置的字符
substr( string, start, length )
-- SELECT substr('SQLite substr', 1, 6);

-- 返回指定字符串在str中出现的第一个位置的下标
instr(String str, String target)
-- SELECT INSTR('SQLite Tutorial','Tutorial') position = 8;
SELECT DISTINCT ShipName, SUBSTR(ShipName, 1, INSTR(ShipName, '-') - 1) 
FROM 'Order' 
WHERE ShipName LIKE "%-%" 
ORDER BY ShipName;
Bottom-Dollar Markets|Bottom
Chop-suey Chinese|Chop
GROSELLA-Restaurante|GROSELLA
HILARION-Abastos|HILARION
Hungry Owl All-Night Grocers|Hungry Owl All
LILA-Supermercado|LILA
LINO-Delicateses|LINO
QUICK-Stop|QUICK
Save-a-lot Markets|Save

答案:结果正确,sql语句基本一致,细节需要注意。

SELECT DISTINCT ShipName, substr(ShipName, 0, instr(ShipName, '-')) as PreHyphen
FROM 'Order'
WHERE ShipName LIKE '%-%'
ORDER BY ShipName ASC;

Q3 [5 POINTS] (Q3_NORTHAMERICAN):

Indicate if an order’s ShipCountry is in North America. For our purposes, this is 'USA', 'Mexico', 'Canada'

Details: You should print the Order Id, ShipCountry, and another column that is either 'NorthAmerica' or 'OtherPlace' depending on the Ship Country.
Order by the primary key (Id) ascending and return 20 rows starting from Order Id 15445 Your output should look like 15445|France|OtherPlace or 15454|Canada|NorthAmerica

SELECT 
	Id, 
	ShipCountry, 
	CASE ShipCountry 
		WHEN 'USA' 
			THEN 'NorthAmerica' 
		WHEN 'Mexico' 
			THEN 'NorthAmerica' 
		WHEN 'Canada' 
			THEN 'NorthAmerica' 
		ELSE 'OtherPlace' 
	END region
FROM 'Order'
WHERE Id >= 15445
ORDER BY Id ASC
LIMIT 20;
15445|France|OtherPlace
15446|Italy|OtherPlace
15447|Portugal|OtherPlace
15448|Argentina|OtherPlace
15449|Portugal|OtherPlace
15450|Venezuela|OtherPlace
15451|Brazil|OtherPlace
15452|France|OtherPlace
15453|France|OtherPlace
15454|Canada|NorthAmerica
15455|USA|NorthAmerica
15456|France|OtherPlace
15457|Mexico|NorthAmerica
15458|USA|NorthAmerica
15459|Germany|OtherPlace
15460|Argentina|OtherPlace
15461|Austria|OtherPlace
15462|Austria|OtherPlace
15463|Finland|OtherPlace
15464|Brazil|OtherPlace

答案:结果正确。sql语句冗长,需要改进。

SELECT Id, ShipCountry, 
       CASE 
              WHEN ShipCountry IN ('USA', 'Mexico','Canada')
              THEN 'NorthAmerica'
              ELSE 'OtherPlace'
       END
FROM 'Order'
WHERE Id >= 15445
ORDER BY Id ASC
LIMIT 20;

Q4 [10 POINTS] (Q4_DELAYPERCENT):

For each Shipper, find the percentage of orders which are late.

Details: An order is considered late if ShippedDate > RequiredDate. Print the following format, order by descending precentage, rounded to the nearest hundredths, like United Package|23.44

-- 四舍五入的保留小数点后n位有效数字
ROUND(real target, num n)
-- ROUND(1.235, 2) = 1.24

-- JOIN:连接两个表,默认是内连接(返回两表共有的部分)
SELECT 
	CASE a.ShipVia
		WHEN 1
			THEN 'Speedy Express'
		WHEN 2
			THEN 'United Package'
		ELSE 'Federal Shipping'
	END CompanyName,
	ROUND((a.cnt * 1.0 / b.cnt) * 100, 2) precentage
	FROM
        (
            SELECT ShipVia, COUNT(ShipVia) AS cnt
            FROM 'Order'
            WHERE ShippedDate > RequiredDate
            GROUP BY ShipVia
        ) a
        JOIN
        (
            SELECT ShipVia, COUNT(ShipVia) AS cnt
            FROM 'Order'
            GROUP BY ShipVia
        ) b ON (a.ShipVia = b.ShipVia)
ORDER BY precentage DESC;
Federal Shipping|23.61
Speedy Express|23.46
United Package|23.44

答案:结果正确。sql语句冗长,可以使用多次「JOIN」来连接多个表。

SELECT CompanyName, round(delayCnt * 100.0 / cnt, 2) AS pct
FROM (
      SELECT ShipVia, COUNT(*) AS cnt 
      FROM 'Order'
      GROUP BY ShipVia
     ) AS totalCnt
INNER JOIN (
            SELECT ShipVia, COUNT(*) AS delaycnt 
            FROM 'Order'
            WHERE ShippedDate > RequiredDate 
            GROUP BY ShipVia
           ) AS delayCnt
          ON totalCnt.ShipVia = delayCnt.ShipVia
INNER JOIN Shipper on totalCnt.ShipVia = Shipper.Id
ORDER BY pct DESC;

Q5 [10 POINTS] (Q5_AGGREGATES):

Compute some statistics about categories of products

Details: Get the number of products, average unit price (rounded to 2 decimal places), minimum unit price, maximum unit price, and total units on order for categories containing greater than 10 products.
Order by Category Id. Your output should look like Beverages|12|37.98|4.5|263.5|60

-- 在使用聚合函数时使用条件判断:Having 关键字
SELECT b.CategoryName, a.cnt, a.avg_price, a.min_price, a.max_price, a.sum_units
FROM
	(
        SELECT CategoryId, COUNT(CategoryId) cnt, ROUND(AVG(UnitPrice), 2) avg_price, MIN(UnitPrice) min_price, MAX(UnitPrice) max_price, SUM(UnitsONOrder) sum_units
        FROM Product
        GROUP BY CategoryId
        HAVING cnt >= 10
    ) a
    JOIN
    (
        SELECT DISTINCT(CategoryId), CategoryName
        FROM Productdetails_V
    ) b 
    ON a.CategoryId = b.CategoryId
ORDER BY a.CategoryId;
Beverages|12|37.98|4.5|263.5|60
Condiments|12|23.06|10|43.9|170
Confections|13|25.16|9.2|81|180
Dairy Products|10|28.73|2.5|55|140 -- 等于10个产品,不应该计算在内。
Seafood|12|20.68|6|62.5|120

答案:结果错误,结果应该大于10个产品。sql语句先连接后计算会比较简洁。

SELECT CategoryName
     , COUNT(*) AS CategoryCount
     , ROUND(AVG(UnitPrice), 2) AS AvgUnitPrice
     , MIN(UnitPrice) AS MinUnitPrice
     , MAX(UnitPrice) AS MaxUnitPrice
     , SUM(UnitsOnOrder) AS TotalUnitsOnOrder
FROM Product INNER JOIN Category on CategoryId = Category.Id
GROUP BY CategoryId
HAVING CategoryCount > 10
ORDER BY CategoryId;
Beverages|12|37.98|4.5|263.5|60
Condiments|12|23.06|10|43.9|170
Confections|13|25.16|9.2|81|180
Seafood|12|20.68|6|62.5|120

Q6 [10 POINTS] (Q6_DISCONTINUED):

Of the 8 discontinued products in the database, what is the customer’s CompanyName and ContactName of the first order ever made for this product?

Details: Print the following format, order by ProductName alphabetically: Alice Mutton|Consolidated Holdings|Elizabeth Brown

SELECT c.ProductName, d.CompanyName, d.ContactName
FROM
    (
        SELECT a.ProductName, b.CustomerId, MIN(b.OrderDate)
        FROM
            (
                SELECT od.OrderId, pd.ProductName
                FROM
                    OrderDetail od
                    JOIN
                    (
                        SELECT Id, ProductName
                        FROM Product
                        WHERE Discontinued = 1
                    ) pd
                    ON od.ProductId = pd.Id
            ) a
            JOIN
            (
                'Order'
            ) b
            ON a.OrderId = b.Id
        GROUP BY a.ProductName
    ) c
    JOIN
    (Customer) d
    ON c.CustomerId = d.id
ORDER BY c.ProductName;
Alice Mutton|Consolidated Holdings|Elizabeth Brown
Chef Anton's Gumbo Mix|Piccolo und mehr|Georg Pipps
Guaraná Fantástica|Piccolo und mehr|Georg Pipps
Mishi Kobe Niku|Old World Delicatessen|Rene Phillips
Perth Pasties|Piccolo und mehr|Georg Pipps
Rössle Sauerkraut|Piccolo und mehr|Georg Pipps
Singaporean Hokkien Fried Mee|Vins et alcools Chevalier|Paul Henriot
Thüringer Rostbratwurst|Piccolo und mehr|Georg Pipps

答案:结果正确。sql应先使用多次连接再查询,会简洁一些。

SELECT pname, CompanyName, ContactName
FROM (
      SELECT pname, min(OrderDate), CompanyName, ContactName
      FROM (
            SELECT Id AS pid, ProductName AS pname 
            FROM Product 
            WHERE Discontinued != 0
           ) as discontinued
      INNER JOIN OrderDetail on ProductId = pid
      INNER JOIN 'Order' on 'Order'.Id = OrderDetail.OrderId
      INNER JOIN Customer on CustomerId = Customer.Id
      GROUP BY pid
    )
ORDER BY pname ASC;

Q7 [15 POINTS] (Q7_ORDER_LAGS):

For the first 10 orders by CutomerId BLONP: get the Order’s Id, OrderDate, previous OrderDate, and difference between the previous and current. Return results ordered by OrderDate (ascending)

Details: The “previous” OrderDate for the first order should default to itself (lag time = 0). Use the julianday() function for date arithmetic (example).
Use lag(expr, offset, default) for grabbing previous dates.
Please round the lag time to the nearest hundredth, formatted like 17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57

Note: For more details on window functions, see here.

-- LAG(express, offset, default):找到当前元组的上一offset个元组,并根据express返回前面行的值(必须是单一值)
-- express:是根据指定的偏移量对前一行的值求值的表达式。表达式必须返回单个值。
-- offset:两行之间的偏移量
-- default:默认值,当找不到上一行时返回该值
-- PARTITION BY:根据表达式(属性)来进行分组
LAG(express, offset, default) OVER (PARTITION BY expr1, expr2,... ORDER BY exp1 ASC, exp2 DESC)

-- julianday(now) - julianday(before):now和before之间的时间差,单位为天
SELECT 
	Id, 
	OrderDate, 
	LAG(OrderDate, 1, 0) OVER (ORDER BY OrderDate),
	ROUND
	(
        julianday(OrderDate) - 
        julianday(
            LAG(OrderDate, 1, OrderDate) 
            OVER (ORDER BY OrderDate)
        ),
        2
    )
FROM 'Order'
WHERE CustomerId = 'BLONP'
ORDER BY OrderDate ASC
LIMIT 10;
16766|2012-07-22 23:11:15|0|0.0 -- 初始默认值应该使用该天的日期。
10265|2012-07-25|2012-07-22 23:11:15|2.03
12594|2012-08-16 12:35:15|2012-07-25|22.52
20249|2012-08-16 16:52:23|2012-08-16 12:35:15|0.18
20882|2012-08-18 19:11:48|2012-08-16 16:52:23|2.1
18443|2012-08-28 05:34:03|2012-08-18 19:11:48|9.43
10297|2012-09-04|2012-08-28 05:34:03|6.77
11694|2012-09-17 00:27:14|2012-09-04|13.02
25613|2012-09-18 22:37:15|2012-09-17 00:27:14|1.92
17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57

答案:答案错误,初始「LAG」函数默认时间应该是该天,也算合理。sql语句还行。

SELECT
     Id
     , OrderDate
     , PrevOrderDate
     , ROUND(julianday(OrderDate) - julianday(PrevOrderDate), 2)
FROM (
     SELECT Id
          , OrderDate
          , LAG(OrderDate, 1, OrderDate) OVER (ORDER BY OrderDate ASC) AS PrevOrderDate
     FROM 'Order' 
     WHERE CustomerId = 'BLONP'
     ORDER BY OrderDate ASC
     LIMIT 10
);
16766|2012-07-22 23:11:15|2012-07-22 23:11:15|0.0
10265|2012-07-25|2012-07-22 23:11:15|2.03
12594|2012-08-16 12:35:15|2012-07-25|22.52
20249|2012-08-16 16:52:23|2012-08-16 12:35:15|0.18
20882|2012-08-18 19:11:48|2012-08-16 16:52:23|2.1
18443|2012-08-28 05:34:03|2012-08-18 19:11:48|9.43
10297|2012-09-04|2012-08-28 05:34:03|6.77
11694|2012-09-17 00:27:14|2012-09-04|13.02
25613|2012-09-18 22:37:15|2012-09-17 00:27:14|1.92
17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57

Q8 [15 POINTS] (Q8_TOTAL_COST_QUARTILES):

For each Customer, get the CompanyName, CustomerId, and “total expenditures”. Output the bottom quartile of Customers, as measured by total expenditures.

Details: Calculate expenditure using UnitPrice and Quantity (ignore Discount). Compute the quartiles for each company’s total expenditures using NTILE. The bottom quartile is the 1st quartile, order them by increasing expenditure.
Make sure your output is formatted as follows (round expenditure to nearest hundredths): Bon app|BONAP|4485708.49

Note: There are orders for CustomerIds that don’t appear in the Customer table. You should still consider these “Customers” and output them. If the CompanyName is missing, override the NULL to 'MISSING_NAME' using IFNULL.

-- sqlite不支持右连接,左连接保留左边表的所有值,未匹配的为NULL

-- IFNULL(express1, express2):当express1的值为NULL时用express2来代替

-- 分桶函数:num:桶的数量
-- NTILE(num) OVER(PARTITION BY expr1, ... ORDER expr1 [ASC|DESC],...)
SELECT 
	IFNULL(t2.CompanyName, "MISSING_NAME"),
	t1.CustomerId,
	ROUND(t1.s, 2)
FROM
	(
        SELECT 
            a.CustomerId, 
            SUM(b.Quantity * b.UnitPrice) s,
            NTILE(4) OVER (
                ORDER BY SUM(b.Quantity * b.UnitPrice) ASC
            ) bkt
        FROM
            'Order' a
            JOIN
            OrderDetail b
            ON a.Id = b.OrderId
        GROUP BY a.CustomerId
    ) t1
    LEFT JOIN 
    Customer t2
    ON t2.Id = t1.customerId
WHERE t1.bkt = 1;
MISSING_NAME|DUMO|1615.9
MISSING_NAME|OCEA|3460.2
MISSING_NAME|ANTO|7515.35
MISSING_NAME|QUEE|30226.1
Trail's Head Gourmet Provisioners|TRAIH|3874502.02
Blondesddsl père et fils|BLONP|3879728.69
Around the Horn|AROUT|4395636.28
Hungry Owl All-Night Grocers|HUNGO|4431457.1
Bon app|BONAP|4485708.49
Bólido Comidas preparadas|BOLID|4520121.88
Galería del gastrónomo|GALED|4533089.9
FISSA Fabrica Inter. Salchichas S.A.|FISSA|4554591.02
Maison Dewey|MAISD|4555931.37
Cactus Comidas para llevar|CACTU|4559046.87
Spécialités du monde|SPECD|4571764.89
Magazzini Alimentari Riuniti|MAGAA|4572382.35
Toms Spezialit?ten|TOMSP|4628403.36
Split Rail Beer & Ale|SPLIR|4641383.53
Santé Gourmet|SANTG|4647668.15
Morgenstern Gesundkost|MORGK|4676234.2
White Clover Markets|WHITC|4681531.74
La corne d'abondance|LACOR|4724494.22
Victuailles en stock|VICTE|4726476.33
Lonesome Pine Restaurant|LONEP|4735780.66

答案:结果正确。sql语句各有千秋,没有使用「WITH」语句来建立两个临时表。

WITH expenditures AS (
    SELECT
        IFNULL(c.CompanyName, 'MISSING_NAME') AS CompanyName,
        o.CustomerId,
        ROUND(SUM(od.Quantity * od.UnitPrice), 2) AS TotalCost
    FROM 'Order' AS o
    INNER JOIN OrderDetail od on od.OrderId = o.Id
    LEFT JOIN Customer c on c.Id = o.CustomerId
    GROUP BY o.CustomerId
),
quartiles AS (
    SELECT *, NTILE(4) OVER (ORDER BY TotalCost ASC) AS ExpenditureQuartile
    FROM expenditures
)
SELECT CompanyName, CustomerId, TotalCost
FROM quartiles
WHERE ExpenditureQuartile = 1
ORDER BY TotalCost ASC;

Q9 [15 POINTS] (Q9_YOUNGBLOOD):

Find the youngest employee serving each Region. If a Region is not served by an employee, ignore it.

Details: Print the Region Description, First Name, Last Name, and Birth Date. Order by Region Id.
Your first row should look like Eastern|Steven|Buchanan|1987-03-04

xxxx -- 多表连接,只能嵌套连接
-- 多表连接,可以使用多次「JOIN」
SELECT a.RegionDescription, b.FirstName, b.LastName, MAX(b.BirthDate)
FROM
    (
       SELECT a.EmployeeId, r.RegionDescription, r.Id
       FROM
            (
                SELECT et.EmployeeId, t.Id, t.RegionId
                FROM 
                    EmployeeTerritory et
                    JOIN 
                    Territory t
                    ON et.TerritoryId = t.Id
            ) a
            JOIN
            Region r
            ON a.RegionId = r.Id
    ) a
    JOIN
    Employee b
    ON a.EmployeeId = b.Id
GROUP BY a.Id
ORDER BY a.Id;
Eastern|Steven|Buchanan|1987-03-04
Western|Michael|Suyama|1995-07-02
Northern|Anne|Dodsworth|1998-01-27
Southern|Janet|Leverling|1995-08-30

答案:结果正确。

SELECT RegionDescription, FirstName, LastName, bday
FROM 
(
  SELECT RegionId AS rid, MAX(Employee.Birthdate) AS bday 
  FROM Employee
    INNER JOIN EmployeeTerritory ON Employee.Id = EmployeeTerritory.EmployeeId
    INNER JOIN Territory ON TerritoryId = Territory.Id
  GROUP BY RegionId
)
INNER JOIN (
            SELECT FirstName, LastName, Birthdate, RegionId, EmployeeId
            FROM Employee
              INNER JOIN EmployeeTerritory ON Employee.Id = EmployeeTerritory.EmployeeId
              INNER JOIN Territory ON TerritoryId = Territory.Id
           )
           ON Birthdate = bday AND rid = RegionId
INNER JOIN Region ON Region.Id = RegionId
GROUP BY EmployeeId
ORDER BY rid;

Q10 [15 POINTS] (Q10_CHRISTMAS):

Concatenate the ProductNames ordered by the Company 'Queen Cozinha' on 2014-12-25.

Details: Order the products by Id (ascending). Print a single string containing all the dup names separated by commas like Mishi Kobe Niku, NuNuCa Nuß-Nougat-Creme...
Hint: You might find Recursive CTEs useful.

-- 递归查询的练习
WITH RECURSIVE res(id, name) AS (
    SELECT 0, ''
    UNION ALL
    SELECT 
    	id + 1,
    	name || 
    	(
            SELECT a.ProductName
            FROM
            	(
                    SELECT 
                        b.Id, 
                        b.ProductName,
                        ROW_NUMBER() OVER(
                            ORDER BY b.Id
                        ) num
                    FROM
                        (
                            SELECT a.Id, b.ProductId
                            FROM
                                (
                                    SELECT a.Id
                                    FROM
                                        'Order' a
                                        JOIN 
                                        Customer b
                                        ON a.CustomerId = b.Id
                                    WHERE 
                                        b.CompanyName = 'Queen Cozinha' 
                                        AND 
                                        DATE(OrderDate) = DATE('2014-12-25')
                                ) a
                                JOIN
                                OrderDetail b
                                ON a.Id = b.OrderId
                        ) a
                        JOIN
                        Product b
                        ON a.ProductId = b.Id
                    ORDER BY b.Id
            	) a
            	WHERE a.num = res.id + 1
        ) || ', '
    FROM res
    WHERE res.id < 9
)
SELECT SUBSTR(name, 0, LENGTH(name) - 1)
FROM res
WHERE id = 9;
ProductId|ProductName|ROW_NUMBER
9|Mishi Kobe Niku|1
25|NuNuCa Nuß-Nougat-Creme|2
27|Schoggi Schokolade|3
32|Mascarpone Fabioli|4
34|Sasquatch Ale|5
40|Boston Crab Meat|6
51|Manjimup Dried Apples|7
74|Longlife Tofu|8
76|Lakkalikööri|9
Mishi Kobe Niku,NuNuCa Nuß-Nougat-Creme,Schoggi Schokolade,Mascarpone Fabioli,Sasquatch Ale,Boston Crab Meat,Manjimup Dried Apples,Longlife Tofu,Lakkalikööri

答案:结果错误,两个单词之间应该用逗号和空格隔开。sql语句就是坨屎。可以使用「WITH」语句将查询结果保存为一个临时表,这样方便多次调用查询。

with p as (
      select Product.Id, Product.ProductName as name
      from Product
            inner join OrderDetail on Product.id = OrderDetail.ProductId
            inner join 'Order' on 'Order'.Id = OrderDetail.OrderId
            inner join Customer on CustomerId = Customer.Id
      where DATE(OrderDate) = '2014-12-25' and CompanyName = 'Queen Cozinha'
      group by Product.id
),
c as (
      select row_number() over (order by p.id asc) as seqnum, p.name as name
      from p
),
flattened as (
      select seqnum, name as name
      from c
      where seqnum = 1
      union all
      select c.seqnum, f.name || ', ' || c.name
      from c join
            flattened f
            on c.seqnum = f.seqnum + 1
)
select name from flattened
order by seqnum desc limit 1;
Mishi Kobe Niku, NuNuCa Nuß-Nougat-Creme, Schoggi Schokolade, Mascarpone Fabioli, Sasquatch Ale, Boston Crab Meat, Manjimup Dried Apples, Longlife Tofu, Lakkalikööri

总体评价

结果总体正确,有些地方没有注意细节。自己写的sql语句有点像幼儿园,看答案学到很多。