CMU数据库系统作业自查(CMU Databases Systems / Fall 2019)
本文最后更新于:2024年2月12日 晚上
课听的是2019年,作业作的是2021年,只因为Andy Pavlo的风格是我理想中老师的样子,听课就像看美剧一样。
HOMEWORK #1 - SQL
Q1 [0 POINTS] (Q1_SAMPLE):
sqlite3查询语句:
SELECT CategoryName FROM Category ORDER BY CategoryName;
输出:
Beverages
Condiments
Confections
Dairy Products
Grains/Cereals
Meat/Poultry
Produce
Seafood
Q2 [5 POINTS] (Q2_STRING_FUNCTION):
Get all unique ShipNames
from the Order table that contain a hyphen '-'
.
Details: In addition, get all the characters preceding the (first) hyphen. Return ship names alphabetically. Your first row should look like Bottom-Dollar Markets|Bottom
-- 从string截取从start到end之间的字符串,包括start和end位置的字符
substr( string, start, length )
-- SELECT substr('SQLite substr', 1, 6);
-- 返回指定字符串在str中出现的第一个位置的下标
instr(String str, String target)
-- SELECT INSTR('SQLite Tutorial','Tutorial') position = 8;
SELECT DISTINCT ShipName, SUBSTR(ShipName, 1, INSTR(ShipName, '-') - 1)
FROM 'Order'
WHERE ShipName LIKE "%-%"
ORDER BY ShipName;
Bottom-Dollar Markets|Bottom
Chop-suey Chinese|Chop
GROSELLA-Restaurante|GROSELLA
HILARION-Abastos|HILARION
Hungry Owl All-Night Grocers|Hungry Owl All
LILA-Supermercado|LILA
LINO-Delicateses|LINO
QUICK-Stop|QUICK
Save-a-lot Markets|Save
答案:结果正确,sql语句基本一致,细节需要注意。
SELECT DISTINCT ShipName, substr(ShipName, 0, instr(ShipName, '-')) as PreHyphen
FROM 'Order'
WHERE ShipName LIKE '%-%'
ORDER BY ShipName ASC;
Q3 [5 POINTS] (Q3_NORTHAMERICAN):
Indicate if an order’s ShipCountry
is in North America. For our purposes, this is 'USA', 'Mexico', 'Canada'
Details: You should print the Order Id
, ShipCountry
, and another column that is either 'NorthAmerica'
or 'OtherPlace'
depending on the Ship Country.
Order by the primary key (Id
) ascending and return 20 rows starting from Order Id 15445
Your output should look like 15445|France|OtherPlace
or 15454|Canada|NorthAmerica
SELECT
Id,
ShipCountry,
CASE ShipCountry
WHEN 'USA'
THEN 'NorthAmerica'
WHEN 'Mexico'
THEN 'NorthAmerica'
WHEN 'Canada'
THEN 'NorthAmerica'
ELSE 'OtherPlace'
END region
FROM 'Order'
WHERE Id >= 15445
ORDER BY Id ASC
LIMIT 20;
15445|France|OtherPlace
15446|Italy|OtherPlace
15447|Portugal|OtherPlace
15448|Argentina|OtherPlace
15449|Portugal|OtherPlace
15450|Venezuela|OtherPlace
15451|Brazil|OtherPlace
15452|France|OtherPlace
15453|France|OtherPlace
15454|Canada|NorthAmerica
15455|USA|NorthAmerica
15456|France|OtherPlace
15457|Mexico|NorthAmerica
15458|USA|NorthAmerica
15459|Germany|OtherPlace
15460|Argentina|OtherPlace
15461|Austria|OtherPlace
15462|Austria|OtherPlace
15463|Finland|OtherPlace
15464|Brazil|OtherPlace
答案:结果正确。sql语句冗长,需要改进。
SELECT Id, ShipCountry,
CASE
WHEN ShipCountry IN ('USA', 'Mexico','Canada')
THEN 'NorthAmerica'
ELSE 'OtherPlace'
END
FROM 'Order'
WHERE Id >= 15445
ORDER BY Id ASC
LIMIT 20;
Q4 [10 POINTS] (Q4_DELAYPERCENT):
For each Shipper
, find the percentage of orders which are late.
Details: An order is considered late if ShippedDate > RequiredDate
. Print the following format, order by descending precentage, rounded to the nearest hundredths, like United Package|23.44
-- 四舍五入的保留小数点后n位有效数字
ROUND(real target, num n)
-- ROUND(1.235, 2) = 1.24
-- JOIN:连接两个表,默认是内连接(返回两表共有的部分)
SELECT
CASE a.ShipVia
WHEN 1
THEN 'Speedy Express'
WHEN 2
THEN 'United Package'
ELSE 'Federal Shipping'
END CompanyName,
ROUND((a.cnt * 1.0 / b.cnt) * 100, 2) precentage
FROM
(
SELECT ShipVia, COUNT(ShipVia) AS cnt
FROM 'Order'
WHERE ShippedDate > RequiredDate
GROUP BY ShipVia
) a
JOIN
(
SELECT ShipVia, COUNT(ShipVia) AS cnt
FROM 'Order'
GROUP BY ShipVia
) b ON (a.ShipVia = b.ShipVia)
ORDER BY precentage DESC;
Federal Shipping|23.61
Speedy Express|23.46
United Package|23.44
答案:结果正确。sql语句冗长,可以使用多次「JOIN」来连接多个表。
SELECT CompanyName, round(delayCnt * 100.0 / cnt, 2) AS pct
FROM (
SELECT ShipVia, COUNT(*) AS cnt
FROM 'Order'
GROUP BY ShipVia
) AS totalCnt
INNER JOIN (
SELECT ShipVia, COUNT(*) AS delaycnt
FROM 'Order'
WHERE ShippedDate > RequiredDate
GROUP BY ShipVia
) AS delayCnt
ON totalCnt.ShipVia = delayCnt.ShipVia
INNER JOIN Shipper on totalCnt.ShipVia = Shipper.Id
ORDER BY pct DESC;
Q5 [10 POINTS] (Q5_AGGREGATES):
Compute some statistics about categories of products
Details: Get the number of products, average unit price (rounded to 2 decimal places), minimum unit price, maximum unit price, and total units on order for categories containing greater than 10 products.
Order by Category Id
. Your output should look like Beverages|12|37.98|4.5|263.5|60
-- 在使用聚合函数时使用条件判断:Having 关键字
SELECT b.CategoryName, a.cnt, a.avg_price, a.min_price, a.max_price, a.sum_units
FROM
(
SELECT CategoryId, COUNT(CategoryId) cnt, ROUND(AVG(UnitPrice), 2) avg_price, MIN(UnitPrice) min_price, MAX(UnitPrice) max_price, SUM(UnitsONOrder) sum_units
FROM Product
GROUP BY CategoryId
HAVING cnt >= 10
) a
JOIN
(
SELECT DISTINCT(CategoryId), CategoryName
FROM Productdetails_V
) b
ON a.CategoryId = b.CategoryId
ORDER BY a.CategoryId;
Beverages|12|37.98|4.5|263.5|60
Condiments|12|23.06|10|43.9|170
Confections|13|25.16|9.2|81|180
Dairy Products|10|28.73|2.5|55|140 -- 等于10个产品,不应该计算在内。
Seafood|12|20.68|6|62.5|120
答案:结果错误,结果应该大于10个产品。sql语句先连接后计算会比较简洁。
SELECT CategoryName
, COUNT(*) AS CategoryCount
, ROUND(AVG(UnitPrice), 2) AS AvgUnitPrice
, MIN(UnitPrice) AS MinUnitPrice
, MAX(UnitPrice) AS MaxUnitPrice
, SUM(UnitsOnOrder) AS TotalUnitsOnOrder
FROM Product INNER JOIN Category on CategoryId = Category.Id
GROUP BY CategoryId
HAVING CategoryCount > 10
ORDER BY CategoryId;
Beverages|12|37.98|4.5|263.5|60
Condiments|12|23.06|10|43.9|170
Confections|13|25.16|9.2|81|180
Seafood|12|20.68|6|62.5|120
Q6 [10 POINTS] (Q6_DISCONTINUED):
Of the 8 discontinued products in the database, what is the customer’s CompanyName
and ContactName
of the first order ever made for this product?
Details: Print the following format, order by ProductName
alphabetically: Alice Mutton|Consolidated Holdings|Elizabeth Brown
SELECT c.ProductName, d.CompanyName, d.ContactName
FROM
(
SELECT a.ProductName, b.CustomerId, MIN(b.OrderDate)
FROM
(
SELECT od.OrderId, pd.ProductName
FROM
OrderDetail od
JOIN
(
SELECT Id, ProductName
FROM Product
WHERE Discontinued = 1
) pd
ON od.ProductId = pd.Id
) a
JOIN
(
'Order'
) b
ON a.OrderId = b.Id
GROUP BY a.ProductName
) c
JOIN
(Customer) d
ON c.CustomerId = d.id
ORDER BY c.ProductName;
Alice Mutton|Consolidated Holdings|Elizabeth Brown
Chef Anton's Gumbo Mix|Piccolo und mehr|Georg Pipps
Guaraná Fantástica|Piccolo und mehr|Georg Pipps
Mishi Kobe Niku|Old World Delicatessen|Rene Phillips
Perth Pasties|Piccolo und mehr|Georg Pipps
Rössle Sauerkraut|Piccolo und mehr|Georg Pipps
Singaporean Hokkien Fried Mee|Vins et alcools Chevalier|Paul Henriot
Thüringer Rostbratwurst|Piccolo und mehr|Georg Pipps
答案:结果正确。sql应先使用多次连接再查询,会简洁一些。
SELECT pname, CompanyName, ContactName
FROM (
SELECT pname, min(OrderDate), CompanyName, ContactName
FROM (
SELECT Id AS pid, ProductName AS pname
FROM Product
WHERE Discontinued != 0
) as discontinued
INNER JOIN OrderDetail on ProductId = pid
INNER JOIN 'Order' on 'Order'.Id = OrderDetail.OrderId
INNER JOIN Customer on CustomerId = Customer.Id
GROUP BY pid
)
ORDER BY pname ASC;
Q7 [15 POINTS] (Q7_ORDER_LAGS):
For the first 10 orders by CutomerId BLONP
: get the Order’s Id
, OrderDate
, previous OrderDate
, and difference between the previous and current. Return results ordered by OrderDate
(ascending)
Details: The “previous” OrderDate
for the first order should default to itself (lag time = 0). Use the julianday()
function for date arithmetic (example).
Use lag(expr, offset, default) for grabbing previous dates.
Please round the lag time to the nearest hundredth, formatted like 17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57
Note: For more details on window functions, see here.
-- LAG(express, offset, default):找到当前元组的上一offset个元组,并根据express返回前面行的值(必须是单一值)
-- express:是根据指定的偏移量对前一行的值求值的表达式。表达式必须返回单个值。
-- offset:两行之间的偏移量
-- default:默认值,当找不到上一行时返回该值
-- PARTITION BY:根据表达式(属性)来进行分组
LAG(express, offset, default) OVER (PARTITION BY expr1, expr2,... ORDER BY exp1 ASC, exp2 DESC)
-- julianday(now) - julianday(before):now和before之间的时间差,单位为天
SELECT
Id,
OrderDate,
LAG(OrderDate, 1, 0) OVER (ORDER BY OrderDate),
ROUND
(
julianday(OrderDate) -
julianday(
LAG(OrderDate, 1, OrderDate)
OVER (ORDER BY OrderDate)
),
2
)
FROM 'Order'
WHERE CustomerId = 'BLONP'
ORDER BY OrderDate ASC
LIMIT 10;
16766|2012-07-22 23:11:15|0|0.0 -- 初始默认值应该使用该天的日期。
10265|2012-07-25|2012-07-22 23:11:15|2.03
12594|2012-08-16 12:35:15|2012-07-25|22.52
20249|2012-08-16 16:52:23|2012-08-16 12:35:15|0.18
20882|2012-08-18 19:11:48|2012-08-16 16:52:23|2.1
18443|2012-08-28 05:34:03|2012-08-18 19:11:48|9.43
10297|2012-09-04|2012-08-28 05:34:03|6.77
11694|2012-09-17 00:27:14|2012-09-04|13.02
25613|2012-09-18 22:37:15|2012-09-17 00:27:14|1.92
17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57
答案:答案错误,初始「LAG」函数默认时间应该是该天,也算合理。sql语句还行。
SELECT
Id
, OrderDate
, PrevOrderDate
, ROUND(julianday(OrderDate) - julianday(PrevOrderDate), 2)
FROM (
SELECT Id
, OrderDate
, LAG(OrderDate, 1, OrderDate) OVER (ORDER BY OrderDate ASC) AS PrevOrderDate
FROM 'Order'
WHERE CustomerId = 'BLONP'
ORDER BY OrderDate ASC
LIMIT 10
);
16766|2012-07-22 23:11:15|2012-07-22 23:11:15|0.0
10265|2012-07-25|2012-07-22 23:11:15|2.03
12594|2012-08-16 12:35:15|2012-07-25|22.52
20249|2012-08-16 16:52:23|2012-08-16 12:35:15|0.18
20882|2012-08-18 19:11:48|2012-08-16 16:52:23|2.1
18443|2012-08-28 05:34:03|2012-08-18 19:11:48|9.43
10297|2012-09-04|2012-08-28 05:34:03|6.77
11694|2012-09-17 00:27:14|2012-09-04|13.02
25613|2012-09-18 22:37:15|2012-09-17 00:27:14|1.92
17361|2012-09-19 12:13:21|2012-09-18 22:37:15|0.57
Q8 [15 POINTS] (Q8_TOTAL_COST_QUARTILES):
For each Customer
, get the CompanyName, CustomerId, and “total expenditures”. Output the bottom quartile of Customers, as measured by total expenditures.
Details: Calculate expenditure using UnitPrice
and Quantity
(ignore Discount
). Compute the quartiles for each company’s total expenditures using NTILE. The bottom quartile is the 1st quartile, order them by increasing expenditure.
Make sure your output is formatted as follows (round expenditure to nearest hundredths): Bon app|BONAP|4485708.49
Note: There are orders for CustomerId
s that don’t appear in the Customer
table. You should still consider these “Customers” and output them. If the CompanyName
is missing, override the NULL
to 'MISSING_NAME'
using IFNULL
.
-- sqlite不支持右连接,左连接保留左边表的所有值,未匹配的为NULL
-- IFNULL(express1, express2):当express1的值为NULL时用express2来代替
-- 分桶函数:num:桶的数量
-- NTILE(num) OVER(PARTITION BY expr1, ... ORDER expr1 [ASC|DESC],...)
SELECT
IFNULL(t2.CompanyName, "MISSING_NAME"),
t1.CustomerId,
ROUND(t1.s, 2)
FROM
(
SELECT
a.CustomerId,
SUM(b.Quantity * b.UnitPrice) s,
NTILE(4) OVER (
ORDER BY SUM(b.Quantity * b.UnitPrice) ASC
) bkt
FROM
'Order' a
JOIN
OrderDetail b
ON a.Id = b.OrderId
GROUP BY a.CustomerId
) t1
LEFT JOIN
Customer t2
ON t2.Id = t1.customerId
WHERE t1.bkt = 1;
MISSING_NAME|DUMO|1615.9
MISSING_NAME|OCEA|3460.2
MISSING_NAME|ANTO|7515.35
MISSING_NAME|QUEE|30226.1
Trail's Head Gourmet Provisioners|TRAIH|3874502.02
Blondesddsl père et fils|BLONP|3879728.69
Around the Horn|AROUT|4395636.28
Hungry Owl All-Night Grocers|HUNGO|4431457.1
Bon app|BONAP|4485708.49
Bólido Comidas preparadas|BOLID|4520121.88
Galería del gastrónomo|GALED|4533089.9
FISSA Fabrica Inter. Salchichas S.A.|FISSA|4554591.02
Maison Dewey|MAISD|4555931.37
Cactus Comidas para llevar|CACTU|4559046.87
Spécialités du monde|SPECD|4571764.89
Magazzini Alimentari Riuniti|MAGAA|4572382.35
Toms Spezialit?ten|TOMSP|4628403.36
Split Rail Beer & Ale|SPLIR|4641383.53
Santé Gourmet|SANTG|4647668.15
Morgenstern Gesundkost|MORGK|4676234.2
White Clover Markets|WHITC|4681531.74
La corne d'abondance|LACOR|4724494.22
Victuailles en stock|VICTE|4726476.33
Lonesome Pine Restaurant|LONEP|4735780.66
答案:结果正确。sql语句各有千秋,没有使用「WITH」语句来建立两个临时表。
WITH expenditures AS (
SELECT
IFNULL(c.CompanyName, 'MISSING_NAME') AS CompanyName,
o.CustomerId,
ROUND(SUM(od.Quantity * od.UnitPrice), 2) AS TotalCost
FROM 'Order' AS o
INNER JOIN OrderDetail od on od.OrderId = o.Id
LEFT JOIN Customer c on c.Id = o.CustomerId
GROUP BY o.CustomerId
),
quartiles AS (
SELECT *, NTILE(4) OVER (ORDER BY TotalCost ASC) AS ExpenditureQuartile
FROM expenditures
)
SELECT CompanyName, CustomerId, TotalCost
FROM quartiles
WHERE ExpenditureQuartile = 1
ORDER BY TotalCost ASC;
Q9 [15 POINTS] (Q9_YOUNGBLOOD):
Find the youngest employee serving each Region
. If a Region is not served by an employee, ignore it.
Details: Print the Region Description, First Name, Last Name, and Birth Date. Order by Region Id.
Your first row should look like Eastern|Steven|Buchanan|1987-03-04
xxxx -- 多表连接,只能嵌套连接
-- 多表连接,可以使用多次「JOIN」
SELECT a.RegionDescription, b.FirstName, b.LastName, MAX(b.BirthDate)
FROM
(
SELECT a.EmployeeId, r.RegionDescription, r.Id
FROM
(
SELECT et.EmployeeId, t.Id, t.RegionId
FROM
EmployeeTerritory et
JOIN
Territory t
ON et.TerritoryId = t.Id
) a
JOIN
Region r
ON a.RegionId = r.Id
) a
JOIN
Employee b
ON a.EmployeeId = b.Id
GROUP BY a.Id
ORDER BY a.Id;
Eastern|Steven|Buchanan|1987-03-04
Western|Michael|Suyama|1995-07-02
Northern|Anne|Dodsworth|1998-01-27
Southern|Janet|Leverling|1995-08-30
答案:结果正确。
SELECT RegionDescription, FirstName, LastName, bday
FROM
(
SELECT RegionId AS rid, MAX(Employee.Birthdate) AS bday
FROM Employee
INNER JOIN EmployeeTerritory ON Employee.Id = EmployeeTerritory.EmployeeId
INNER JOIN Territory ON TerritoryId = Territory.Id
GROUP BY RegionId
)
INNER JOIN (
SELECT FirstName, LastName, Birthdate, RegionId, EmployeeId
FROM Employee
INNER JOIN EmployeeTerritory ON Employee.Id = EmployeeTerritory.EmployeeId
INNER JOIN Territory ON TerritoryId = Territory.Id
)
ON Birthdate = bday AND rid = RegionId
INNER JOIN Region ON Region.Id = RegionId
GROUP BY EmployeeId
ORDER BY rid;
Q10 [15 POINTS] (Q10_CHRISTMAS):
Concatenate the ProductName
s ordered by the Company 'Queen Cozinha'
on 2014-12-25
.
Details: Order the products by Id (ascending). Print a single string containing all the dup names separated by commas like Mishi Kobe Niku, NuNuCa Nuß-Nougat-Creme...
Hint: You might find Recursive CTEs useful.
-- 递归查询的练习
WITH RECURSIVE res(id, name) AS (
SELECT 0, ''
UNION ALL
SELECT
id + 1,
name ||
(
SELECT a.ProductName
FROM
(
SELECT
b.Id,
b.ProductName,
ROW_NUMBER() OVER(
ORDER BY b.Id
) num
FROM
(
SELECT a.Id, b.ProductId
FROM
(
SELECT a.Id
FROM
'Order' a
JOIN
Customer b
ON a.CustomerId = b.Id
WHERE
b.CompanyName = 'Queen Cozinha'
AND
DATE(OrderDate) = DATE('2014-12-25')
) a
JOIN
OrderDetail b
ON a.Id = b.OrderId
) a
JOIN
Product b
ON a.ProductId = b.Id
ORDER BY b.Id
) a
WHERE a.num = res.id + 1
) || ', '
FROM res
WHERE res.id < 9
)
SELECT SUBSTR(name, 0, LENGTH(name) - 1)
FROM res
WHERE id = 9;
ProductId|ProductName|ROW_NUMBER
9|Mishi Kobe Niku|1
25|NuNuCa Nuß-Nougat-Creme|2
27|Schoggi Schokolade|3
32|Mascarpone Fabioli|4
34|Sasquatch Ale|5
40|Boston Crab Meat|6
51|Manjimup Dried Apples|7
74|Longlife Tofu|8
76|Lakkalikööri|9
Mishi Kobe Niku,NuNuCa Nuß-Nougat-Creme,Schoggi Schokolade,Mascarpone Fabioli,Sasquatch Ale,Boston Crab Meat,Manjimup Dried Apples,Longlife Tofu,Lakkalikööri
答案:结果错误,两个单词之间应该用逗号和空格隔开。sql语句就是坨屎。可以使用「WITH」语句将查询结果保存为一个临时表,这样方便多次调用查询。
with p as (
select Product.Id, Product.ProductName as name
from Product
inner join OrderDetail on Product.id = OrderDetail.ProductId
inner join 'Order' on 'Order'.Id = OrderDetail.OrderId
inner join Customer on CustomerId = Customer.Id
where DATE(OrderDate) = '2014-12-25' and CompanyName = 'Queen Cozinha'
group by Product.id
),
c as (
select row_number() over (order by p.id asc) as seqnum, p.name as name
from p
),
flattened as (
select seqnum, name as name
from c
where seqnum = 1
union all
select c.seqnum, f.name || ', ' || c.name
from c join
flattened f
on c.seqnum = f.seqnum + 1
)
select name from flattened
order by seqnum desc limit 1;
Mishi Kobe Niku, NuNuCa Nuß-Nougat-Creme, Schoggi Schokolade, Mascarpone Fabioli, Sasquatch Ale, Boston Crab Meat, Manjimup Dried Apples, Longlife Tofu, Lakkalikööri
总体评价
结果总体正确,有些地方没有注意细节。自己写的sql语句有点像幼儿园,看答案学到很多。
本文作者: MerickBao
本文链接: https://merickbao.top/post/%E6%95%B0%E6%8D%AE%E5%BA%93/CMU%E6%95%B0%E6%8D%AE%E5%BA%93%E7%B3%BB%E7%BB%9F%E4%BD%9C%E4%B8%9A%E8%87%AA%E6%9F%A5.html
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!