SQL Homework #2

더기덕·2022년 10월 17일
0

1-d / 1-e (answer) / 1-f / 1-g
3-a / 3-e / 3-f

Part1

(a)

SELECT TS.trip_id, TS.user_id, CEIL(TIMESTAMPDIFF(SECOND,TS.time,TE.time)/60) AS trip_length
FROM trip_start TS 
LEFT OUTER JOIN trip_end TE
ON TS.trip_id = TE.trip_id
ORDER BY TS.trip_id ASC;

(b)

SELECT TS.trip_id, TS.user_id, (CASE WHEN TE.time IS NULL THEN 500 ELSE 1+CEIL(TIMESTAMPDIFF(SECOND,TS.time,TE.time)/60)*0.15 END) AS trip_charge
FROM trip_start TS 
LEFT OUTER JOIN trip_end TE
ON TS.trip_id = TE.trip_id
ORDER BY TS.trip_id ASC;

(c)

SELECT trip_id, user_id, (CASE WHEN trip_length IS NULL THEN 500 ELSE 1+trip_length*0.15 END) AS trip_charge
FROM 
(SELECT TS.trip_id, TS.user_id, CEIL(TIMESTAMPDIFF(SECOND,TS.time,TE.time)/60) AS trip_length
FROM trip_start TS 
LEFT OUTER JOIN trip_end TE
ON TS.trip_id = TE.trip_id
ORDER BY TS.trip_id ASC) AS trip_time ;

(d)

By using the subquery, we could only refer to the column name when using the case function.
Therefore the legibility of the SQL query is improved than putting all the nested formula inside the CASE clause.
This is especially important in corporate environments where many people have to collaborate over a query or a code.

(e)

# 1514.05
SELECT TS.user_id, SUM((CASE WHEN TE.time IS NULL THEN 500 ELSE 1+CEIL(TIMESTAMPDIFF(SECOND,TS.time,TE.time)/60)*0.15 END)) AS monthly_total
FROM trip_start TS 
LEFT OUTER JOIN trip_end TE
ON TS.trip_id = TE.trip_id
WHERE DATE_FORMAT(TS.time,'%Y-%m') = '2018-03'
GROUP BY TS.user_id
ORDER BY TS.user_id ASC;

(f)

1) When using a condition in the ON clause, it means that the database is going to join the columns which meets the conditions mentioned in the condition.
This means that the DB will try to select all the rows in the left table and carry out LEFT OUTER JOIN on the rows which has March for its date. This results in leaving out the columns which belongs to other dates than March in the results.
2) You can use the conditions in the ON clause when using INNER JOINs but for other joins, using the 'ON' clause for conditions might lead to unanticipated consequences.

(g)

We need to use self-join and non-equi join.

Let's say there are three columns named 'trip_id' / 'trip_time' (when the trip end or start happened) / 'trip_type' (ENUM values - 'trip_start' ,'trip_end') in table named 'trips'

We would use queries like this :

SELECT *

FROM trips T1

INNER JOIN trips T2 (self-join)

ON T1.trip_id = T2.trip_id

AND T1.trip_type = T2.trip_type (non-equi join)

Part 2

(a)

SELECT COUNT(*) AS total_flight_num
FROM flights AS F
INNER JOIN aircraft AS A
ON F.tail = A.tail
WHERE A.type = 'B73G' 
OR A.type = 'B738'

(b)

SELECT flightnum 
FROM flights AS F 
WHERE F.tail NOT IN (SELECT AA.tail FROM airtran_aircraft AA)

(c)

SELECT F1.tail, F1.from AS origin, F2.to AS final_destination
FROM flights AS F1
INNER JOIN flights AS F2 
ON F1.tail = F2.tail 
AND flights1.to = flights2.from
AND F1.departure < F2.departure

Part3

(a)

1) Assumption : highway and area columns are columns that are like ENUM columns that have set of fixed values.
However, there's one row in highway = 'SR20' and area = '1N THE NORTHERN CALFIORNIA'
-there's a typo. 1 is put in instead of I.
2) The composite key contains 4 columns, and this might not be a good practice as the datset is more prone to duplicate columns. However strictly speaking, this is not a major irregularity.

(b)

SELECT DISTINCT highway, area
FROM caltrans CT
WHERE (status LIKE '%SNOW%' OR '%WINTER%')
AND status LIKE '%CLOSED%'
ORDER BY highway DESC, area DESC;

(c)

# Assumption here: When calculating the percentage, the denominator for each highway and area is the number is the total number of days when there was a reporting. 
SELECT close_yn.highway, close_yn.area, SUM(CASE WHEN close_yn.close_num>=1 THEN 1 ELSE 0 END)/COUNT(*)*100 AS percentage
FROM 
(SELECT highway, area, DATE_FORMAT(reported, '%Y-%m-%d') AS day, SUM(CASE WHEN status LIKE '%CLOSED%' THEN 1 ELSE 0 END) AS close_num
FROM caltrans
GROUP BY highway, area, day) close_yn
WHERE CONCAT(highway,area) IN (SELECT DISTINCT CONCAT(highway, area)
FROM caltrans
WHERE status LIKE '%SNOW%'
AND status LIKE '%CLOSED%'
ORDER BY highway DESC, area DESC)
GROUP BY close_yn.highway, close_yn.area;

(d)

SELECT CASE WHEN reported  <= '2017-03-19' THEN 'winter'
WHEN  reported <= '2017-06-19' THEN 'spring' 
WHEN  reported <= '2017-09-21' THEN 'summer' 
WHEN  reported <= '2017-12-20' THEN 'fall' 
ELSE 'winter' 
END AS season, CT.highway, CT.area, COUNT(DISTINCT DATE_FORMAT(reported, '%Y-%m-%d')) AS days_without_restriction 
FROM caltrans CT 
WHERE status LIKE 'NO TRAFFIC RESTRICTIONS ARE REPORTED FOR THIS AREA.'
GROUP BY season, CT.highway, CT.area;

(e)

1) Text in Status column after FROM or TO is the more specific location where the reporting was done.
2) However aforementioned patterns do not apply to certain columns such as 'NO TRAFFIC RESTRICTIONS ARE REPORTED FOR THIS AREA.'.

(f)

SELECT REGEXP_REPLACE(REGEXP_REPLACE(status,'(?:IN THE)',''), '(?:&)',',')
FROM caltrans 

0개의 댓글