获取date等于某一天的第一个id sql_leetcode(sql)之经典困难题合集

1.市场分析---用户行为,定义第二次

Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2019-01-01 | Lenovo         |
| 2       | 2019-02-09 | Samsung        |
| 3       | 2019-01-19 | LG             |
| 4       | 2019-05-21 | HP             |
+---------+------------+----------------+

Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2019-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2019-08-04 | 1       | 4        | 2         |
| 5        | 2019-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+

Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+

写一个 SQL 查询确定每一个用户按日期顺序卖出的第二件商品的品牌是否是他们最喜爱的品牌。如果一个用户卖出少于两件商品，查询的结果是 no 。题目保证没有一个用户在一天中卖出超过一件商品

Result table:
+-----------+--------------------+
| seller_id | 2nd_item_fav_brand |
+-----------+--------------------+
| 1         | no                 |
| 2         | yes                |
| 3         | yes                |
| 4         | no                 |
+-----------+--------------------+

id 为 1 的用户的查询结果是 no，因为他什么也没有卖出
id为 2 和 3 的用户的查询结果是 yes，因为他们卖出的第二件商品的品牌是他们自己最喜爱的品牌
id为 4 的用户的查询结果是 no，因为他卖出的第二件商品的品牌不是他最喜爱的品牌

题目的难点在于:如何确定是第二次购买,而不是第三次、第四次。。。

Orders表作为一个中间表,其seller_id与Users表的user_id相连,item_id与Items表的item_id相连

解法一:oracle,用窗口函数获得,只能用case when,而不能用if

select u.user_id as "seller_id",
    (case when b.item_brand=u.favorite_brand then 'yes' else 'no' end)  "2nd_item_fav_brand"
from Users u
left join 
(select a.* from 
(select o.seller_id,i.item_brand,
row_number()over(partition by o.seller_id order by o.order_date)rk
from Orders o inner join Items i 
on o.item_id=i.item_id)a
where a.rk=2)b
on u.user_id=b.seller_id 
order by b.seller_id

解法二:sql,利用自连接来获取排名,但是这个方法很巧妙,和平时获取排名的情况不一样

select user_id seller_id, if(favorite_brand = item_brand, 'yes', 'no') 2nd_item_fav_brand
from users left join (
    select o1.seller_id, item_brand
    from orders o1 
    join orders o2
    on o1.seller_id = o2.seller_id
    join items i
    on o1.item_id = i.item_id
    group by o1.order_id
    having sum(o1.order_date > o2.order_date) = 1
) tmp
on user_id = seller_id

这里用到的是这个条件having sum(o1.order_date > o2.order_date) = 1,只要比当前的日期更早的订单数为1,那么就能保证是第二次卖出,而不是常规的做法count和min的结合或者是除了min以为的min.这里关于第二次卖出也可以这样写:

select a.seller_id, a.item_id
from orders a 
where (
    select count(*)
    from orders b 
    where a.seller_id = b.seller_id
    and a.order_date > b.order_date
) = 1

计数有一个

解法三:用变量

select u.user_id as seller_id, ifnull(t3.ind, "no") as 2nd_item_fav_brand
from Users as u
left join
(
    select t1.seller_id, if(@user = seller_id, @cnt := @cnt + 1, @cnt := 1) as tag, if(fb = ib, "yes", "no") as ind, (@user := seller_id) as uid
    from 
    (   
        select o.order_date, o.item_id, o.seller_id, u.favorite_brand as fb, i.item_brand as ib
        from Orders o, Users u, Items i
        where o.item_id = i.item_id and o.seller_id = u.user_id
        order by seller_id, order_date
    ) as t1, (select @cnt := 1, @user := 0) as t2
) as t3
on u.user_id = t3.seller_id and t3.tag = 2;

2.获取最近第二次的活动---第二次行为

UserActivity 表:
+------------+--------------+-------------+-------------+
| username   | activity     | startDate   | endDate     |
+------------+--------------+-------------+-------------+
| Alice      | Travel       | 2020-02-12  | 2020-02-20  |
| Alice      | Dancing      | 2020-02-21  | 2020-02-23  |
| Alice      | Travel       | 2020-02-24  | 2020-02-28  |
| Bob        | Travel       | 2020-02-11  | 2020-02-18  |
+------------+--------------+-------------+-------------+

写一条SQL查询展示每一位用户最近第二次的活动(如果用户仅有一次活动，返回该活动)

一个用户不能同时进行超过一项活动，以任意顺序返回结果

Result 表:
+------------+--------------+-------------+-------------+
| username   | activity     | startDate   | endDate     |
+------------+--------------+-------------+-------------+
| Alice      | Dancing      | 2020-02-21  | 2020-02-23  |
| Bob        | Travel       | 2020-02-11  | 2020-02-18  |
+------------+--------------+-------------+-------------+

Alice 最近第二次的活动是从 2020-02-24 到 2020-02-28 的旅行, 在此之前的 2020-02-21 到 2020-02-23 她进行了舞蹈
Bob 只有一条记录，我们就取这条记录

思路和上一道题一样,都是求第二次,不过不一样的是它还有一种情况那就是只有一次的话,返回该条记录

解法一:自连接,在having条件中表示两种情况,和上一题的解法二类似

select b.*  
from UserActivity a join UserActivity b
on a.username=b.username
group by b.username,b.startDate
having sum(a.startDate>b.startDate)=1 or count(1)=1

解法二:在where条件里列出两种情况,同时借助外面的表来完成,和上一题很类似

select a.*
from UserActivity a
where 
(select count(1)
from UserActivity b
where a.username=b.username
and (b.startDate>a.startDate))=1
or
(select count(1)
from UserActivity 
where a.username=username)=1

解法三:变量

select username, activity, startDate, endDate 
from
(select useractivity.*, @rk:=if(@name=username, @rk+1, 1)rk, @name:=username
from useractivity, (select @name:=null, @rk:=0) t1
order by username, startDate desc) t2
where username in 
(select username
from useractivity
group by username
having count(*) = 1) or t2.rk = 2

其实先做这道题比较好

3.用户购买平台---用户行为,出现原来没有的取值

Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1       | 2019-07-01 | mobile   | 100    |
| 1       | 2019-07-01 | desktop  | 100    |
| 2       | 2019-07-01 | mobile   | 100    |
| 2       | 2019-07-02 | mobile   | 100    |
| 3       | 2019-07-01 | desktop  | 100    |
| 3       | 2019-07-02 | desktop  | 100    |
+---------+------------+----------+--------+

写一段 SQL 来查找每天仅使用手机端用户、仅使用桌面端用户和同时使用桌面端和手机端的用户人数和总支出金额。

Result table:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop  | 100          | 1           |
| 2019-07-01 | mobile   | 100          | 1           |
| 2019-07-01 | both     | 200          | 1           |
| 2019-07-02 | desktop  | 100          | 1           |
| 2019-07-02 | mobile   | 100          | 1           |
| 2019-07-02 | both     | 0            | 0           |
+------------+----------+--------------+-------------+ 
在 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2 仅 使用了手机端购买，而用户3 仅 使用了桌面端购买。
在 2019-07-02, 用户2 仅 使用了手机端购买, 用户3 仅 使用了桌面端购买，且没有用户 同时 使用桌面端和手机端购买。

从结果集中可看出多了所谓的同时'both'情况,难点在于:如何加入一项之前没有的取值

添加表中不存在的列并设列的固定值为:select id,'value'extra from table ,value是添加的固定值,extra为列名.原本为desktop或者mobile的还是保持不变,而这多种情况需要用union来合并

select t2.spend_date, t2.platform, 
ifnull(sum(amount),0) total_amount, ifnull(count(user_id),0) total_users
from
(select distinct spend_date, "desktop" as platform from Spending
union
select distinct spend_date, "mobile" as platform from Spending
union
select distinct spend_date, "both" as platform from Spending 
) t2
left join
(select spend_date, sum(amount) amount, user_id, 
case when count(*) = 1 then platform else "both" end as platform
from Spending 
group by spend_date, user_id) t1
on t1.spend_date = t2.spend_date
and t1.platform = t2. platform
group by t2.spend_date, t2.platform

both也就是当以spend_date和user_id分组的时候,计数不为1的情况

4.报告系统状态的连续日期---连续区间的开始和结束

Failed table:
+-------------------+
| fail_date         |
+-------------------+
| 2018-12-28        |
| 2018-12-29        |
| 2019-01-04        |
| 2019-01-05        |
+-------------------+

Succeeded table:
+-------------------+
| success_date      |
+-------------------+
| 2018-12-30        |
| 2018-12-31        |
| 2019-01-01        |
| 2019-01-02        |
| 2019-01-03        |
| 2019-01-06        |
+-------------------+

系统每天运行一个任务。每个任务都独立于先前的任务。任务的状态可以是失败或是成功。

编写一个 SQL 查询2019-01-01到 2019-12-31期间任务连续同状态period_state的起止日期（start_date 和 end_date).即如果任务失败了，就是失败状态的起止日期，如果任务成功了，就是成功状态的起止日期。最后结果按照起始日期 start_date 排序

Result table:
+--------------+--------------+--------------+
| period_state | start_date   | end_date     |
+--------------+--------------+--------------+
| succeeded    | 2019-01-01   | 2019-01-03   |
| failed       | 2019-01-04   | 2019-01-05   |
| succeeded    | 2019-01-06   | 2019-01-06   |
+--------------+--------------+--------------+

结果忽略了 2018 年的记录，因为我们只关心从 2019-01-01 到 2019-12-31 的记录
从 2019-01-01 到 2019-01-03 所有任务成功，系统状态为 "succeeded"。
从 2019-01-04 到 2019-01-05 所有任务失败，系统状态为 "failed"。
从 2019-01-06 到 2019-01-06 所有任务成功，系统状态为 "succeeded"。

难点:如何记录每段时间对应的状态,怎样才能判断下一个时间的状态发生了变化

select period_state,
       min(task_date) as start_date,
       max(task_date) as end_date
from
(
    select period_state,task_date,
       case when datediff(task_date,@date) = 1 and @state = period_state then @num := @num
       else @num := @num + 1
       end num,
       @date := task_date,
       @state := period_state
       from
       (
           select 'succeeded' as period_state,success_date as task_date
           from Succeeded
           where success_date between '2019-01-01' and '2019-12-31'
           union all
           select 'failed' as period_state,fail_date as task_date
           from Failed
           where fail_date between '2019-01-01' and '2019-12-31'
           order by task_date
        ) t1,(select @date := null,@num := 0,@state := null) t2
)t3
group by period_state,num
order by start_date

state记录失败还是成功,然后随着时间的推移看状态有没有变化,如果没变,那么num还是那个标签,否则就会改变,那么就相应开启了下一个记录,变量都是和子查询来匹配的

5.学生地理信息报告---透视表

一所美国大学有来自亚洲、欧洲和美洲的学生，他们的地理信息存放在如下 student 表中。
| name   | continent |
|--------|-----------|
| Jack   | America   |
| Pascal | Europe    |
| Xi     | Asia      |
| Jane   | America   |

写一个查询语句实现对大洲（continent）列的透视表操作，使得每个学生按照姓名的字母顺序依次排列在对应的大洲下面。输出的标题应依次为美洲（America）、亚洲（Asia）和欧洲（Europe）。数据保证来自美洲的学生不少于来自亚洲或者欧洲的学生。

| America | Asia | Europe |
|---------|------|--------|
| Jack    | Xi   | Pascal |
| Jane    |      |        |

行转列,详见

Lotus：Leetcode经典SQL刷题笔记(六)-如何在SQL中实现数据透视表zhuanlan.zhihu.com

6.部门工资前三高的所有员工

Employee 表包含所有员工信息，每个员工有其对应的工号 Id，姓名 Name，工资 Salary 和部门编号 DepartmentId 。

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 85000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
| 7  | Will  | 70000  | 1            |
+----+-------+--------+--------------+

Department 表包含公司所有部门的信息。

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+

编写一个 SQL 查询，找出每个部门获得前三高工资的所有员工。

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Randy    | 85000  |
| IT         | Joe      | 85000  |
| IT         | Will     | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+
IT 部门中，Max 获得了最高的工资，Randy 和 Joe 都拿到了第二高的工资，Will 的工资排第三。
销售部门（Sales）只有两名员工，Henry 的工资最高，Sam 的工资排第二。

难点:这个前三高,并不是说人数,而是说工资,可能会出现并列的情况

解法一:用in

select d.Name as Department,e.Name as Employee,e.Salary as Salary
from Employee as e inner join Department as d 
on e.DepartmentId = d.Id
where e.Id in
(
    select e1.Id
    from Employee as e1 left join Employee as e2
    on e1.DepartmentId = e2.DepartmentId and e1.Salary < e2.Salary
    group by e1.Id
    having count(distinct e2.Salary) <= 2
)
order by d.Id asc,e.Salary desc

注意是count(distinct e2.Salary)一定要去重,e1.Salary < e2.Salary,如果是<=,那么就是<=3了

解法二:新建一张表,然后将筛选条件放在外面

SELECT d.Name Department, e.Name Employee, e.Salary
FROM Employee e, Department d,
 (SELECT e1.Id, COUNT(DISTINCT e2.Salary) AS rank
  FROM Employee e1, Employee e2
  WHERE e1.DepartmentId = e2.DepartmentId
    AND e1.Salary <= e2.Salary
  GROUP BY e1.Id) AS r
WHERE r.rank <= 3
  AND e.DepartmentId = d.Id
  AND r.Id = e.Id
ORDER BY Department, Salary DESC;

解法三:用变量

select t3.name department, t2.name employee, t2.salary
from 
(select departmentid, name, salary, 
@rk:=if(@preDepartmentId=departmentId,if(@preSalary=salary, @rk, @rk+1 ),1) rk,
@preDepartmentId :=departmentId, @preSalary:=salary
from 
employee, (select @preDepartmentId:=null, @preSalary:= null, @rk:=0) t1
order by departmentId, salary desc) t2, department t3
where
t2.departmentid = t3.id and 
rk <= 3

解法四用dense_rank

7.游戏玩法分析---次日留存率

Activity 表：
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-01 | 0            |
| 3         | 4         | 2016-07-03 | 5            |
+-----------+-----------+------------+--------------+

编写一个 SQL 查询，查询每个首次登录日期、当天登录的玩家数量和次日留存率.

Result 表：
+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2 | 0.50 |
| 2017-06-25 | 1 | 0.00 |
+------------+----------+----------------+
玩家1和3在2016-03-01登录了游戏,但只有玩家1在 2016-03-02重新登录,所以2016-03-01的第一天保留时间是 1/2=0.50
玩家2在2017-06-25 登录了游戏,但在2017-06-26没有重新登录,因此2017-06-25的第一天保留为0/1=0.00

相当于把前一道题按照时间维度拆开,看每个首次登录时间下的比例情况,在03-01上为1/2,06-25上为0/1,正好合起来的话为(1+0)/(2+1)=1/3

select a.first_login as install_dt,count(1) as installs,
round(ifnull(count(b.event_date)/count(1),0),2) as Day1_retention
from
(select player_id,min(event_date) as first_login
from Activity
group by player_id)a
left join Activity b
on a.player_id=b.player_id and datediff(b.event_date,a.first_login)=1
group by a.first_login

注意有几点要注意:

(1)left join而不是inner join,如果用后者,由于条件datediff的原因,只能显示出2016-03-01的1玩家,因为只有它在第二天也登录了,但是并不满足题意

(2)datediff(b.event_date,a.first_login)=1不能写在where语句中,原因同上.

其他的详见

红酥手：leetcode(sql)之游戏玩法zhuanlan.zhihu.com

8.按年度列出销售总额---日期截断

Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1          | LC Phone     |
| 2          | LC T-Shirt   |
| 3          | LC Keychain  |
+------------+--------------+

Sales table:
+------------+--------------+-------------+---------------------+
| product_id | period_start | period_end  | average_daily_sales |
+------------+--------------+-------------+---------------------+
| 1          | 2019-01-25   | 2019-02-28  | 100                 |
| 2          | 2018-12-01   | 2020-01-01  | 10                  |
| 3          | 2019-12-01   | 2020-01-31  | 1                   |
+------------+--------------+-------------+---------------------+

编写一段SQL查询每个产品每年的总销售额，并包含 product_id, product_name 以及 report_year 等信息。销售年份的日期介于 2018 年到 2020 年之间。你返回的结果需要按 product_id 和 report_year 排序

Result table:
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1          | LC Phone     |    2019     | 3500         |
| 2          | LC T-Shirt   |    2018     | 310          |
| 2          | LC T-Shirt   |    2019     | 3650         |
| 2          | LC T-Shirt   |    2020     | 10           |
| 3          | LC Keychain  |    2019     | 31           |
| 3          | LC Keychain  |    2020     | 31           |
+------------+--------------+-------------+--------------+
LC Phone 在 2019-01-25 至 2019-02-28 期间销售，该产品销售时间总计35天。销售总额 35*100 = 3500。
LC T-shirt 在 2018-12-01 至 2020-01-01 期间销售，该产品在2018年、2019年、2020年的销售时间分别是31天、365天、1天，2018年、2019年、2020年的销售总额分别是31*10=310、365*10=3650、1*10=10。
LC Keychain 在 2019-12-01 至 2020-01-31 期间销售，该产品在2019年、2020年的销售时间分别是：31天、31天，2019年、2020年的销售总额分别是31*1=31、31*1=31。

难点:如何对日期作相应的截断

解法一:巧用if和datediff作截断

select t.product_id,product_name,report_year,sum(total_amount)total_amount from (
select product_id,"2020" report_year,(datediff(if(period_end<"2021-01-01",period_end,date("2020-12-31")),if(period_start>"2020-01-01",period_start,date("2020-01-01")))+1)*average_daily_sales total_amount from Sales having total_amount>0 
union all
select product_id,"2019" report_year,(datediff(if(period_end<"2020-01-01",period_end,date("2019-12-31")),if(period_start>"2019-01-01",period_start,date("2019-01-01")))+1)*average_daily_sales total_amount from Sales having total_amount>0  
union all
select product_id,"2018" report_year,(datediff(if(period_end<"2019-01-01",period_end,date("2018-12-31")),if(period_start>"2018-01-01",period_start,date("2018-01-01")))+1)*average_daily_sales total_amount from Sales having total_amount>0  
)t left join product p on p.product_id=t.product_id                               
group by product_id,report_year order by product_id,report_year

解法二

select 
    s.PRODUCT_ID, PRODUCT_NAME, date_format(bound, '%Y') REPORT_YEAR,
    (datediff(
        if(bound < period_end, bound, period_end), 
        if(makedate(year(bound), 1) > period_start, makedate(year(bound), 1), period_start)
    ) + 1) * average_daily_sales TOTAL_AMOUNT
from product p join (
    select '2018-12-31' bound
    union all
    select '2019-12-31' bound
    union all
    select '2020-12-31' bound
) bounds join sales s
on 
    p.product_id = s.product_id 
    and year(bound) between year(period_start) and year(period_end)
order by s.product_id, report_year