客户多于平均水平的城市。子查询答案

【问题标题】：Cities with more customer than average. Subquery客户多于平均水平的城市。子查询
【发布时间】：2020-12-05 12:49:10
【问题描述】：

我有三张桌子

country_table: id int, country_name string
city_table: id int, city_name string, postal_code int, country_id int
customer_table: id int, customer_name string, city_id id, customer_address string

我正在寻找一个答案，它将返回所有城市的客户数量超过所有城市的平均客户数量。对于每个这样的城市，返回国家名称、城市名称、客户数量。

输出应该是

country_name, city_name, count

我尝试使用子查询但出现错误

Select country_name, city_name, count(customer_name)
from country
inner join city on city.country_id = country.id
inner join customer on customer.city_id = city.id
where customer_name > (select avg(customer_name) 
                       from customer 
                       inner join customer on customer.city_id = city.id group by id)
                       group by 1, 2

非常感谢任何帮助

【问题讨论】：

您使用的是哪个 DBMS？你已经标记了很多。
请只标记一个 RDBMS。
我正在使用 MySql

标签： mysql sql subquery inner-join having-clause

【解决方案1】：

Select country.country_name, city.city_name, count(customer.customer_name)
from country
inner join city on city.country_id = country.id
inner join customer on customer.city_id = city.id
group by country.country_name, city.city_name
having count(customer.customer_name) > 
(
  Select count(customer.customer_name) / count(city.city_name) as avg_per_city
  from city
  inner join customer on customer.city_id = city.id
)

子查询可以更短：

Select count(*) / count(distinct city_id) from customer

【讨论】：

COUNT() 不应用于可 NULL 列（当然除非有意），因为当值为 NULL 时，该记录不计算在内。 OP 没有提供 Customer_name 或 City_name 是否可以为 NULL。我同意这不太可能。但作为一般的最佳实践，如果目的是统计记录数，最好使用count(*)，以免陷入这种错误。详情：dev.mysql.com/doc/refman/5.7/en/…
我很清楚。这在我的查询中具体是什么问题？
您的查询使用的是count(specific column)，而不是count(*)。如果count 使用的列是NULL-able & 里面有实际的NULLs，那么count 将不会产生预期的结果，即在这种情况下返回记录数。我们不知道这个应用程序是关于什么的，所以这是一种可能性，因为 OP 没有指定这些列是否为 NOT NULL。它实际上是为阅读它的人准备的，而不是批评您的查询。
在我的这个查询中 - 例如，如果 null 是 city_name，为什么应该将其计为 1？在此查询中不计入null 是绝对正确的！
因为有客户分配到那个城市。您最终可以通过count(customer.customer_name) 计算该NULL 城市中的客户，但count(city.city_name) 不会计算该城市，因为它是NULL。这会扭曲平均值。

【解决方案2】：

您的查询的连接逻辑看起来不错，但需要修复子查询。我会这样写：

Select co.country_name, ci.city_name, count(*) no_customers
from city ci
inner join country co on co.id = ci. country_id
inner join customer cu on cu.city_id = ci.id
group by co.id, co.country_name, ci.id, ci.city_name
having count(*) > (
    select count(*) / count(distinct cu1.id) from customer cu1
)

注意事项：

子查询中不需要带city表；您只能从表customer 中获取您想要的信息
我在group by 子句中添加了主键列，以便处理可能具有相同名称的城市/国家（在现实生活中确实会发生）
表别名使查询更易于编写和阅读

【讨论】：

【解决方案3】：

with table1 as
  (
    select country_id, city_name, count(city_name) no_of_customers
    from city inner join customer
    on city.id = customer.city_id
    group by city_name, country_id
  )
select country_name, city_name, no_of_customers
from table1 inner join country
on table1.country_id = country.id
where no_of_customers >
(
    select count(*)/count(distinct city_id) from customer
)
order by country_name;

我没有一次性加入所有三个表，而是使用 CTE 加入 city 和 customer 表并获取相关信息，即 country_id、city_name 和 no_of_customers。
之后，剩下要做的就是从每个对应的 country_id 中获取 country_name 并选择 no_of_cutomers 大于平均值的记录。
为了实现这一点，我将 CTE 与 country 表结合起来以获取 country_name，然后应用 where 子句过滤所需的结果。

【讨论】：

为什么这样做“而不是一次性使用所有 3 个表”是一种优化？
这不是优化，它只是解决这个问题的另一种方式。

【解决方案4】：

我认为我们可以通过更简单的查询得到相同的答案，而无需使用任何连接

SELECT country.country_name, city.city_name, count(*) 
FROM country, city, customer 
where customer.city_id = city.id and city.country_id = country.id
group by customer.city_id
having count(*) > (select count(*) / count(distinct customer.city_id) from customer);

如有遗漏请告诉我

【讨论】：

不使用任何连接 - 你认为FROM a, b WHERE a.id = b.id 是什么，除了“30 年前我们在更好的语法出现之前如何进行连接”。

【解决方案5】：

SELECT co.country_name, ci.city_name, count(*) no_customers
FROM city ci
inner join country co on co.id = ci. country_id
inner join customer cu on cu.city_id = ci.id
group by co.id, co.country_name, ci.id, ci.city_name
having count() > (select count() / count(distinct cu1.city_id) from customer cu1

平均计算有点不正确，因为问题是按城市要求平均客户

【讨论】：