The closer you look at something, the more interesting it gets. That’s probably why I love it when I’m able to pore through large amounts of data related to a topic that interests me. You keep twisting and turning it enough, and you’ll eventually start to uncover fascinating trends and insights that weren’t apparent on the surface.

您越看越仔细,就越有趣。 当我能够浏览与我感兴趣的主题相关的大量数据时,这可能就是我喜欢它的原因。 您不断地扭转和旋转它,最终您将开始发现表面上看不见的引人入胜的趋势和见解。

Hence, why I’m a big fan of BigQuery.

因此,为什么我是BigQuery的忠实拥护者

You can use it for your own private data, and I’ve worked at or with organizations that did, but there is also a wealth of public datasets available for you to dig into. I spend 90% of my time querying either the HTTP Archive or the Chrome User Experience Report (CrUX) data. Typically, I’m looking for performance trends, doing some level of competitive comparison, or digging through data for a company that doesn’t yet have a proper real-user monitoring (RUM) solution in place. But I’ve also dug into data from Libraries.io, GitHub, Stack Overflow, and World Bank, for example. There are more than enough public datasets to keep any curious individual busy for quite a while.

您可以将其用于自己的私有数据,我曾经在或与之合作过,但也有大量公共数据集可供您研究。 我花90%的时间查询HTTP存档Chrome用户体验报告 ( CrUX )数据。 通常,我正在寻找绩效趋势,进行某种程度的竞争比较,或者为尚未安装适当的真实用户监控( RUM )解决方案的公司挖掘数据。 但是我也从例如Libraries.ioGitHubStack OverflowWorld Bank挖掘了数据。 有足够多的公共数据集,可以让任何好奇的人忙一段时间。

One of the great things about the BigQuery platform is how well it handles massive datasets and queries that can be incredibly computationally intensive. As someone who hasn’t had to write SQL as part of my job for at least 10 years, maybe longer, that raw power comes in handy as it makes up for my lack of efficient queries.

BigQuery平台的一大优点是其处理海量数据集和查询的能力非常之高。 作为至少10年来不需要写SQL作为我的工作一部分的人,也许更长一些,这种原始能力会派上用场,因为它弥补了我缺乏有效查询的不足。

One thing that it doesn’t hide, though, is the cost. If you’re constantly querying BigQuery, you can end up with a pretty hefty bill fairly quickly.

它不会隐藏的一件事是成本。 如果您一直在查询BigQuery,那么很快就会结帐。

Jeremy Wagner mentioned being concerned about this, and it’s something I was worried about when I first started playing around with it as well.

杰里米·瓦格纳(Jeremy Wagner)提到了对此的担心,这也是我刚开始玩游戏时担心的事情。

I’m no expert, but I do have a handful of tips for folks who maybe want to start digging into these datasets on their own but are wary of wracking up a big bill.

我不是专家,但是对于那些可能想自己开始研究这些数据集但对产生巨额损失警惕的人们来说,我确实有一些技巧。

不要设置帐单 (Don’t set up billing)

If you’re just starting fresh, don’t even bother to set up billing yet. The free tier provides you with 1TB of query data each month. While it’s easier to burn through that than you might think, it’s also not a trivial amount.

如果您只是刚开始,甚至不必费心设置帐单。 免费套餐每月为您提供1TB的查询数据。 尽管比您想象的要容易得多,但这也不是小数目。

If you stick with the free tier, when you exhaust your limits, BigQuery will fail to execute your next query and tell you you need to setup billing instead. It’s a safe way to play around, knowing that you aren’t going to be charged unless you explicitly decide to level up.

如果您坚持使用免费套餐,那么当您用尽限额时,BigQuery将无法执行下一个查询,并告知您需要设置结算。 这是一种安全的方法,除非您明确决定升级,否则您将不会被收费。

设定预算 (Set a budget)

If you have moved beyond the free tier and your payment information is already set, then the next best thing you can do is use the different budgeting features BigQuery provides.

如果您已经超越了免费套餐,并且已经设置了付款信息,那么接下来要做的另一件事就是使用BigQuery提供的不同预算功能。

For each individual query, you can set a “maximum bytes billed” limit. If your query is going to exceed that limit, the query won’t run, and you won’t be charged. Instead, you’ll be told your limit is going to be exceeded. To run it successfully, you’d have to first up the budget or remove it entirely.

对于每个单独的查询,您可以设置“最大计费字节数”限制。 如果您的查询将超过该限制,该查询将不会运行,并且不会向您收费。 相反,您会被告知将要超过您的限制。 要成功运行它,您必须先增加预算或将其完全删除。

使用BigQuery无需花钱

With a maximum bytes billed limit set on a query, the query will fail without charge if it will exceed that data limit.

在查询上设置了最大字节计费限制的情况下,如果查询超过该数据限制,查询将免费失败。

You can also set a budget for the month as a whole. You can then set a few thresholds (BigQuery will default to 50%, 90%, and 100%), each of which will trigger an alert (like an email) warning you that they’ve been reached.

您还可以设置整个月的预算。 然后,您可以设置一些阈值(BigQuery默认为50%,90%和100%),每个阈值都会触发警报(例如电子邮件),警告您已达到阈值。

So, let’s say you set a monthly budget of $20. With alerts in place, you would be emailed as soon as you hit $10, again when you hit $18, and the once more when you hit your $20 budget. With these in place, you can rest easy knowing you aren’t going to be surprised with an obnoxiously high bill.

因此,假设您将每月预算设置为20美元。 有了警报后,一旦您达到10美元,系统就会向您发送电子邮件,达到18美元时,您会再次收到电子邮件,达到20美元预算时,系统会再次向您发送电子邮件。 有了这些功能后,您就可以放心了,知道您不会因令人讨厌的高额账单感到惊讶。

使用BigQuery无需花钱

BigQuery lets you set a monthly budget, with different thresholds so you can be alerted as you get closer to using your budget.

BigQuery可让您设置具有不同阈值的每月预算,以便在您接近使用预算时收到提醒。

使用BigQuery Mate进行查询费用估算 (Use BigQuery Mate for Query Cost Estimates)

使用BigQuery无需花钱

If you use Chrome, you can use BigQuery Mate to keep you informed of the anticipated cost of a query before you ever run it. BigQuery already tells you how much data you’re going to use in a given query. This extension adds the cost as well (something BigQuery should probably just do by default).

如果您使用的是Chrome,则可以使用BigQuery Mate使您在运行查询之前了解预期的查询费用。 BigQuery已经告诉您要在给定查询中使用多少数据。 此扩展也增加了成本(默认情况下,BigQuery可能应该这样做)。

If you don’t use Chrome or don’t want to install the extension, you can also use Google’s cost calculator. It works, but it’s certainly a more manual and clunky process.

如果您不使用Chrome或不想安装扩展程序,也可以使用Google的费用计算器。 它可以工作,但是肯定是一个更加手动和笨拙的过程。

如果可能,请先对较小的表进行测试 (Test against smaller tables first, if possible)

Some datasets have numerous tables that represent the data, just sliced differently.

一些数据集具有大量代表数据的表,只是切片不同。

For example, the CrUX data is contained in one massive table, as well as broken up into smaller tables for each country’s traffic. The structure, however, is identical.

例如, CrUX数据包含在一个庞大的表格中,并且针对每个国家/地区的流量分为多个较小的表格。 但是,结构是相同的。

When I’m writing a new query against CrUX data, and I know it’s gonna take some tweaking to get it right, I’ll pick a country table to query against instead. That way I’m using less data on all my experiments. When I’ve got the query returning the data I’m after in the format I want, that’s when I’ll go back to the main table to query the aggregate data.

当我针对CrUX数据编写新查询时,我知道要进行一些调整才能使其正确无误 ,我将选择一个国家/地区表来进行查询。 这样,我在所有实验中使用的数据就更少了。 当我以所需的格式返回要查询的数据时,这时我将返回主表以查询聚合数据。

如果没有较小的桌子,把它们做成 (If there aren’t smaller tables, make them)

For other datasets, the smaller tables don’t exist, but you can make your own.

对于其他数据集,较小的表不存在,但是您可以创建自己的表。

For example, I was recently querying HTTP Archive data to find connections between JavaScript framework usage and performance metrics. Instead of running my queries against the main tables over and over, I ran a query to find all the URL’s that were using one of the frameworks I wasn interested in. Then I grabbed all the data for those URL’s and dumped it into a separate table.

例如,我最近正在查询HTTP存档数据,以查找JavaScript框架使用情况和性能指标之间的联系。 而不是一遍又一遍地对主表运行查询,我运行了一个查询来查找所有使用了我不感兴趣的框架的URL。然后,我获取了这些URL的所有数据并将其转储到一个单独的表中。

From there, every query could be run against this table containing only the data that was relevant for what I was investigating. The impact was huge. One query which would have gone through 9.2GB of data had I queried CrUX directly instead ended up using only 826MB of data when I queried the subset I created.

从那里,每个查询都可以针对该表运行,该表仅包含与我正在调查的数据相关的数据。 影响是巨大的。 如果我直接查询CrUX,则一次查询本来需要9.2GB的数据,而查询我创建的子集时却只使用了826MB的数据。

我相信还有很多 (Plenty more, I’m sure)

This is far from an exhaustive list of tips or advice, and I’m certain someone who spends more time than I do in BigQuery (or who actually knows what they’re doing in SQL) would have plenty more to add, but these have all been enough to make me really comfortable hopping into BigQuery whenever I think there might be something interesting to pull out.

这远不是详尽的提示或建议列表,而且我敢肯定,比我在BigQuery中花费更多时间的人(或者实际上知道他们在SQL中所做的事情)会添加很多东西,但是这些人所有这些足以使我真正感到很自在,只要我认为有必要提出一些有趣的建议,就可以跳入BigQuery。

翻译自: https://timkadlec.com/remembers/2019-12-10-using-bigquery-without-breaking-the-bank/

相关文章:

  • 2021-06-16
  • 2022-02-07
  • 2022-02-16
  • 2021-05-28
  • 2021-05-31
  • 2021-11-17
猜你喜欢
  • 2022-12-23
  • 2022-02-06
  • 2021-11-02
  • 2021-09-16
  • 2021-06-11
  • 2021-08-23
  • 2021-10-28
相关资源
相似解决方案