In 2002, President Bush signed into law No Child Left Behind (NCLB) which was an education policy stating that all schools receiving public funding must administer an annual standardized assessment to their students. One of the stipulations of the law required that schools make adequate yearly progress (AYP) on standardized assessments year over year (i.e., third grade students taking an assessment in the current year would have had to perform better than third grade students in the previous year’s cohort). If schools were continuously unable to meet AYP requirements, there were drastic consequences including school restructuring and school closure. As such, many district administrators developed internal policies requiring that teachers increase their students’ test scores, using these scores as a metric for teacher quality. Eventually, with their jobs on the line, teachers began to “teach to the test.” In fact, a policy of this sort inadvertently incentivized cheating so that teachers and whole school systems could maintain necessary funding. One of the most prominent cases of alleged cheating was the Atlanta Public Schools cheating scandal.

2002年,布什总统签署了不让任何孩子落后》 (NCLB)法律,这是一项教育政策,规定所有接受公共资助的学校都必须对其学生进行年度标准化评估。 该法律的一项规定要求学校每年在标准化评估方面取得足够的年度进步(AYP)(即,在本年度接受评估的三年级学生的表现必须比上一年的三年级学生更好)。队列)。 如果学校持续无法满足AYP要求,则会产生严重后果,包括学校改组和停课。 因此,许多学区行政管理人员制定了内部政策,要求教师提高学生的考试成绩,并使用这些成绩作为衡量老师素质的指标。 最终,随着工作的进行,老师们开始“教考试” 。 实际上,这种政策无意中激励了作弊行为,因此教师和整个学校系统都可以维持必要的资金。 亚特兰大公立学校作弊丑闻是涉嫌作弊最突出的案例之一。

Unintended consequences of this sort are actually very common. Charles Goodhart, a British economist once said “When a measure becomes a target, it ceases to be a good measure.” This statement, known as Goodhart’s Law can actually be applied to a number of real-world scenarios beyond just social policies and economics.

这种意外后果实际上非常普遍。 英国经济学家查尔斯·古德哈特(Charles Goodhart)曾经说过:“当一项措施成为目标时,它就不再是一项好措施。” 这种被称为古德哈特定律的说法实际上可以应用于除社会政策和经济学之外的许多现实情况。

Another commonly cited example is that of a call center manager setting a target to increase the number of calls taken at the center each day. Eventually, call center employees increase their numbers at the cost of actual customer satisfaction. In observing employees’ conversations, the manager notices that some employees are rushing to end the call without ensuring that the customer is fully satisfied. This example, as well as the accountability measures of No Child Left Behind, stresses one of the most important elements of Goodhart’s Law — targets can and will be gamed.

另一个经常被引用的例子是呼叫中心经理设定一个目标,以增加每天在该中心接听的电话数量。 最终,呼叫中心员工以实际的客户满意度为代价增加了他们的人数。 在观察员工的对话时,经理注意到一些员工在不确保客户完全满意的情况下急于结束通话。 这个例子以及“不让任何一个孩子掉队”的问责措施强调了《古德哈特定律》最重要的要素之一-目标可以而且将会被游戏化。

商har法对数据科学的影响
Unsplash Unsplash》

The threat of gaming is much greater when considering how AI and machine-learning models may be susceptible to gaming and/or intrusion. A 2019 analysis of 84,695 videos from YouTube found that a video by Russia Today, a state-owned media outlet had been recommended by over 200 channels, far exceeding the number of recommendations that other videos on Youtube get, on average. The findings from the analysis were suggestive that Russia, in some way, gamed YouTube’s algorithm to propagate false information on the internet. The problem is further exacerbated by the platform’s reliance on viewership as a metric for user satisfaction. This created the unintended consequence of incentivizing conspiracy theories about the unreliability and dishonesty of major media institutions so that users would continue to source their information from YouTube.

考虑到AI和机器学习模型如何容易受到游戏和/或入侵的影响,游戏的威胁要大得多。 2019年对YouTube上的84,695部视频的分析发现,``俄罗斯今日'' (一家国有媒体)的视频已被200多个频道推荐,远远超过了YouTube上其他视频的平均推荐数量。 分析结果表明,俄罗斯在某种程度上利用YouTube的算法在互联网上传播虚假信息。 该平台对收视率作为用户满意度指标的依赖进一步加剧了该问题。 这引起了对主要媒体机构的不可靠性和不诚实行为的阴谋论激励,从而使用户继续从YouTube上获取信息。

“The question before us is the ethics of leading people down hateful rabbit holes full of misinformation and lies at scale just because it works to increase the time people spend on the site — and it does work” — Zeynep Tufekci

“摆在我们面前的问题是,引导人们走下充满错误信息的可恶兔子洞的道德观念,而且它之所以存在,只是因为它可以增加人们在网站上的停留时间,而且确实起作用。”- Zeynep Tufekci

那该怎么办呢? (So what can be done?)

In this vein, it’s important to think critically about how to effectively measure and achieve desired outcomes in a way that minimizes unintended consequences. A large part of this is not relying too heavily on a single metric. Rather, understanding how a combination of variables can influence a target variable or outcome could help to better contextualize data. Chris Wiggins, Chief Data Scientist at the New York Times, provides four useful steps for creating ethical computer algorithms to avoid harmful outcomes:

从这个角度出发,重要的是认真思考如何有效地衡量和实现期望的结果,从而最大程度地减少意外后果。 其中很大一部分不是过于依赖单个指标。 相反,了解变量组合如何影响目标变量或结果可以帮助更好地关联数据。 《纽约时报》首席数据科学家克里斯·威金斯( Chris Wiggins)提供了四个有用的步骤来创建合乎道德的计算机算法,从而避免有害的结果:

1. Start by defining your principles. I’d suggest [five in particular], which are informed by the collective research of the authors of the Belmont and Menlo reports on ethics in research, augmented by a concern for safety of the users of a product. The choice is important, as is the choice to define, in advance, the principles which guide your company, from the high level corporate goals to the individual product key performance indicators (KPIs) [or metrics].

1.首先定义您的原则。 我建议[尤其是五个],这是根据BelmontMenlo关于研究伦理的报告的作者的集体研究得出的,并增加了对产品使用者安全性的关注。 选择非常重要,因为选择要预先定义指导您的公司的原则,从高层公司目标到单个产品关键绩效指标(KPI)[或指标]。

2. Next: before optimizing a KPI, consider how this KPI would or would not align with your principles. Now document that and communicate, at least internally if not externally to users or simply online.

2.接下来:优化KPI之前,请考虑该KPI如何与您的原则保持一致。 现在记录下来并进行沟通,至少在内部(如果不是在外部)与用户交流或只是在线交流。

3. Next: monitor user experience, both quantitatively and qualitatively. Consider what unexpected user experiences you observe, and how, irrespective of whether your KPIs are improving, your principles are challenged.

3.下一步:定量定性监视用户体验。 考虑一下您观察到的意外用户体验,以及无论您的KPI是否在提高,您的原则都将受到挑战。

4. Repeat: these conflicts are opportunities to learn and grow as a company: how do we re-think our KPIs to align with our objectives and key results (OKRs), which should derive from our principles? If you find yourself saying that one of your metrics is the “de facto” goal, you’re doing it wrong.

4.重复:这些冲突是一个学习和成长为公司的机会:我们如何重新思考我们的KPI,以使其与我们的原则和目标和关键成果(OKR)保持一致? 如果您发现自己的指标之一是“事实上”的目标,那说明您做错了。

翻译自: https://towardsdatascience.com/on-the-implications-of-goodharts-law-for-data-science-8f4c5cd81d2e

相关文章:

  • 2021-10-16
  • 2021-06-12
  • 2022-12-23
  • 2022-12-23
  • 2021-09-29
  • 2021-11-25
  • 2021-11-06
  • 2021-07-23
猜你喜欢
  • 2022-12-23
  • 2021-07-28
  • 2022-12-23
  • 2021-09-14
  • 2021-08-12
  • 2021-08-10
  • 2021-09-17
相关资源
相似解决方案