Week9_1Anomaly Detection

第 1 题

For which of the following problems would anomaly detection be a suitable algorithm?

  • Given an image of a face, determine whether or not it is the face of a particular famous individual.
  • Given a dataset of credit card transactions, identify unusual transactions to flag them as possibly fraudulent.
  • Given data from credit card transactions, classify each transaction according to type of purchase
    (for example: food, transportation, clothing).
  • From a large set of primary care patient records, identify individuals who might have unusual health conditions.

*     答案: 2 4 *
*   说明: anomaly detection 是通过高斯概率去判断的, 其中三西格玛准则:3σ 0.9974,超过3σ认为是不正常. *
*   选项1: 通过人脸的照片,去判断这个脸是不是famous. 要判断必须得有所有的famous的人,不在这些famous的人里面就是非famous的人,即famous的人符合二项分布(只有0和1),不符合高斯分布. 不正确 *
*   选项2: 给出刷卡记录来判断哪些是不正常的交易. 刷卡记录符合正态分布. 正确 *
*   选项3: 给出刷卡记录来区分哪些交易用于食物 交通 及衣服.不能用于多分类. 不正确 *
*   选项4: 给出病人的信息,区分哪些人有不寻常的病. 正常的病的概率+不正常的病的概率=100%, 并且正常与不正常的病不是事先知道,是统计完定个标准,例如认为在3σ内是正常的,在3σ外是不正常的. 正确 *


第 2 题

Suppose you have trained an anomaly detection system that flags anomalies when p(x) is less than ε, and you find on the cross-validation set that it has too many false positives (flagging too many things as anomalies). What should you do?

  • Increase ε
  • Decrease ε

*     答案: 2 *
Week9_1Anomaly Detection
* 如上图所示, 阴影部分是正常的, 非阴影部分是不正常的,现在不正常的太多了,要增大阴影部分的面积怎么办? 不就是向左称动ε嘛,即减小ε *


第 3 题

Suppose you are developing an anomaly detection system to catch manufacturing defects in airplane engines. You model uses
p(x)=j=1np(xj;μj,σj2).
You have two features x1 = vibration intensity, and x2 = heat generated.
Both x1 and x2 take on values between 0 and 1 (and are strictly greater than 0),
and for most “normal” engines you expect that x1x2. One of the suspected anomalies is that a flawed
engine may vibrate very intensely even without generating much heat (large x1, small x2),
even though the particular values of x1 and x2 may not fall outside their typical ranges of values.
What additional feature x3 should you create to capture these types of anomalies:

  • x3=x12×x2
  • x3=x1x2
  • x3=x1+x2
  • x3=x1×x2

*     答案: 2 *
**    用正态分布去判断一个飞机引擎是正常的还是不正常的,现在选取了两个特征:振动强度
和产生的发热量,这两个特征的概率都在0-1之间,并且没有0值;对于正常的引擎来说,振动强度与产生的发热量是正相关的;但对于不正常的引擎振动强度很大但是发热量越很小,虽然不正常但是引擎振动强度也没有超出正常范围
**
*    关键点是: 正常时震动小发热量小;异常时震动小发热量大,所以出异常时这个比例值将会非常大,所以选2 *
*    关键点是: 不正常状况下:引擎振动强度也没有超出正常范围,所以用加乘都看不出异常来. *


第 4 题

Which of the following are true? Check all that apply.

  • If you do not have any labeled data (or if all your data has label y=0), then is is still possible to learn p(x), but it may be harder to evaluate the system or choose a good value of ϵ.
  • If you have a large labeled training set with many positive examples and many negative examples, the anomaly detection algorithm will likely perform just as well as a supervised learning algorithm such as an SVM.
  • If you are developing an anomaly detection system, there is no way to make use of labeled data to improve your system.
  • When choosing features for an anomaly detection system, it is a good idea to look for features that take on unusually large or small values for (mainly the) anomalous examples.
    *     答案: 1 4 *
    * *

第 5 题

You have a 1-D dataset {x(1),,x(m)} and you want to detect outliers in the dataset. You first plot the dataset and it looks like this:
Week9_1Anomaly Detection
Suppose you fit the gaussian distribution parameters μ1 and σ12 to this dataset. Which of the following values for μ1 and σ12 might you get?

  • μ1=3,σ12=4
  • μ1=6,σ12=4
  • μ1=3,σ12=2
  • μ1=6,σ12=2

*     答案: 1 *
* 从图中可以看出μ1=3因为在-3处概率密度最大,感觉图画得不标准,但概率密度总比-6处大,所以μ1选-3 *
* 再看σ1: 1个σ的概率0.6826,2个是0.9544,3个是0.9974, 图中点的分布是在[-9, 2]之间 *
* 3个是0.9974,, 当σ1取2时正好包括所有的点 *
* 3个是0.9974,图中的分布是在[-9, 2]之间, 当σ12=1.414时约在[-7.3, 1.3]之间则左右都会漏好几个点,则达不到0.9974的概率 *

相关文章:

  • 2021-09-16
  • 2021-07-26
  • 2021-07-07
  • 2021-12-02
  • 2021-11-22
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2021-12-02
  • 2021-05-25
  • 2021-10-11
  • 2021-09-09
  • 2021-10-14
  • 2021-06-30
相关资源
相似解决方案