【发布时间】:2017-06-09 04:03:50
【问题描述】:
由于现在的代码,我可以分析 DataSet 并根据存储在 LowConf 和 HighConf 中的值检测异常值,这些值基于 DataSet 的大小。但是,我只能确定其中的一个异常值。我想扩展或添加代码,以便执行以下操作:
- 首先确定 MOST 外围元素
- 判断它是否是异常值,如果是,则从DataSet中删除该元素,如果不是,则结束程序。
- 更改样本大小(减 1)并再次分析 DataSet 以查找下一个 MOST 离群元素并循环 k 次。
我的想法是创建一个新数组,它是每个 DataSet 元素与平均值的距离(绝对值),然后找到它的 UBound 并在下面的 If 语句中分析它。我的问题是,一旦我确定它是否是异常值,我如何返回 DataSet 中的相应元素并删除它?如果是这样,有没有更简单的方法来解决这个问题?另外,我是编码新手,所以任何关于可接受格式/如何清理代码的提示也值得赞赏。
Sub CalculateOutliers()
Dim n As Integer
Dim mean As Double
Dim SD As Double
Dim k As Integer
Dim DataSet As Variant
Dim LowConf As Single
Dim HighConf As Single
'--------------------------------------------------------
DataSet = Selection.Value
'Copies highlighted data into DataSet variable
'Cell A1 is (1,1) Because it starts at 0 which is out of range
'--------------------------------------------------------
'--------------------------------------------------------
n = Selection.CountLarge
'Counts number of entries
'--------------------------------------------------------
'--------------------------------------------------------
'DEFINES 95(LowConf) AND 99(HighConf) PERCENT CONFIDENCES BASED ON
'SAMPLE SIZE
If n <= 5 Then
LowConf = 1.72
HighConf = 1.76
End If
If n = 6 Then
LowConf = 1.89
HighConf = 1.97
End If
If n = 7 Then
LowConf = 2.02
HighConf = 2.14
End If
If n = 8 Then
LowConf = 2.13
HighConf = 2.28
End If
If n = 9 Then
LowConf = 2.21
HighConf = 2.39
End If
If n = 10 Then
LowConf = 2.29
HighConf = 2.48
End If
If n = 11 Then
LowConf = 2.36
HighConf = 2.56
End If
If n = 12 Then
LowConf = 2.41
HighConf = 2.64
End If
If n = 13 Then
LowConf = 2.46
HighConf = 2.7
End If
If n = 14 Then
LowConf = 2.51
HighConf = 2.75
End If
If n = 15 Then
LowConf = 2.55
HighConf = 2.81
End If
If n = 16 Then
LowConf = 2.59
HighConf = 2.85
End If
If n = 17 Then
LowConf = 2.62
HighConf = 2.9
End If
If n = 18 Then
LowConf = 2.65
HighConf = 2.93
End If
If n = 19 Then
LowConf = 2.68
HighConf = 2.97
End If
If n = 20 Then
LowConf = 2.71
HighConf = 3
End If
If n = 21 Then
LowConf = 2.73
HighConf = 3.03
End If
If n = 22 Then
LowConf = 2.76
HighConf = 3.06
End If
If n = 23 Then
LowConf = 2.78
HighConf = 3.08
End If
If n = 24 Then
LowConf = 2.8
HighConf = 3.11
End If
If n = 25 Then
LowConf = 2.82
HighConf = 3.14
End If
If n = 26 Then
LowConf = 2.84
HighConf = 3.16
End If
If n = 27 Then
LowConf = 2.86
HighConf = 3.18
End If
If n = 28 Then
LowConf = 2.88
HighConf = 3.2
End If
If n = 29 Then
LowConf = 2.89
HighConf = 3.22
End If
If n = 30 Then
LowConf = 2.91
HighConf = 3.24
End If
If n <= 35 And n > 30 Then
LowConf = 2.98
HighConf = 3.32
End If
If n <= 40 And n > 35 Then
LowConf = 3.04
HighConf = 3.38
End If
If n <= 45 And n > 40 Then
LowConf = 3.09
HighConf = 3.44
End If
If n <= 50 And n > 45 Then
LowConf = 3.13
HighConf = 3.48
End If
If n <= 60 And n > 50 Then
LowConf = 3.2
HighConf = 3.56
End If
If n <= 70 And n > 60 Then
LowConf = 3.26
HighConf = 3.62
End If
If n <= 80 And n > 70 Then
LowConf = 3.31
HighConf = 3.67
End If
If n <= 90 And n > 80 Then
LowConf = 3.35
HighConf = 3.72
End If
If n <= 100 And n > 90 Then
LowConf = 3.38
HighConf = 3.75
End If
If n <= 150 And n > 100 Then
LowConf = 3.52
HighConf = 3.89
End If
If n <= 200 And n > 150 Then
LowConf = 3.61
HighConf = 3.98
End If
If n <= 300 And n > 200 Then
LowConf = 3.72
HighConf = 4.09
End If
If n <= 400 And n > 300 Then
LowConf = 3.8
HighConf = 4.17
End If
If n <= 500 And n > 400 Then
LowConf = 3.86
HighConf = 4.32
End If
If n > 500 Then
MsgBox "Sample size cannot exceed 500."
End If
'--------------------------------------------------------
'--------------------------------------------------------
If n < 50 Then
k = Int(n / 10)
Else
k = 5
End If
'determines k = number of possible outliers
'--------------------------------------------------------
'--------------------------------------------------------
mean = Application.WorksheetFunction.Average(DataSet)
'Calculates mean of Data Set
'--------------------------------------------------------
'--------------------------------------------------------
SD = Application.WorksheetFunction.StDev(DataSet)
'Calculates Standard Deviation of Data Set
'--------------------------------------------------------
'--------------------------------------------------------
For Each element In DataSet
If (Abs(element - mean) / SD) > LowConf Then
MsgBox "95% outlier: " & element
End If
If (Abs(element - mean) / SD) > HighConf Then
MsgBox "99% outlier: " & element
End If
Next element
'--------------------------------------------------------
End Sub
更新:我已经找到了以下代码块。这个循环应该可以工作,但现在我只需要弄清楚如何从 DataSet 中删除 Suspect。一旦它被删除,它将循环返回并重新计算平均值、SD 和 Suspect。我意识到我没有关于 UBound 何时等于 LBound 的代码,但这是我将在此之后处理的事情。使用此代码,Suspect 一直显示值 1。我不确定为什么会这样,也不确定 DataSet.Remove(Suspect) 是否有效。
Dim i As Long
For i = 1 To k
'--------------------------------------------------------
mean = Application.WorksheetFunction.Average(DataSet)
'Calculates mean of Data Set
'--------------------------------------------------------
'--------------------------------------------------------
SD = Application.WorksheetFunction.StDev(DataSet)
'Calculates Standard Deviation of Data Set
'--------------------------------------------------------
'--------------------------------------------------------
Dim Suspect As Double
If (Abs(UBound(DataSet) - mean)) > (Abs(LBound(DataSet) - mean)) Then
Suspect = UBound(DataSet)
End If
If (Abs(UBound(DataSet) - mean)) < (Abs(LBound(DataSet) - mean)) Then
Suspect = LBound(DataSet)
End If
'Defines what the most outlying value is
'--------------------------------------------------------
'--------------------------------------------------------
Dim Retest As Boolean
If (Abs(Suspect - mean) / SD) > LowConf Then
MsgBox "95% outlier: " & Suspect
Retest = True
End If
If (Abs(Suspect - mean) / SD) > HighConf Then
MsgBox "99% outlier: " & Suspect
Retest = True
End If
If Retest = True Then
DataSet.Remove (Suspect)
End If
MsgBox Suspect
'--------------------------------------------------------
Next i
更新:我稍微改变了部分。我可以将其应用到最低限度,但如何指定 Suspect 的位置并将其用作删除和上移的范围?
Dim Retest As Boolean
If (Abs(Suspect - mean) / SD) > LowConf Then
MsgBox "95% outlier: " & Suspect
Retest = True
End If
If (Abs(Suspect - mean) / SD) > HighConf Then
MsgBox "99% outlier: " & Suspect
Retest = True
End If
Dim pos As Range
Set pos = Application.Match(Suspect, DataSet)
MsgBox pos
If Retest = True And Suspect = Application.WorksheetFunction.Max(DataSet) Then
Range(pos).Delete Shift:=xlUp
End If
【问题讨论】:
-
为什么不发布一个独立的子或函数?实际上,您有各种未定义的变量。甚至不清楚
DataSet是什么。它是一个 VBA 数组吗?收藏?范围?LowConf和HighConf是如何定义的?给出的代码太零碎了。最好能发minimal reproducible example。 -
知道了。 DataArray 是变体
-
切点:几乎没有理由使用
Integer或Single而不是Long或Double。Integer迟早会要求溢出错误,Single要求的精度比容易获得的要低。 -
警告警告:
UBound(DataSet) - mean),LBound(DataSet) - mean,这些没有意义。您正在将 indices 与 values 进行比较。这是两个不同的世界。mean派生自数组中的 值,UBound和LBound是数组的极端索引(比如地址)。 -
@A.S.H 有没有办法先从数据集的 UBound 和 LBound 中检索值,然后我可以存储它们并将它们替换到这些语句中?