If θ is contained in a function, n would be the total sample numbers, and θ^ would be the estimator for actual maximum ID.
If M is contained in a function, k would be the total sample numbers, and N would be the actual maximum ID. As for M, that is a random variable of maximum ID in a random sample.
Method 1: Probability of each sample
The estimator used to predict the maximum value can also be determined by assuming that the probability of getting each sample is uniform where θ represents the actual maximum ID in each day.
P(x)=θ1
Method 2: Probability of maximum sample
According to the Assumption 1, we consider the observed maximum ID as a r.v M, and take the maximum ID we encountered in one specific day as m (i.e. xn:n). Assume that the N is the actual maximum ID and k represents the number of ill sample, the probability mass function (PMF) of getting the maximum ID can be expressed as follows:
P(M=m)=CkNCk−1m−1
Point Estimate
Estimators intuited from discrete uniform distribution
Estimator 1: 2*Mean-1
Consider continuous distribution for this problem, i.e. UNIF(0,θ)
ForUNIF(0,θ),E(X)=2θ,Var(X)=12θ2
We consider the following estimator: θ1=n2i=1∑nXi−1fordiscretedistrubutionθ1=n2i=1∑nXiforcontinuousdistrubutionE(θ1)=E(n2i=1∑nXi−1)=2E(X)=θVar(θ1)=n24i=1∑nVar(Xi)=n24i=1∑n12θ2=3nθ2∴θ1isanunbiasedestimatorwithVar=3nθ2
Estimator 2: Max + Avg GAP
Consider other form of improvement from MLE estimator, i.e. using average approach to estimate the GAP between maximum and the upper limit:
Calculate the expected value and variance to determine if this estimator is biased or not. E(θ2)=E(Xn:n)+n−11i>j∑E(Xi−Xj)=n+1nθVar(θ^2)=(n+1)(n−1)(n+2)nθ2
Therefore, θ2 is a biased estimator.
Estimator3: Min+max estimator
We know that maximum sample ID is what’s closed to the upper limit, and we could add more information to it. Intuitively, we first consider minimum sample ID + maximum sample ID: θ3=x1:n+xn:nFXn:n(x)=[FX(x)]n=θnxn,fXn:n(x)=nθnxn−1E[Xn:n]=∫xnθnxn−1dx=n+1nθE[Xn:n2]=∫x2nθnxn−1dx=n+2nθ2FX1:n(x)=1−[1−FX(x)]n=1−(θθ−x)n,fX1:n(x)=θnn(θ−x)n−1E[X1:n]=∫xθnn(θ−x)n−1dx=n+11θE[X1:n2]=∫x2θnn(θ−x)n−1dx=n(n+1)2θ2E(θ3)=E(x1:n)+E(xn:n)=θVar(θ^3)=Var(X1:n)+Var(Xn:n)+2Cov(X1:n,Xn:n)=n(n+1)2θ2−(n+11θ)2+n+2nθ2−(n+1nθ)2+2Cov(X1:n,Xn:n)Sincethejointdistributionoftheorderstatisticsoftheuniformdistributionisfui,vj(u,v)=n!(i−1)!ui−1(j−i−1)!(v−u)j−i−1(n−j)!(1−v)n−jCov(uk,vj)=(n−1)2(n+2)j(n−k−1)Var(θ^3)=n(n+2)2θ2+(n+1)2(n+2)2n2θ2
Therefore, θ3 is a biased estimator.
θ^5=nn+1xn:n−1…fordiscretecase.θ^5=nn+1xn:n…forcontinuouscase.
Get the variance to determine if the estimator has been improved to an unbiased estimator where Var(x)=E(x2)−[E(x)]2.
E(θ^5)=θE(Xn:n2)=∫x2⋅θnn⋅xn−1dx=n+2nθ2Var(θ^5)=(nn+1)2⋅Var(Xn:n)=(nn+1)2(n+2nθ2−(n+1nθ)2)=n(n+2)θ2
Therefore, θ^5 is an unbiased estimator.
To obtain the maximum value of θ, the best option is to get the maximum value of each random sample m (i.e. θ^=xn:n). According to the Ex 3.4, we know that
Fxn:n(x)=[Fx(x)]n=θnxn→fxn:n(x)=θnn⋅xn−1
where n is the total number of sample in each day. To obtain the expected value of m (i.e. xn:n), we can determine if the estimator is an biased estimator such that:
E(xn:n)=∫x⋅θnn⋅xn−1dx=n+1nθ=E(θ^)=θ
Therefore, the estimator is an biased estimator. f(x)={θ10<x<θ0o.w.f(x)=θ1I0,θ(x)f(x1,...,xn)=θn1i=1∏nI0,θ(xi)=θn1I0,θ(xn:n)=g(xn:n,θ)⋅h(x1,...,xn)
Therefore, S=xn:n is sufficient for θ. According to Lehmann-Scheffe Theorem, we have:
T=θ^2=nn+1Xn:n
which is unbiased for τ(θ)=θ. Thus, T is UMVUE.
UMVUE under method 2
The expected value of M=m can be calculated as follows where CkN=k!(N−k)!N!, ∑m=kNCkm=Ck+1N+1:
E(M=m)=m=k∑Nm⋅P(m)=m=k∑Nm⋅k!(N−k)!N!(k−1)!(m−k)!(m−1)!=m=k∑Nk!(m−k)!N!m!k(N−k)!k!=N!k(N−k)!k!⋅m=k∑NCkm=N!k(N−k)!k!⋅(k+1)!(N−k)!(N+1)!=k+1k(N+1)
Since we are looking for the maximum ID from our observation, the best guess of M should be the maximum ID of THAT particular day m. Therefore, we get: m=k+1k(N+1)→kN^=mk+m−k→N^=m+km−1
By Finding the expected value of N^, we have:
E(N^)=E[E(M)+kE(M)−1]=E(M)+kE(M)−1=k+1k(N+1)+k+1N+1−k+1k+1=k+1N(k+1)=N
Therefore, N^ is proved to be unbiased.
Estimator 6: Bayes Estimator
The Bayesian approach is to consider the credibility P(N=n∣M=m,K=k) that the maximum random ID N is equal to the number n, and the maximum observed serial number M is equal to the number m. Consider Conditional probability rule instead of using a proper prior distribution.
P(n∣m,k)P(m∣k)=P(m∣n,k)P(n∣k)=P(m,n∣k)
P(m∣n,k) answers the question: “What is the probability of a specific serial number m being the highest number observed in a sample of k patients, given there are n in total?” The probability of this occurring is: P(m∣n,k)=kn!(n−k)!(m−k)!(m−1)!=CknCk−1m−1Ik≤mIm≤n P(m∣k) is the probability that the maximum serial number is equal to m once k tanks have been observed but before the serial numbers have actually been observed. P(m∣k)=P(m∣k)∗n=0∑∞P(n∣m,k)=P(m∣k)∗n=0∑∞P(m∣k)P(m∣n,k)P(n∣k)=n=0∑∞P(m∣n,k)P(n∣k) P(n∣k) is the credibility that the total number of tanks, N, is equal to n when the number K patients observed is known to be k, but before the serial numbers have been observed. Assume that it is some discrete uniform distribution:
Hence, μBayes=n−2(Xn:n−1)(n−1) using the standard of writing in other chapter. E(μBayes)=n+2nθ The Bayes estimator is a biased one.
To measure its uncertainty, we calculate its variance: μ2+σ2−μ=n∑n(n−1)P(n∣m,k)=n∑n(n−1)nm−1n−1m−2k−2k−1Ck−2n−2Ck−3m−3Im≤n=(m−1)(m−2)k−2k−1Ck−3m−3n≥m∑Ck−2n−21=(m−1)(m−2)k−2k−1Ck−3m−3k−3k−2Ck−3m−31=k−3(m−1)(m−2)(k−1)σBayes2=k−3(m−1)(m−2)(k−1)−(k−2(m−1)(k−1))2+k−2(m−1)(k−1)=(k−3)(k−2)2(m−1)(k−1)(m+1−k)Var(θBayes)=(n−3)(n−2)2(xn:n−1)(n−1)(xn:n+1−n)
Point Estimation Conclusion
According to the distribution of the question, we find six possible estimators, in which four of them are intuitive from the distribution and the background of the question, one is maximum likelihood estimator (MLE) and one the Bayes estimator and the improved estimator from MLE. We proved that the improved estimator from MLE is exactly uniformly minimum-variance unbiased estimator (UMVUE) which is unbiased estimator with the smallest variance.
Also, what is most important in our findings is the Xn:n plays an important role in estimating the upper limit of th discrete uniform distribution since the maximum sample give the closest information of the upper limit intuitively and we also proved that it’ s the sufficient statistics to estimate N.
To compare the unbiasedness, effectiveness of the estimators we find, we summarize the results and give the following table:
No.
Function
E(θ)
Var(θ)
θ1
n2∑i=1nXi−1
θ
3nθ2
θ2
Xn:n+n−11∑i>j(Xi−Xj−1)
n+1nθ
(n+1)(n−1)(n+2)nθ2
θ3
x1:n+xn:n
θ
n(n+2)2θ2+(n+1)2(n+2)2n2θ2
θ4
n∑i=1nXi+3n−1∑E(Xi−X)2
21θ+323θ
Var(θ4)>12nθ2
θ^5
nn+1xn:n−1
θ
n(n+2)θ2
θMLE
Xn:n
n+1nθ
(n+1)2(n+2)nθ2
θ^Bayes
n−2(Xn:n−1)(n−1)
n+2nθ
(n−3)(n−2)2(xn:n−1)(n−1)(xn:n+1−n)
Interval Estimation
In addition to point estimation, interval estimation can be carried out. Based on the observation that the probability that k observations in the sample will fall in an interval covering p of the range (0 ≤ p ≤ 1) is pk(assuming in this section that draws are with replacement, to simplify computations; if draws are without replacement, this overstates the likelihood, and intervals will be overly conservative).
Thus the sampling distribution of the quantile of the sample maximum is the graph x1/k from 0 to 1: the p-th to q-th quantile of the sample maximum m are the interval [p1/kN, q1/kN]. Inverting this yields the corresponding confidence interval for the population maximum of [m/q1/k,m/p1/k].