文章阅读 - Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices

1. Summary

1)for what problem?

目前的DRAM chip很多使用了ECC,并且对于研究者能看到的测试数据也是经过ECC纠错以后的数据,而这掩盖了原本的错误发生的分布。

Unfortunately, recent DRAM technology scaling issues are forcing manufacturers to adopt on-die error-correction codes  ECC), which pose a significant challenge for DRAM error characterization studies by obfuscating raw error distributions using undocumented, proprietary, and opaque errorcorrection hardware. As we show in this work, errors observed in devices with on-die ECC no longer follow expected, well-studied distributions (e.g., lognormal retention times) but rather depend on the particular ECC scheme used.

2) key idea?

 . Our approach is based on the key idea that even though ECC obfuscates the exact locations of the pre-correction errors, we can leverage known statistical properties of pre-correction error distributions (e.g., uniform-randomness [5, 57, 98, 112]) in order to disambiguate the effects of different ECC schemes (Section 4)

3) Mechanism?

EIN uses maximum a posteriori (MAP) estimation over statistical models that we develop to represent ECC operation to: i) reverse-engineer the ECC scheme and ii) infer the pre-correction error rates given only the post-correction errors. We design and publicly release EINSim, a flexible open-source simulator that can apply EIN to a wide variety of DRAM devices and standards.

2020-10-22 文章阅读 EIN

对于已知所有的Cj',对于任意的可能的w'都可以求出概率

2020-10-22 文章阅读 EIN

因为列举所有的可能的w'数量过多(2^64),将w归类到几个Wn', n ∈(0,N)

2020-10-22 文章阅读 EIN

将所有的环境因素(e.g.芯片结构)影响因子算入θ,(实际上(Fi,θ)等价于Fi'?)

现在,我们的工作是,对于一个不知道的ECC算法(F unknown),基于观察O,推测出最可能的F

2020-10-22 文章阅读 EIN

根据贝叶斯公示

2020-10-22 文章阅读 EIN

舍去分母的P[O]是因为并不影响何时取最大值。

2020-10-22 文章阅读 EINwith j = 0 to jmax growing as time increasing, we can observe each n-j for the first j bursts, and the actual sequence must be same to N(as time goes, total error bits = [0, 0, 0, 1, 2, 2, 2,...] means time 4 and 5 has error bits)

2020-10-22 文章阅读 EIN

因为不知道θ,所以首先假设实验是在对于任意Fi都是在另P[O|Fi]最大的环境下进行的

2020-10-22 文章阅读 EIN

2020-10-22 文章阅读 EIN

这样可以求出F unknown,并且在求出F后,通过

2020-10-22 文章阅读 EIN

可以进一步得出实验环境的θ

4) Result?

we show that EIN enables: i) reverse-engineering the on-die ECC scheme, which we find to be a single-error correction Hamming code with (n = 136, k = 128, d = 3), ii) inferring pre-correction error rates given only post-correction errors, and iii) recovering the well-studied precorrection error distributions that on-die ECC obfuscates.

2. Strengths

 Can get the original ECC algorithms and pre-ECC error rates, and the pre-ECC error pattern is essential for new ECC technique development and test and evaluation. (and this really work with the example of Retention time)

2020-10-22 文章阅读 EIN

Also work for retention time of different temperatures.

 

3. Weakness

Can not determine which ECC algorithm exactly with confidence. (But with some knowledge user can have basic idea about the algorithms)

Cannot determine where exactly the error happen(it is not very possible to really implement such a simulator...)

 

4. Takeaway

根据以前的研究,ECC在一定的温度下对时间不显现规率 -> 实际上这表明error bit以uniform distribution(均匀分布)出现

相关文章:

猜你喜欢
  • 2021-04-09
相关资源
相似解决方案