exploratory data analysis (EDA)

is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

In this book, we focus on:

1) summary statistics

2) visualization

3) online analytical processing(OLAP)


 

UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html


1. summary statistics:

1) mean is very sensitive to outliers.Thus, the median or a trimmed mean is also commonly used.

2) variance is also sensitive to outliers.

 

数据挖掘导论-2

Average absolute deviation:

数据挖掘导论-2


2. Visualization

box plot:

数据挖掘导论-2

数据挖掘导论-2数据挖掘导论-2

Parallel Coordinates:

不使用纵轴。横轴上是很多attribute(顺序影响解读),每个样本的各属性值在横轴上方的位置标好,连线,即每个样本用一条线表示。

数据挖掘导论-2

数据挖掘导论-2数据挖掘导论-2


 

3. OLAP

OLAP uses a multidimensional array representation.

 

相关文章:

  • 2021-06-21
  • 2018-11-26
  • 2021-10-09
  • 2021-07-28
  • 2019-06-21
  • 2019-06-21
  • 2021-03-28
  • 2021-05-01
猜你喜欢
  • 2021-04-24
  • 2021-06-25
  • 2021-05-20
  • 2021-12-20
  • 2021-11-27
  • 2021-11-02
  • 2021-07-04
相关资源
相似解决方案