Learning Notes of Dr.Bo Yuan.THU 《Data:Theory and Algorithm》Part I

  • Definition:Data Mining is the process of automatically extracting interesting and useful hidden patterns from usually massive,incomplete and noisy data.
    Not a fully automatically process.
    From data to intelligence.
    Data、information、knowledge、decision support
    Data mining(I)
  • Classification
    Data mining(I)
    Algorithms:
    Decision Tree、KNN、Neural Networks、SVM
    Overfitting
    Cross Validation Training data 、Test data
    Data mining(I)
    Confusion Matrix 、 TP(True Positive) 、FP(False Positive) 、FN(False Negative) 、TN(True Negative) 、TPR(True Positive Rate)、 TNR(True Negative Rate)、 Accuracy
    TP+FP+FN+TN = number of samples
    Data mining(I)
    ROC:Receiver Operating Characteristic
    AUC:Area Under ROC Curve #AUC near 1 is good
    Data mining(I)
    Cost sensitive learning
    Lift analysis

  • Clustering
    Difference:Clustering is Unsupervised Learning,Classification is Supervised Learning
    Data mining(I)
    Association Rule

  • Regression
    Data mining(I)
    Underfitting
    Overfitting

  • Data Preprocessing
    Data mining(I)
    Garbage Input garbage Output
    Cloud Computing
    Parallel Computing

相关文章: