【发布时间】:2019-03-01 15:11:46
【问题描述】:
我正在尝试创建一个包含分类数据和连续数据的无监督模型。我想我已经解决了,但这是正确的方法吗?
加载库
library(tidyr)
library(dummies)
library(fastDummies)
library(cluster)
library(dplyr)
创建样本数据集
set.seed(3)
sampleData <- data.frame(id = 1:50,
gender = sample(c("Male", "Female"), 10, replace =
TRUE),
age_bracket = sample(c("0-10", "11-30","31-60",">60"),
10, replace = TRUE),
income = rnorm(10, 40, 10),
volume = rnorm(50, 40, 100))
创建稀疏矩阵和缩放
sd1 <- sampleData %>%
dummy_cols(select_columns = c("gender","age_bracket"))%>%
mutate(id = factor(id))%>%
select(-c(gender,age_bracket))%>%
mutate_if(is.numeric, scale)
glimpse(sd1)
使用 k = 3 的 pam() 函数生成 k-means 模型
sd2 <- pam(sd1, k =3)
从模型中提取聚类分配向量
sd3 <- sd2$cluster
构建 segment_customers 数据框
sd4 <- mutate(sd1, cluster = sd3)
计算每个簇的大小
count(sd4, cluster)
【问题讨论】:
标签: r cluster-analysis