基于其他列添加具有函数结果的列答案

【问题标题】：Add a column with function result based on other columns基于其他列添加具有函数结果的列
【发布时间】：2018-07-02 09:43:18
【问题描述】：

我有以下数据框：

Latitude , Longitude, Altitude
44.388401, 8.433392 , 463.000000
44.388571, 8.434575 , 471.000000
44.388740, 8.435758 , 507.000000
44.388910, 8.436941 , 563.000000
44.389079, 8.438123 , 606.000000
44.389249, 8.439306 , 629.000000
44.389418, 8.440489 , 639.000000
44.389588, 8.441672 , 640.000000
44.389757, 8.442854 , 590.000000
44.389927, 8.444037 , 564.000000
44.390096, 8.445220 , 543.000000
44.390265, 8.446403 , 527.000000
44.390435, 8.447585 , 469.000000

前两列是纬度和经度（以度为单位），第三列是海拔。我想要做的是添加一列表示观察位置与第一次观察位置的距离，例如（距离不准确，只是为了显示）

Latitude , Longitude, Distance , Altitude
44.388401, 8.433392 ,  0.000000, 463.000000
44.388571, 8.434575 , 10.000000, 471.000000
44.388740, 8.435758 , 21.000000, 507.000000
44.388910, 8.436941 , 25,231232, 563.000000
44.389079, 8.438123 , 33,211333, 606.000000
44.389249, 8.439306 , 55,000000, 629.000000
...

我知道我可以使用库 geosphere 中的函数 distm，但问题是：如何添加一个列，该列的值由一个函数计算，该函数具有相同观察值和值的其他值作为参数第一次观察？

我见过this post，但它允许根据相同观察的其他数据计算新列，而不是相同的观察和第一个，就像我需要的那样。

【问题讨论】：

标签： r dataframe calculated-columns

【解决方案1】：

不知道为什么 distm 函数是这样编写的，但这应该可以工作：

   library(dplyr)

# Put the data in a data frame
df <- data.frame(Latitude = c(44.388401,44.388571), Longitude = c(8.433392,8.434575), Altitude =  c(471.000000, 463.000000))

# Extract the two required columns 
start_point <-  df %>% select(Longitude,  Latitude) %>% filter(row_number() == 1)
lat_long <- select(df,  Longitude, Latitude)

# Calculate distance 
df %>% mutate(Distance = distm(lat_long, start_point ))

【讨论】：

【解决方案2】：

如果我正确理解了这个问题，那么您可以使用来自purrr 的pmap_dbl

library(dplyr)
library(geosphere)
library(purrr)

df %>%
  mutate(Distance = pmap_dbl(., ~distm(c(..2, ..1), 
                                       c(Longitude[1], Latitude[1]), 
                                       fun = distHaversine)))

样本数据：

df <- structure(list(Latitude = c(44.388401, 44.388571, 44.38874, 44.38891, 
44.389079, 44.389249, 44.389418, 44.389588, 44.389757, 44.389927, 
44.390096, 44.390265, 44.390435), Longitude = c(8.433392, 8.434575, 
8.435758, 8.436941, 8.438123, 8.439306, 8.440489, 8.441672, 8.442854, 
8.444037, 8.44522, 8.446403, 8.447585), Altitude = c(463, 471, 
507, 563, 606, 629, 639, 640, 590, 564, 543, 527, 469)), .Names = c("Latitude", 
"Longitude", "Altitude"), class = "data.frame", row.names = c(NA, 
-13L))

【讨论】：