Referer: http://www.quora.com/How-can-R-and-Hadoop-be-used-together/answer/Jay-Kreps?srid=OVd9&share=1

 

Another way to answer this question is that they don't really integrate very well.

The advantage of R is not its syntax but rather the incredible library of primitives for visualization and statistics. These libraries are fundamentally non-distributed, and almost always operate on data resident in memory. So, for example, if you are finding R's glm method slow (or completely infeasible) on a particular dataset, there is really no way to make it run faster with Hadoop.

The reason it is important to point this out is that
1. Transparently distributed R is every data geeks wet dream.
2. I have sat through numerous presentations from distributed database vendors claiming to provide this.

What hadoop and database vendors can provide is the ability to run R in parallel on lots of little data sets. Virtually none of the libraries will work on a data set larger than memory.

 

Referer: http://www.quora.com/How-can-R-and-Hadoop-be-used-together/answer/Jay-Kreps?srid=OVd9&share=1

相关文章:

  • 2021-12-31
  • 2022-12-23
  • 2022-12-23
  • 2021-07-22
  • 2022-12-23
  • 2021-10-05
  • 2021-07-10
  • 2021-05-23
猜你喜欢
  • 2021-12-01
  • 2022-12-23
  • 2022-12-23
  • 2021-12-25
  • 2022-12-23
  • 2022-12-23
  • 2021-07-29
相关资源
相似解决方案