【发布时间】:2018-01-27 21:02:33
【问题描述】:
所以我在 Linux (Ubuntu) 上设置了 Azure 数据科学虚拟机,并在终端上执行了以下操作以启用 Remote R 工作区、RStudio Server、R Server Operationalization 和 hadoop:
sudo apt update
sudo apt -y upgrade
# Hadoop is installed but doesn't seem to appear on the PATH or have its environment variable set by default
sudo echo "" >> ~/.bashrc
sudo echo "export PATH="'$'"PATH:/opt/hadoop/hadoop-2.7.4/bin" >> ~/.bashrc
sudo echo "export HADOOP_HOME=/opt/hadoop/hadoop-2.7.4" >> ~/.bashrc
#
source ~/.bashrc
#Setting up a password as none exists to begin with because of private key selection in the installation
#RStudio Server requires a password though
"MyPassword\nMyPassword\n" | sudo passwd sshuser
#Unfortunately hadoop fails on Data Science Virtual Machine
#error: mkdir: Call From IM-DSonUbuntu/192.168.5.4 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
# hadoop fs -mkdir /user/RevoShare/rserve2
# hadoop fs -chmod uog+rwx /user/RevoShare/rserve2
sudo mkdir -p /var/RevoShare/rserve2
sudo chmod uog+rwx /var/RevoShare/rserve2
# hadoop fs -mkdir /user/RevoShare/sshuser
# hadoop fs -chmod uog+rwx /user/RevoShare/sshuser
sudo mkdir -p /var/RevoShare/sshuser
sudo chmod uog+rwx /var/RevoShare/sshuser
#Setting up R Server Operationalisation
cd /opt/microsoft/mlserver/9.2.1/o16n
sudo dotnet Microsoft.MLServer.Utils.AdminUtil/Microsoft.MLServer.Utils.AdminUtil.dll -silentoneboxinstall MyPassword
#They say this Data Science Virtual Machine already has RStudio Server, but even though the port 8787 is open, it's nowhere to be found! So installing it now, and after the installation it's accessible by refreshing the page that failed before.
#Perhaps it's not installed then? Or a service is not running like it shoudl?
#https://www.rstudio.com/products/rstudio/download-server/
wget https://download2.rstudio.org/rstudio-server-1.1.414-amd64.deb
yes | sudo gdebi rstudio-server-1.1.414-amd64.deb
#They are small, leave them for debug reasons - lets have evidence the script run thus far.
#sudo rm rstudio-server-1.1.414-amd64.deb
# Remote R workspace Service needs dotnet sdk
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg
sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt update
sudo apt -y install dotnet-sdk-2.0.0
sudo apt install libxml2-dev
#Downloading and installing the Remote R service
wget -O rtvs-daemon.tar.gz https://aka.ms/r-remote-services-linux-binary-current
tar -xvzf rtvs-daemon.tar.gz
sudo ./rtvs-install -s
sudo systemctl enable rtvsd
sudo systemctl start rtvsd
#sudo rm rtvs-daemon.tar.gz
#sudo rm rtvs-install
#Fixing Remote R: For some reason, even though 'sudo systemctl enable rtvsd' runs, after every reboot the service won't become automatically active. So let's fix that.
wget https://sa0im0general.blob.core.windows.net/general-blob-container/StartRemoteRAfterReboot.sh
sudo mv StartRemoteRAfterReboot.sh /var/RevoShare/StartRemoteRAfterReboot.sh
sudo /sbin/shutdown -r 5
sudo chown root /etc/rc.local
sudo chmod 755 /etc/rc.local
sudo systemctl enable rc-local.service
sudo -s
sudo find /etc/ -name "rc.local" -exec sed -i 's/exit 0//g' {} \;
sudo echo "" >> /etc/rc.local
sudo echo "sh /var/RevoShare/StartRemoteRAfterReboot.sh" >> /etc/rc.local
sudo echo "exit 0" >> /etc/rc.local
exit
我也一一尝试了这些,看看它是否对 RStudio 服务器有任何影响(它没有,但即使有,我想要一个全局解决方案来处理远程 R 工作区服务和 R Server Operationalization,不仅仅是 RStudio Server):
#Configuring RStudio Server to see the R Server R
sudo echo "rsession-which-r=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> /etc/rstudio/rserver.conf
export RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.profile
source ~/.profile
sudo echo "RSTUDIO_WHICH_R=/opt/microsoft/mlserver/9.2.1/bin/R/R" >> ~/.bashrc
source ~/.bashrc
sudo echo "PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R" >> ~/.bashrc
export PATH=$PATH:/opt/microsoft/mlserver/9.2.1/bin/R
source ~/.bashrc
问题在于,即使“which R”指向 R Server 的 R,即键入“sudo R”,也会显示消息“正在加载 Microsoft R Server 软件包,版本 9.2.1”。并且会加载像 RevoScaleR 这样的包,其他的都不能这样做。
使用 http://THE-IP-GOES-HERE.westeurope.cloudapp.azure.com:8787 访问 RStudio Server 并使用初始用户 ("sshuser")(或任何其他用户)登录将不会加载 R Server,并且 RevoScaleR rx 功能不可用
使用我的本地 Visual Studio 2017 通过“工作区”选项卡上的“添加连接”访问远程工作区加载 MRO 并说:
已安装的 R 版本:
[0] Microsoft R Open '3.4.1.1347' (Default)
- 最后,当我使用 R Server 的 Operationalization 并使用“mrsdeploy”包的“remoteLogin()”登录时,RevoScaleR 等 R Server 包不会再次加载,因此“rxSummary(~., data=iris)”之类的内容会失败出现错误'找不到函数“rxSummary”'
当我从 azure 部署“Linux (Ubuntu) 上的机器学习服务器 9.2.1”时,发生了完全相同的事情。
我不想只使用常规的开源 R,我希望能够使用 R 服务器——这就是我部署这个 VM 的原因。我怎样才能使所有内容都加载 R Server 的 R,而不是 Microsoft R Open? (就像我可以使用“R”从终端做的那样)
由于我已经尝试了所有这些以及 R Server 已加载到控制台中的事实,我现在想到了权限。会不会是默认情况下 Data Science VM 没有正确的权限来允许这些? 我很茫然
【问题讨论】: