【发布时间】:2021-07-18 05:46:24
【问题描述】:
我想在 Windows Server 2019 上安装 Scrapy,在 Docker 容器中运行(请参阅 here 和 here 了解我的安装历史记录)。
在我的本地 Windows 10 机器上,我可以在 Windows PowerShell 中运行我的 Scrapy 命令(只需启动 Docker Desktop):
scrapy crawl myscraper -o allobjects.json 在文件夹中 C:\scrapy\my1stscraper\
对于此处推荐的 Windows Server,我首先按照以下步骤安装了 Anaconda:https://docs.scrapy.org/en/latest/intro/install.html。
然后我打开 Anaconda 提示符并在 D:\Programs 中键入 conda install -c conda-forge scrapy
(base) PS D:\Programs> dir
Directory: D:\Programs
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 4/22/2021 10:52 AM Anaconda3
-a---- 4/22/2021 11:20 AM 0 conda
(base) PS D:\Programs> conda install -c conda-forge scrapy
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.9.2
latest version: 4.10.1
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: D:\Programs\Anaconda3
added / updated specs:
- scrapy
The following packages will be downloaded:
package | build
---------------------------|-----------------
automat-20.2.0 | py_0 30 KB conda-forge
conda-4.10.1 | py38haa244fe_0 3.1 MB conda-forge
constantly-15.1.0 | py_0 9 KB conda-forge
cssselect-1.1.0 | py_0 18 KB conda-forge
hyperlink-21.0.0 | pyhd3deb0d_0 71 KB conda-forge
incremental-17.5.0 | py_0 14 KB conda-forge
itemadapter-0.2.0 | pyhd8ed1ab_0 12 KB conda-forge
parsel-1.6.0 | py_0 15 KB conda-forge
pyasn1-0.4.8 | py_0 53 KB conda-forge
pyasn1-modules-0.2.7 | py_0 60 KB conda-forge
pydispatcher-2.0.5 | py_1 12 KB conda-forge
pyhamcrest-2.0.2 | py_0 29 KB conda-forge
python_abi-3.8 | 1_cp38 4 KB conda-forge
queuelib-1.6.1 | pyhd8ed1ab_0 14 KB conda-forge
scrapy-2.4.1 | py38haa95532_0 372 KB
service_identity-18.1.0 | py_0 12 KB conda-forge
twisted-21.2.0 | py38h294d835_0 5.1 MB conda-forge
twisted-iocpsupport-1.0.1 | py38h294d835_0 49 KB conda-forge
w3lib-1.22.0 | pyh9f0ad1d_0 21 KB conda-forge
------------------------------------------------------------
Total: 9.0 MB
The following NEW packages will be INSTALLED:
automat conda-forge/noarch::automat-20.2.0-py_0
constantly conda-forge/noarch::constantly-15.1.0-py_0
cssselect conda-forge/noarch::cssselect-1.1.0-py_0
hyperlink conda-forge/noarch::hyperlink-21.0.0-pyhd3deb0d_0
incremental conda-forge/noarch::incremental-17.5.0-py_0
itemadapter conda-forge/noarch::itemadapter-0.2.0-pyhd8ed1ab_0
parsel conda-forge/noarch::parsel-1.6.0-py_0
pyasn1 conda-forge/noarch::pyasn1-0.4.8-py_0
pyasn1-modules conda-forge/noarch::pyasn1-modules-0.2.7-py_0
pydispatcher conda-forge/noarch::pydispatcher-2.0.5-py_1
pyhamcrest conda-forge/noarch::pyhamcrest-2.0.2-py_0
python_abi conda-forge/win-64::python_abi-3.8-1_cp38
queuelib conda-forge/noarch::queuelib-1.6.1-pyhd8ed1ab_0
scrapy pkgs/main/win-64::scrapy-2.4.1-py38haa95532_0
service_identity conda-forge/noarch::service_identity-18.1.0-py_0
twisted conda-forge/win-64::twisted-21.2.0-py38h294d835_0
twisted-iocpsuppo~ conda-forge/win-64::twisted-iocpsupport-1.0.1-py38h294d835_0
w3lib conda-forge/noarch::w3lib-1.22.0-pyh9f0ad1d_0
The following packages will be UPDATED:
conda pkgs/main::conda-4.9.2-py38haa95532_0 --> conda-forge::conda-4.10.1-py38haa244fe_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
constantly-15.1.0 | 9 KB | ############################################################################ | 100%
itemadapter-0.2.0 | 12 KB | ############################################################################ | 100%
twisted-21.2.0 | 5.1 MB | ############################################################################ | 100%
pydispatcher-2.0.5 | 12 KB | ############################################################################ | 100%
queuelib-1.6.1 | 14 KB | ############################################################################ | 100%
service_identity-18. | 12 KB | ############################################################################ | 100%
pyhamcrest-2.0.2 | 29 KB | ############################################################################ | 100%
cssselect-1.1.0 | 18 KB | ############################################################################ | 100%
automat-20.2.0 | 30 KB | ############################################################################ | 100%
pyasn1-0.4.8 | 53 KB | ############################################################################ | 100%
twisted-iocpsupport- | 49 KB | ############################################################################ | 100%
python_abi-3.8 | 4 KB | ############################################################################ | 100%
hyperlink-21.0.0 | 71 KB | ############################################################################ | 100%
conda-4.10.1 | 3.1 MB | ############################################################################ | 100%
scrapy-2.4.1 | 372 KB | ############################################################################ | 100%
incremental-17.5.0 | 14 KB | ############################################################################ | 100%
w3lib-1.22.0 | 21 KB | ############################################################################ | 100%
pyasn1-modules-0.2.7 | 60 KB | ############################################################################ | 100%
parsel-1.6.0 | 15 KB | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) PS D:\Programs>
然后在我的 VPS 上的 PowerShell 中,我尝试通过 D:\Programs\Anaconda3\Scripts\scrapy.exe 运行 scrapy
我想运行存储在文件夹 D:\scrapy\my1stscraper 中的蜘蛛,请参阅:
Docker Engine 服务作为 Windows 服务运行(假设我在运行我的 scrapy 命令时不需要显式启动容器,如果这样做,我不知道如何):
我尝试像 D:\Programs\Anaconda3\Scripts\scrapy.exe crawl D:\scrapy\my1stscraper\spiders\my1stscraper -o allobjects.json 这样启动我的爬虫,导致错误:
Traceback (most recent call last):
File "D:\Programs\Anaconda3\Scripts\scrapy-script.py", line 6, in <module>
from scrapy.cmdline import execute
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\__init__.py", line 12, in <module>
from scrapy.spiders import Spider
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\spiders\__init__.py", line 11, in <module>
from scrapy.http import Request
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\request\form.py", line 10, in <module>
import lxml.html
File "D:\Programs\Anaconda3\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from .. import etree
ImportError: DLL load failed while importing etree: The specified module could not be found.
我在这里检查过: from lxml import etree ImportError: DLL load failed: The specified module could not be found
这里谈到了pip,我没有使用它,但可以确定我确实安装了 C++ 构建工具:
我仍然遇到同样的错误。如何在 Docker 容器中运行我的 Scrapy 爬虫?
更新 1
我的 VPS 是我唯一的环境,所以不知道如何在虚拟环境中进行测试。
我现在做了什么:
- 卸载Anacondo
- 使用 Python 3.8 (https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) 安装 Miniconda,未添加到路径并使用 miniconda 作为系统的 python 3.8
查看您的建议:
获取在 Windows Server 上手动安装应用程序的步骤 - 最好在虚拟化环境中进行测试,以便您可以干净地重置它
- 当您说应用程序时,您是什么意思?刮痧?康达?
将所有步骤转换为全自动的powershell脚本(例如对于conda,需要通过wget下载安装程序,执行安装程序等。
-
我现在在主机操作系统上安装了 Conda,因为我认为这将使我的开销最少。还是直接在镜像中安装,如果是这样,我怎么不用每次都安装?
-
最后,为了确定一下,我想运行多个 Scrapy 抓取工具,但我想以尽可能少的开销来做到这一点。 我应该在相同的 docker 容器中为我要执行的每个爬虫重复
RUN命令,对吗?
更新 2
whomami 确实返回 user manager\containeradministrator
scrapy benchmark 返回
Scrapy 2.4.1 - no active project
Unknown command: benchmark
Use "scrapy" to see available commands
我在文件夹D:\scrapy\my1stscraper 中有我想要运行的scrapy 项目,我该如何运行该项目,因为我的容器中没有D:\ 驱动器?
更新 3
几个月后,当我们讨论这个问题时,当我现在运行你提出的 Dockerfile 时,它会中断,我现在得到这个输出:
PS D:\Programs> docker build . -t scrapy
Sending build context to Docker daemon 1.644GB
Step 1/9 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
---> d1724c2d9a84
Step 2/9 : SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
---> Running in 5f79f1bf9b62
Removing intermediate container 5f79f1bf9b62
---> 8bb2a477eaca
Step 3/9 : RUN setx /M PATH $('C:\Users\ContainerAdministrator\miniconda3\Library\bin;C:\Users\ContainerAdministrator\miniconda3\Scripts;C:\Users\ContainerAdministrator\miniconda3;' + $Env:PATH)
---> Running in f3869c4f64d5
SUCCESS: Specified value was saved.
Removing intermediate container f3869c4f64d5
---> 82a2fa969a88
Step 4/9 : RUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; Start-Process -FilePath 'miniconda3.exe' -Wait -ArgumentList '/S', '/D=C:\Users\ContainerAdministrator\miniconda3'; Remove-Item .\miniconda3.exe; conda install -y -c conda-forge scrapy;
---> Running in 3eb8b7bfe878
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment:
Specifications:
- scrapy -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.5,<3.6.0a0|3.4.*']
Your python: python=3.9
If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
不确定我是否正确阅读,但似乎 Scrapy 不支持 Python 3.9,除了在这里我看到“Scrapy 需要 Python 3.6+”https://docs.scrapy.org/en/latest/intro/install.html 你知道是什么导致了这个问题吗?我也checked here,但也没有答案。
【问题讨论】:
标签: python docker scrapy anaconda windows-server-2019