【发布时间】:2013-04-23 11:00:29
【问题描述】:
我试图用 nutch 抓取网站并收到此错误:
java.net.MalformedURLException: no protocol:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.Injector.inject(Injector.java:296)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
【问题讨论】:
标签: web-crawler nutch