Java – Apache nutch – path problem
•
Java
I tried to set Apache nutch to grab the URL and follow this guide As an old guide (this guide is 1. X, I use 2.3), I have made the necessary changes to the structure However, when I try to run a crawl, I get this error:
root@IndiStage:~# /usr/local/nutch/framework/apache-nutch-2.3/src/bin/crawl urls FirstCrawl 2 No SOLRURL specified. Skipping indexing. Injecting seed URLs /usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl Error: Could not find or load main class org.apache.nutch.crawl.InjectorJob Error running: /usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl Failed with exit value 1. root@IndiStage:~#
As a new feature of Ubuntu (14.04), it is difficult for me to manage the directory structure and path here
The injectorjob is located in / usr / local / nutch / framework / apache-nutch-2.3/src/java/org/apache/nutch/crawl
JAVA_ Home is set to / usr / lib / JVM / java-7-openjdk-amd64
Solution
Make sure you have compiled nutch source code Then, run the crawl command from ${apache_nut_home} / Runtime / local (or ${apache_nut_home} / Runtime / deploy / bin)
I hope it helps,
Le Quoc does
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码