Cygwin is used to run Nutch on Windows. Of course, you may run Nutch on Linux if desired.
1) Go to Cygwin site to download setup.exe.
2) Run setup.exe to set up Cygwin. No additional package is required to run Nutch.
3) Download the Nutch package (please choose at least version 0.8).
4) Unzip the package, preferably to the Cygwin home folder for easy access.
5) Test that the installation works by typing the following in the nutch folder:
Verify that the following is shown:
6) Set Classpath to the Lucene core(core version may vary):
7) Set JAVA_HOME
Note: When setting CLASSPATH or JAVA_HOME, do not include folders that have names with spaces in them.
For example, naming the Nutch folder 'Nutch 0.9' instead of 'Nutch-0.9' will result in the CLASSPATH or JAVA_HOME not being recognized.
8) Type the following to verfiy that the paths are set correctly: './bin/nutch crawl'
The above output will appear if CLASSPATH is set correctly.
Nutch is now ready to crawl and index.
For further information on how to use Nutch, please follow the tutorials located in the Nutch website and the java.net introduction to Nutch. The urls are given in the Introduction post.
Subscribe to:
Post Comments (Atom)
9 comments:
If you encounter this problem:
$ ./bin/nutch
./bin/nutch: line 15: syntax error near un
'/bin/nutch: line 15: `case "`uname`" in
run d2u:
$ d2u bin/nutch
bin/nutch: done.
For those who are having trouble due to the space in the 'Program Files' and you are getting an error like below,
$ ./bin/nutch crawl
./bin/nutch: line 158: C:\Program Files\Java\jdk1.6.0_24;/bin/java: No such file or directory
./bin/nutch: line 268: exec: C:\Program: not found
Here is the solution!
create a new environment variable $NUTCH_JAVA_HOME
Set its value as below (see no trailing ';' and the folder name in the DOS way).
C:\PROGRA~1\Java\jdk1.6.0_24
precisely at echo,
$ echo $NUTCH_JAVA_HOME
C:\PROGRA~1\Java\jdk1.6.0_24
@samy
Thanx a lot man, you saved my day.
hi.i ve read your post. but i got this: please i need your help ://
$ ./nutch crawl
cygpath: can't convert empty path
Error occurred during initialization of VM
java/lang/ClassNotFoundException: error in opening JAR file C:\PROGRA~1\Java\jdk1.7.0_03\jre\lib\rt.jar
cygpath: can't convert empty path
i hv also got same error like stetsa
so,please help me to overcome this problem..or sent me my email mugeesh@gmail.com
cygpath: can't convert empty path
./nutch: line 268: exec: C:\Program: not found
what a damn i have been trying for a long time any one pls help me
I too getting the same error c:\Program Files\ Not found.Can you please any one explain
I am new to nutch and I have a problem with my initial deployment Cygwin I have a problem compiling:
$ cd apache-nutch-1.4-bin/runtime/
-bash: cd: apache-nutch-1.4-bin/runtime /: No such file or directory,
i'would like to know why and how to fix it, Can you please help me?
- change your jdk path in environment variables to not have spaces.
- In windows 7 its control panel>system>advanced system settings - advanced tab and click on Environment Variables.
- Change JAVA_HOME in user variables and system variables.
- To find the name of the path without spaces open cmd prompt and issue "dir /x" and this will show you the paths that windows uses without spaces. Usually they follow 6 character followed by ~1. SO on my windows 7 64 bit I installed java 8 jdk in program files x86 so my path was c:\progra~2\java\jdk1.8.0.0_25
Post a Comment