Searching
(from Nutch Website)
To search you need to put the nutch war file into your servlet container. (If instead of downloading a Nutch release you checked the sources out of SVN, then you'll first need to build the war file, with the command ant war.)
Assuming you've unpacked Tomcat as ~/local/tomcat, then the Nutch war file may be installed with the commands:
rm -rf ~/local/tomcat/webapps/ROOT*
cp nutch*.war ~/local/tomcat/webapps/ROOT.war
Ok, this is the first time that I have used Tomcat or any similar tool which explains the fumbling. Anyway, I finally realized that the above 2 lines are only necessary if you want to start the Nutch search as the default application. So, the only thing you need to do is to copy the .war file to the Tomcat's webapps folder.
The above simply copies the .war file to the webapps folder without renaming.
The webapp finds its indexes in ./crawl, relative to where you start Tomcat, so use a command like:
~/local/tomcat/bin/catalina.sh start
If you did not place your crawl contents in crawl folder,
you will need to define the search directory.
1) First just start the Tomcat.
The .war file that you just copied to the
/tomcat_dir/webapps will be automatically
expanded, as evident from the image below.
If you have named your .war file to
say, abc.war, then it will expand
to a abc folder.
2) Amend the nutch-site.xml file in your
tomcat_dir/webapps/exanded_dir/WEB-INF/classes
folder. So, for my case, I will locate
the file here:
This is the content of my nutch-site.xml file.
(note: this nutch-site.xml file is the one
located in the tomcat_dir and not the
one in the nutch_dir folder.)
So you just need to put in the path of your
crawl directory where the indexes and segments
are placed after the crawl.
In this case, my folder is called crawl.test3.
Then visit http://localhost:8080/
and have fun!
Note: If you are using other ports for Tomcat, please use
the corresponding port number.
For example, if port 8888 is used, the address
will be http://localhost:8888.
This will bring you to the ROOT
application. If you have not changed
anything in the original root folder, then
you will be at the Tomcat start page which looks
like this:
However, if you have changed the root folder
like this:
rm -rf ~/local/tomcat/webapps/ROOT*
cp nutch*.war ~/local/tomcat/webapps/ROOT.war
Then, the nutch search page will be the
root application.
For my example, my nutch search page
is at http://localhost:8080/nutch-0.9/
as shown in the below image.
Now, you can verify that your search works
by inputing the search queries. If there
are no hits when there should be, maybe
the search directory is not set correctly.
Or the problem may lie with the crawling part.
Have fun!
10 comments:
Sorry for trying the nutch app so late. I am ok with all the steps except the one that move the .war file. I did that, and start tomcat, supprisingly, the .war file does not expended. What shall I do now??
Hi,
May I ask what version of Tomcat you are using?
Also, try expanding the .war file manually using winRar.
Thanks
Thanks for the great nutch installation guide. Could you tell someting more about the use of Luke? I tried but i could not select the index... (see also the article of Tom White : http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html where they use nutch 0.7.1)
Thanks
This is a great article. I was able to set up nutch without any issues.
Good post.
To publish text with tags, replace your tags with their character entities. Replace < with & lt; (without space between & and lt;
Similarly replace > with & gt; (without space between & and gt;
See http://www.w3schools.com/HTML/html_entities.asp
thanks! your tutorial works great!
Hi,
I'm having problems with the spaces in the path; due to the space in the windows 'Program Files' folder.
here is my java home path (directly copied from the Cygwin terminal).
$ echo $JAVA_HOME
C:\Program Files\Java\jdk1.6.0_24;
So when I run it,
$ ./bin/nutch crawl
./bin/nutch: line 158: C:\Program Files\Java\jdk1.6.0_24;/bin/java: No such file or directory
./bin/nutch: line 268: exec: C:\Program: not found
How can I solve this issue? Didn't you come up with this?
Thanks!
For those who are having trouble due to the space in the 'Program Files', here is the solution!
create a new environment variable $NUTCH_JAVA_HOME
Set its value as below (see no trailing ';' and the folder name in the DOS way).
C:\PROGRA~1\Java\jdk1.6.0_24
precisely at echo,
$ echo $NUTCH_JAVA_HOME
C:\PROGRA~1\Java\jdk1.6.0_24
VOILA!
Thanks for sharing such useful Picture window installation Tips with us i really need that kind of Informations for my business please provide some Informations regarding window glass replacement Massachusetts, window repair Massachusetts.
Post a Comment