Difference between revisions of "Swish-e"

From SME Server
Jump to navigationJump to search
m (This is not a contrib it is a howto.)
 
(13 intermediate revisions by one other user not shown)
Line 7: Line 7:
 
===Forum link===
 
===Forum link===
 
http://forums.contribs.org/index.php/topic,43486.0.html
 
http://forums.contribs.org/index.php/topic,43486.0.html
 +
 +
Please add comment there so I can merge it here later!
  
 
===Installation===
 
===Installation===
Line 17: Line 19:
 
  wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm
 
  wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm
  
Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell:
+
Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell.
 +
 
 +
Howto enable dag's repository: http://wiki.contribs.org/Dag
  
 
  yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*
 
  yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*
  
 
There is no need to reboot.
 
There is no need to reboot.
 +
Test:
 +
 +
swish-e -h
  
====Setup====
+
====Setup Part 2====
 
In order to have swish-e index .doc .xls and .pdf files we need:
 
In order to have swish-e index .doc .xls and .pdf files we need:
  
  yum install perl-Spreadsheet-ParseExcel --enablerepo=dag
+
  yum install --enablerepo=dag perl-Spreadsheet-ParseExcel perl-MIME-Types xpdf catdoc
yum install perl-MIME-Types --enablerepo=dag
 
yum install xpdf
 
  
 
Test filter:
 
Test filter:
Line 53: Line 58:
  
 
Next create a config file: ibay.cfg
 
Next create a config file: ibay.cfg
 
+
# ibay.cfg, a shwish-e config file
 +
#
 
  IndexDir /usr/libexec/swish-e/DirTree.pl
 
  IndexDir /usr/libexec/swish-e/DirTree.pl
 
+
#
 
  SwishProgParameters /home/e-smith/files/ibays/ibayname/files
 
  SwishProgParameters /home/e-smith/files/ibays/ibayname/files
 
+
#
 
  StoreDescription HTML <body> 20000
 
  StoreDescription HTML <body> 20000
 
+
#
 
  # replace to make links to UNC
 
  # replace to make links to UNC
 
  # works in IE, needs fix for Firefox
 
  # works in IE, needs fix for Firefox
 
  ReplaceRules remove /home/e-smith/files/ibays
 
  ReplaceRules remove /home/e-smith/files/ibays
 
  ReplaceRules prepend //smeservername
 
  ReplaceRules prepend //smeservername
 +
# Next line will not work if you have dir's called "files"...
 
  ReplaceRules replace /files/ /
 
  ReplaceRules replace /files/ /
 +
#
  
 
Next: run the swish. The index file will be placed in the current dir.  
 
Next: run the swish. The index file will be placed in the current dir.  
Line 70: Line 78:
 
  swish-e -c ibay.cfg -S prog -v 9
 
  swish-e -c ibay.cfg -S prog -v 9
  
This should create both index.swish-e and index.swish-e.prop in the current dir.  
+
This should create both index.swish-e and index.swish-e.prop in the current dir.
  
Under construction
+
=== swish.cgi ===
 +
 
 +
For PoC I have setup this basic configuration in /home/e-smith/files/ibays/Primary/cgi-bin
 +
 
 +
Copy (or symlink) swish.cgi. I prefer copy as I can modify the script without loosing the original.
 +
 
 +
cp /usr/libexec/swish-e/swish.cgi /home/e-smith/files/ibays/Primary/cgi-bin/
  
=== swish.cgi ===
+
Create /home/e-smith/files/ibays/Primary/cgi-bin/.swishcgi.conf:
  
Under construction
+
return {
 +
    swish_index    => '/home/e-smith/files/ibays/Primary/cgi-bin/index.swish-e',
 +
    title_property  => 'Just a Sample Title ',  # Not required, but recommended
 +
#
 +
#  Next line to make it clickable
 +
#
 +
    prepend_path    => 'file:////',
 +
#
 +
    link_property  => 'swishdocpath',
 +
    title_property => 'swishtitle',
 +
};
  
 
=== Options ===
 
=== Options ===
Line 84: Line 108:
 
=== Usage ===
 
=== Usage ===
  
Under construction
+
Search should now be available at http://smeservername/cgi-bin/swish.cgi
  
[[Category: Contrib]]
+
[[Category: Howto]]

Latest revision as of 19:56, 11 May 2009


Description

http://www.swish-e.org

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Forum link

http://forums.contribs.org/index.php/topic,43486.0.html

Please add comment there so I can merge it here later!

Installation

Download rpm's from http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/

wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-debuginfo-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-devel-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm

Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell.

Howto enable dag's repository: http://wiki.contribs.org/Dag

yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*

There is no need to reboot. Test:

swish-e -h 

Setup Part 2

In order to have swish-e index .doc .xls and .pdf files we need:

yum install --enablerepo=dag perl-Spreadsheet-ParseExcel perl-MIME-Types xpdf catdoc

Test filter:

swish-filter-test 
swish-filter-test -man
swish-filter-test -headers /path/to/xlsfile.xls
swish-filter-test -headers /path/to/docfile.doc
swish-filter-test -headers /path/to/pdffile.pdf

Configuration

As I was not interested in indexing web pages, just files in ibays I used the following spider: /usr/libexec/swish-e/DirTree.pl

I modified it, so it would index .doc .xls .pdf files:

sub check_path {
   my $path = shift;
   return 1 if $path = /\.doc$/;  # return true if ends in .doc?
   return 1 if $path = /\.xls$/;  # return true if ends in .xls?
   return 1 if $path = /\.pdf$/;  # return true if ends in .pdf?
   return 0;  # otherwise return false
}

Next create a config file: ibay.cfg

# ibay.cfg, a shwish-e config file
# 
IndexDir /usr/libexec/swish-e/DirTree.pl
#
SwishProgParameters /home/e-smith/files/ibays/ibayname/files
#
StoreDescription HTML <body> 20000
#
# replace to make links to UNC
# works in IE, needs fix for Firefox
ReplaceRules remove /home/e-smith/files/ibays
ReplaceRules prepend //smeservername
# Next line will not work if you have dir's called "files"...
ReplaceRules replace /files/ /
# 

Next: run the swish. The index file will be placed in the current dir.

swish-e -c ibay.cfg -S prog -v 9

This should create both index.swish-e and index.swish-e.prop in the current dir.

swish.cgi

For PoC I have setup this basic configuration in /home/e-smith/files/ibays/Primary/cgi-bin

Copy (or symlink) swish.cgi. I prefer copy as I can modify the script without loosing the original.

cp /usr/libexec/swish-e/swish.cgi /home/e-smith/files/ibays/Primary/cgi-bin/

Create /home/e-smith/files/ibays/Primary/cgi-bin/.swishcgi.conf:

return {
   swish_index     => '/home/e-smith/files/ibays/Primary/cgi-bin/index.swish-e',
   title_property  => 'Just a Sample Title ',  # Not required, but recommended
#
#   Next line to make it clickable
#
   prepend_path    => 'file:////',
#
   link_property   => 'swishdocpath',
   title_property => 'swishtitle',
};

Options

Under construction

Usage

Search should now be available at http://smeservername/cgi-bin/swish.cgi