Swish-e

From SME Server
Revision as of 20:56, 11 May 2009 by Cactus (talk | contribs) (This is not a contrib it is a howto.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Description

http://www.swish-e.org

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Forum link

http://forums.contribs.org/index.php/topic,43486.0.html

Please add comment there so I can merge it here later!

Installation

Download rpm's from http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/

wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-debuginfo-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-devel-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm

Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell.

Howto enable dag's repository: http://wiki.contribs.org/Dag

yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*

There is no need to reboot. Test:

swish-e -h 

Setup Part 2

In order to have swish-e index .doc .xls and .pdf files we need:

yum install --enablerepo=dag perl-Spreadsheet-ParseExcel perl-MIME-Types xpdf catdoc

Test filter:

swish-filter-test 
swish-filter-test -man
swish-filter-test -headers /path/to/xlsfile.xls
swish-filter-test -headers /path/to/docfile.doc
swish-filter-test -headers /path/to/pdffile.pdf

Configuration

As I was not interested in indexing web pages, just files in ibays I used the following spider: /usr/libexec/swish-e/DirTree.pl

I modified it, so it would index .doc .xls .pdf files:

sub check_path {
   my $path = shift;
   return 1 if $path = /\.doc$/;  # return true if ends in .doc?
   return 1 if $path = /\.xls$/;  # return true if ends in .xls?
   return 1 if $path = /\.pdf$/;  # return true if ends in .pdf?
   return 0;  # otherwise return false
}

Next create a config file: ibay.cfg

# ibay.cfg, a shwish-e config file
# 
IndexDir /usr/libexec/swish-e/DirTree.pl
#
SwishProgParameters /home/e-smith/files/ibays/ibayname/files
#
StoreDescription HTML <body> 20000
#
# replace to make links to UNC
# works in IE, needs fix for Firefox
ReplaceRules remove /home/e-smith/files/ibays
ReplaceRules prepend //smeservername
# Next line will not work if you have dir's called "files"...
ReplaceRules replace /files/ /
# 

Next: run the swish. The index file will be placed in the current dir.

swish-e -c ibay.cfg -S prog -v 9

This should create both index.swish-e and index.swish-e.prop in the current dir.

swish.cgi

For PoC I have setup this basic configuration in /home/e-smith/files/ibays/Primary/cgi-bin

Copy (or symlink) swish.cgi. I prefer copy as I can modify the script without loosing the original.

cp /usr/libexec/swish-e/swish.cgi /home/e-smith/files/ibays/Primary/cgi-bin/

Create /home/e-smith/files/ibays/Primary/cgi-bin/.swishcgi.conf:

return {
   swish_index     => '/home/e-smith/files/ibays/Primary/cgi-bin/index.swish-e',
   title_property  => 'Just a Sample Title ',  # Not required, but recommended
#
#   Next line to make it clickable
#
   prepend_path    => 'file:////',
#
   link_property   => 'swishdocpath',
   title_property => 'swishtitle',
};

Options

Under construction

Usage

Search should now be available at http://smeservername/cgi-bin/swish.cgi