Swish-e

From SME Server
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Description

http://www.swish-e.org

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Forum link

http://forums.contribs.org/index.php/topic,43486.0.html

Please add comment there so I can merge it here later!

Installation

Download rpm's from http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/

wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-debuginfo-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-devel-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-2.4.5-4.i386.rpm
wget http://rpmbuild.joshr.com/swish-e-release/2.4.5-4/centos-4-i386/swish-e-perl-api-2.4.5-4.i386.rpm

Install with dependencies from the SME Contribs repository by issuing the following command on the SME Server shell.

Howto enable dag's repository: http://wiki.contribs.org/Dag

yum --enablerepo=dag localinstall swish-e-2.4.5-4.i386.rpm swish-e-d* swish-e-p*

There is no need to reboot. Test:

swish-e -h 

Setup Part 2

In order to have swish-e index .doc .xls and .pdf files we need:

yum install --enablerepo=dag perl-Spreadsheet-ParseExcel perl-MIME-Types xpdf catdoc

Test filter:

swish-filter-test 
swish-filter-test -man
swish-filter-test -headers /path/to/xlsfile.xls
swish-filter-test -headers /path/to/docfile.doc
swish-filter-test -headers /path/to/pdffile.pdf

Configuration

As I was not interested in indexing web pages, just files in ibays I used the following spider: /usr/libexec/swish-e/DirTree.pl

I modified it, so it would index .doc .xls .pdf files:

sub check_path {
   my $path = shift;
   return 1 if $path = /\.doc$/;  # return true if ends in .doc?
   return 1 if $path = /\.xls$/;  # return true if ends in .xls?
   return 1 if $path = /\.pdf$/;  # return true if ends in .pdf?
   return 0;  # otherwise return false
}

Next create a config file: ibay.cfg

# ibay.cfg, a shwish-e config file
# 
IndexDir /usr/libexec/swish-e/DirTree.pl
#
SwishProgParameters /home/e-smith/files/ibays/ibayname/files
#
StoreDescription HTML <body> 20000
#
# replace to make links to UNC
# works in IE, needs fix for Firefox
ReplaceRules remove /home/e-smith/files/ibays
ReplaceRules prepend //smeservername
# Next line will not work if you have dir's called "files"...
ReplaceRules replace /files/ /
# 

Next: run the swish. The index file will be placed in the current dir.

swish-e -c ibay.cfg -S prog -v 9

This should create both index.swish-e and index.swish-e.prop in the current dir.

swish.cgi

For PoC I have setup this basic configuration in /home/e-smith/files/ibays/Primary/cgi-bin

Copy (or symlink) swish.cgi. I prefer copy as I can modify the script without loosing the original.

cp /usr/libexec/swish-e/swish.cgi /home/e-smith/files/ibays/Primary/cgi-bin/

Create /home/e-smith/files/ibays/Primary/cgi-bin/.swishcgi.conf:

return {
   swish_index     => '/home/e-smith/files/ibays/Primary/cgi-bin/index.swish-e',
   title_property  => 'Just a Sample Title ',  # Not required, but recommended
#
#   Next line to make it clickable
#
   prepend_path    => 'file:////',
#
   link_property   => 'swishdocpath',
   title_property => 'swishtitle',
};

Options

Under construction

Usage

Search should now be available at http://smeservername/cgi-bin/swish.cgi