Opus   Opus 2.30 Publisher's Manual
   Product Overview
   Installation and Set Up
   Customising a Paper 
    Adding an RSS Feed
    Auxiliary Fields
    Blogging
    Creating Simple Forms
    Defining utags
    Extra Configuration Options
    Upload Document Types
    Using Sections
   Templates
   Using Datacards
   Using Objects
   Author Maintenance
   Activity Logging
   Technical Issues
   Appendices

Note that text shown in this style documents a feature which isn't in the current release but will be in the next release and text shown thus indicates a feature which is being removed in the next release.

If you find anything in this documentation which is wrong or unclear then please use the link at the bottom on the page to comment and we will update the page to correct it or make it clearer.

Search:


Upload Document Types

When uploading a document Opus checks what sort of document it is by examining its file extension. It will only accept certain document types. The default set is MS/Word, Adobe PDF and Postscript. Opus uses helper applications to extract the raw text from these documents. It tries to use antiword for MS/Word but if it can't find that it uses strings. For PDF and Postscript it uses pdftotext, part of the Xpdf package or, failing that, ps2ascii, part of the ghostscript package.

If you are using shared hosting and your ISP doesn't provide these binaries you can put them in ./php/ext and Opus will pick them up from there in preference to any other binary. Note that you need to be careful when doing this and used statically linked binaries to avoid problems when the ISP changes their shared libraries. For example a statically linked version of pdftotext can be found here.

You can override the list of binaries used for the different document types by creating a file ./php/cfg/document_types and listing the valid file types in there, one per line. The fields are separated by commas, the first being the file extension in upper case, the second a description of the file format, and the remaining fields being a list of zero to many helper applications Opus can use to extract the raw text. Opus will attempt to use the first listed and then fall back through any remaining entries if it can't first the first listed.

Here's an example which emulates the default behaviour:

DOC,MS/Word,antiword,strings
PDF,Adobe PDF,pdftotext,ps2ascii
PS,Postscript,pdftotext,ps2ascii

Comment on this page

Copyright
Privacy Policy