You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Since Blueriq 12.10, the runtime uses Apache Tika to determine whether the file extension matches the actual content of a file. This way, if a user would rename virus.exe to document.pdf, the runtime will detect that the uploaded file is not a PDF, and it will not continue with the process.

This feature is enabled by default. If needed, it can be disabled by setting properties:

blueriq.fileupload.detect-content-type=false

blueriq.fileupload.validate-content-type=false

This will speed up the upload process, but introduces a security risk, so only use upon careful consideration.

Best effort

Apache Tika does content detection on an best effort basis, so there may be false positives or false negatives, 

Office Documents

By default, Apache Tika will detect the new Office documents (docx, xslx, pptx, etc.) as "office document". This means that you can upload a Word document with an xslx extension, and the Runtime would accept that.

It is possible to add an additional Tika module to your custom Runtime, so that the contents of the Office documents are inspected more thoroughly. Note that the Runtime will need extra time to inspect the Office documents, but the inspection is more complete. This is a consideration that you should make for your solution.

To add Office document detection, you can add this dependency to your dependencyManagement section in your parent pom file:

pom.xml
<dependencyManagement>
  <dependencies>
    ...
	<dependency>
		<groupId>org.apache.tika</groupId>
		<artifactId>tika-parser-microsoft-module</artifactId>
		<version>${apache.tika.version}</version>
	</dependency>
    ....
  <dependencies>
</dependencyManagement>

And refer to the dependency in the dependencies section:

pom.xml
<dependencies>
  ...
  <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parser-microsoft-module</artifactId>
	<scope>runtime</scope>
  </dependency>
  ...
<dependencies> 

Note that the core module of Apache Tika is already included in the Runtime. Depending on the way you set the project up, you may have to redefine the apache.tika.version property to the same value as the version that is shipped with the runtime.

  • No labels