web stats
Java issues in transformer trying to convert base64 pdf to text - Mirth Community

Go Back   Mirth Community > Mirth Connect > Support

Reply
 
Thread Tools Display Modes
  #1  
Old 03-01-2017, 10:58 AM
agradinc agradinc is offline
Mirth Newb
 
Join Date: Dec 2015
Posts: 14
agradinc is on a distinguished road
Default Java issues in transformer trying to convert base64 pdf to text

I'm having strange issues in my transformer, however, it works perfectly fine at a Rhino shell. I'm using Apache PDFBox 2.04 (standalone version) to try to convert a PDF to text. Mirth version is 3.4.1.8057.

This code works in a Rhino shell:
Code:
importPackage(com.mirth.connect.server.userutil);

// in mirth I construct base64pdf by combining multiple OBX.5 segments. In Rhino I read a sample from a file.
var base64pdf = "base64 encoded pdf";
var bytearraypdf = FileUtil.decode(base64pdf);
var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(bytearraypdf);
var stripper = new org.apache.pdfbox.text.PDFTextStripper();
var pdftext = stripper.getText(pdf);
The first issue when I run this in a transformer is on the var pdf line:
Code:
Wrapped java.io.FileNotFoundException: [B@59ad8d42 (The system cannot find the file specified)
For some reason it is not calling the correct PDDocument.load method that takes a byte[] and is instead trying to open bytearraypdf.toString() as a file.

Not sure why I needed to do this, but I was able to work around the issue by changing the line to
Code:
var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(new java.io.ByteArrayInputStream(bytearraypdf));
This still ran in the Rhino shell without any issues. When trying to run it in Mirth, I now get:
Code:
Transformer error
ERROR MESSAGE: Error evaluating transformer
java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPages()Lorg/apache/pdfbox/pdmodel/PDPageTree;
	at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
	at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:126)
	at org.mozilla.javascript.NativeJavaMethod.call(NativeJavaMethod.java:225)
	at org.mozilla.javascript.Interpreter.interpretLoop(Interpreter.java:1479)
	at org.mozilla.javascript.Interpreter.interpret(Interpreter.java:815)
	at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
	at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:393)
	at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3280)
	at org.mozilla.javascript.InterpretedFunction.exec(InterpretedFunction.java:120)
	at com.mirth.connect.server.util.javascript.JavaScriptTask.executeScript(JavaScriptTask.java:142)
	at com.mirth.connect.server.transformers.JavaScriptFilterTransformer$FilterTransformerTask.doCall(JavaScriptFilterTransformer.java:143)
	at com.mirth.connect.server.transformers.JavaScriptFilterTransformer$FilterTransformerTask.doCall(JavaScriptFilterTransformer.java:119)
	at com.mirth.connect.server.util.javascript.JavaScriptTask.call(JavaScriptTask.java:113)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
I put the PDFBox jar file in it's own subdirectory (pdfbox-lib) in the Mirth Program Files folder (Windows 2008 R2.) I'm including that directory in the classpath when I start the Rhino shell. In Mirth I added the directory as a new Resource. I then assigned that Resource to the connector from which I'm trying to use it.
Reply With Quote
  #2  
Old 03-02-2017, 08:37 AM
agradinc agradinc is offline
Mirth Newb
 
Join Date: Dec 2015
Posts: 14
agradinc is on a distinguished road
Default

So after much digging, I discovered an older version of PDFBox sitting in C:\Program Files\Mirth Connect\extensions\doc\lib\pdfbox-1.8.4.jar. I was not including this in the classpath when running in Rhino.

That would explain the issues finding the correct classes/methods as there were apparently significant changes between the 1.8.x and 2.0.x branches.

I removed the resource for the 2.0 branch that I had added. I updated my transformer code to be:
Code:
var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(new java.io.ByteArrayInputStream(FileUtil.decode(base64pdf)));
var stripper = new org.apache.pdfbox.util.PDFTextStripper();
var pdftext = stripper.getText(pdf);
Then I got this error:
Code:
Transformer error
ERROR MESSAGE: Error evaluating transformer
java.lang.NoClassDefFoundError: org/apache/fontbox/afm/AFMParser
	at org.apache.pdfbox.pdmodel.font.PDFont.addAdobeFontMetric(PDFont.java:144)
	at ...
So I downloaded fontbox-1.8.4.jar and stuck it in custom-lib. Then I got this error:

Code:
Transformer error
ERROR MESSAGE: Error evaluating transformer
java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.font.PDTrueTypeFont
	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:100)
	at ...
I'm not sure what I'm missing at the moment, and this is all very frustrating considering I was able to easily make it work in Rhino. I'd prefer to use the newest release (2.0.4 at this time.) Is there any way to do this without breaking Mirth?
Reply With Quote
  #3  
Old 03-02-2017, 09:35 AM
agradinc agradinc is offline
Mirth Newb
 
Join Date: Dec 2015
Posts: 14
agradinc is on a distinguished road
Default

Ok, I just found this thread. http://www.mirthcorp.com/community/forums/showthread.php?t=13960

I put pdfbox-app-1.8.13.jar in the extensions/doc/lib folder and edited extensions/doc/destination.xml to refer to it instead of pdfbox-1.8.4. I removed everything from custom-lib and the extra resources I had added.

My sample code from my second post works now.

The thread I linked asked the same question about using the 2.0 branch with Mirth, and it was not answered there either. Is this possible?
Reply With Quote
Reply

Tags
base64, java, pdf, rhino

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 06:02 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Mirth Corporation