web stats
how to handle invalid XML Characters - Mirth Community

Go Back   Mirth Community > Mirth Connect > Support

Reply
 
Thread Tools Display Modes
  #1  
Old 03-25-2015, 02:20 AM
ihanbali ihanbali is offline
OBX.1 Kenobi
 
Join Date: Mar 2014
Posts: 27
ihanbali is on a distinguished road
Default how to handle invalid XML Characters

Hi
Mirth 3.1.2
I am trying to extract a PDF document from xml message and write it to PDF file.
I built the following Channel

(1) Source connector types:
Inbound XML
Outbound Raw
Channel reader
I added a transformer step to the source connector to map the xml element to a variable as follows:
pdfdoc = msg['Body']['Part'][1]['Content'].toString();
globalChannelMap.put('pdfdoc',pdfdoc)

(2) Destination connector types:
Inbound Raw
Outbound Raw
File writer
File Type: Binary

I used the above variable in the destination template as follows:
${pdfdoc}

The problem:
the pdf has special characters and I keep getting the following error at the source transformer step:
Transformer error
ERROR MESSAGE: Error evaluating transformer
com.mirth.connect.server.MirthJavascriptTransforme rException:
CHANNEL: XDSb Retrieve Document Filter DOC
CONNECTOR: sourceConnector
SCRIPT SOURCE:
SOURCE CODE:
257: }
258: if ('xml' === typeof msg && msg.hasSimpleContent()) { msg = msg.toXMLString(); }if ('xml' === typeof tmp && tmp.hasSimpleContent()) { tmp = tmp.toXMLString(); }
259: }
260: if (doFilter() == true) { doTransform(); return true; } else { return false; }
261: }
LINE NUMBER: 262
DETAILS: TypeError: Character reference "" is an invalid XML character.
at 264932c5-eb67-4a90-b549-071d8ac1ec55:241 (doScript)
at 264932c5-eb67-4a90-b549-071d8ac1ec55:262
at com.mirth.connect.server.transformers.JavaScriptFi lterTransformer$FilterTransformerTask.call(JavaScr iptFilterTransformer.java:153)
at com.mirth.connect.server.transformers.JavaScriptFi lterTransformer$FilterTransformerTask.call(JavaScr iptFilterTransformer.java:118)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker( Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (Unknown Source)
at java.lang.Thread.run(Unknown Source)



My trials:
(1) if I remove the special characters in the preprocessor scirpt, I end up with corrupted PDF.
(2) if i change file type in my destination to "Text", I will have a PDF file the opens as blank

P.S: I attached samples from the xml message and the PDF document that should be extracted from this message.
Any help is mostly appreciated
Attached Files
File Type: pdf forcare.pdf (60.9 KB, 20 views)
File Type: xml forcareMsg.xml (63.1 KB, 28 views)
Reply With Quote
  #2  
Old 03-25-2015, 07:43 AM
narupley's Avatar
narupley narupley is offline
Mirth Employee
 
Join Date: Oct 2010
Posts: 7,124
narupley is on a distinguished road
Default

That "XML" file you attached isn't XML at all. It's an HTTP multipart payload. How are you getting that? Are you receiving it with a TCP Listener or something? Consider using an HTTP Listener instead, and enabling XML Body and Parse Multipart. Then the actual message received by the channel will be a well-formatted XML document containing each part, and any binary content (like the PDF) will be encoded in Base64.
__________________
Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

Nicholas Rupley
Work: 949-237-6069
Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


- How do I foo?
- You just bar.
Reply With Quote
  #3  
Old 03-25-2015, 01:51 PM
ihanbali ihanbali is offline
OBX.1 Kenobi
 
Join Date: Mar 2014
Posts: 27
ihanbali is on a distinguished road
Default

Well ... I tried to be brief ... But what actually I did is the following ..I have a separate channel where I send MTOM request and receive response with embedded PDF.
I built up the MTOM body in the template, and made the below settings for my Http sender:
Multipart: No
Response Content: XML Body
Parse Multipart: Yes

I got the PDF as xml element inside the response, following an example:
HttpResponse>
<Body boundary="MIMEBoundaryurn_uuid_EB763C07A95A62816A1 426668158042" multipart="yes">
<Part>
<Headers>
<Content-Type>application/xop+xml; charset=UTF-8; type="application/soap+xml"</Content-Type>
<Content-Transfer-Encoding>binary</Content-Transfer-Encoding>
<Content-ID>&lt;0.urn:uuid:EB763C07A95A62816A1426668158044@ apache.org&gt;</Content-ID>
</Headers>
<Content multipart="no">&lt;?xml version='1.0' encoding='UTF-8'?&gt;&lt;soapenv:Envelope &gt;&lt;soapenv:Header&gt;&lt;wsa:Action&gt;urn :ih e:iti:2007:RetrieveDocumentSetResponse&lt;/wsa:Action&gt;&lt;wsa:RelatesTo&gt;bb1906f1-5fff-4ea5-979e-55c3a2f20489&lt;/wsa:RelatesTo&gt;&lt;/soapenv:Header&gt;&lt;soapenv:Body&gt;&lt;xdsb:Ret rieveDocumentSetResponse &gt;&lt;rs:RegistryResponse status="urnasis:names:tc:ebxml-regrep:ResponseStatusType:Success" /&gt;&lt;xdsbocumentResponse&gt;&lt;xdsb:Reposit o ryUniqueId&gt;1.19.6.24.109.42.1.5&lt;/xdsb:RepositoryUniqueId&gt;&lt;xdsbocumentUnique Id&gt;1.42.20141105202336.39&lt;/xdsbocumentUniqueId&gt;&lt;xdsb:mimeType&gt;text/plain&lt;/xdsb:mimeType&gt;&lt;xdsbocument&gt;&lt;xop:Incl ude href="cid:1.urn:uuid:EB763C07A95A62816A14266681580 45@apache.org" /&gt;&lt;/xdsbocument&gt;&lt;/xdsbocumentResponse&gt;&lt;/xdsb:RetrieveDocumentSetResponse&gt;&lt;/soapenv:Body&gt;&lt;/soapenv:Envelope&gt;</Content>
</Part>
<Part>
<Headers>
<Content-Type>text/plain</Content-Type>
<Content-Transfer-Encoding>binary</Content-Transfer-Encoding>
<Content-ID>&lt;1.urn:uuid:EB763C07A95A62816A1426668158045@ apache.org&gt;</Content-ID>
</Headers>
<Content multipart="no">........ PDF document is here ....

</Content>
</Part>
</Body>
</HttpResponse>

Now in my response channel I added the following transformer to extract the PDF into a string:
$gc('doc', msg['Body']['Part'][1]['Content'].toString());

I added a file writer destination set it into binary and put ${doc} in it's template.

I have the following error:
ERROR MESSAGE: Error evaluating transformer
com.mirth.connect.server.MirthJavascriptTransforme rException:
CHANNEL: XDSb Retrieve Document Filter
CONNECTOR: To Response 2
SCRIPT SOURCE:
SOURCE CODE:
LINE NUMBER: 262
DETAILS: TypeError: Character reference "" is an invalid XML character.

If I try to change the type of inbound/outbound connector to Raw I get:
246: $gc('doc', msg['Body']['Part'][1]['Content'].toString());
247: if ('xml' === typeof msg && msg.hasSimpleContent()) { msg = msg.toXMLString(); }if ('xml' === typeof tmp && tmp.hasSimpleContent()) { tmp = tmp.toXMLString(); }
248: }
249: if (doFilter() == true) { doTransform(); return true; } else { return false; }
250: }
LINE NUMBER: 246
DETAILS: TypeError: Cannot read property "Part" from undefined

if I try to change the connector to type to document writer I get:
Document Writer error
ERROR MESSAGE: Error writing document
org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 2; The content of elements must consist of well-formed character data or markup.

If I try to pre-process the message removing control characters such as "", I end up with "The content of elements must consist of well-formed character data or markup" error
Reply With Quote
  #4  
Old 03-25-2015, 01:55 PM
narupley's Avatar
narupley narupley is offline
Mirth Employee
 
Join Date: Oct 2010
Posts: 7,124
narupley is on a distinguished road
Default

The problem is that you're receiving the PDF document with a content type of text/plain. That is incorrect. You need to go to whoever manages that server and get them to send correct responses with correct content types. Also, what is your Response Binary MIME Types set to on the HTTP Sender? By default it should be Base64 encoding anything with a content type that begins with "application/".
__________________
Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

Nicholas Rupley
Work: 949-237-6069
Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


- How do I foo?
- You just bar.
Reply With Quote
  #5  
Old 03-26-2015, 07:18 AM
ihanbali ihanbali is offline
OBX.1 Kenobi
 
Join Date: Mar 2014
Posts: 27
ihanbali is on a distinguished road
Default

Thank you Nick ... The problem as you indicated was in the response binary MIME types ... It was blank.
Now I am able to extract both PDFs and text documents..
However, since the content type might be incorrect from the document source , I wonder what is the best way to detect document type (pdf, Cda, txt ..etc)
Reply With Quote
  #6  
Old 03-26-2015, 08:25 AM
narupley's Avatar
narupley narupley is offline
Mirth Employee
 
Join Date: Oct 2010
Posts: 7,124
narupley is on a distinguished road
Default

By default the Binary MIME Types field is set to "application/, image/, video/, audio/". That must have been changed at some point.

Since the server is sending you the incorrect content type, yeah there's no good way to know what type of document it is. That's the entire point of the content type header, so whoever manages that server must have really messed up.

You could force all content types to be Base64 encoded, by checking Regular Expression and using ".*". That way at least you shouldn't get any of those "invalid XML character" issues.
__________________
Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

Nicholas Rupley
Work: 949-237-6069
Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


- How do I foo?
- You just bar.
Reply With Quote
Reply

Tags
pdf

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 11:52 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Mirth Corporation