Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
CarlosRoggan
Product and Topic Expert
Product and Topic Expert
0 Kudos

SAP Cloud Integration offers iFlow steps for signing and verifying XML content according to the "XML Signature" standard. This standard provides some benefits and flexibility specifically for xml content.
The present article is intended to introduce into the "XML Signature" standard, as preparation for hands-on tutorial in next blog post.
I'm trying to explain everything simple, with my simple understanding and my simple words - this is not a professional article.
In this blog post, I will try to answer many questions and show examples.
The following concepts are explained:
Hash/DigestDigital Signature , XML Signature.
The next blog post shows how we can sign/ verify XML payloads, according to the XML Signature spec, manually in a Groovy script.

Overview

  1. History
  2. Basics: Digest, Digital Signature
  3. Specifics: XML Signature
  4. Canonicalization
  5. General Info

1. History

How I imagine that everything started:
Timmy from Texas wanted to share some secret info with his friend Taku in Tokyo.
So he encrypted a message and sent it to Taku.
Taku was unable to decrypt and read the message.
So Timmy travelled to Tokyo to enjoy some food and to explain the way how he encrypts and packages his messages.
Afterwards, Taku in Tokyo was able to decrypt and read all messages (even before breakfast).
Some time later, same situation happened with his friend Toto in Togo.
Although the food is said to be great, Timmy decided not to travel, but to invite his friends for a conference at home.
They had international food, late-night discussions and at the end, they agreed on a common way of sending secure messages.
As a consequence, everybody in the world can send secure messages and the recipients can understand the message, as long as they follow that agreement.

Does that make sense?
Really makes sense, especially the section about the international food (which didn’t make it into the specification).

What do we learn from this story?
People communicating with each other need to agree on some basic principles:
- how encryption is done, which steps in which order
- what exactly is encrypted
- which algorithms are used
- certificate information 
- where is that information stored

This intro was copied from my cms-post.     

2. Introduction to Digital Signatures

We start from the very beginning to explain the concepts of signing.
Experienced readers can skip this section.
Can we skip the blog post?
Experienced readers can skip the whole blog post.

Example:
You want to buy something, e.g. a cat 🐈
You take a piece of paper, write a contract which covers the product and the price, go to post office and send it to the dealer.
The dealer calls you and says that your request is not valid.
Ohhh- What has happened?
The dealer doesn’t trust a contract that is not signed.

OK.
Try again: this time you sign the contract with your signature. ✍️
You receive a package…
Exciting…
… but it contains a hungry crocodile instead of a fluffy cat, plus the price is much higher.
OMG - What has happened?  🐊
Somebody modified the contract replacing animal and price.

OK.
Try again: this time you sign the contract, put it in an envelope which you close with a seal. ✉️
Afterwards, you finally receive your fluffy cat (a bit fat, though). 😺
Cool 👍What has happened?
The dealer trusts the signature and moreover, the contract couldn’t be altered, as it was secured.

What do we learn?
We need two mechanisms to ensure:
   🔹 The content is not modified ➡️ “integrity”
   🔹 The content is original ➡️ “authenticity”.

Nice. Ehm - why do we have to learn that?
Now let’s transfer this learning to the digital world.

How do we ensure integrity in a digital world?
Let's see a common example where integrity is required:
Downloading software from a web page.
Usually, in addition to the zip file, a checksum is published in the website.
This allows us to verify that the zip is not modified.
So we’re sure that there’s nothing malicious in it (allowing a hacker to e.g. steal our private cat photos)

What is a checksum?
In general, it is the same as: “hash” or “hash value” or “hash code” or “digest” or “fingerprint”.
Ehhmmm - what?😕
It is a code, a silly combination of characters and numbers.
Can we have an example?
This is a hash code:
f7a5f85f2b80792a7b4650f009b130dd1b955d855c99ef64d7b98e5f103f3709
And this is a digest:
f7a5f85f2b80792a7b4650f009b130dd1b955d855c99ef64d7b98e5f103f3709
It is the same... 🤔
Exactly.

How does it work?
To produce a hash code, we need to use a hash function, or better a cryptographic hash function (CHF).
What’s the difference?
Generally speaking, CHF is more secure.
Differences are fine-granular and security related.

What is a hash function?
Based on an algorithm, the hash function creates a hash code from any input data (e.g. text, image, etc).
Important properties:
- The input can be of any size, where the digest will always have a fix size.
- It is not possible to guess the original text from the digest (one-way).
- Any small change in the input file will produce different digest (hash collision).
- As such, the digest (hash) proves that the original input is not changed (-> integrity).
BTW, no key or secret or password is required here

Any examples for hash functions?
SHA-256 (and more), MD5 (and more), RIPEMD, BLAKE (etc), GOST

What are the differences?
Some are more safe than others

What does SHA-256 mean?
It stands for "Secure Hash Algorithm" and the hash value has a size of 256 bits.
See below for more info.

Now a diagram would be helpful...
OK OK

diagram_hashFunction.jpg

 How is a cat function used?
You mean hash function.
We have an important document that should be signed.
We create a hash value with a (cryptographic) hash function (e.g. SHA-256).
We send both to the receiver.
The receiver views the document and wonders if it might have been altered.
He creates his own hash value with same hash algorithm (SHA-256).
He compares his own hash value with our value which we had sent to him.
As we know, even the slightest change in the document results in a different digest.
If both hashes are equal, he can be sure that the content was not altered.

Can we look at a dia…
OK OK

diagram_hashFunction2.jpg

 

Nice. Can we try it out?
Yes. See next blog post

So the Signer in CPI is a hash function?
No.
What is it? A fat cat?
No, we have to go one more preparation step further.
Simply using hash is not secure enough.
Super, I’ve wasted my time ;-(
Wait.
A malicious hacker who intercepts the e.g. eMail can alter the document, create his own new hash and forward the eMail. Nobody would notice that the document was changed.
So the weak point is: we don’t have authenticity.
To overcome this weakness:
Use digital signature.

What is a digital signature?
In brief: create a hash value and then encrypt it.
In long?
Similar as before, we create a hash value to ensure integrity.
Now we want to protect the hash.
To avoid hacker attacks, we encrypt the hash with a private key.
(As prerequisite, we need a key pair.)
Then send the document along with  the encrypted hash to the receiver.
(The receiver needs our public key. The public key is public, so there’s no problem with it.)
The receiver can decrypt the encrypted hash with the public key.
Then he can proceed as explained before:
create his own hash and compare..
Why is this more secure?
It is impossible for a hacker to decrypt the hash, to alter the doc and create new hash.
If the hacker alters the doc, creates a new hash and encrypts it himself (with his own private key), then the receiver won’t be able to decrypt with public key.
This would mean that the doc was hacked.

Oh, that sounds complex….
OKOK, here comes the visualization:

 

diagram_hashFunction3.jpg

 What about the keys?

In asymmetric encryption, we always talk about key pairs.
Private and public key are always generated together, they mathematically belong together.
The public key is public and can be published in the internet.
It is not possible to guess the private key from it
To generate a key pair, we use tools, like openSSL, or CPI, etc.
What about the practical example?
<Sigh>. See next blog post

So the Signer in CPI is a digital signature?
Almost.
Why?
We have to go one last education step further:
The signer in CPI is embedded into the XML Signature standard.
Now comes the next question,,,

 

What is XML Signature Standard??
It is not only a normal signature, it is more than that.
We’re signing a message and sending it out to a receiver.
To enable the receiver to decrypt/verify it, we need to add additional information/metadata to the message (e.g. algorithm info).
All must be nicely structured, as the receiver needs to know where to find everything he needs to decrypt and verify.

Is there anything else?
Yes...
OMG - don't want to know it...
The difference between XML Signature Standard and CMS (PKCS7) Standard:
The CMS Standard can be applied to sign any content, including xml, no problem.
However, the XML content itself is structured – so why not use the structure to add the signature-metadata-structure to it?
Furthermore, there’s a big advantage in having structured XML content:
Instead of always signing the content as a whole, we can choose to sign only an XML subtree.

Why is this an advantage?
The message content might contain a section containing info that does change, but doesn’t affect the integrity.
For example: a timestamp
We don’t want a changed timestamp to cause the signature-verification to fail.

Is there a disadvantage?
Yes
I don’t want to know it (I shouldn't have asked).
Sorry: we have to go through it.
Later?😸
Agreed, we talk later..

Most of this intro was copied from my CMS Signer blog post

3. XML Signature

Up to now, we know that
🔹we have an xml content
🔹we create hash and encryption
🔹we have to store the hash and the metadata somehow in the xml.

It is time to look at the "XML Signature" standard itself.
The most important info that is defined by the standard:

  1. What is signed?
  2. How it is signed?
  3. Where is it stored?

In addition:

  1. the tedious disadvantage of xml

So let’s get to know it.
Our examples are based on this simple sample xml service payload:

xml_sample.jpg

3.1. Signing Modes

 

As mentioned, xml allows for flexibility:
A signature can be enveloped or enveloping.

Sounds confusing. What does it mean?
Personally, I translate it as follows:

  1. Enveloping = embracing = parent
    Parent embraces a child.
    If a signature is enveloping, this means:
    The signature is a parent node of the whole content.
    With other words: our message is a child node of a new signature-root node.

intro_enveloping.jpg

  1. Enveloped = embraced = child
    The signature is a child node of the message content.
    The xml content is enriched with an additional <Signature>-section:
    intro_enveloped.jpg
    And finally, there’s an additional variant:
  2. Detached
    In this case, the signature is a standalone xml tree which lives somewhere next to the xml content.
    Like so:
    intro_detached.jpg

Understood?
ehhmm....
OK, let's repeat with more detailed pictures
Can we play a game?
Sigh 🙄
What kind of signature mode is applied below?

intro_enveloped2.jpg

Umm - enveloped?
Great.
And here?

intro_enveloping2.jpg

Umm - enveloping?
Perfect.
Is there a price?
What do you want?
Ummm - a croco...?

3.2. Structure

The examples above have been very simplified, let’s have a closer look now.
When calculating the XML Signature of an XML content, then we get a result which itself is an XML tree.
The root element of this tree is a <Signature>.
Roughly speaking, this tree has the following 3 main sections:

 

 

 

 

 

 

 

 

 

 

<Signature>
    <SignedInfo>
    <SignatureValue>
    <KeyInfo>

 

 

 

 

 

 

 

 

 

 

Explanation of <Signature> section:

<🔸><SignedInfo>
This top-element contains the information and metadata that are considered during signing.
Personally, I don’t like the name, it is somehow confusing.
I would prefer to call it “Info about Signing”

<🔸><SignatureValue>
The overall result of the signing process is contained here.
Note that the result has to be encoded with Base64, before it can be inserted in an XML payload.

<🔸><KeyInfo>
This element can contain the public key, which is required to verify the signature.
Instead of the public key itself, the certificate (or parts of it) can be included.
This element is optional. The required public key can be already known to the receiver.

More details:

intro_sigSubtree.jpg

Explanation of <SignedInfo> section:

<🔸><CanonicalizationMethod>
This element contains the info about which canonicalization method is applied to the content, before signing.
See chapter below for more info.
Example:
http://www.w3.org/2006/12/xml-c14n11 

<🔸><SignatureMethod>
The algorithms used to digest and encrypt, during signature creation.
For instance, use a SHA algorithm with key length 256 to create the hash value, then use an RSA based private key to encrypt the hash.
Example value:
http://www.w3.org/2001/04/xmldsig-more#rsa-sha256 

<🔸><Reference>
We have to store the information about what content is signed.
As we know, it can be the whole content or just any xml-element somewhere in the tree, or even outside.
As such, it is necessary to point to the xml-element that is signed.
That’s done via the <Reference> tag which contains an URI attribute.
So the <Reference> stands for the content to be signed.
There can be multiple separate xml-sections specified by multiple <Reference> nodes.
Furthermore, the <Reference> contains the information, how the content is treated.
Such info is contained as <Transform> element

<🔸><Transforms>
There can be multiple transformations that are applied to the selected content, before signing.
One prominent example is the canonicalization.

<🔸><DigestMethod>
This element is a child of the <Reference>.
As such it is relevant only for the content which is specified by the <Reference>.
Here we can see which hash algorithm is applied to the content.
Example:
http://www.w3.org/2001/04/xmlenc#sha256 

Note that his URI points to the xmlenc ("XML Encryption") standard which defines the usage of the SHA-256 algorithm. Same is valid here.

Note that in XML Signature, there are 2 separate hashing steps, that’s why we have 2 elements containing info about hash-algorithm.
See below chapter for more explanation.

<🔸><DigestValue>
Again, this element is valid for the specific <Reference> only.
The hash of the referenced content is calculated with algorithm mentioned above (SHA-256).
The result is stored here.

<🔸><...>
There are a few more elements, but let’s ignore them here.

Example for <Signature> structure:

intro_sigSubtree_example.jpg

 Example with values:

intro_sigSubtree_example2.jpg

3.3. Process

To better understand what is happening, we should have a look at what is actually done, when creating an xml signature - and when verifying it.

3.3.1. Process of Signing

We learned above:
A digital signature means to create a hash and encrypt it.
Um?
No nooooooo...
Unfortunately, the XML Signature Standard is not so simple.
We’re not just creating a signature of some content.
To make it secure, even the metadata and the hash need to be signed.
Sigh ;-(
OK. Let's...
Yes, sign makes me sigh…..
OK.
And makes me sick….
Let’s get a rough overview of the overall process.

1. The content:
These steps are done with the content, .e.g the message payload.

1.1. The content that should be signed is identified.
Example:

process1_content.jpg

1.2. This content is canonicalized.
1.3. The digest of the content is calculated.
1.4. The digest is base64-encoded.

2. The <SignedInfo>
These steps are done with the <SignedInfo> element.

2.1. The <SignedInfo> element is constructed.
2.2. The calculated digest (see 1.3.) is inserted in the subtree.
2.3. The <SignedInfo> element is canonicalized.
2.4. The signature of <SignedInfo> element is created.
         2.4.1. The hash of <SignedInfo> is calculated
         2.4.2. The hash is encrypted with private RSA key
2.5. The signature is base64-encoded

3. The whole Document
Finally, the overall message payload is affected.

3.1. The final result, the whole <Signature> element is constructed.
3.2. The signature (see 2.4.) is inserted
3.3. The super-final result, the documents as a whole, is enriched with the <Signature> element.

Summary
The content is digested.
The <SignedInfo> is digested and signed.
In addition: beforehand the canonicalization – and afterwards the base64-encoding has to be applied.

The simplified result of an enveloped signature: 

process1_result.jpg

 

3.3.2. Process of Verification

Let’s briefly repeat the process, this time from the verification perspective.

Ehmm - what does verification mean?
Repeat:
🔷Verification of a digest:
  -> compute a new hash
      -> then compare it to the original hash.

🔷Verification of digital signature:
Decrypt
  -> Decrypt the signature with public key
     -> compute a new hash
         -> then compare it to the original hash.

🔷Verification of an XML Signature:
See below. 

1. Verify the content-digest

Here, we're talking about the hash that was calculated of the original message payload (or part).

1.1.  Identify the content (info found in <Reference>).
1.2. Apply the canonicalization and other transforms (info found in <Reference>).
1.3. Identify the algorithm for hashing (info found in <Reference>/<DigestMethod>).
1.4. Calculate new digest.
1.5. Identify the old digest from <Reference>/<DigestValue>.
1.6. Compare both.

process2_digest.jpg

 

2. Verify the signature

Here we're talking about the <SignedInfo> element, which was signed.
To verify a digital signature, a public key is required, which is either known or contained in the xml.

2.1. Identify the <SignedInfo> element.
2.2. Identify the method for canonicalization (info at <SignedInfo>/<CanonicalizationMethod>)
2.3. Canonicalized the <SignedInfo>.
2.4. Fetch the public key, which is either known, or contained in <Signature>/<KeyInfo> element.
2.5.  Identify the algorithm (info in <SignatureMethod>)
2.6. Perform the signature-verification

process2_sig.jpg

4. Canonicalization

Finally, let's talk about that tedious topic, as I promised above.
Can we do that later...?

4.1. Intro

OK.
What is canonicalization?
We know that XML is used for storing structured data and for enriching it with metadata.
Example:

 

 

 

 

 

 

 

 

<order>
   <customer trusted="yes" active="true">Joe</customer>
</order>       

 

 

 

 

 

 

 

 

When reading the data from it, we don’t care about the way how it is formatted:

 

 

 

 

 

 

 

 

<order    >
   <customer 
         active="true" 
         trusted = ’yes’>
      Joe
   </customer>
</order   >        

 

 

 

 

 

 

 

 

Both XML snippets are valid and have the same content - although they look different.
We don’t care about
   🔹silly spaces
   🔹useless line feeds
   🔹using inverted comma or quotation marks
   🔹using different order of attributes
Etc
Sure - I don't care at all 😹
However……
In case of cryptographic hashes, we know that changing one little cute byte does invalidate the content and causes the verification to fail.
So just imagine that the xml payload goes through an iFlow……. It will be for sure re-formatted by any iFlow step. Normally, this doesn’t matter, because xml is anyways parsed by machines, so the format is irrelevant.
But if some xml is used for signature, then the format matters.
Really?
Really really matters 

How can it be solved?
We must agree on a totally default standard format for xml.
This must be agreed on – which results in another standard.
OMG - how boring 😴
And here it is:
The XML Canonicalization standard.
Applying canonicalization to both sample snippets above, would result in a third representation:

 

 

 

 

 

 

 

 

<order>
   <customer active="true" trusted="yes">Joe</customer>
</order>       

 

 

 

 

 

 

 

 

We can see that the attributes have fix order, useless spaces have been removed, etc
What does the spec say?
Let’s have a look at the spec and copy a little excerpt:

   🔹 The XML declaration and document type declaration are removed.
   🔹 Attribute value delimiters are set to quotation marks (double quotes).
   🔹 Empty elements are converted to start-end tag pairs.
etc

Why do we need different canonicalization methods?
Good question and I recommend the tutorial for visual understanding.
🔹There are small differences in the different versions of specs.
🔹Also, we can choose if we want to keep comments or not.
🔹And one important difference is the exclusive canonicalization (specified here)

Last question: c14n?
The word canonicalization is so long and hard to type and pronounce… people like to use c14n as abbreviation.
The number 14 stands for the number of characters between c and n.
Haha 

4.2. Optional: C14N Tutorial

You imagine how much I enjoyed typing C14N instead of Canoni….
Let’s have a look at a few examples to see how c14n works.
I’ve prepared a little code sample which uses the Apache Santuario library for c14n.
The full code can be found in the Appendix.

xample 1: Simplest xml with Comment

Our first sample XML:

 

 

 

 

 

 

 

<!-- my comment -->
<parent    >
   <child    batt='yes'    att='no'    />
</parent   >

 

 

 

 

 

 

 

We apply this c14 method: http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments 

We use this Java code:

 

 

 

 

 

 

 

Canonicalizer canon = Canonicalizer.getInstance(c14nMethod);
canon.canonicalize(xml.getBytes(), System.out, false);     

 

 

 

 

 

 

 

And the result:

 

 

 

 

 

 

 

<!-- my comment -->
<parent>
<child att="no" batt="yes">
</child>
</parent>

 

 

 

 

 

 

 

What we see:
🔸The comment has been preserved
🔸The blanks inside the <parent   > tag have been removed
🔸The attributes have been ordered (a comes before b)
🔸The inverted commas have been adapted

Example 2: Simplest xml removing Comment

Now we use the same input, but apply the c14 algorithm which removes comments:
http://www.w3.org/TR/2001/REC-xml-c14n-20010315 

The result is the same as above, but without the comment in the first line.

Example 3: Namespaces and subtree

An interesting aspect is the propagation of namespaces.
This becomes relevant, when we create a signature of a subtree only.
In this case, the child element inherits the namespaces declared at parent.

In the following examples we will apply the c14n on a child node only.
The XML content has a root element with 3 child elements.
The parent has 2 namespace declarations.
The children use only 1 of the declared namespaces.

 

 

 

 

 

 

 

<parent 
      xmlns:pans='/pauri' 
      xmlns:chns='/churi'>
   <child chns:att='good'/>
   <friend pans:att='OK' chns:att='cool'/>
   <brother/>
</parent>

 

 

 

 

 

 

 

We apply the c14n method on the element “child”
http://www.w3.org/TR/2001/REC-xml-c14n-20010315 

The sample code uses DOM method for retrieving the desired child node.
Note that in the context of XML Signature it is recommended to use xPath, not DOM.

 

 

 

 

 

 

 

InputStream stream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
Document doc = XMLUtils.read(stream, true);
Node myChild = (Node)doc.getElementsByTagName(element).item(0);

Canonicalizer canon = Canonicalizer.getInstance(c14nMethod); 
canon.canonicalizeSubtree(myChild, System.out);			

 

 

 

 

 

 

 

The result:

 

 

 

 

 

 

 

<child xmlns:chns="/churi" xmlns:pans="/pauri" chns:att="good"></child>

 

 

 

 

 

 

 

We can see that both namespace declarations have been propagated to the child by the canonicalizer.

Example 4: Exclusive Canonicalization

To understand what "exclusive" means, we run the same example, but applying the exclusive method:
http://www.w3.org/2001/10/xml-exc-c14n# 

The result:

 

 

 

 

 

 

 

<child xmlns:chns="/churi" chns:att="good"></child>

 

 

 

 

 

 

 

We can see that only the one required namespace declaration has been propagated.

Example 5: Exclusive Canonicalization 2

Now let's compare to the <friend> element which has 2 attributes using both namespaces.
The result:

 

 

 

 

 

 

 

<friend xmlns:chns="/churi" xmlns:pans="/pauri" chns:att="cool" pans:att="OK"></friend>

 

 

 

 

 

 

 

We can see that both declarations have been propagated from the parent.

BTW, we can also see that the declarations and attributes are nicely ordered.

Example 5: Exclusive Canonicalization 3

Last example: we apply c14n on a child node that doesn’t have any attributes, hence doesn’t use any namespace.
As a result of applying the exclusive canonicalization method, we can see that none of the declarations has been propagated:

 

 

 

 

 

 

 

<brother></brother>

 

 

 

 

 

 

 

Next test would be:
Use the normal c14n method, not exclusive.
We would wee that all declarations are inherited. Without necessity.
We skip it here, but it is contained in the Appendix.

BTW, furthermore we can see that the shortcut for empty tags has been replaced:

<brother/> is c14n'ed to   <brother></brother>

5. Optional: Some General Info

There are several different names for the same thing:
🔹XML Signature is the official name, as used in the specification.
🔹It is also called  XMLDSig, XML-DSig, XML-Sig.
🔹Personally, I would like to add the names XML-Digi-Sigi and xml-disi to the list (but up to now, nobody has adapted).

The standard was developed by the World Wide Web Consortium (W3C) and published as a W3C Recommendation.
The specification can be found at https://www.w3.org/TR/xmldsig-core/

It has version 1.1 as of 2013, which is expressed in the internal link http://www.w3.org/TR/2013/REC-xmldsig-core1-20130411/

REC  stands for recommendation, this is the mature result of the definition process.
TR stands for Technical Report, this is a general hint towards the character of the standard+

The XML Signature is used in xml-based technologies like SAML, SOAP, WSSecurity.

Summary

The XML Signature standard is used for creating a digital signature of xml content.
The signature is represented by an xml-tree
The standard defines

🔷  3 ways of signing xml content: enveloping, enveloped and detached.
- The signature can be inserted as subtree somewhere in the xml content (enveloped).
- Or the signature xml tree can contain the content as a subtree (enveloping).
- Alternatively, the Signature xml tree can be detached from the content and live as standalone xml.

🔷  a process of creating a hash of the desired content and in addition creating a signature over a part of the signature itself.

🔷  an xml structure for storing the signature and digest, that are required for verification.

Next Steps

Go through the tutorial in the next blog post to gain hands-on experience.

Links

SAP Help Portal
Docu for Message-Level Security

Specs
XML Signature: https://www.w3.org/TR/xmldsig-core/
C14n Version 1.1 https://www.w3.org/TR/xml-c14n/
C14N Version 2 (2013) https://www.w3.org/TR/xml-c14n2/
Exclusive c14n Vers 1 https://www.w3.org/TR/xml-exc-c14n/

Info
Wikipedia: CHF, Cryptographic Hash Function
Wikipedia: Comparison of cryptographic hash functions
Wikipedia: Digital Signature

Libs
Apache Santuario implements XML Enc and DigiSigi standards.

Blogs
Understanding the XML Encryption standard.
Understanding CMS (PKCS 7) standard.
Understanding the PKCS7/CMS Signer
Security Glossary Blog

Appendix: C14N Test Code

To get below code working, the Apache Santuario library is required.
It can be downloaded from here: https://mvnrepository.com/artifact/org.apache.santuario/xmlsec
Alternatively, below snippet can be added to your Maven dependencies section:

 

 

 

<dependency>
   <groupId>org.apache.santuario</groupId>
   <artifactId>xmlsec</artifactId>
   <version>4.0.2</version>
</dependency>

 

 

 

This test class is for your convenience:

 

 

 

 

package example.c14n;

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;

import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.utils.XMLUtils;
import org.w3c.dom.Document;
import org.w3c.dom.Node;

/** Using Apache Santuario library for canonicalizing different XML payloads */
public class CanonTest {

	
	public static void main(String unused[]) throws Exception {
		org.apache.xml.security.Init.init(); 
		
		// example for handling comment and basics
		String xmlSimple = ""
				+ "<!-- my comment -->"
				+ "<parent    >"
				+     "<child batt='yes' att='no'    />"
				+ "</parent>";
		
		// apply different canonicalization methods 
		canoFull(xmlSimple, Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS);
		canoFull(xmlSimple, Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);

		
		// example for handling namespaces
		String xmlWithNs = ""
				+ "<parent "
				+       "xmlns:pans='/pauri' "
				+       "xmlns:chns='/churi'>"
				+     "<child chns:att='good'/>"
				+     "<friend pans:att='OK' chns:att='cool' />"
				+     "<brother/>"
				+ "</parent>";
		
		// apply different canonicalization methods on different subtrees
		canoSubtree(xmlWithNs, "child", Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
		canoSubtree(xmlWithNs, "child", Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);
		canoSubtree(xmlWithNs, "friend", Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);
		canoSubtree(xmlWithNs, "brother", Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
		canoSubtree(xmlWithNs, "brother", Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);
	}

	private static void canoFull(String xml, String c14nMethod) throws Exception {
		System.out.println("\n\n- - - " + c14nMethod + " - - - \n");

		Canonicalizer canon = Canonicalizer.getInstance(c14nMethod); 
		canon.canonicalize(xml.getBytes(), System.out, false);		
	}	
	
	private static void canoSubtree(String xml, String element, String c14nMethod) throws Exception {
		System.out.println("\n\n- - - " + c14nMethod + " - - - \n");

		InputStream stream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
		Document doc = XMLUtils.read(stream, true);
        Node myChild = (Node)doc.getElementsByTagName(element).item(0);

		Canonicalizer canon = Canonicalizer.getInstance(c14nMethod);//Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS 
		canon.canonicalizeSubtree(myChild, System.out);			
	}	
}