LibXML2 parsing whitespace and line breaks

When using libXML2 to parse HTML, by default, libXML2 normalizes and merges whitespace characters (including line breaks) on text nodes, which can cause line breaks within tags such as<pre>,<textarea>, script, style, etc. to be removed or merged. But for tags like<pre>, line breaks and whitespace are meaningful and need to be preserved.

Answered by DTS Engineer in 842027022
Apple provides a couple of higher-level XML wrappers, depending on your platform.

Indeed. And apropos that, the libxml2 that’s built in to Apple platforms is pretty much a direct copy of the open source version [1]. Given that, you’re more likely to find answers to questions like this in the support channel for that library, rather than here on DevForums.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] If you’re curious, the Open Source > Releases page shows which version of libxml2 is built in to which version of macOS, and the versions for other platforms are generally aligned.

Maybe you should include some code showing how you are doing this. As far as I'm aware, the default is to preserve whitespace.

And maybe explain what platforms you're targeting. Apple provides a couple of higher-level XML wrappers, depending on your platform.

Apple provides a couple of higher-level XML wrappers, depending on your platform.

Indeed. And apropos that, the libxml2 that’s built in to Apple platforms is pretty much a direct copy of the open source version [1]. Given that, you’re more likely to find answers to questions like this in the support channel for that library, rather than here on DevForums.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] If you’re curious, the Open Source > Releases page shows which version of libxml2 is built in to which version of macOS, and the versions for other platforms are generally aligned.

LibXML2 parsing whitespace and line breaks
 
 
Q