If I use (in VS2010 C#):
var reader = new Sgml.SgmlReader
{
DocType = "HTML",
WhitespaceHandling = WhitespaceHandling.Significant,
CaseFolding = Sgml.CaseFolding.ToLower
};
to parse an InputStream containing:
<P><SPAN class=equation><I>I</I></SPAN><SUB>Z</SUB></P>
to get XHTML as XML string with .ReadOuterXml() then I get:
<html>
<p>
<span class="equation">
<i>I</i>
</span>
<sub>Z</sub>
</p>
</html>
The problem is that the line break between the span and sub elements causes an additional white space to appear in the rendered document.
Curiously, if I do the same thing with:
<p>Before</p>
<P><SPAN class=equation><I>I</I></SPAN><SUB>Z</SUB></P>
I receive:
<html>
<p>Before</p>
<p><span class="equation"><i>I</i></span><sub>Z</sub></p></html>
If I modify the SgmlReader.cs ReadOuterXml method to use xw.Formatting = Formatting.None then there is no issue, but that has the potential to collapse other whitespace I suspect.
Any ideas why this happens and how to fix it?


Reply With Quote
