PDA

View Full Version : SgmlReader unexpected white space insertion



pinkduck
06-19-2012, 11:01 AM
If I use (in VS2010 C#):

var reader = new Sgml.SgmlReader
{
DocType = "HTML",
WhitespaceHandling = WhitespaceHandling.Significant,
CaseFolding = Sgml.CaseFolding.ToLower
};

to parse an InputStream containing:

<P><SPAN class=equation><I>I</I></SPAN><SUB>Z</SUB></P>

to get XHTML as XML string with .ReadOuterXml() then I get:

<html>
<p>
<span class="equation">
<i>I</i>
</span>
<sub>Z</sub>
</p>
</html>

The problem is that the line break between the span and sub elements causes an additional white space to appear in the rendered document.

Curiously, if I do the same thing with:

<p>Before</p>
<P><SPAN class=equation><I>I</I></SPAN><SUB>Z</SUB></P>

I receive:

<html>
<p>Before</p>
<p><span class="equation"><i>I</i></span><sub>Z</sub></p></html>

If I modify the SgmlReader.cs ReadOuterXml method to use xw.Formatting = Formatting.None then there is no issue, but that has the potential to collapse other whitespace I suspect.

Any ideas why this happens and how to fix it?

SteveB
06-26-2012, 03:53 PM
Filed a tracking issue to look into it: http://youtrack.developer.mindtouch.com/issue/SR-8480

jasicajhon
04-08-2013, 10:33 AM
It caused unexpected rejection of empty namespace URI. ..... we can avoid extraneous whitespace node creation for XmlTextReader by setting WhitespaceHandling. ..... XmlParserInput.cs : Now to handle nested PE insertion and correct BaseURI