PDA

View Full Version : conversion of empty tags



teh1623
03-11-2010, 04:38 AM
Hi.
I tested the version of 1.8.6, and the parsed result of attached document is as follows:

-----------------------------
<img alt="xxx" src="xxx.jpg"> <img alt="yyy" src="yyy.jpg">
<table>
<colgroup>
<col style="WIDTH: 100px">
<col style="WIDTH: 50px"></col></col></colgroup>
<tbody>
<tr>
<td>aaa</td>
<td>111</td></tr>
<tr>
<td>bbb</td>
<td>222</td></tr></tbody></table></img></img>
-----------------------------

I think img and col tags should be treaded as empty tags...

SteveB
03-12-2010, 03:25 PM
Oh, that's interesting. Yeah, I wonder how we're going to handle that case. Thanks for submitting it. I filed a bug for it (http://bugs.developer.mindtouch.com/view.php?id=7801).

teh1623
03-15-2010, 05:10 AM
Thank you for your prompt reply!
As you may know, I think there are many other end-tag omittable elements in HTML DTD.
For example, according to HTML 4.01 Strict DTD, listed elements are end-tag omittable I think.

---------------------------------------
!ELEMENT BR - O EMPTY
!ELEMENT BODY O O
!ELEMENT AREA - O EMPTY
!ELEMENT LINK - O EMPTY
!ELEMENT IMG - O EMPTY
!ELEMENT HR - O EMPTY
!ELEMENT P - O
!ELEMENT DT - O
!ELEMENT DD - O
!ELEMENT LI - O
!ELEMENT INPUT - O
!ELEMENT OPTION - O
!ELEMENT THEAD - O
!ELEMENT TFOOT - O
!ELEMENT TBODY O O
!ELEMENT COLGROUP - O
!ELEMENT COL - O EMPTY
!ELEMENT TR - O
!ELEMENT (TH|TD) - O
!ELEMENT HEAD O O
!ELEMENT BASE - O EMPTY
!ELEMENT META - O EMPTY
!ELEMENT HTML O O
---------------------------------------

When I input the another html(attached) including the tags listed above, I got the unexpected result as follows.

---------------------------------------
<img alt="xxx" src="xxx.jpg"> <img alt="yyy" src="yyy.jpg">
<ul>
<li>aaa
<li>bbb </li></li></ul>
<table>
<colgroup>
<col style="WIDTH: 100px">
<col style="WIDTH: 50px"></col></col></colgroup>
<tbody>
<tr>
<td>aaa
<td>111 </td>
<tr>
<td>bbb
<td>222 </td></td></tr></td></tr></tbody></table></img></img>
---------------------------------------

SteveB
03-17-2010, 12:17 AM
Can you be more specific. Are all these tags handled incorrectly or just a few? If only a few, which ones? I can't imagine SgmlReader would be performing as well as it did if this issue were not confined to a few tags.