bhaal
11-16-2009, 11:42 AM
I was wondering, shouldn't the SgmlReader be capable of ignoring the SystemId of a document, and parse it regardless of its content?
I am using the current version in combination with XDocument (System.Xml.Linq) to read various xml and sgml documents.
However, my testcases fail when:
- the input file is SGML (thus the SgmlReader comes into play) and
- the file contains a doctype with SystemId.
It usually fails with FileNotFound/DirectoryNotFoundException, since it simply cannot find the DTD file inside Entity.Open.
In my case, I just want to open the document, and not validate it; hence I wouldn't care if it is actually valid against the referenced DTD or not. Usually, the DTD is not right next to the document where SgmlReader expects it, but in all other cases the DTD is not available/known in general.
Is there any way to load those documents anyways? XDocument alone succeeds (for Xml), and I think XmlReader also has options to allow for this (not sure about this tho).
This is the Snippet I call for my tests:
public static XDocument LoadDocument(string fileName)
{
XDocument ret;
try
{
ret = XDocument.Load(fileName);
}
catch
{
try
{
var sgmlReader = new Sgml.SgmlReader();
sgmlReader.Href = fileName;
sgmlReader.StripDocType = false;
ret = XDocument.Load(sgmlReader);
}
catch
{
throw new XmlException("Could not load " + fileName + " as Xml Document");
}
}
return ret;
}
void Test()
{
//<!DOCTYPE root PUBLIC "publicId" "systemId" [subset]>
//<root/>
var doc = LoadDocument("some.xml");
Assert.AreEqual("publicId", doc.DocumentType.PublicId);
Assert.AreEqual("systemId", doc.DocumentType.SystemId);
Assert.AreEqual("subset", doc.DocumentType.InternalSubset);
//<!DOCTYPE root PUBLIC "publicId" "systemId" [subset]>
//<root>
doc = LoadDocument("some.sgm");
Assert.AreEqual("publicId", doc.DocumentType.PublicId);
Assert.AreEqual("systemId", doc.DocumentType.SystemId);
Assert.AreEqual("subset", doc.DocumentType.InternalSubset);
}
Loading "some.sgm" fails with "Unable to find 'current\working\directory\\systemId'", since the file is obviously not there.
Any chance I can get those files to load?
Regards, BhaaL
I am using the current version in combination with XDocument (System.Xml.Linq) to read various xml and sgml documents.
However, my testcases fail when:
- the input file is SGML (thus the SgmlReader comes into play) and
- the file contains a doctype with SystemId.
It usually fails with FileNotFound/DirectoryNotFoundException, since it simply cannot find the DTD file inside Entity.Open.
In my case, I just want to open the document, and not validate it; hence I wouldn't care if it is actually valid against the referenced DTD or not. Usually, the DTD is not right next to the document where SgmlReader expects it, but in all other cases the DTD is not available/known in general.
Is there any way to load those documents anyways? XDocument alone succeeds (for Xml), and I think XmlReader also has options to allow for this (not sure about this tho).
This is the Snippet I call for my tests:
public static XDocument LoadDocument(string fileName)
{
XDocument ret;
try
{
ret = XDocument.Load(fileName);
}
catch
{
try
{
var sgmlReader = new Sgml.SgmlReader();
sgmlReader.Href = fileName;
sgmlReader.StripDocType = false;
ret = XDocument.Load(sgmlReader);
}
catch
{
throw new XmlException("Could not load " + fileName + " as Xml Document");
}
}
return ret;
}
void Test()
{
//<!DOCTYPE root PUBLIC "publicId" "systemId" [subset]>
//<root/>
var doc = LoadDocument("some.xml");
Assert.AreEqual("publicId", doc.DocumentType.PublicId);
Assert.AreEqual("systemId", doc.DocumentType.SystemId);
Assert.AreEqual("subset", doc.DocumentType.InternalSubset);
//<!DOCTYPE root PUBLIC "publicId" "systemId" [subset]>
//<root>
doc = LoadDocument("some.sgm");
Assert.AreEqual("publicId", doc.DocumentType.PublicId);
Assert.AreEqual("systemId", doc.DocumentType.SystemId);
Assert.AreEqual("subset", doc.DocumentType.InternalSubset);
}
Loading "some.sgm" fails with "Unable to find 'current\working\directory\\systemId'", since the file is obviously not there.
Any chance I can get those files to load?
Regards, BhaaL