+ Reply to Thread
Results 1 to 6 of 6

Thread: Timeout

  1. #1

    Default Timeout

    I have some very simple code used to scrape a particular website and parse the HTML that was using a version of SgmlReader prior to the Nuget package's existence. The DLL I was using reports the version as 1.8.7.0. If I continue to use this version there are no problems with my code. However, if I switch to using the Nuget version (the DLL reports the version as 1.8.8.0) the code fails with a WebException ("The operation has timed out"). Here's basically what the code looks like:

    Code:
    using (var client = new WebClient() { Encoding = Encoding.UTF8 })
    {
       var sgml = client.DownloadString(address);
       using (var stringReader = new StringReader(sgml))
       {
          using (var sgmlReader = new SgmlReader())
          {
              sgmlReader.DocType = "HTML";
              sgmlReader.WitespaceHandling = WhitespaceHandling.All;
              sgmlReader.CaseFolding = CaseFolding.ToLower;
              sgmlReader.InputStream = stringReader;
              return XDocument.Load(sgmlReader);
          }
       }
    }
    The exception is thrown from the "return XDocument.Load(sgmlReader);" statement. Any ideas what the problem is here and how to correct it?

  2. #2
    Join Date
    Jul 2006
    Location
    San Diego, CA
    Posts
    5,450

    Default

    I haven't used the Nuget'ed version yet. The pull request was just recently accepted for it. What exception are you getting?
    Steve G. Bjorg - Chief Architect
    Did you check the MindTouch FAQ?
    Found a bug? Report it.
    Follow me on Twitter
    Find us on IRC: irc.freenode.net #mindtouch

  3. #3

    Default

    I gave the exception in the original posting . "return XDocument.Load(sgmlReader)" is throwing a WebException with a message that reads "The operation has timed out".

  4. #4
    Join Date
    Jul 2006
    Location
    San Diego, CA
    Posts
    5,450

    Default

    I filed a tracking issue to look into it: http://youtrack.developer.mindtouch.com/issue/SR-8480
    Steve G. Bjorg - Chief Architect
    Did you check the MindTouch FAQ?
    Found a bug? Report it.
    Follow me on Twitter
    Find us on IRC: irc.freenode.net #mindtouch

  5. #5

    Default

    Quote Originally Posted by wekempf View Post
    I gave the exception in the original posting . "return XDocument.Load(sgmlReader)" is throwing a WebException with a message that reads "The operation has timed out".
    Guess it's too late to answer but I've got the same error.
    After debugging into SgmlReader's code I've figured out that the problem was in DTD's loading, in my case - it's http://www.w3.org/TR/html4/sgml/loose.dtd.
    More specifically in Entity.Open method: it creates HttpWebRequest with timeout 10 secs to load the file. Unfortunately, from my ISP it takes about 20 seconds to load from that URL, so the method fails with timeout exception.

    To work-out the exception you may:
    1) Load SgmlDtd object from cached file and put it to SgmlReader.Dtd
    2) (my option) Set SgmlReader.IgnoreDtd = false

  6. #6

    Default

    I was facing the same error, but find the answer now...
    Its too late to post it here..

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts