+ Reply to Thread
Page 4 of 5 FirstFirst ... 2 3 4 5 LastLast
Results 31 to 40 of 47

Thread: Inconsistent Search Results

  1. #31
    Join Date
    Jan 2010
    Location
    Oak Ridge, TN
    Posts
    35

    Default

    OK... Due to whenever a rebuild of the index is done, I am always getting random articles missing. So I have found when I did what is described here:

    http://developer.mindtouch.com/en/re...e-index_a_Page

    The article would show up... so here is a very DIRTY perl script that will do this for ALL of my articles and it appears to have worked. In an instance I have where I tried doing searches and only 1 article appeared and this was after a rebuild of the index, I then just ran this script and did the exact same search and behold... now I get ohh the 10 or so articles I was expecting... So this might help some of you:


    Code:
    #!/usr/bin/perl -w
    use strict;
    use DBI;
    
    my $database = "mydb";
    my $username = "mydbuser";
    my $password = "mydbpass";
    my $protocol = "http";
    my $url = "myurl.domain.com";
    my $kbadmin = "mykbadmin";
    my $kbadminpass = "mykbadminpass";
    my $hostname = "localhost";
    
    
    my $dbh = DBI->connect("DBI:mysql:database=$database;host=$hostname",$username,$password,{'RaiseError' => 1});
    
    my $sth = $dbh->prepare("SELECT page_id FROM pages WHERE page_is_hidden = 0 ORDER BY page_id");
    $sth->execute();
    while (my $ref = $sth->fetchrow_hashref()) {
    	my $pageid = $ref->{'page_id'};
            print "Found Row: $pageid\n";
    	system("curl -u $kbadmin:$kbadminpass -H \"Content-Type: text\/plain\" -d \"\" -i $protocol\:\/\/$url\/\@api\/deki\/pages\/$pageid\/index");
    }
    $sth->finish();
    
    $dbh->disconnect();
    Regards,

    Billy S.

  2. #32
    Join Date
    Feb 2008
    Location
    Southern California
    Posts
    258

    Default

    While the PERL script is an interesting stop-gap it is not a solution. Reviewing this thread indicates that both Core and paying customers are seeing this issue. It appears to have been introduced with version 10.0 and remains through 10.1. I find it hard to believe that the Mindtouch team is not taking this more seriously. After all the indexing function is one of the most basic features. A bug report has been created and continues to go unassigned. What is going on???

    Regards,
    Mark C.

  3. #33
    Join Date
    Jan 2010
    Location
    Oak Ridge, TN
    Posts
    35

    Default

    Well... I have a ticket open with Support and just the other day I have sent them a TON of information to look at. I backed up the indexes, and rebuilt, backed up and rebuilt... and I could clear see the counts are just off each time. Not the same number either, just randomly skipping articles. So I don't have a bug ticket open, but I am working with support on this. Wish I had more information, but I just dont
    Regards,

    Billy S.

  4. #34
    Join Date
    Jan 2010
    Location
    Oak Ridge, TN
    Posts
    35

    Default

    Here is an example of what I am seeing:

    Rebuild # 1 Results - Number of Articles in Index 2026, Number of Terms in Index 40700
    Go to site and click Rebuild
    Rebuild # 2 Results - Number of Articles in Index 1198, Number of Terms in Index 30218
    Stop dekiwiki
    goto the luceneindex directory and delete all directories
    Restart dekiwiki
    Go to site and click rebuild if Indexing hadn't already begun
    Rebuild # 3 Results - Number of Articles in Index 722, Number of Terms in Index 20745
    Go to site and click rebuild
    Rebuild # 4 Results - Number of Articles in Index 1148, Number of Terms in Index 29273

    So... as you can see... it varies on how many articles are in the index at any given time. And yes I wait until all articles are indexed before checking the report.
    Regards,

    Billy S.

  5. #35
    Join Date
    Sep 2008
    Posts
    195

    Default

    Sorry for not jumping in on this thread earlier.

    So we upgraded lucene back in some revision 10.0 to the latest version of lucene.net because it just provided significantly faster querying plus usable wildcard support. I've heard from a number of customers about search issues but I have not been able to create a reproducible case myself. Our local dev environments seeem to be behaving properly, Developer, which is one of our largest sites, is also fine. I am certainly not doubting that you are having problems, but until I can reproduce it there isn't a lot I can do, unfortunately. If someone can provide me with a step by step way to creating a bad index, I would really appreciate it.

    I also wanted to address whether this is something that is caused by our commercial adaptive search. I can assure it is not. Adaptive search is a layer on top of lucene to influence search result positioning on a couple of additional metrics, but it relies completely on lucene having provided an initial good ranking of documents. I.e. if lucene results are bad our adaptive search would also be bad. This is not a scenario where the non-commercial version is degrading search result.

    I hope we can get to the bottom of this. There is nothing I hate more than intermittent problems I cannot help.

    cheers,
    arne
    Arne Claassen - Software Architect
    Found a bug? Report it.
    Follow me on Twitter!

  6. #36
    Join Date
    Jul 2009
    Location
    Girona, SPAIN
    Posts
    547

    Default

    Hi Arnec,

    I think we can give you access to our instance to check it, but since half a year the search function it's totally awkward, I think it returns the results reverse ordered.

    Thanks in advance,
    Carles Coll

  7. #37
    Join Date
    Jan 2010
    Location
    Oak Ridge, TN
    Posts
    35

    Default

    As I said I sent my results to support the other day (Ticket # 19068). I have done web sessions with the tech staff, and I can do more if needed. I have easily replicated this issue in my lab area. If another tech at MindTouch wishes to schedule time to do some debugging I am more then happy to sit on the phone with you all and do another web session.
    Regards,

    Billy S.

  8. #38
    Join Date
    Oct 2008
    Posts
    38

    Default

    Strader,

    Are any of the articles you're missing in the Lucene index attachments? MindTouch does index file attachments such as Word and Excel documents. Sometimes the system is missing a few dependencies which prevents these types of attachments from getting indexed.

    Let me know.

    Thanks!

  9. #39
    Join Date
    Jan 2010
    Location
    Oak Ridge, TN
    Posts
    35

    Default

    Not that I am aware of... Reason, we commented out the indexing of attachments due to we believe that causing the MindTouch service to cache.
    Regards,

    Billy S.

  10. #40
    Join Date
    Sep 2008
    Posts
    195

    Default

    Carles,

    So there are two problems i'm aware of:

    The first is documents not appearing in the index and we've seen it, but even looking and before index snapshots and all the logs inbetween has not shown how this has occured. This is a scenario that I have to get a local repro that i can attach a debugger to address.

    The second problem is search result being wrong, as you seem to be indicating. This I haven't witnessed and it would certainly be useful to see it in action and to get a copy of that index to inspect it manually. If the index isn't too huge and not full of proprietary data, i'd appreciate a copy and the keywords that are failing for you. Easiest way to do this would be to create a private page on developer and attach it as a zip, since I should be able to access it via admin.

    I've created a ticket to track this issue here:
    http://youtrack.developer.mindtouch.com/issue/MT-10766
    If you have data that isn't proprietary you can also attach it to the ticket.

    cheers,
    arne
    Arne Claassen - Software Architect
    Found a bug? Report it.
    Follow me on Twitter!

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts