Opened 6 years ago

Last modified 11 months ago

#767 new enhancement

Make testcases using different backends run in parallel

Reported by: Guruprasad Hegde Owned by: Guruprasad Hegde
Priority: normal Milestone:
Component: Test Suite Version: git master
Severity: normal Keywords:
Cc: Olly Betts, Gaurav Arora Blocked By:
Blocking: Operating System: All

Description

All test cases are run sequentially. Test cases which use different backends can be run in parallel. As a result total elapsed time can be reduced by a great factor.

This ticket tracks initial planning and work done to speed up the test suite.

Plan: We have decided to fork a child process for each backend run test cases related to that backend in the child process. We must devise a way to report the progress from multiple processes running test cases in parallel. Olly suggested about using a pipe to communicate the progress to parent process which performs the display job.

BackendManagerSingleFile, BackendManagerMulti and BackendManagerRemote share glass database hence these can't be run in parallel.

Attachments (1)

gp_cpu_info.png (27.6 KB ) - added by Guruprasad Hegde 6 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 by Guruprasad Hegde, 6 years ago

Currently BackendManagerGlass creates a set of databases in .glass directory by default. Can we make backend manager to create a separate set of databases in a directory given as a parameter? .glass takes 5MB of disk space, is it already a lot of memory? Or this change doesn't fit well?

comment:2 by Guruprasad Hegde, 6 years ago

I tried this - In testrunner.cc, ignoring the exception handling, output format with Valgrind disabled:

   128         int children_count = 0;                                                                                       
   129         pid_t pid;
   130 #ifdef XAPIAN_HAS_HONEY_BACKEND                                                                                                  
   131         pid = fork();
   132         if (pid == 0) {                                                                                                          
   133             do_tests_for_backend(BackendManagerHoney(datadir));                                                                  
   134             exit(0);
   135         }                                                                                                                        
   136         children_count++;                                                                                                        
   137 #endif
   138 
   139         pid = fork();
   140         if (pid == 0) {
   141             do_tests_for_backend(BackendManager(string()));                                                                      
   142             exit(0);
   143         }
   144         children_count++;
   145 
   146 #ifdef XAPIAN_HAS_INMEMORY_BACKEND                                                                                               
   147         pid = fork();                                                                                                            
   148         if (pid == 0) {                                                                                                          
   149             do_tests_for_backend(BackendManagerInMemory(datadir));                                                               
   150             exit(0);                                                                                                             
   151         }
   152         children_count++;                                                                                                        
   153 #endif
   154       
   155 #ifdef XAPIAN_HAS_GLASS_BACKEND                                                                                                  
   156         {                                                                                                                        
   157             BackendManagerGlass glass_man(datadir);                                                                              
   158             do_tests_for_backend(glass_man);                                                                                     
   159             do_tests_for_backend(BackendManagerSingleFile(datadir, &glass_man));                                                 
   160             do_tests_for_backend(BackendManagerMulti(datadir, &glass_man));                                                      
   161 # ifdef XAPIAN_HAS_REMOTE_BACKEND                                                                                                
   162             do_tests_for_backend(BackendManagerRemoteProg(&glass_man));                                                          
   163             do_tests_for_backend(BackendManagerRemoteTcp(&glass_man));                                                           
   164 # endif
   165         }
   166 #endif
   167         int status;                                                                                                              
   168         for (int i = children_count; i != 0; --i) {                                                                              
   169             wait(&status);
   170         }

Run time with parallel:

real	1m49.777s
user	0m48.546s
sys	0m31.740s

Run time without parallel:

real	2m2.414s
user	0m42.655s
sys	0m30.834s

With Valgrind enabled, I got these prints for few testcases - Leak summary: and 304 bytes in 1 blocks are possibly lost in loss record 52 of 57. I guess these prints are from Valgrind.

by Guruprasad Hegde, 6 years ago

Attachment: gp_cpu_info.png added

comment:3 by Guruprasad Hegde, 6 years ago

We need to update result_so_far variable too. So each child process must send subtotal.

comment:4 by Guruprasad Hegde, 6 years ago

Updates about leak summary error mentioned in comment: All Valgrind errors are reported for backend honey.

All erros are similar to the one below, I am not sure to which testcase this error is related, since outputs are mixed.

==7621== 304 bytes in 1 blocks are possibly lost in loss record 64 of 70
==7621==    at 0x4C2EEF5: calloc (vg_replace_malloc.c:711)
==7621==    by 0x40112B2: allocate_dtv (in /usr/lib/ld-2.27.so)
==7621==    by 0x4011C3D: _dl_allocate_tls (in /usr/lib/ld-2.27.so)
==7621==    by 0x6B8DBAA: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.27.so)
==7621==    by 0x5673E6F: ??? (in /usr/lib/librt-2.27.so)
==7621==    by 0x6B94A5E: __pthread_once_slow (in /usr/lib/libpthread-2.27.so)
==7621==    by 0x5672CDB: timer_create (in /usr/lib/librt-2.27.so)
==7621==    by 0x53B2B9F: TimeOut (matchtimeout.h:84)
==7621==    by 0x53B2B9F: ProtoMSet (protomset.h:159)
==7621==    by 0x53B2B9F: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:315)
==7621==    by 0x53B4611: Matcher::get_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight::Internal&, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:440)
==7621==    by 0x528EDA8: Xapian::Enquire::Internal::get_mset(unsigned int, unsigned int, unsigned int, Xapian::RSet const*, Xapian::MatchDecider const*) const (enquire.cc:327)
==7621==    by 0x528F293: Xapian::Enquire::get_mset(unsigned int, unsigned int, unsigned int, Xapian::RSet const*, Xapian::MatchDecider const*) const (enquire.cc:205)
==7621==    by 0x2462CB: test_matchtimelimit1() (api_postingsource.cc:657)
==7621== 
==7621== 304 bytes in 1 blocks are possibly lost in loss record 65 of 70
==7621==    at 0x4C2EEF5: calloc (vg_replace_malloc.c:711)
==7621==    by 0x40112B2: allocate_dtv (in /usr/lib/ld-2.27.so)
==7621==    by 0x4011C3D: _dl_allocate_tls (in /usr/lib/ld-2.27.so)
==7621==    by 0x6B8DBAA: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.27.so)
==7621==    by 0x5673D33: ??? (in /usr/lib/librt-2.27.so)
==7621==    by 0x6B8D0BB: start_thread (in /usr/lib/libpthread-2.27.so)
==7621== 
==7621== LEAK SUMMARY:
==7621==    definitely lost: 0 bytes in 0 blocks
==7621==    indirectly lost: 0 bytes in 0 blocks
==7621==      possibly lost: 608 bytes in 2 blocks
==7621==    still reachable: 175,526 bytes in 68 blocks
==7621==         suppressed: 0 bytes in 0 blocks
==7621== Reachable blocks (those to which a pointer was found) are not shown.
==7621== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==7621== 
 304 bytes in 1 blocks are possibly lost in loss record 64 of 70

I tried running only backend honey in a separate process (using -bhoney) that gives no error. A single process might not trigger this error, but this is one observation.

comment:5 by Guruprasad Hegde, 6 years ago

Regarding the display of output:

All outputs are written to out(ostream object sharing cout buffer), instead of writing to cout buffer, store result in buffer owned by out and wherever endl manipulator added we can write to pipe?

Still, one question is how the parent should print multiple display request from child processes? Is it possible to allot some set of lines on the console for each backend and update those lines on each request as per backend type?

comment:6 by Olly Betts, 6 years ago

Currently BackendManagerGlass creates a set of databases in .glass directory by default. Can we make backend manager to create a separate set of databases in a directory given as a parameter? .glass takes 5MB of disk space, is it already a lot of memory? Or this change doesn't fit well?

We could just use separate cache directories for glass, remotetcp_glass, etc, but not only does that use more disk space, but we have to build the same database several times, and there's more disk cache pressure, both of which work against trying to speed things up.

I think more work would be needed to get valgrind to work properly here. In runtest we tell valgrind not to follow child processes after fork() (and changing that would make things complicated for remote tests). Probably when valgrind is in use the child process needs to exec() valgrind with a command to run apitest on just the backend of interest with some option to tell it to report output in TAP format.

Perhaps the simplest first step is to parallelise via the makefile - automake's parallel testharness understands TAP format test output and as the name suggests can run tests in parallel.

For output display, I think sending output in TAP format is the best approach. The child process can just write to the pipe by hooking up its end of the pipe as fd 1 and then using cout - no need to do anything special at the iostreams level. The parent process will need to handle displaying test results from multiple children in a sensible way - I think just showing each completed test and recording the failed ones to summarise at the end is probably the best approach.

I'm surprised you don't see more gain from parallelism - is one child running all the tests which involve glass?

comment:7 by Guruprasad Hegde, 6 years ago

Perhaps the simplest first step is to parallelise via the makefile - automake's parallel testharness understands TAP format test output and as the name suggests can run tests in parallel.

Ok. I am getting familiar about automake parallel test harness.

I'm surprised you don't see more gain from parallelism - is one child running all the tests which involve glass?

Yes. All tests which involve glass run in a single process. Tests related to other backends complete very quickly.

comment:8 by Olly Betts, 5 years ago

Component: OtherTest Suite
Type: taskenhancement
Version: git master

comment:9 by Olly Betts, 11 months ago

https://github.com/xapian/xapian/pull/210 has the prototype using automake's parallel test harness.

I tried updating that (and merging honey into the list of glass-based testsuite backends since we compact the glass DB to give the honey one) and the speed up is disappointing (~20% IIRC), I think mostly because the glass-based list is most of the work. I didn't copy the stats off my laptop but I'll try to remember to add them next time I turn it on.

I have come up with a cheap way to schedule though.

If we annotate testcases (could be automatically derived) with the database names they use then we can partition the testcases such that any which use the same DB are in the same partition. Some use more than one, and that will kind of span between DBs and pull them into the same partition, but many use their own DB or share a DB but without such overlaps.

We order these partitions by decreasing expected time to process (could be just by number of testcases as a simple approximation, but we could feedback from actual runtime), then each worker subprocess just gets the next partition from the list to work through when it needs more to do. This simple greedy algorithm should work well as we have a load of small partitions which should help even out the end of the run between workers.

These partitions can take into account testsuite backends, overlap between them (e.g. glass and honey) and which testcases run for each.

If we aren't running in parallel we could even run testcases in the same order as currently by having a suitable alternative partition set for that.

comment:10 by Olly Betts, 11 months ago

Here are the timings - saving was actually jut under 11% for 4-way parallelism. This was on x86 with eatmydata and without valgrind (just so the runs didn't take so long). /proc/cpuinfo reports 8 CPUs, but I think it's really 4 + hyperthreading.

time make check -sj4 VALGRIND= AUTOMATED_TESTING=1

real    2m26.917s
user    1m8.072s
sys     0m36.666s

time make check -sj2 VALGRIND= AUTOMATED_TESTING=1

real    2m29.914s
user    1m8.009s
sys     0m36.334s

time make check -s VALGRIND= AUTOMATED_TESTING=1

real    2m44.534s
user    1m8.860s
sys     0m37.316s

Also notable is that there's not much speed up from 2 to 4 processes.

User and system time is reassuringly similar across the runs too.

Note: See TracTickets for help on using tickets.