CPU scaling in Mecway/CCX

I ran a few tests to
compare compute time vs number of threads in Mecway and CCX (2.10 and
2.12MTNO are from kwip, 2.11 from the Mecway7 beta). It seems that
there is very little benefits in using more than 2 threads and none
for more than 4. Running multiple instances in parallel is more
efficient, at least on my machine. Is this behaviour typical or are
there some Windows settings that I should try? And was CCX 2.11
supposed to be faster than 2.10, because I practically got the same
results for both?

Also, is there any
significant performance gain running CCX in Linux compared to
Windows? and can threads be used more efficiently?

Thanks!

Comments

  • While I can't answer your questions, that's a nice comparison! It's reasonable to expect speed to plateau. I don't expect that CCX is particularly optimized for running in parallel, unlike a supercomputer code. It's hard do that and probably not worth the effort if it's targeting PCs with only 4-8 cores.

    Here's a graph of Amdahl's law (theoretical speedup for increasing parallelization) for a program where 60% of it can be run in parallel. It looks quite similar to your graph. The formula I used is Normalized time = (1-p)+p/n where I chose p=60% to get a similar looking curve and n is the number of threads.
  • That's what I was expecting as well, I just wanted to make sure I wasn't missing anything obvious. Thanks!
  • After some digging around, I found answers to my questions.

    *There is no real difference for (optimized) build of CCX2.10, 2.11 and 2.12 under Windows.

    *Under a less than optimal Linux set-up, I have seen 5-20% faster computing than in windows. It can also run solve models up to at least 600 000 nodes (RAM full), compare to windows where 300 000 seems to be the limit.

    *If you need performance (up to 4 faster and 30% less memory), Pardiso is the way to go. There might be some more optimization possible (I use the script from feacluster.com to build it) to make things more consistent on a specific system if you have the time, know-how and money.

    I hope this can be of some help.
  • Have tried on a Xeon with 16 cores, but even if I setup all the environment variables to use the 16 cores, CCX runs only in 8. On the other way have ran also some problems in a bigger cluster and again, there were no big difference against my old i7 with 8 cores. Maybe adding a SSD disk would help to speedup the disk operations.

    I have almost the same problems for big problems, guess that 500.000 nodes was the bigger model able to solve, no matter the memory available.
  • I remember in the Windows XP era, I had a standard model that I ran on every new workstation to have a kind of benchmarking (not using CalculiX but Zebulon, another solver), and one thing that I notice was that the state of the Windows installation was important, running the same problem after one or two years of use on the same hardware have a 40 or 50% increase in time to solve.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!