CCX PARDISO Low Multi-Core Usage

I've been noticing that regardless of the type of solve, the multi-core usage is very low. I'm using the CCX PARDISO exe from the Calculix website and the latest Intel MKL. Has anyone else noticed the same thing?
«1

Comments

  • edited September 2021
    Ran a few tests:

    bolted joint with contact, 100k DOF run times
    ccx_PARDISO.exe from calculix_2.17.1 with 1 processor 11:55
    ccx_PARDISO.exe from calculix_2.17.1 with 6 processor 6:58
    ccx_PASTIX 2.17 by 3rav 31 aug 2021 with 6 processor 6:42
    ccx PASTIX 2.17 by 3rav 24 sie 2020 with 6 processor 6:16
    ccx_pardiso_dynamic216.exe by sob with 6 processor 6:21

    Test case has contact but is relatively "stable".

    Interesting extra:
    I ran a "pop" test with a hyperelastic model with a buckling behavior.
    Only the ccx_pardiso_dynamic216.exe ran through, all others failed at the "pop" event.



  • edited September 2021
    thanks john,

    i'll have to see what pardiso dynamic is. i'm not familiar with it. i attached a picture of some recent benchmarks i did to find the optimum number of cores to use on my computer. i chose to go with three cores, based on the results. however, the thing i am curious about is the amount of time spent in multi-core. i am having very little multi-core usage. it is primarily using only one core and occasionally going to multi-core. it doesn't seem to matter what type of analysis i run. that characteristic remains.

    update; see latest version in a post below
  • edited September 2021
    here is a typical system usage of what i have been seeing. it loads up the memory then uses multi-core then goes back to single core. the thing that is disappointing is it's primarily in single core.


  • That seems pretty poor. Here's my result for:
    • simple_cube.liml
    • CCX_PARDISO.exe that shows date "26 lip 2020 19:57:20" with the MKL dlls in Mecway 14.
    • OMP_NUM_THREADS=12
    It's using multiple cores the whole time.

  • that's strange. it ran in single core the whole time on my computer.
  • i guess ccx_pardiso_dynamic.exe was an old file that is no longer in the download. the readme file mentions that name. i don't know if ccx_pardiso.exe is the new version. either way, it's calling mkl_sequential.dll and running in single core most of the time. i only get brief periods of multicore.
  • here is an old benchmark that looks similar to yours:
    https://mecway.com/forum/discussion/750/propeller-hub/p1

    I ran the cube with PASTIX
    1 proc - 21% CPU usage 1:45
    6 proc - 70% CPU usage 0:58

    similar with PARDISO
  • I have a similar performance than prop_design with CCX_PARDISO.exe but I think it is using all cores not just one. The point is that seems to be limited in the utilization% of each core. That limitation disappears with Pastix that goes full capacity.


  • edited September 2021
    the best i can tell the ccx_PARDISO.exe in the download from the ccx website isn't setup right. it's not calling the files that the readme file says. my testing found the following:

    files used from the intel mkl:

    mkl_intel_thread.1.dll (have to rename it mkl_intel_thread.dll)
    mkl_sequential.1.dll (from the readme, this file isn't supposed to be used)
    mkl_def.1.dll
    mkl_core.1.dll

    the following don't seem to be used:

    libiomp5md.dll
    mkl_intel_lp64_dll.lib
    mkl_avx.1.dll
    mkl_avx2.1.dll (from the readme, this file is supposed to be used)
    mkl_avx512.1.dll

    rafal is credited with creating the file. i messaged him about it and he says a new exe will be released at some point and to wait for that.

    my cpu supports avx2 and avx512. i don't think either are being used. the mkl_sequential.dll is being called as well.

    there are a lot of iparm settings for mkl pardiso. i'm not sure if anyone has optimized those settings. so there are a lot of areas that may need work as far as creating the windows exe.

    i would try compiling it myself but i would like to use the intel oneapi compilers instead of msys2. the biggest issue is i have no idea how to do this. it seems really complex.

    hopefully, the next ccx windows exe will run better. i think it ran better for me in the past but i didn't save those exe files. the latest one seems to run really poorly for me.

    thanks for the feedback though. it seems like some of you are getting better multi-core usage than i am. i suspect we may not be using the same exe file. if we are, then perhaps it varies based on cpu type or mkl version. i have one of the latest intel cpus. so i can't imagine that's the issue.


  • 4 Threads Pastix (25 wrz 2020 08:54:48 Windows from 3rav) shown above, about 3 seconds slower than 8 threads. 6 threads a tad less than 8 threads. 5 threads a tad worse than 4. CPU utilization a function of # of threads for either verision of CCX.
    My system is nearly the same as Victors except my verision of Pardiso CCX may use a very slightly newer version of MKL.

    Max _____Paradiso MKL ____Pastix
    Threads Time (tot and CCX) _Time (tot and CCX)
    2_______1:13 1:07
    4________ :55 :49 ________:38 :31
    5________ :56 :49
    6________:52 :45.5
    8________:52 :45.7 ______:35 :28

    This Pastix tends to have undefined memory problems on large problems before the system runs out of ram available. But it is fast. I tend to run with 4 threads as cores are running 28% less for the same problem. Intel chips of various varieties may be different
  • For ccx 2.18 I changed from "Dynamic" to "Single Dynamic Library", from: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html

    Base description (http://portal.nacad.ufrj.br/online/intel/mkl/common/mkl_userguide/GUID-7091CAB6-0506-443A-ABA0-CCE2245A1A1C.htm):
    "You can simplify your link line through the use of the Intel MKL Single Dynamic Library (SDL).

    To use SDL, place libmkl_rt.so on your link line. For example:

    icс application.c -lmkl_rt

    SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking with SDL provides:

    Intel LP64 interface on systems based on the Intel® 64 architecture
    Intel interface on systems based on the IA-32 architecture
    Intel threading"


    https://software.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/linking-your-application-with-the-intel-oneapi-math-kernel-library/linking-in-detail/dynamically-selecting-the-interface-and-threading-layer.html

    https://software.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/linking-your-application-with-the-intel-oneapi-math-kernel-library/linking-quick-start/using-the-single-dynamic-library.html


  • edited September 2021
    The ccx 2.18 version exe has just been posted to the ccx website. I'll post updated runtime results, once I get them.
  • edited September 2021
    so the new version seems to run Pastix solver with scotch reordering. The previous version ran Pardiso solver with Metis reordering. I got a little bit better results with the previous version. Oddly enough, the multi-core usage was much better with Pastix. However, it didn't translate to runtime improvements. Moreover, it's using more power to generate slower results. This is much different than what others have seen. I attached a pdf file with the updated results. Pastix did seem to scale better with core usage. It certainly is making good use of the cores. I suspect if the pardiso iparm values were tweaked it could use more of the cores, but don't know that.

    ps, a couple of notes. the processor used has four physical cores. the other four are hyperthreading. pastix didn't like the hyperthreading. one other oddity with pastix was it used one additional core beyond what was specified. so if i specify three cores, it actually used four. pardiso was fine with hyperthreading and only used the number of cores specified. for now, i switched back to the ccx 2.17 pardiso build.

    update; see latest version in a post below
  • I saw prop_designs intermittant multi core use with earlier versions last year.
  • thanks mike. that makes me feel better. another user here shared some of the old ccx exe files. i tested them and was seeing the same type of usage as the 2.17 build.
  • hi,

    i probably have a dumb question here, but i can't figure out how to get pardiso to run using the latest exe file. it defaults to the pastix solver. looking at the ccx manual, it seems to indicate that you have to specify the solver type for any given analysis. that's not something mecway is currently doing. i couldn't find any 'easy' way to call one of the solvers (spooles,pastix,pardiso) within the ccx_dynamic.exe file.
  • I have a long list of ccx.exe files. They are all different depending on what was included in the compilation. Some don't work for me. Whatever Victor provides works, but is not the bleeding latest. Paradiso is proprietary, so he can't provide it or integrate it without paying a fee (not trivial). You can compile yourself into a CCX executable, but I needed a make file by others and had to get the link files named right. Pastix and a few other things use the gpl, but need to be compiled (linked actually) into the ccx compile. There was a Pastix & pardiso compile a ways back dated Friday, ‎September ‎25, ‎2020, ‏‎1:19:31 AM, ccx_PASTIXandPARDISO.exe That you put a "solver=" line in the ccx input to chose the input. I have not tried it. I just use the MKL compiled version I compiled with the most recent intel MKL and a PASTIX version dated Friday, ‎September ‎25, ‎2020, ‏‎1:18:38 AM. I downloaded a new pastix version yesterday created ‎Tuesday, ‎August ‎31, ‎2021, ‏‎9:48:52 AM but have not tried it yet.
  • thanks mike. yeah, compiling it myself seems like it's out of the question for now. way too many things i don't understand still. in any event, you mentioned the solver= line in the inp file. that is what i'm wondering about. it looks like the new ccx_dynamic.exe will support that. it's just mecway doesn't have that built in. there is probably a way to do it with the custom ccx cards. it would be nice if there was an easy way though. so far the new ccx_dynamic.exe is running pastix. on my cpu pastix isn't impressive at all. it did solve the issue of this post. that being the last pardiso build didn't use much multi core. pastix is doing great in that regard. however, it doesn't translate to speed improvements on my cpu. so i would rather use pardiso for now. i can switch back to the last build. that's easy. however, i'd be interested to try the new build using pardiso. i setup the mkl environment variables based on what 3rav posted. i want to test it and see if that fixes the pardiso multi core usage issue.
  • edited September 2021
    Hi,

    Try this, option "modify keyword", select *STATIC:







    SPOOLES
    PARDISO
    PaStiX
  • edited September 2021
    ah, brilliant. thanks 3rav

    update; for some reason spooles and pardiso still won't run. i also tried ccx_static.exe and spooles.
  • Seems to work (running now pastix) on yesterday's release.
  • I can only run pastix. but it ran by default anyway. when i switch to solver=spooles or solver=pardiso it stops without an indication of why.
  • 3rav---
    I am successfully running your 31 August version of ccx with pastix thanks to your instructions, but it gags on 1, 890,000 nodes in iteration 1. I believe this is due to nnz being about 6.9 GB. Reading in the Pastix documentation it gives a recommendation to change to 8 byte integers for the PasTiX compile for large problems. I have run up to 4.1 MB node problems using Mecway internal.

  • edited September 2021
    @prop_design

    I have the same symptoms as I did not copy all the necessary oneAPI dll library.

    In my case (Haswell) ccx need:
    libiomp5md.dll
    mkl_avx2.1.dll
    mkl_core.1.dll
    mkl_intel_thread.1.dll
    and most important: mkl_rt.1.dll



  • Hi,

    I'm completely puzzled regarding the windows variables I have to use to set/optimize the Pc/ccx performance.

    I have :

    CCX_NPROC_STIFFNESS
    NUMBER_OF_PROCESSORS
    OMP_NUM_THREADS

    My processor is an intel i7 4700MQ which according to Intel has:

    Nº of Cores (4 cores)
    Nº of Threads (8 Threads). # A Thread, or thread of execution, is a software term for the basic ordered sequence of instructions that can be passed through or processed by a “single core”.

    ¿Which is the OMP_NUM_THREADS value.?

    2 Threads / core
    4 Cores x 2 = 8 Threads total
    4 Cores x 8 Threads/Core (intel) =32

    ¿Why are the solvers showing "using up to 8 Cpu(s)" when I have defined CCX_NPROC_STIFFNESS = NUMBER_OF_PROCESSORS = 4.
    When using 16 or 32 threads PastiX works but runs super slow.

    Thanks
  • thanks again 3rav,

    that is really weird. i had links to the file locations in my env variables. those worked in the past. however, i tried copying the files to the folder of ccx_dynamic.exe and it worked, like you said. i haven't had that happen before.

    in any event, this version is working better. i set two env variables for the intel sdl. those are:

    MKL_INTERFACE_LAYER value LP64
    MKL_THREADING_LAYER value INTEL

    the files I copied to the ccx_dynamic.dll folder are:

    libiomp5md.dll
    mkl_avx2.1.dll
    mkl_avx512.1.dll
    mkl_core.1.dll
    mkl_def.1.dll
    mkl_intel_thread.1.dll
    mkl_rt.1.dll

    sometimes it will give you help and tell you what's missing and sometimes it won't. if i switch to the sequential mkl then i think i will also need:

    mkl_sequential.1.dll

    i haven't tested spooles yet and i need to run more tests. i'll get back with results later. using 3 cores pardiso is now the fastest option for me. it's only a little faster, but this is what i expected with better multi-core usage. i think the previous versions were stuck in sequential usage for some reason.
  • Thank you 3rav.

    Another question.

    ¿Do you know if PROCESSORS in CCX_NPROC_STIFFNESS and NUMBER_OF_PROCESSORS refers to Logical Processors or Cores?

    Thanks
  • I had spooles work on a small model. on a large model it just exits.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!