Mecway PC benchmarks

24

Comments

  • interesting comparison 3rav. victor actually suspected this from viewing their website. the examples they used to show a speed-up were not relevant to what we use it for. nice to have confirmation though.
  • edited May 2020
    Hi guys,
    I’ve run calculation on Mecway 13.0 with ccx 2.16 pardiso using 8 core.

    Hardware configuration:
    CPU: AMD Ryzen 9 3900x
    RAM: Crucial UDIMM 16GB ECC 2666MHz
    SSD: Samsung SSD 970 EVO Plus 500GB

    Run time: 1:09

    As antte mentioned I also added MKL_DEBUG_CPU_TYPE=5 in environment variable.

  • edited May 2020
    Hi, I will be grateful Your help, how can i download intel mkl or EX studio
    I'm trying to compile pardiso, one core → 12min ;(

    I can only see that ↑
    thanks in advance

  • you can get the intel mkl from here; https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html

    you do have to sign up to get it. i don't know what 'EX studio' is

    anthony
  • Hi all
    I ran the calculation on Mecway 13.1 with ccx 2.16 pardiso as follows:

    Hardware configuration:
    CPU: AMD Ryzen 9 3950x (16 core/32 virtual processors - runs at about 4.2GHz
    RAM: Crucial Ballistix UDIMM 32GB clocked at 3200MHz
    SSD: Samsung NVMre 1Tb

    Run time: 1:01

    I have not added the MKL_DEBUG_CPU_TYPE=5 in environment variable at this time
  • Time to go computer shopping...
  • I've re-calculated the bolt assembly(benchmark):

    CUP i7 6700 CPU 3.40GHZ. 8 cores
    MEMORY: 64GB DDR4
    Running Time: 0 minutes + 52 seconds. (MW+calculix 2.17 PASTIX !!!!! )
  • That's great. Are you using the pastix that is available for download in the Calculix forum?
  • Yes, JohnM. I download the brand new compilation from the Calculix Forum.
    It seems to work but I had a bug running a modal análisis
  • Where is the compiled version of Pastix? All I see on the Calculix page is the source code to be compiled.
  • it doesn't look like anyone has been able to compile it on windows using parsec. parsec seems to take a serial code and automatically turn it into a parallel code. but not entirely sure.
  • Oh. I was curious where mmartin got the brand new compilation he refers to and if it was available to install on Windows in his benchmark run above. Knowing where to get that would be very useful.
  • i believe he got it here; https://calculix.discourse.group/t/calculix-and-pastix-solver-windows-version/130

    however, the windows version isn't working completely. mainly it's missing parsec, which would make it work even better.

    i'd hold off for awhile, until there is a fully functional windows version
  • As you can see in the attached ccx2.17 PASTIX report, there are many disabled options yet. We will have to wait to another compilation.

    +-------------------------------------------------+
    + PaStiX : Parallel Sparse matriX package +
    +-------------------------------------------------+
    Version: 6.0.1
    Schedulers:
    sequential: Enabled
    thread static: Started
    thread dynamic: Disabled
    PaRSEC: Disabled
    StarPU: Disabled
    Number of MPI processes: 1
    Number of threads per process: 8
    Number of GPUs: 0
    MPI communication support: Disabled
    Distribution level: 2D( 128)
    Blocking size (min/max): 1024 / 2048

    Matrix type: General
    Arithmetic: Float
    Format: CSC
    N: 12339
    nnz: 457155

    +-------------------------------------------------+
    Ordering step :
    Ordering method is: Metis
    Time to compute ordering: 0.0592
    +-------------------------------------------------+
    Symbolic factorization step:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure: 1750149
    Fill-in of L: 3.828349
    Time to compute symbol matrix: 0.0040
    +-------------------------------------------------+
    Reordering step:
    Split level: 0
    Stoping criteria: -1
    Time for reordering: 0.0074
    +-------------------------------------------------+
    Analyse step:
    Number of non-zeroes in blocked L: 3500298
    Fill-in: 7.656698
    Number of operations in full-rank LU : 670.40 MFlops
    Prediction:
    Model: AMD 6180 MKL
    Time to factorize: 0.0569
    Time for analyze: 0.0010
    +-------------------------------------------------+
    Factorization step:
    Factorization used: LU
    Time to initialize internal csc: 0.0361
    Time to initialize coeftab: 0.0050
    Time to factorize: 0.0276 (23.71 GFlop/s)
    Number of operations: 670.40 MFlops
    Number of static pivots: 0
    RHS only consists of 0.0
    ________________________________________

    CSC Conversion Time: 0.003112
    Init Time: 0.075961
    Factorize Time: 0.068795
    Solve Time: 0.000013
    Clean up Time: 0.000000
    ---------------------------------
    Sum: 0.147881

    Total PaStiX Time: 0.147881
    CCX without PaStiX Time: 2.520762
    Share of PaStiX Time: 0.055414
    Total Time: 2.668644
    Reusability: 0 : 1
    ________________________________________

  • OK, guess I'll wait too
  • one thing i wonder is if they are using the parallel versions of METIS (ParMETIS) and SCOTCH (PT-SCOTCH). hopefully they are, but I don't know that's the case.
  • Is there some sort of recipe for compiling this solver similar to what Victor gave us for Pardiso? I'm wondering how the current compilation was done.
  • Hi,

    Important, for the best possible performance on PaStiX, set:

    set OPENBLAS_NUM_THREADS=1
    use only for physical processors (set OMP_NUM_THREADS= max physical cores)

    Please try with this settings.
  • We now run with the PaStiX ccx.exe available on the CalculiX discouse blog.

    I typically see about 30-35% speedup over PARDISO.

    I tried OPENBLAS_NUM=1 and 4, it seemed to run slower. I deleted and it ran the fastest.

    I also noticed that OMP_NUM_THREADS setting makes only a few seconds difference, and seems to be the best when I set =1

    Thoughts?


  • Try:
    set PASTIX_MIXED_PRECISION=1
  • Went through things a little more methodically, and here is my best current recipe on my 8proc laptop.

    PASTIX + all DLLS from PARDISO

    OMP_NUM_THREADS=6 (8 ran 20% slower)
    OPENBLAS_NUM_THREADS=1 (helped by 8%)
    PASTIX_MIXED_PRECISION=1 (helped by 8%)

    Now over 40% speedup over PARDISO.

  • Looks like the Pastix build has matured, is available compiled for Windows or one must do the procedure?
  • I downloaded from the Calculix Discourse group, it is compiled by 3rav
  • Where exactly is the latest Pastix compile for windows?
  • edited September 2020
    I've re-calculated the bolt assembly(benchmark) with the latest calculix version(by 3rav):

    CUP i7 6700 CPU 3.40GHZ. 4 cores, 8 threads.
    MEMORY: 64GB DDR4
    Running Time: 0 minutes + 44 seconds. (MW+calculix 2.17 PASTIX+PARDISO=SCOTCh+STATIC+OPENBLASS_NUM_TRHEADS=1 )
  • I finally found the latest version of the solver being discussed (3rav) and tried it on the benchmark bolt assy on two different machines with the following results:

    PASTIX+PARDISO, OPENBLAS_NUM_THREADS=1, PASTIX_MIXED_PRECISION=1
    MW+ccx2.17

    Intel Zeon W2123, 3.6GHz, 4 cores, 8 threads
    Memory: 16GB
    Run time: 0 min 43 sec

    and

    AMD Ryzen 3950, 4.2GHz, 16 cores, 32 threads
    Memory 32GB, 3200DDR4
    Run time: 0 min 30 sec

    I like it!
  • edited June 2021
    @3rav, we'd like to try a modification to CCX that I discussed with Guido, but we are not proficient at compiling a Windows executable. Can you provide a recipe here?
  • edited June 2021
    1. Install base needed packages:
    $ pacman -Sy mingw-w64-x86_64-toolchain
    $ pacman -Sy make

    2. Install packages needed for CalculiX:
    $ pacman -Sy mingw-w64-x86_64-openblas
    $ pacman -Sy mingw-w64-x86_64-spooles
    $ pacman -Sy mingw-w64-x86_64-arpack

    3. Simple file modification ccx_2.17 and CalculiXstep.c like:


    and ccx_2.17.c (already after modification):



    4. Modify the Makefile to this form:

    CFLAGS = -Wall -O2 -DARCH="Linux" -DARPACK -DMATRIXSTORAGE -DNETWORKOUT -fcommon
    FFLAGS = -Wall -O2 -fallow-argument-mismatch

    #SPOOLES
    CFLAGS+= -I /mingw64/include/spooles -DSPOOLES

    CC=gcc
    FC=gfortran

    .c.o :
    $(CC) $(CFLAGS) -c $<
    .f.o :
    $(FC) $(FFLAGS) -c $<

    include Makefile.inc

    SCCXMAIN = ccx_2.17.c

    OCCXF = $(SCCXF:.f=.o)
    OCCXC = $(SCCXC:.c=.o)
    OCCXMAIN = $(SCCXMAIN:.c=.o)

    LIBS = -lpthread -lm
    LDFLAGS = -lspoolesMT -lspooles
    LDFLAGS += -lopenblas -larpack

    ccx_2.17.exe: $(OCCXMAIN) ccx_2.17.a $(LIBS)
    ./date.pl; $(CC) $(CFLAGS) -c ccx_2.17.c; $(FC) -Wall -O2 -o $@ $(OCCXMAIN) ccx_2.17.a $(LIBS) $(LDFLAGS)

    ccx_2.17.a: $(OCCXF) $(OCCXC)
    ar vr $@ $?
  • THANKS!
  • @3rav, what version of pthreads are you using?
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!