Mecway PC benchmarks

prop_design · April 2020

interesting comparison 3rav. victor actually suspected this from viewing their website. the examples they used to show a speed-up were not relevant to what we use it for. nice to have confirmation though.

Franjo · May 2020

Hi guys,
I’ve run calculation on Mecway 13.0 with ccx 2.16 pardiso using 8 core.

Hardware configuration:
CPU: AMD Ryzen 9 3900x
RAM: Crucial UDIMM 16GB ECC 2666MHz
SSD: Samsung SSD 970 EVO Plus 500GB

Run time: 1:09

As antte mentioned I also added MKL_DEBUG_CPU_TYPE=5 in environment variable.

MMK66 · May 2020

Hi, I will be grateful Your help, how can i download intel mkl or EX studio

I'm trying to compile pardiso, one core → 12min ;(

I can only see that ↑
thanks in advance

prop_design · May 2020

you can get the intel mkl from here; https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html

you do have to sign up to get it. i don't know what 'EX studio' is

anthony

tk1537 · August 2020

Hi all
I ran the calculation on Mecway 13.1 with ccx 2.16 pardiso as follows:

Hardware configuration:
CPU: AMD Ryzen 9 3950x (16 core/32 virtual processors - runs at about 4.2GHz
RAM: Crucial Ballistix UDIMM 32GB clocked at 3200MHz
SSD: Samsung NVMre 1Tb

Run time: 1:01

I have not added the MKL_DEBUG_CPU_TYPE=5 in environment variable at this time

JohnM · August 2020

Time to go computer shopping...

mmartin · August 2020

I've re-calculated the bolt assembly(benchmark):

CUP i7 6700 CPU 3.40GHZ. 8 cores
MEMORY: 64GB DDR4
Running Time: 0 minutes + 52 seconds. (MW+calculix 2.17 PASTIX !!!!! )

JohnM · August 2020

That's great. Are you using the pastix that is available for download in the Calculix forum?

mmartin · August 2020

Yes, JohnM. I download the brand new compilation from the Calculix Forum.
It seems to work but I had a bug running a modal análisis

tk1537 · August 2020

Where is the compiled version of Pastix? All I see on the Calculix page is the source code to be compiled.

prop_design · August 2020

it doesn't look like anyone has been able to compile it on windows using parsec. parsec seems to take a serial code and automatically turn it into a parallel code. but not entirely sure.

tk1537 · August 2020

Oh. I was curious where mmartin got the brand new compilation he refers to and if it was available to install on Windows in his benchmark run above. Knowing where to get that would be very useful.

prop_design · August 2020

i believe he got it here; https://calculix.discourse.group/t/calculix-and-pastix-solver-windows-version/130

however, the windows version isn't working completely. mainly it's missing parsec, which would make it work even better.

i'd hold off for awhile, until there is a fully functional windows version

mmartin · August 2020

As you can see in the attached ccx2.17 PASTIX report, there are many disabled options yet. We will have to wait to another compilation.

+-------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+-------------------------------------------------+
Version: 6.0.1
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Disabled
Number of MPI processes: 1
Number of threads per process: 8
Number of GPUs: 0
MPI communication support: Disabled
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048

Matrix type: General
Arithmetic: Float
Format: CSC
N: 12339
nnz: 457155

+-------------------------------------------------+
Ordering step :
Ordering method is: Metis
Time to compute ordering: 0.0592
+-------------------------------------------------+
Symbolic factorization step:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure: 1750149
Fill-in of L: 3.828349
Time to compute symbol matrix: 0.0040
+-------------------------------------------------+
Reordering step:
Split level: 0
Stoping criteria: -1
Time for reordering: 0.0074
+-------------------------------------------------+
Analyse step:
Number of non-zeroes in blocked L: 3500298
Fill-in: 7.656698
Number of operations in full-rank LU : 670.40 MFlops
Prediction:
Model: AMD 6180 MKL
Time to factorize: 0.0569
Time for analyze: 0.0010
+-------------------------------------------------+
Factorization step:
Factorization used: LU
Time to initialize internal csc: 0.0361
Time to initialize coeftab: 0.0050
Time to factorize: 0.0276 (23.71 GFlop/s)
Number of operations: 670.40 MFlops
Number of static pivots: 0
RHS only consists of 0.0
________________________________________

CSC Conversion Time: 0.003112
Init Time: 0.075961
Factorize Time: 0.068795
Solve Time: 0.000013
Clean up Time: 0.000000
---------------------------------
Sum: 0.147881

Total PaStiX Time: 0.147881
CCX without PaStiX Time: 2.520762
Share of PaStiX Time: 0.055414
Total Time: 2.668644
Reusability: 0 : 1
________________________________________

tk1537 · August 2020

OK, guess I'll wait too

prop_design · August 2020

one thing i wonder is if they are using the parallel versions of METIS (ParMETIS) and SCOTCH (PT-SCOTCH). hopefully they are, but I don't know that's the case.

tk1537 · August 2020

Is there some sort of recipe for compiling this solver similar to what Victor gave us for Pardiso? I'm wondering how the current compilation was done.

3rav · September 2020

Hi,

Important, for the best possible performance on PaStiX, set:

set OPENBLAS_NUM_THREADS=1
use only for physical processors (set OMP_NUM_THREADS= max physical cores)

Please try with this settings.

JohnM · September 2020

We now run with the PaStiX ccx.exe available on the CalculiX discouse blog.

I typically see about 30-35% speedup over PARDISO.

I tried OPENBLAS_NUM=1 and 4, it seemed to run slower. I deleted and it ran the fastest.

I also noticed that OMP_NUM_THREADS setting makes only a few seconds difference, and seems to be the best when I set =1

Thoughts?

3rav · September 2020

Try:
set PASTIX_MIXED_PRECISION=1

JohnM · September 2020

Went through things a little more methodically, and here is my best current recipe on my 8proc laptop.

PASTIX + all DLLS from PARDISO

OMP_NUM_THREADS=6 (8 ran 20% slower)
OPENBLAS_NUM_THREADS=1 (helped by 8%)
PASTIX_MIXED_PRECISION=1 (helped by 8%)

Now over 40% speedup over PARDISO.

Sergio · September 2020

Looks like the Pastix build has matured, is available compiled for Windows or one must do the procedure?

JohnM · September 2020

I downloaded from the Calculix Discourse group, it is compiled by 3rav

tk1537 · September 2020

Where exactly is the latest Pastix compile for windows?

mmartin · September 2020

I've re-calculated the bolt assembly(benchmark) with the latest calculix version(by 3rav):

CUP i7 6700 CPU 3.40GHZ. 4 cores, 8 threads.
MEMORY: 64GB DDR4
Running Time: 0 minutes + 44 seconds. (MW+calculix 2.17 PASTIX+PARDISO=SCOTCh+STATIC+OPENBLASS_NUM_TRHEADS=1 )

tk1537 · September 2020

I finally found the latest version of the solver being discussed (3rav) and tried it on the benchmark bolt assy on two different machines with the following results:

PASTIX+PARDISO, OPENBLAS_NUM_THREADS=1, PASTIX_MIXED_PRECISION=1
MW+ccx2.17

Intel Zeon W2123, 3.6GHz, 4 cores, 8 threads
Memory: 16GB
Run time: 0 min 43 sec

and

AMD Ryzen 3950, 4.2GHz, 16 cores, 32 threads
Memory 32GB, 3200DDR4
Run time: 0 min 30 sec

I like it!

JohnM · June 2021

@3rav, we'd like to try a modification to CCX that I discussed with Guido, but we are not proficient at compiling a Windows executable. Can you provide a recipe here?

3rav · June 2021

1. Install base needed packages:
$ pacman -Sy mingw-w64-x86_64-toolchain
$ pacman -Sy make

2. Install packages needed for CalculiX:
$ pacman -Sy mingw-w64-x86_64-openblas
$ pacman -Sy mingw-w64-x86_64-spooles
$ pacman -Sy mingw-w64-x86_64-arpack

3. Simple file modification ccx_2.17 and CalculiXstep.c like:

and ccx_2.17.c (already after modification):

4. Modify the Makefile to this form:

CFLAGS = -Wall -O2 -DARCH="Linux" -DARPACK -DMATRIXSTORAGE -DNETWORKOUT -fcommon
FFLAGS = -Wall -O2 -fallow-argument-mismatch

#SPOOLES
CFLAGS+= -I /mingw64/include/spooles -DSPOOLES

CC=gcc
FC=gfortran

.c.o :
$(CC) $(CFLAGS) -c $<
.f.o :
$(FC) $(FFLAGS) -c $<

include Makefile.inc

SCCXMAIN = ccx_2.17.c

OCCXF = $(SCCXF:.f=.o)
OCCXC = $(SCCXC:.c=.o)
OCCXMAIN = $(SCCXMAIN:.c=.o)

LIBS = -lpthread -lm
LDFLAGS = -lspoolesMT -lspooles
LDFLAGS += -lopenblas -larpack

ccx_2.17.exe: $(OCCXMAIN) ccx_2.17.a $(LIBS)
./date.pl; $(CC) $(CFLAGS) -c ccx_2.17.c; $(FC) -Wall -O2 -o $@ $(OCCXMAIN) ccx_2.17.a $(LIBS) $(LDFLAGS)

ccx_2.17.a: $(OCCXF) $(OCCXC)
ar vr $@ $?

JohnM · June 2021

THANKS!

JohnM · June 2021

@3rav, what version of pthreads are you using?

Mecway

Forum

Mecway PC benchmarks

Comments

Howdy, Stranger!