Here's a summary of useful options for CCX. I'll keep this post updated as an easy reference. Feel free to add other knowledge as a reply and I'll incorporate it here if appropriate.
1) SPOOLES. The default CCX included with Mecway.
Speed: Slow
Node limit*: 350 000
Difficulty: None
2)** MKL CCX downloaded from
http://www.dhondt.de/ where it says "
For an update of the bconverged distribution replace the executables in the bconverged download by the following files .".
Speed: Fast
Node limit*: over 550 000
Difficulty: Hard or impossible
3)** MKL CCX as in 2) but also install
Intel oneAPI Base Toolkit from
https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=webdownload&options=onlineand copy all the DLL files from %ProgramFiles(x86)%/Intel/oneAPI/mkl/latest/redist/intel64/ to the same location as the CCX .exe file. This takes advantage of CPU features like AVX2.
Speed: Faster
Node limit*: over 550 000
Difficulty: Hard or impossible
4)** Compile CCX with MKL and a patch to enable Out-Of-Core (OOC) mode. The source code with step-by-step instructions is in
https://mecway.com/download/ccx_win64_mkl_pardiso_source_2.21_2.zip . After compiling, set it in Mecway through Tools -> Options -> CalculiX -> Solver.
Speed: Fast or Faster
Node limit*: over 1 400 000
Difficulty: Hard
5) Download CCX 2.22 compiled with PastiX from
https://dhondt.de/calculix_2.22_4win.zip. Extract ccx_static.exe then set it in Mecway through Tools -> Options -> CalculiX -> Solver. For better correctness, set the environment variable PASTIX_MIXED_PRECISION=0.
Speed: Fast
Node limit*: over 700 000
Difficulty: Easy
*Node limit is the approximate maximum number of nodes for hex20 elements with static analysis. It also depends on mesh connectivity, available RAM and available disk space.
**For multithreading, set the environment variable OMP_NUM_THREADS to the number of threads, eg. 8.
Comments
Basically, run PASTIX but keep PARDISO handy
We have found that 4-6 processors is useful, over that is diminishing return.
For model size, we try to keep things under 500k nodes.
How many things in the Intel oneAPI base toolkit is required?
For example, the Intel Distribution for Python takes up a lot of space so i'd prefer to only install what is nessecary for this to work.
Also, the files "mkl_core.1.dll" did not exist but "mkl_core.2.dll" and so on did. This was the case for all of the files that had a number before the .dll. renaming the files made it work for MSYS64 to build it.
After doing as per 4) and solving with ccx.exe, the solver output still states that the symmetric spooler solver was used when doing a static analysis and ccx_MKL.exe does not work. Solving with ccx_MKL.exe gives the response "solver did not produce an output file".
I do notice that the files in /mecway 14/ccx does not have .1 before the .dll. With that removed from the new files, nothing has changed.
yes the dll re-numbering is annoying. intel keeps adding numbers to the dll files for some reason. first it was .1 now it's .2. previously, there were no numbers. this causes many of the ccx distributions to not work, unless you rename one particular file.
i know what you mean about the install size. first you have to install microsoft visual studio, then intel base toolkit, then intel hpc toolkit. so this can get huge. there is one dll file where you have to install at least one compiler to get the dll. i'm not 100% sure what options would lead to the minimum install size. i actually use intel fortran. so i install that. but i have since added most of the others, just in case someone comes up with compiler instructions for ccx using the intel compilers. if there is a language you actually use, i would try that first. if you don't use them, the base of C or C+ or whatever they are calling it will work.
below are my latest install instructions.
------------------------------------------------------------------------------------------------------
~ Getting the Calculix Windows build to run properly ~
------------------------------------------------------------------------------------------------------
Download the Calculix Windows binary files from the Calculix website.
Set the following Windows system environment variables:
MKL_INTERFACE_LAYER=LP64
MKL_THREADING_LAYER=INTEL
OMP_NUM_THREADS=(Set to desired number of cores)
OPENBLAS_NUM_THREADS=1
PASTIX_MIXED_PRECISION=1
Copy the following files to the same folder as the ccx_dynamic.exe file. The following files come from installing the Intel oneAPI Base and HPC toolkits.
libiomp5md.dll (Doesn't come with the 'Base' toolkit, have to install the 'HPC' toolkit)
mkl_core.2.dll
mkl_def.2.dll
mkl_intel_thread.2.dll
rename mkl_rt.2.dll to mkl_rt.1.dll
One of the following three files will also be needed. You will have to experiment to find out which your computer can use. Move each file in and out of the folder with the ccx_dynamic.exe file, to find out which one you need. Try to run PARDISO each time you move a different file into the folder. You may get a message saying a file is missing or the solver may just quit without any indication of what's wrong. Only have one of the files in the folder when you test.
mkl_avx512.2.dll (fastest)
mkl_avx2.2.dll (faster)
mkl_sequential.2.dll (slowest)
Make sure to keep the Intel oneAPI toolkits up to date. After you update the toolkits, copy all of the files you needed into the ccx_dynamic.exe folder again.
Use the Calculix SOLVER= option to call one of three available solvers:
SPOOLES (This solver requires a lot of memory for large problems)
PARDISO (I generally get the lowest run times by using this solver. It also has the least hardware utilization)
PASTIX (Best multi-core utilization, but not necessarily the fastest option)
Examples; SOLVER=SPOOLES, SOLVER=PARDISO, SOLVER=PASTIX
If the above command is not specified, the ccx_dynamic.exe file uses PASTIX by default.
The run times you get with the solvers seems to depend greatly on the computer hardware you have. For my budget laptop, PARDISO is the fastest solver. It also uses the hardware the most efficiently. Meaning, the power draw is the lowest. PASTIX does a great job using multi-core. However, the run times I get are longer than with PARDISO. Also, it's using the most power. SPOOLES really isn't an option for me, because it can not solve large models with a reasonable amount of RAM.
------------------------------------------------------------------------------------------------------
Looks like I need to update the build script and makefiles for the new OneAPI filenames. If you want to do it yourself sooners, the two files that refer to these DLLs are
ccx/src/build.sh
ccx/src/patches/CalculiX/ccx_2.17/src/Makefile_MKL
I don't recommend renaming them since they may refer to each other and expect the 2 in the name.
I had upgraded my PC to Windows 10 some time back and got spun around trying to regain Pardiso functionality. Never tried Pastix, but would also like to.
Pardiso remains elusive. Option #2) above says extract ccx_pardiso.exe, but I don't think it's called that anymore. Tried pointing to ccx_dynamic.exe, but got the red death screen (No solve). I confess ignorance. I don't know if the pardiso binary I'm searching for was compiled with the libraries, or if I have to add the libraries to its directory, or if I have to build the thing myself. Victor's update RE: Version 15.0 release -- "CCX updated to 2.19 with source code that includes all required MKL files" -- does that include the library files for Pardiso?
So many forum members have been good enough to post their (evolving) "recipe", I'm just stuck in the intersection and need a helping Scout to cross the street. Line-by-line instruction set for third graders would suit me fine.
I've posted the setup quite a few times here and in the Calculix forum. Not sure if it's worth repeating. Victor has a different method. In fact, I think everyone does it slightly differently. So it's hard to answer. It also changes when Intel renames their files.
The last time I tested the Calculix for Windows files, the dynamic version was running slightly faster than the static version. The person who created the files was surprised by that. I run the ccx_dynamic.exe file. It runs Pastix by default now. So I have to manually force Pardiso. I prefer to stick with Pardiso. To run that, you have to get all the files from the Intel oneAPI distribution. I'm not sure what the current version of Calculix for Windows is doing. I would have to download it and see. The version I downloaded was when 2.19 first came out.
Thanks. That clears up a few things.
Are you also using the CCX folder modify keyword *STATIC ==> Name=SOLVER , Value=PARDISO ?
I attached my personal readme file. It's to get the ccx dynamic without the i8 running. To use PARDISO, there are things you can modify in mecway. I have a few different ways of doing it. One way is via importing the attached file.
Whatever the missing libraries are to get ccx dynamic i8 to run, don't appear to be anything from the Intel oneAPI. I tried all of the files there and it still doesn't run.
Thanks, I have extracted static i8 and just need to persuade someone with the right admin permissions to copy it over. I have neither the confidence not the permissions to attempt anything other than Victor's option 5! Will I have the option to change the SOLVER keyword as per cwharpe & prop_design's suggestion above?
My models often have very large numbers of nodes. We do a lot of thin film thermal analysis, also components that have one thin dimension. It is very difficult to be parsimonious with the nodes and have sufficient resolution within these components. The bits of the models that I can ditch are already meshed very coarsely so it doesn't save me much. I quite often have no symmetry to take advantage of.
One more question: I have a laptop with nearly 16 GB RAM and an Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz (copied straight from my computer settings). How many threads should I specify? By using more threads is there an increased danger of 'out of memory' failures?
you should be able to use the SOLVER keyword, if using the CCX solver. I don't think the number of threads affects the memory. The memory is mainly set by the number of nodes. You can experiment with the number of threads. With new CPUs they don't always scale the way you would think. Especially with laptops. On my laptop, PASTIX seems to make things slower. It looks like PASTIX hammers the CPU a lot more than PARDISO and that makes the CPU frequency go down. So PARDISO ends up working better for me. For the CPU model you specified, you could try 2, 4, 6, and 8 threads and see how they perform.
I'm not exactly sure what the i8 binary applies to. I know it means double precision integers. However, I'm not sure which solvers it applies to (SPOOLES, PASTIX, or PARDISO). I think I saw something on the CCX forum that mentioned it only applied to one of the solvers. Perhaps you or someone else knows. I tried benchmarking an earlier version of the i8 binary and it was a lot slower than the normal binary. I also have a laptop with 16GB of memory. I have been keeping my models in core to keep the solve times reasonable. It sounds like you can't do that. Once things go out of core, it takes so long I have abandoned the solve. I think I keep the node count around 700k. You may very well need the i8 binary for higher node counts.
anthony
I experimented once without success with:
3)** MKL CCX as in 2) but also install Intel oneAPI Base Toolkit from
https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=webdownload&options=online
and copy all the DLL files from %ProgramFiles(x86)%/Intel/oneAPI/mkl/latest/redist/intel64/ to the same location as the CCX .exe file. This takes advantage of CPU features like AVX2.
Speed: Faster
Node limit*: over 550 000
Difficulty: Hard or impossible
I have AMD chip and it looked like there were 3rd party items that wouldn't be compatibile with it or it was me just getting lost.
After that I selected:
2)** MKL CCX downloaded from http://www.dhondt.de/ where it says "For an update of the bconverged distribution replace the executables in the bconverged download by the following files .".
Speed: Fast
Node limit*: over 550 000
Difficulty: Hard or impossible
and this one worked for me just fine, no issues at all, but after installation of new version of Mecway and uninstal of the previous one I realised some dll's were missing. This made me explore option I am having now, which is:
4)** Compile CCX with MKL and a patch to enable Out-Of-Core (OOC) mode. The source code with step-by-step instructions is in ccx_win64_mkl_pardiso_source_2.19.zip in Mecway's install location. After compiling, set it in Mecway through Tools -> Options -> CalculiX -> Solver.
Speed: Fast or Faster
Node limit*: over 1 400 000
Difficulty: Hard
And it was slightly longer process but pretty well defined in the description file but the advantage to me is that straight away I put all the exe and dll in stand alone directory and any new install will be hopefully just easy to connect. I must admit that speed fast vs very fast isn't too descriptive but runnig a model of 550k vs 1.4m nodes is a massive improovement.
My spec:
AMD Ryzen 5 2600 Six-Core Processor 3.40 GHz
RAM DDR4 64,0 GB
I have both HDD and SDD
And I am quite happy with the speed to results and very happy with the 1.4m nodes limit
4)** Compile CCX with MKL and a patch to enable Out-Of-Core (OOC) mode. The source code with step-by-step instructions is in ccx_win64_mkl_pardiso_source_2.19.zip in Mecway's install location. After compiling, set it in Mecway through Tools -> Options -> CalculiX -> Solver.
Speed: Fast or Faster
Node limit*: over 1 400 000
Difficulty: Hard
But when running, ccx_MKL.exe takes more time than ccx.exe
Is this normal?
Maybe it doesn't include the right MKL dlls for your platform (what CPU model?) and is defaulting to something generic.
Procesador Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz 2.60 GHz
RAM instalada 8,00 GB (7,85 GB utilizable)
Id. del dispositivo 61AFF701-BB57-4C66-8CD8-FAB777CE7D60
Id. del producto 00325-95924-00879-AAOEM
Tipo de sistema Sistema operativo de 64 bits, procesador x64
Lápiz y entrada táctil Compatibilidad con entrada manuscrita
Leting all things equal and running several times, ccx_MKL is twice as fast as ccx.
Im looking for latest version of pardiso ccx solver and couldnt find it online. I saw this post on building one from your option 4) and followed the instructions but no files were in the install folder in that last step. Could you help me with this please?
Where to download ccx_dynamic.exe in 2.20? Could you help me?
A common reason it fails is if there's a space in the home directory name.