P-9 Not enough memory for matrix solver

Hi all,

we keep getting p-9 error code for large models.
Does this mean that there is not enough RAM available (we have 192 GB in our workstation) or could there be something else wrong with the model or the setup?
Is there a max number of nodes that the solver (internal) can handle independent from the available RAM?
«1

Comments


  • sorry, bad quality, but maybe helpful nonetheless.
    approx. 1.500.000 nodes in the model.
  • Probably isn't really a RAM problem. There is a limit due to it using 32-bit integers (~= 16 GB of 8-byte doubles), and similar one limiting the in-core part of out-of-core mode to 16 GB. Sorry it can't exploit all that RAM.
  • If this uses the Pastix solver I had this issue a couple of years back. Someone (3rav?) did a recompile with i8 for me that corrected this. Otherwise use Pardiso with the intel speed ups. The advantage of Pastix seems to fade somewhat for larger problems anyway, at least on my sustem (64GB) seemingly because it needs more memory. Can't say about the other solvers.
  • We are using the internal solver. CCX is not an option for some reasons. So I assume Pastix / Pardiso does not apply?

    I did solve a simple model with more than two million nodes in the meantime on my laptop.





    @Victor ,any idea what could be the limiting factor in the model? It does not seem to be the mere number of nodes that causes problems.
  • Appearantly it seems to be an issue with the work station.
    The same model that solves on the laptop will not solve on the workstation for some reason. It gets the P-9 error shortly after the "solving matrix" phase begins.


  • Now the Laptop also solved a model with ~1.5 Mio nodes and many bonded contacts. Could it be a hardware related problem?



    Laptop:



    Workstation



    The workstation does have two different RAM installed. But the issue we have now has already been there before we upgraded the ram. At that point, all RAM bars were identical.

    Does anyone have a clue?
    I remember that there have been problems with the intel xeon before but they have been fixed already with v19, I think...
  • @kuhl I can't imagine it's hardware. But there are stress test programs to find that sort of thing.

    There is a parameter for the MKL solver which controls how much memory it can use. If this is set too low, it causes that P-9 error. I'm not sure what happens if it's set too high but it will likely also cause big models to fail. The default value is the minimum of the available memory and 14.4 GB. I've added an option to choose the maximum value under the Labs menu in the (released today) version 20. My guess is that on your laptop it might not have enough memory to use that 14.4 GB maximum but the workstation does. So maybe try reducing it?
  • edited July 2023
    Hi @Victor
    to be honest I don't quite understand what you mean. You are suggesting that reducing the value in Mecway with the labs option might help although the P-9 error is caused if it is set too low. If the Workstation may wants to use more than the 14.4 GB, shouldnt the value be increased?

    Ill give it a try anyway and see what happens.

    Ill also try to find a stress test and see if we have a hardware issue.

    Thanks for the advice!
  • I don't really know either :p I suspect P-9 might happen both for too low and too high.

    From what I understand, OOC requires some RAM too, and it's not allowed to use more than this value. A 2 million node model might require more than 14.4 GB of RAM for OOC and fail if it hits the limit. But it might also fail if you allow it to use more than it can index with its 32-bit integers (hence OK on laptop if there isn't enough RAM to attempt that). Not sure which problem is happening or which direction to adjust the value.
  • OK, i'll just try it out when I get the chance and let you know.
    The Laptop hast 64 GB by the way so it should have enough RAM available to use or try to exeed the specified value of 14.4 GB.
  • Seems to work with above value set to 20 GB on the Workstation:



    Allocation phase and assembling Phase also seem quite a bit quicker in V20 compared to V20Beta. Dont know if due the above parameter (20GB) or due to the solve improvements.

    trying to set the value to 100 GB next, just for the sake of it...

    V20Beta, Paramater not set (default)



    V20. Parameter set to 20 GB
  • Parameter set to 100 GB:
  • Setting the parameter to 100 GB makes the entire Solutuon about 45% faster compared to the parameter beeing set to 20 GB which is mainly due
    to the solving mateix phase that goes down from approx. 6h 15 min to 3h 30 min.

    Rocketship.... B)

    Trying 160 GB next ...
  • This may help me as well. 100 GB might have been faster simply due to never getting out of core. I dealt with some of these issues by increasing virtual memory which worked pretty well for problems needing up to 160% of available ram, above that got too slow. Capabilities of Mecway seem to be improving rapidly.
  • This is amazing - it seems like it can properly use all that memory! I never thought it would be able to.

    I wonder if it should always be set to ~infinity. I'm not even sure there was ever a good reason for having that 14.4GB limit.

    @MikeMcMullen I agree that it's probably using less or no OOC speeding it up.
  • Setting the value to 160 GB gave me no further improvements. Solving time is nearly identical compared to the parameter beeing set to 100 GB.
    I'll try setting a value that is higher than the available RAM and see what happens.
  • 14.4 GB was probably related to a point relevent to an 8 GB or 16 GB laptop with a slow hard drive.
  • edited July 2023
    If the parameter is set to a value greater than the available RAM, I get P-9 error shortly after the solving matrix phase beginns (so seting it to infinity by default may not be a good idea).
    I had the parameter set to 1000 GB which is above the capacity of the system hard drive.
    When I checkt the value in mecway after the solver failed, the parameter has been reset to zero which had not been the case before. So maybe mecway figured out that I dont have as much RAM as requested and then reset the value to zero for some reason which then again was not enough to solve...

    Ill try something above available RAM but below available hard drive and see what happens.

    btw, I checket the task manager and when comparing the RAM usage in idle and when solving it appears that mecway is using about 64 GB of RAM during matrix solver phase
  • The good thing when the solver fails is that I dont have to wait too long.
    Setting the Parameter to 200 caused the P-9 error (the machine has 192 GB RAM) and the parameter has again be reset to zero. Ill try 190 GB next, so just below installed RAM.
  • The solver finished with the parameter beeing set to 190 but it was no quicker compared to the parameter being set to 100GB

    So it seems that it does not hurt to set it as high as you like as long as you stay below the installed RAM.
  • Turns out there's a bug in the input box for that value. Clicking cancel sets it to zero! Sorry. It's possible it was already zero (which is treated as 14.4) before solving in those cases.
  • ok, so I tried again with the parameter set to 1000 GB and it finished without error but of course again no faster. The sky is the limit. B)
  • Awesome. thanks for that confirmation.
  • I'm getting the P-9 error, but there is no option in the labs to adjust memory. Version 21. Did this option move, or get eliminated?
  • I removed the setting from v21 and made it effectively infinite. Do you have a model which can solve in v20 but not in v21?
  • No, I thought I had recently updated, but I was on V19. I have a large model (1.8M nodes) with bonded contacts which was applying constraints for about 2 weeks. The PC finally ran out of memory (not P-9) when I tried running another app. I have the motherboard maxed at 32 GB, and the pagefile was also practically max size. The processor is i7 12th gen and the system benchmarks well. When I saw this thread, I updated in hope that it would improve the situation. Much, much faster with the bonded contact! But now it P-9s within an hour(?).
    Looks like it's time for another PC. Is there any way to know roughly how much RAM a model will take to solve? I'm importing STEP geometry, and the largest part is 100MB. That one part meshes to well over 1M nodes...and that was after some simplification so meshing would succeed. Of primary concern is buckling, and likely nonlinear 3D analysis is appropriate.
  • I'm not sure about memory requirements, but your problem sounds like a similar size to kuhl's above and he seemed to need a little over 100 GB to avoid out-of-core.

    Without more RAM, you should compile MKL Pardiso CCX from the source included in Mecway to get the out-of-core (OOC) functionality. That's faster than letting it use Windows's paging and could prevent out-of-memory if the pagefile reaches its limit. I would never solve a model too big for RAM without using OOC.
  • The last time I compiled anything it was sometime in the 20th century! But the build instructions made it easy. After setting the environment variables and pointing Mecway to the new .exe, I was off. Now to fix the problems with my model so it will solve.
    Thank you!
  • Thanks for confirming that it's easy! A lot of people are put off by the idea of compiling things because it's notoriously frustrating and time-consuming.

    You don't need to set the MKL environment variables since Mecway does that when it calls CCX.
  • edited August 2023
    So I believe that I fixed the model issues. I had to mesh the large part as a solid instead of surface (only tension is relevant for that part). Mecway indicates the size is now over 2M nodes and 1M elements, although CCX reports the same number for the elements, it counts the nodes as over 46M. The previous errors are gone, but CCX is now failing after 10 minutes on a memory related issue.

    The solver I have Mecway pointed to is "ccx_MKL.exe". The unnecessary/redundant environment variables previously mentioned that I set are no longer set, so maybe CCX deleted them, but I ran the solve a second time just in case the variables that I set were present the first run, and caused a problem. Same failure the second run.

    I tried changing the analysis type from Nonlinear Static 3D to Static 3D, thinking that it might be less demanding on memory. It ran for 16 minutes and ended with the same error.

    The compile for the MKL version appeared to go smoothly, but I searched the buildlog.txt for the string "Error:", and there were none.

    For the C literate (whatever I knew is mostly forgotten), this is the mentioned line 59 in insert.c (line numbers added my me).

    57 if(*ifree>=*nzs_){
    58 *nzs_=(ITG)(1.1**nzs_);
    59 RENEW(mast1,ITG,*nzs_);
    60 RENEW(next,ITG,*nzs_);
    61 }

    Any thoughts on how to move forward?

    Entire CCX output follows.

    ************************************************************

    CalculiX Version 2.19, Copyright(C) 1998-2021 Guido Dhondt
    CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
    software, and you are welcome to redistribute it under
    certain conditions, see gpl.htm

    ************************************************************

    You are using an executable made on Mon Aug 14 18:47:44 PDT 2023

    The numbers below are estimated upper bounds

    number of:

    nodes: 46574461
    elements: 1059979
    one-dimensional elements: 0
    two-dimensional elements: 692928
    integration points per element: 9
    degrees of freedom per node: 3
    layers per element: 1

    distributed facial loads: 182016
    distributed volumetric loads: 0
    concentrated loads: 0
    single point constraints: 49890816
    multiple point constraints: 83151361
    terms in all multiple point constraints: 532168705
    tie constraints: 0
    dependent nodes tied by cyclic constraints: 0
    dependent nodes in pre-tension constraints: 0

    sets: 7
    terms in all sets: 2330845

    materials: 2
    constants per material and temperature: 2
    temperature points per material: 1
    plastic data points per material: 0

    orientations: 692928
    amplitudes: 1
    data points in all amplitudes: 1
    print requests: 0
    transformations: 0
    property cards: 0

    *INFO reading *STEP: nonlinear geometric
    effects are turned on

    *WARNING reading *STATIC:
    the minimum increment 0.0000000000000000
    is smaller then 1.e-6 times the
    step time;
    the minimum increment is changed
    to 9.9999999999999995E-007
    which is the minimum of the initial
    increment time and 1.e-6 times the step time
    *WARNING in calinput: PEEQ-output requested
    yet no (visco)plastic calculation


    STEP 1

    Static analysis was selected

    Newton-Raphson iterative procedure is active

    Nonlinear geometric effects are taken into account

    Decascading the MPC's

    Determining the structure of the matrix:
    *ERROR in u_realloc: error allocating memory
    variable=mast1, file=insert.c, line=59, size(bytes)=0, oldaddress=261025856

    ------------End of the CCX output
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!