Error allocating memory error - different outcomes for different computers

Hello,

I'm wondering if anyone could help me with an issue I have running a particularly large model.

I have two AMD based computers running the same problem. The lower spec computer seems to run the model without any issues (although memory usage is often nearing 100%) but the higher spec computer stops at the first iteration with the following error:

*ERROR in u_calloc: error allocating memory
variable=aub1, file=mafillsmmain.c, line=191, num=135840256, size=8

I have attached screenshots for both computers and also the solver output that is identical up until the first iteration.

My question is why is the higher spec computer failing whereas the lower spec isn't? For other problems the higher spec machine has worked whilst the lower spec machine hasn't for the same problem.

Mecway and CCX have been setup the same way on both machines.

Anyone able to advise what the problem might be?

Many thanks

Bob

Comments

  • Some ideas:

    1. This first one is only relevant if you're using CCX compiled from the source in Mecway.

    Mecway versions 19 and earlier restricted the in-core memory available to Pardiso to 14.4 GB. I don't think you're using this and shouldn't.
    Version 20 has an option in the Labs menu (Maximum memory use for OOC) to configure this limit.
    Versions 21 and 22 automatically set that to use 90% of the available RAM.

    If you're on 21 or 22, it might be an idea to try 20 and set it to something between 48 and 64 GB. I'm just guessing here but maybe something goes wrong because it's trying to use more when more is available. Other people have had success with over 100 GB being fully used though so it's a long shot.

    2. Perhaps the higher spec computer has less swap space or free disk space? I don't expect it should need any with 64 GB RAM and 600k nodes but it might.

    3. Does task manager show memory is full when it's solving on the lower-memory computer, or that a lot of the memory is already used by something else on the higher-spec one?

    4. Make sure memory isn't wasted by existing solution data which can be significant for dynamic. Open a new instance of Mecway with the file that already has no solution data then solve. It's also a good idea to remove unneeded solution variables before solving or you might have another memory problem loading the solution.
  • Hi Victor,

    many thanks for your comments.

    1. Both computers are using Version 22. I may have a look at Version 20 however I have currently set the higher-spec computer to 24 processors rather than 36 and for the current run this seems to do the trick.

    2. Both computers have 400GB free space on SSD drives so don't think its this.

    3. Task manager for the lower spec computer shows the memory reaching upto 95% but not continuously. On both computers Mecway & CCX are the only applications running. Memory doesn't seem to be the problem on the higher-spec computer.

    4. Before reducing the number of processors on the higher-spec computer I tried using a new instance of Mecway with a file that has no solution data. The error still occurred.

    I plan to play around with the number of processors initially and see how it goes. Currently using the Pardiso solver but was thinking of installing PastiX once I've deciphered the best way of installing it (been looking at you post - Improving performance of CCX solver but still having a few problems installing it).

    Bob
  • That's great you found a solution.

    set the higher-spec computer to 24 processors rather than 36


    Perhaps I should get Mecway to limit number of threads. 24 is probably way too high to give any speed advantage.
  • May be vaguely related to what I found running largish problems on my Ryzen 3700X, 8 cores, 16 thread, two channel machine. I found that assigning 8 cores/threads was slower than 6, and 4 only a few percent slower than 6, and more memory was needed with more cores. 5 was slower than either 4 or 6. Deep research, mostly found in the paper written documenting the logid in the creation of Pastix, "Linear Equation Solvers on GPU Architectures for Finite Element Methods in Structural Mechanics" by Peter Wauligmann revealed the issue was likily a mismatch between number of cores and memory bandwidth with the dual channel architecture. Note I also found Pastix is actually slower for larger problems (1.1 million nodes and up), apparently due to needing more memory than pardiso, and therefore more memory bandwidth and memory size to run the problems. Note the research to create Pastix was done about 2020, so current architectures may have a bit different different results.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!