Multiple simulations at the same time

Hi all,

recently I sometimes have issues with parallel simulations. When I solve one model and then start so solve another model before the first one is finished, I sometimes do not get a solver window as confirmation that the solver is running. Also, the task manager does not show the second solver process. But when I close Mecway, I get a warning that there still is a solver process running. Does anyone experience the same issue or is this an issue at all? What could be the cause of this behaviour?

Kind Regards!

Comments

  • Is this the internal solver or CCX?

    In either case, it's not supposed to start solving again if it's already solving, but there could be a bug. Can you reproduce the problem?
  • Internal Solver.
    I was not very precise in my first post.
    Lets say I have a large model with two configurations. If I would "solve all configurations", Mecway would solve them one after the other and the Workstation is not at capacity.
    Therefore I save the model as "model_config_1" and "model_config_2" and open those two models in separate instances in Mecway. I then solve cofig_2 in one instance and config_2 in the other so that they can run at the same time.
    And sometimes I get the above mentioned behaviour. I have not been able to see a pattern yet so I cant really reproduce it.
  • That's a strange one. I hope you can identify what led to it if it happens again.

    There should be no communication between different instances of Mecway and they should be protected from each other by Windows even in the case of bugs.
  • edited May 2023
    Hi Victor,

    this is not to 100% validated, but it seems that multiple solves are possible, whenever the second model is saved before starting to solve.

    Workflow that does not work:
    1: prepare the models on laptop
    2: open and start to solve model 1 on workstation
    3: open and start to solve model 2 on workstation

    Workflow that does seem to work:
    1: prepare the models on laptop
    2: open model 1 on work station
    3: save model 1 to the current path/filename
    4: start to solve model 1 on workstation

    5: open model 2 on work station
    6: save model 2 to the current path/filename
    7: start to solve model 2 on workstation

    Regards
  • Thanks for investigating @kuhl. That's still very confusing, hope you don't mind more questions:

    Are the models set to output any files when solving like matrices or table exported to .csv?

    Is step 1 necessary to reproduce it or it also fails if you try to solve them both a 2nd time without editing them?

    Are these only big models (~100,000s nodes + ) that would be using out-of-code mode of the solver?

    Are you opening them through a network? If so, could you give an (anonymized if needed) example of what the path looks like? Maybe it's too long or in an unexpected format that breaks only some parts of Mecway.
  • One more question - are you using either version 19 or the patch for version 18 that updates MKL? Older versions had a very old MKL library that has problems on some workstations.
  • No file outputs are requsted. But "Save after solve" is set.
    The model has about 450000 Nodes.
    I suppose step 1 is only an issue if preparing the model on the laptop. Because if we edit on the workstation, we would save it to a new file name "..._SOLVE_" on that machine before solving anyway.
    Running test is not really feasible. Solving a model takes about 15 h to 25 h, mainly because there are many bonded contacts involved that take a loooong time (models without bonded contact with about 1.5 million solve within 15 minutes).

    we save to local drive befor solving. The path has an "&" in it, other thatn that, extraordinary.

    We are using Meway 19. I was the one asking for the MKL fix which has worked for Mecway 18 and our workstation.
  • Sorry I didn't notice it was you.

    If you like, I can prepare a special version of Mecway 19 that records some debugging information but otherwise functions normally.

    It sounds like you can test it quickly by starting two solves and the problem will appear straight away so you don't have to finish them?

    Another question - are you using "Solve" instead of "Solve all configurations" when it's split into two files?
  • Hi Victor, yes you are right, testing is quick, I got that wrong.
    At the moment it seems that a "save as" is neccessray to get it to work. I'll report back once we tried to do the last edit on the workstation with using "save as" before solving.
    We are using "solve" because we want t solve simultaneously and the result files would be even bigger if we have multiple results in one file.


  • Hi Victor,
    we did some more tests. Even if the last edit has been done one the workstation, a "save as" is neccessary to get the solver runnning as usual. If we use "save" or do not save at all, the solver window does not open and we are unsure what is happening in the background.
  • I'm not sure I'll be able to do anything unless more information comes up or I can reproduce it. Save and Save As are almost identical except for one showing the file picker dialog box and updating the recent files list.

    The way it fails is consistent with Mecway thinking the solver thread is already running. In that case, it will do nothing if you press solve, and will warn that it's running when you exit. But I have no idea how it could get that wrong, or have access to another instance's solver thread.
  • -¿Could this be one of those cases where the window goes out of screen or behind other unnoticed.?¿Could you try switching in between windows with the Alt TAB key combination to see if the second job monitor shows up.?

    -¿Are you running both files on the same directory.? I say this because some files have common names no matter your file naming like WarnNodeMissMultiStage.nam. If you have some contacts , ¿Is it possible they both try to write on the same file?.
  • Thanks for the input @Victor & @disla.
    I don't think that hidden windows are the issue, but we'll double check and let you know.

    We dont have contact in the model, so that should not be the issue either.

    Both files are derived from the same base model. Maybe that could be an issue because they stay linked to one another via some kind of file (maybe some log file) and the connection is only broken if "safe as" is used.

    for the moment it is working for us just making sure that we use the safe as command.
    I'll let you know if we discover any more clues.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!