{"id":926,"date":"2024-06-25T18:08:10","date_gmt":"2024-06-25T16:08:10","guid":{"rendered":"https:\/\/spgoo.org\/?page_id=926"},"modified":"2025-09-08T16:53:48","modified_gmt":"2025-09-08T14:53:48","slug":"cluster-cascimodot","status":"publish","type":"page","link":"https:\/\/spgoo.org\/?page_id=926","title":{"rendered":"Cluster Cascimodot"},"content":{"rendered":"\n<p class=\"has-medium-font-size\"><strong>Parall\u00e9lisation d&#8217;une fonction python sur un cluster utilisant SLURM<\/strong><\/p>\n\n\n\n<p>Ci-dessous, nous  proposons une m\u00e9thode de parall\u00e9lisation d&#8217;une fonction python sur un cluster utilisant SLURM\/<\/p>\n\n\n\n<p>Les conditions pr\u00e9alables sont de :<br>&#8211; cr\u00e9er un environnement conda sur le cluster, conda_env, compatible avec un environnement en local, en particulier au niveau de la version des pickles.<br>&#8211; cr\u00e9er un r\u00e9pertoire \/home\/{user_name}\/modules sur le cluster dans lequel les modules seront copi\u00e9s.<\/p>\n\n\n\n<p>La fonction suivante do_on_cluster est le d\u00e9corateur qui permet cette parall\u00e9lisation.<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>\ndef do_on_cluster(\n        Lx0=None,\n        x0=None,\n        Dnet=None,\n        with_GNU_parallel=True,\n        ref=None,\n        level=\"INFO\", # the logging level on the cluster\n        nLmin_by_job=200, # Number min of cases to treat by job\n        ncore_max=80, # number max of core to use\n        njob_max=1000,\n        dir_Lres=\"dir_Lres\",\n        with_return_Lres=True,\n        with_fuse_LL=True, # to fuse the differents results of the parallelization\n        Lmodule_to_update=&#91;],\n        return_Lres=False,\n\n    ):\n    \"\"\"\n    ref: https:\/\/www.artima.com\/weblogs\/viewpost.jsp?thread=240845#decorator-functions-with-decorator-arguments\n    \n    \"\"\"\n    # -----------------------------------------------\n    logger.info(\"do_on_cluster\")\n    if Dnet is not None:\n        user_cluster = Dnet&#91;\"user_cluster\"]\n        IP_cluster = Dnet&#91;\"IP_cluster\"]\n        conda_env = Dnet&#91;\"conda_env\"]\n        cmd_smina = Dnet&#91;\"cmd_smina\"]\n        queue = Dnet&#91;\"queue\"]\n    if ref is None:\n        ref = f0.__name__\n    def inner(f0):\n        logger.info(\"\\t inside decorator\")\n        def wrapper(*args, **kwargs) :\n            #\u00a0--------------------------------------------------------------------\n            logger.info(f\"\\t update of the modules\")\n            update_module_on_cluster(\n                Lmodule_to_update,\n                Dnet=Dnet,\n                )\n            # --------------------------------------------------------------------------\n            path_cluster = f\"\/home\/{user_cluster}\"\n            path_local = get_path()\n            logger.info(f\"* path_local={path_local}\")\n            # to be sure that the same job is not still running\n            # --------------------------------------------------------------------------\n            cmd = f\"scancel --user {user_cluster}--name {ref}\"\n            bash_on_remote(\n                                cmd,\n                                Dnet,\n                            )\n            # --------------------------------------------------------------------------\n            logger.info(f\"Remove of directory {ref} on cluster ...\")\n            Lcmd = &#91;f\"rm -rf {path_cluster}\/{ref}\"]\n            bash_on_remote(\n                                Lcmd, \n                                Dnet, \n                            )\n            # --------------------------------------------------------------------------\n            logger.info(f\"Remove of directory of results in local\")\n            remove_dir(dir_Lres)\n            # ========================================================================\n            # Creation of a new directory to gather all the informations\n            # usefull to launch the script on the cluster.\n            # --------------------------------------------------------------------------\n            logger.info(f\"Creation of the directory to send\")\n            make_dir(ref)\n            os.chdir(ref)\n            dump_var(args, \"args\")\n            dump_var(kwargs, \"kwargs\")\n            make_dir(dir_Lres)\n            path_local_ref = get_path()\n            # --------------------------------------------------------------------------\n            LL = split_L_in_LL(\n                                Lx0,\n                                nL=nLmin_by_job,\n                                nLL_max=njob_max,\n                                )\n        \n            if len(LL) &lt; ncore_max:\n                n_core = len(LL)\n            else:\n                n_core = ncore_max\n            Ls = dump_Lvar(LL, \"L\")\n            create_txt_from_L(Ls, \"Ls.txt\")\n            nLs = len(Ls)\n            create_txt_from_s(str(nLs), \"nLs.txt\")\n            # --------------------------------------------------------------------------\n            # Dump of informations that can't be pass directly\n            # --------------------------------------------------------------------------\n                \n            # ========================================================================\n            # Creation of job_all.sh to launch of the different jobs\n            # can be sbatch or bash\n            # --------------------------------------------------------------------------\n            Ls = f\"\"\"\n#!\/bin\/bash\nsbatch --wait job1.sh\nwait\n    \"\"\".split(\"\\n\")\n            create_txt_from_L(Ls&#91;1:], \"job_all.sh\")\n            # ========================================================================\n            # Creation of job1.sh slurm file\n            # --------------------------------------------------------------------------\n            create_slurm_job(\n                                \"do1.py\",\n                                with_GNU_parallel=with_GNU_parallel,\n                                ref=ref,\n                                Dnet=Dnet,\n                                fname=\"job1.sh\",\n                                n_core=n_core,\n                                cmd_to_insert_in_0=f\"cd \/home\/{user_cluster}\/{ref}\",\n                                s_parallel=\"parallel1\",\n                                )\n            # --------------------------------------------------------------------------\n            # do1.py file\n            # --------------------------------------------------------------------------\n            # --------------------------------------------------------------------------\n            module = f0.__module__\n            sf0 = f0.__name__\n            Ls_py = f\"\"\"\nimport sys\nimport os\nimport traceback\nsys.path.insert(0, \"\/home\/{user_cluster}\/modules\")\n# =============================================================================\nfrom tools_SBC.logging_SBC import  (set_log, logger)\nfrom tools_SBC.basic import (load_var, dump_var, get_line_n_in_file, move_file, touch,\n                remove_file, remove_Lfile)\n# =============================================================================\nfrom {module} import {sf0}\n# ------------------------------------------------------------------------------\ni_line = int(sys.argv&#91;1])\nfile = get_line_n_in_file(\"Ls.txt\", i_line)\nLx = load_var(file)\nargs = load_var(\"args\")\nkwargs = load_var(\"kwargs\")\nLres =&#91;]\nfor x in Lx:\n    kwargs&#91;\"{x0}\"]=x\n    res = {sf0}(**kwargs)\n    Lres.append(res)\nif {return_Lres}:\n    dump_var(Lres, f\"{dir_Lres}\/{{file}}\")\nremove_file(file)\n\"\"\".split(\"\\n\")&#91;1:]\n            # -----------------------------------------------------------------------\n            create_txt_from_L(Ls_py, \"do1.py\")\n            # ========================================================================\n            # Export of the directory on the cluster and launch of the job\n            # -------------------------------------------------------------------------\n            os.chdir(\"..\")\n            logger.info(f\"Transfert  of {ref} on cluster ...\")\n            bash(\n                            f\"scp -C -r {ref} {user_cluster}@{IP_cluster}:{path_cluster}\", \n                            info=True,\n                )\n            # --------------------------------------------------------------------------\n            # Launch of the jobs\n            # -------------------------------------------------------------------------\n            logger.info(f\"Launch of jobs ...\")\n            Lcmd = &#91;f\"cd {path_cluster}\/{ref}\",\n                    \"bash job_all.sh\"]\n            bash_on_remote(\n                            Lcmd, \n                            Dnet, \n                        )\n            logger.info(f\"on going ...\")\n            logger.debug(f\"wait for results\")\n            # --------------------------------------------------------------------------\n            # Loading of results\n            # -------------------------------------------------------------------------\n            if return_Lres:\n                logger.info(f\"loading of the results\")\n                os.chdir(path_local_ref)\n                remove_dir(dir_Lres)\n                bash(\n                                f\"scp -C -r {user_cluster}@{IP_cluster}:{path_cluster}\/{ref}\/{dir_Lres} {path_local_ref}\", \n                                info=True,\n                            )\n                os.chdir(dir_Lres)\n                LLres = load_Lvar(\"L*\")\n                os.chdir(path_local)\n                return fuse_LL(LLres)\n            os.chdir(path_local)\n        return wrapper\n    return inner\n\n\n\n<\/code><\/pre>\n\n\n\n<p>Vous disposez maintenant du d\u00e9corateur <strong>do_on_cluster<\/strong> permettant de parall\u00e9liser une fonction <strong>f0<\/strong> sur un cluster<\/p>\n\n\n\n<p><strong>Exemple 1<\/strong>: Minimisation d&#8217;une liste de mol\u00e9cules<\/p>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code>    do_on_cluster(\n            # --------------\n            Lx0=Lmol,\n            x0=\"mol\",\n            # --------------\n            Dnet=Dnet,\n            with_GNU_parallel=True,\n            # --------------\n            ref=\"Lscav2\",\n            level=level,\n            nLmin_by_job=nLmin_by_job,\n            ncore_max=ncore_max, \n            njob_max=njob_max,\n            Lmodule_to_update=Lmodule_to_update,\n            return_Lres=True,\n            )(get_mol_minimized_with_smina)( )\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-small-font-size\"><code><br><br><\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Parall\u00e9lisation d&#8217;une fonction python sur un cluster utilisant SLURM Ci-dessous, nous proposons une m\u00e9thode de parall\u00e9lisation d&#8217;une fonction python sur un cluster utilisant SLURM\/ Les conditions pr\u00e9alables sont de :&#8211; cr\u00e9er un environnement conda sur le cluster, conda_env, compatible avec un environnement en local, en particulier au niveau de la version des pickles.&#8211; cr\u00e9er un [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-926","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/926","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/spgoo.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=926"}],"version-history":[{"count":11,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/926\/revisions"}],"predecessor-version":[{"id":9924,"href":"https:\/\/spgoo.org\/index.php?rest_route=\/wp\/v2\/pages\/926\/revisions\/9924"}],"wp:attachment":[{"href":"https:\/\/spgoo.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}