Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time sereis loop logic should have proper stop event #772

Closed
Mletter1 opened this issue Oct 19, 2017 · 0 comments
Closed

time sereis loop logic should have proper stop event #772

Mletter1 opened this issue Oct 19, 2017 · 0 comments

Comments

@Mletter1
Copy link
Collaborator

###The following code block in time series needs better loop logic for exiting

# TODO this function needs to be migrated to the implementation of the computation interface
    def checkjob_thread(mid, sid, jid, request_from, stop_event, callback):
        """
        Routine running on a separate thread which checks on the status of remote
        jobs running on a SLURM infrastructure.

        :param mid:          model ID
        :param sid:          session ID
        :param jid:          job ID
        :param request_from:
        :param stop_event:   event stopping the thread when the job completes
        :param callback:     callback methods when the job successfully completes
        """
        cherrypy.request.headers["x-forwarded-for"] = request_from
        retry_counter = 5

        while True:
            try:
                response = slycat.web.server.checkjob(sid, jid)
            except Exception as e:
                cherrypy.log.error("Something went wrong while checking on job %s status, trying again..." % jid)
                retry_counter = retry_counter - 1

                if retry_counter == 0:
                    fail_model(mid,
                               "Something went wrong while checking on job %s status: check for the generated files "
                               "when the job completes." % jid)
                    slycat.email.send_error("slycat-timeseries-model.py checkjob_thread",
                                            "An error occurred while checking on a remote job: %s" % e.message)
                    stop_event.set()
                    cherrypy.log.error("[TIMESERIES] An error occurred while checking on a remote job error_message: %s"
                                       % e.message)
                    raise Exception("An error occurred while checking on a remote job: %s" % jid)

                response = {"status": {"state": "ERROR"}}
                time.sleep(60)

            state = response["status"]["state"]
            cherrypy.log.error("checkjob %s returned with status %s" % (jid, state))

            if state == "RUNNING" or state == "PENDING":
                retry_counter = 5
                database = slycat.web.server.database.couchdb.connect()
                model = database.get("model", mid)
                if "job_running_time" not in model:
                    model["job_running_time"] = datetime.datetime.utcnow().isoformat()
                    slycat.web.server.update_model(database, model)

            if state == "CANCELLED" or state == "REMOVED":
                fail_model(mid, "Job %s was cancelled." % jid)
                stop_event.set()
                break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants