Skip to content

Latest commit

 

History

History
77 lines (66 loc) · 3.11 KB

index.md

File metadata and controls

77 lines (66 loc) · 3.11 KB
title
A cluster-wide init and control system for services in cgroups or Docker containers

Marathon

A cluster-wide init and control system for services in cgroups or Docker containers

Download Marathon v0.7.1

v0.7.1 SHA-256 Checksum · v0.7.1 Release Notes

Overview

The graphic shown below depicts how Marathon runs on top of Apache Mesos together with the Chronos framework. In this case, Marathon is the first framework to be launched and it runs alongside Mesos. In other words, the Marathon scheduler processes were started outside of Mesos using init, upstart, or a similar tool. Marathon launches two instances of the Chronos scheduler as a Marathon task. If either of the two Chronos tasks dies -- due to underlying slave crashes, power loss in the cluster, etc. -- Marathon will re-start a Chronos instance on another slave. This approach ensures that two Chronos processes are always running.

Since Chronos itself is a framework and receives Mesos resource offers, it can start tasks on Mesos. In the use case shown below, Chronos is currently running two tasks. One dumps a production MySQL database to S3, while another sends an email newsletter to all customers via Rake. Meanwhile, Marathon also runs the other applications that make up our website, such as JBoss servers, a Jetty service, Sinatra, Rails, and so on.

The next graphic shows a more application-centric view of Marathon running three applications, each with a different number of tasks: Search (1), Jetty (3), and Rails (5).

As the website gains traction and the user base grows, we decide to scale-out the search service and our Rails-based application. This is done via a REST call to the Marathon API to add more tasks. Marathon will take care of placing the new tasks on machines with spare capacity, honoring the constraints we previously set.

Imagine that one of the datacenter workers trips over a power cord and a server gets unplugged. No problem for Marathon, it moves the affected search service and Rails tasks to a node that has spare capacity. The engineer may be temporarily embarrased, but Marathon saves him from having to explain a difficult situation!