===========
 QmemTools
===========
| QmemTools is a set of tools to monitor memory usage in an SGE cluster.
| 
| It will include more tools in the future but you can develop you own client
| tools to fit your needs. Details about data structure returned in JSON by the
| server are described below
|
| **Note:** QmemTools has been tested on ubuntu, centos and redhat distributions.
| QmemTools is compatible with POSIX system. Windows is not supported.

How it works
============
| A daemon (qmemserver) generate and parse cached xml data from qhost and qstat
| xml output command.
| A client (ie: qmemview.py) request the server to receive JSON structured data
| (refer to Data structure) and do something useful with it.

Dependencies
============
QmemTools requires following packages:

 - python (>=2.4 and <3.0)
 - web.py (>=0.34)
 - simplejson (>=2.0.9)
 - libxml2

Installation
============
 **sudo python setup.py install**

| After installation, a file is create in /etc/init.d/qmemserver, so if
| you want to enable qmemserver at boot time you have to configure manually
| according to your distribution.

Configuration
=============
| The server configuration is locate in **/etc/qmem/qmemserver.conf** and contain two
| sections, files and system which are described.

Server
======
| **qmemserver.py** handle requests and return JSON data as response.
| Qmemserver accept the following url requests:

 - http://localhost:8080/qhost for qhost data
 - http://localhost:8080/qstat for qstat data

The JSON format for a successful request is::

    {"success" : true, "message" : "", "data" : {json_data}}

The JSON format for a failed request is::

    {"success" : false, "message" : "an_error_message", "data" : {}}

Tools
=====
**qmemview.py:** display information about memory usage on your cluster

.. image:: img/qmemview-cap.png

**qmemview.py** has the following options::

    qmemview.py <url:port> : display all hosts
    qmemview.py <url:port> -h : display this help
    qmemview.py <url:port> -u : display all job details / host
    qmemview.py <url:port> -j <jobid>: display details for one job (set -u automatically)
    qmemview.py <url:port> -o <owner>: display owner's job details (set -u automatically)
    qmemview.py <url:port> -h <hostname> : display only selected host
    qmemview.py <url:port> -u -h <hostname> : display only selected host with job details for this host
    qmemview.py <url:port> -h <hostname> -j <jobid> : display only selected host with job details for jobid only
    <url:port> argument should be set to point on qmemserver address and port. ie: qmemview.py localhost:8080

**TIPS,** for simplicity you can create a shell alias:
    alias qmemview="qmemview.py localhost:8080"

Data structure
==============

**Returned by qhost**::

    qhost_data[hostname]['num_proc']:str
                        ['mem_total']:str
                        ['jobs'][jobid]['jobcount']:int
                                       ['master']:bool
                                       ['taskid']:list
                                       ['owner']:str
                                       ['jobname']:str

- qhost_data[hostname]['num_proc']
    number of processors on hostname
- qhost_data[hostname]['mem_total']
    total memory available on hostname
- qhost_data[hostname]['jobs']
    contain jobid running on hostname
- qhost_data[hostname]['jobs'][jobid]['jobcount']
    slots used by this job on hostname (don't rely on it for array task, use taskid list length instead)
- qhost_data[hostname]['jobs'][jobid]['master']
    is master run on hostname ?
- qhost_data[hostname]['jobs'][jobid]['taskid']
    list of taskid running on hostname
- qhost_data[hostname]['jobs'][jobid]['owner']
    owner of jobid
- qhost_data[hostname]['jobs'][jobid]['jobname']
    job name

**Returned by qstat**::

    qstat_data[owner]['uid']:str
                     ['jobs'][jobid]['requested_h_vmem_strval']:str
                                    ['requested_h_vmem_dblval']:float
                                    ['hostname'][hostname]['master']:str
                                                          ['slave']:str
                                                          [taskid]:str

- qstat_data[owner]['uid']
    userid of owner
- qstat_data[owner]['jobs'][jobid]['requested_h_vmem_strval']
    requested h_vmem (string format) for owner's jobid
- qstat_data[owner]['jobs'][jobid]['requested_h_vmem_dblval']
    requested h_vmem (double format) for owner's jobid
- qstat_data[owner]['jobs'][jobid]['hostname']
    hostnames where jobid run
- qstat_data[owner]['jobs'][jobid]['hostname'][hostname]['master']
    memory consummed by jobid master on hostname
- qstat_data[owner]['jobs'][jobid]['hostname'][hostname]['slave']
    memory consummed by jobid slave on hostname
- qstat_data[owner]['jobs'][jobid]['hostname'][hostname][taskid]
    memory consummed by jobid task on hostname