.todo

                          ~~~~~~~~ TODO  ~~~~~~~~

Priority
~~~~~~~~

* Add a queue length method to smith:

  Smith.queue_length("queue name") do |length|
  end

Good Ideas
~~~~~~~~~~

* At the moment there is a global SMITH_HOME. This causes problems when you have
  multiple agent paths. Make SMITH_HOME the same as each agent path.

* Add a hook that gets run when agents are started and stopped. This could be a
  shell script. Might be handly for monitoring.

Normal
~~~~~~

* Replace Smith::Messaging::Responder with a Completion and use success and
  fail to return data back to smithctl thereby ensuring that errors get
  reported.

* If an agent hasn't started after 20 seconds (ie there are no keepalives)
  set the state to dead.

* Add --config to the agency and smithctl

* Have a top like tool. Send the agent state to a queue so that anything
  can attach to the queue and do with it what it will. If this is done on a
  topic queue with fanout then a tool would be able to get the state of every
  agent in the network. Obviously if nothing is listening on the queue then no
  messages would be sent.

* Allow an agent to verify that it's the same version as that on disc. Maybe
  implement an autoload system. Look at shogun(?)

* Think about whether to not send messages to agents if they aren't alive.
  At the moment this is a little inconsistent: some commands check to see it the
  agent is alive and others check to see if there is anything listening on the
  queue.

* Don't send messages to queues that don't exist [send, smithctl messages,
  pop-queue.

* Think about the idea of a run list. Or is this already implemented in
  the state machine.

* Have a whitelist/blacklist mechanism listing agents that an agent can
  receive messages from.

* Have a SMITH_LOAD_PATH environment variable that overrides the path set up in
  the config

* Have something monitor the agents' state machine for signs of problems.
  For example if an agent is in the starting state for longer than a minutes or
  so

* Be able to change log level for individual loggers (ie per class).

* Setup a dead letter queue in case of error. Have a look at the dead letter
  stuff in rabbitmq.

* The number of threads should be configurable programatically
  queues per process.

* Have message counters for every agent. [partially done]

* Keep stats on each sender. Messages per second, etc.

* Add checking to the config. For example the logging class can only accept File
  or Stdout appenders but this is not checked for.

* Check for the existance of the config and throw an error if it can't be found.

* Think about a generator to write the directory structure and what that
  directory structure might look like. Change Smith.root to the root of directory
  structure. You will need to think about what to do with the agent_path (ie the
  fact that multiple paths are allowed).

* I've added a method called add_agent_load_path to the bootstrapper. When
  thinking about the above make sure you check whether that method fits.

* Make an equivalent of ripl-rails: ripl-smith

* Add command to smithctl to return the queue names an agent is using. This
  could be done by interrogating the channel.

* Add the ability to clone an agent. See next point (It looks as if the agent
  lifecycles ...).

* Message headers:
  * sent time,
  * message checksum - not sure if this is required
  * TTL - this can be used to get rid of stray messages.

* Add the message_id to all log messages.

* Add i18n. It will also make error messages more consistent.

* Make sure application specific queues don't have the smith namespace.

* Allow the agent to be configured so auto_ack is off. auto_ack is nice when
  you are starting out but in high throughput applications it's unlikely to
  be the right thing to do so the auto_ack => false in the queue declaration
  is noisy.

* Add command line options to the agency:
  * fix --daemonise
  * specify the environment.
  * specify a different config file.
  * start the agency with a clean db.
  * dump a specific queues' acl.
  * specify the agency log level
  * add acl reload function to the agency.

* Add smitch options/commands
  * dump the config -- this should be in smithctl
  * sort the long output of the list command.
  * add --kill option to the agency. This is addition to the stop agency command
  * add --list option to the config command
  * add --path to the agents command: displays the load path.
  * add --clear-cache option to the acls command.
  * add --agency & --smithctl option to the version command.
  * add --pretty-print option to pop command
  * add --regex to kill command
  * add --name to kill command
  * add queue length command.
  * add --number to the start command.

  * pop should not die if there is an unkown message type on the queue being pop'ed
  * add -0 option to smithctl push
  * stat should use the management plugin. It's seriously powerful!
  * logger command should list the current log level.

  In fact look at smith1 agency to see what options it had.

* New commands:

  * ping command -- ping an agent.

* Allow ACL types to be specified using their camel case representation.

* Use prefetch with the pop command.

* Write a startup script.

* Have an option to freeze an agent. Not sure of a use case but it could be
  quite useful. It might be quite complicated to implement though.

* Clean up the pb cache handling. Have something like:

    Smith.set_acl_path

  or maybe add it to the Smith run, Smith is usless without it.
  [done in the sense that I've added load_acls method but not included it in
  Smith.start]

* Have a pb cache reload mechanism at the moment you need to restart the agency.

* Think about passing a class to the receiver. This might be a nice way of
  structuring the message handlers.

* Add support for connecting to rabbitmq using ssl with certificates, see
  http://hg.rabbitmq.com/rabbitmq-auth-mechanism-ssl/file/default/README

* Fix incorrect message handling (if an acl gets sent to an agent that can't
  handle it). At the moment it doesn't catch the exception and the agent dies. It
  should probably send it a dead letter queue. Either way at the moment the
  message has to be manually removed from the queue.

  It's actually a bit tricky to know what to do here: should I assume the agent
  is correct and that the message is in error in which case the message should
  be sent to the dead letter queue; or should I assume the message is correct
  and the agent has a bug in which case the current behaviour is correct.

* Add warning that if there is an error of type:

    Channel level exception: PRECONDITION_FAILED - unknown delivery tag 2. Class
    id: 60, Method id: 80, Status code : 406

  it probably means that you've already acked the message.

* Clean up some of the logging messages, for example:

  Messaging::Sender prints:
    Publishing to: agency.control. [message]: [agency_command] -> {:command=>"list"}

  where Receiver::Reply prints:
    Payload content: [queue]: a1bf3328e2f3db87 [message]: [string] -> No agents running

  this is inconsistent.

* If nothing is listening on the reply queue to agency command then don't send
  the message back and delete the queue. I think this is going to be quite hard
  to do. I could provide an option to Reply#reply but that smells.

* Check all the commands for consistency.

* Add next_ticks around things like requeue and general recovery.

* Put a lot more consideration into recovery. Fox example the agency will die if
  there is a queue error.

* Add a class to define the queue name and the type of message they can listen
  on. This can be used to define the queues in a centralised place which should
  avoid configuration errors.

* Add message versioning.

* Add code to see if the reactor has started. If someone runs Smith.start it
  appears to hang; at least put a warning.

* Check to see if the acl directory actually exists. This logs a message it
  isn't. Is that good enough?

* Add command to list what agents are in each group. Add a --groups option to
  the agents command

* Add a headers hash to all messages so there are no
  AMQP::IncompatibleOptionsError due to the queue being defined with or without
  the headers hash.

* Add an option to pop that returns immediately. If I remove a lot of messages
  from a queue smithctl will timeout due to the time taken to delete the
  messages. Add an option that returns the number of messages to be removed
  without waiting for the messages to be removed.

* Add default on_requeue proc. Check to see if it makes sense to do so or throw
  and exception if it doesn't.

* Add a message TTL so that messages will get timed out after a period of time.
  This might be particularly useful for smithctl.

* Create a mixin that can be included in the agent to allow a key-value store.
  This would be useful for storing transient data that needs to be persisted.

* Check for the existence of the various paths in the config and log a message
  if it doesn't exist.

* Payload should be able to be instantiated from an undecoded message.

* Add better error reporting. When someone sets the message content that isn't a
  payload object and exception should be raised.

* If the list command is passed an agent just give details about that agent -
  maybe get rid of the state command.

* Add a stopping? method to the Smith class. Agents can then check this & do the
  right thing when shutting down.

* Add check for ACLs of the same name.

* Add column(1) like formatting to the output of smithctl.

* Add support for flapping in the agent monitor. At the very least allow the
  agent stop message to be queued (this might be fraught however).

* Add durable to each message.

* /var/cache/smith/acl is not being created properly.

* Fix the formatting of return data from the commands. I'm relying on pp and
  it's not working particularly well.

* Define === for the options Aray in BaseCommand. I can then just have a case
  statement for each option. This is the closest I'm going to get to pattern
  matching.

* Use git instead of leveldb for the agent config. This can then be used to
  implement rollback.

* Have the equivelent of a utmp log.

                          ~~~~~~~~ BUGS  ~~~~~~~~

* Smith::ACL::Payload.new(<payload type>, :from => payload) doesn't work for
  :default messages.

* Have a callback per message type. In effect implement pattern matching instead
  of using a case statement.

* Fix force in protocol_buffer_compiler. There's no way to currently force the
  recompilation the pb files.

* Fix the incredibly slow Payload creation time. It's about 500 times slower
  than instantiating the protocol buffer itself. [ I think this is fixed -- it
  was due to not using lazy log messages. Needs checking.]

* Make sure all time fields in pb files are integers.

* Check queue/exchange leaks. The ones I know of at the moment are:
  * if the publish_and_receive method is used and the message is not replied too
    then there is a exchange/queue leak.
  * I think there is another case but I cannot think of it.

* Fix this bug. I think it's probably due to blah ||= blah. But that's just a guess.

  2012/05/14 06:23:08.057818543 [10753]   ERROR -              Smith::AgentBootstrap:55  - ArgumentError: Unknown option: monitor
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:102:in `merge_options'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `block in options'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `each'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `options'
          /var/lib/digivizer/collection/digivizer-collection/agents/twitter_agent.rb:9:in `<class:TwitterAgent>'
          /var/lib/digivizer/collection/digivizer-collection/agents/twitter_agent.rb:7:in `<top (required)>'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:40:in `load'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:40:in `load_agent'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:146:in `block in <main>'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:156:in `call'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:156:in `block (2 levels) in start'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/channel.rb:241:in `call'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/channel.rb:241:in `block (2 levels) in initialize'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `call'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `block in exec_callback_once_yielding_self'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `each'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `exec_callback_once_yielding_self'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/channel.rb:400:in `handle_open_ok'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/channel.rb:425:in `block in <class:Channel>'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:539:in `call'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:539:in `receive_frameset'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:519:in `receive_frame'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapters/event_machine.rb:327:in `receive_data'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run_machine'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/connection.rb:38:in `start'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:98:in `start'
          /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:145:in `<main>'


* Not sure what this actually is. I assume it's because ret is a RuntimeError
  but I've no idea how this might be so.

  smith-0.5.10/lib/smith/application/agency.rb:22:in `block (4 levels) in setup_queues': undefined method `empty?' for #<RuntimeError: Group does not exist: ndb> (NoMethodError)
        from /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/em/deferrable.rb:151:in `call'
        from /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/em/deferrable.rb:151:in `set_deferred_status'