-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy path.todo
338 lines (238 loc) · 15 KB
/
.todo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
~~~~~~~~ TODO ~~~~~~~~
Priority
~~~~~~~~
* Add a queue length method to smith:
Smith.queue_length("queue name") do |length|
end
Good Ideas
~~~~~~~~~~
* At the moment there is a global SMITH_HOME. This causes problems when you have
multiple agent paths. Make SMITH_HOME the same as each agent path.
* Add a hook that gets run when agents are started and stopped. This could be a
shell script. Might be handly for monitoring.
Normal
~~~~~~
* Replace Smith::Messaging::Responder with a Completion and use success and
fail to return data back to smithctl thereby ensuring that errors get
reported.
* If an agent hasn't started after 20 seconds (ie there are no keepalives)
set the state to dead.
* Add --config to the agency and smithctl
* Have a top like tool. Send the agent state to a queue so that anything
can attach to the queue and do with it what it will. If this is done on a
topic queue with fanout then a tool would be able to get the state of every
agent in the network. Obviously if nothing is listening on the queue then no
messages would be sent.
* Allow an agent to verify that it's the same version as that on disc. Maybe
implement an autoload system. Look at shogun(?)
* Think about whether to not send messages to agents if they aren't alive.
At the moment this is a little inconsistent: some commands check to see it the
agent is alive and others check to see if there is anything listening on the
queue.
* Don't send messages to queues that don't exist [send, smithctl messages,
pop-queue.
* Think about the idea of a run list. Or is this already implemented in
the state machine.
* Have a whitelist/blacklist mechanism listing agents that an agent can
receive messages from.
* Have a SMITH_LOAD_PATH environment variable that overrides the path set up in
the config
* Have something monitor the agents' state machine for signs of problems.
For example if an agent is in the starting state for longer than a minutes or
so
* Be able to change log level for individual loggers (ie per class).
* Setup a dead letter queue in case of error. Have a look at the dead letter
stuff in rabbitmq.
* The number of threads should be configurable programatically
queues per process.
* Have message counters for every agent. [partially done]
* Keep stats on each sender. Messages per second, etc.
* Add checking to the config. For example the logging class can only accept File
or Stdout appenders but this is not checked for.
* Check for the existance of the config and throw an error if it can't be found.
* Think about a generator to write the directory structure and what that
directory structure might look like. Change Smith.root to the root of directory
structure. You will need to think about what to do with the agent_path (ie the
fact that multiple paths are allowed).
* I've added a method called add_agent_load_path to the bootstrapper. When
thinking about the above make sure you check whether that method fits.
* Make an equivalent of ripl-rails: ripl-smith
* Add command to smithctl to return the queue names an agent is using. This
could be done by interrogating the channel.
* Add the ability to clone an agent. See next point (It looks as if the agent
lifecycles ...).
* Message headers:
* sent time,
* message checksum - not sure if this is required
* TTL - this can be used to get rid of stray messages.
* Add the message_id to all log messages.
* Add i18n. It will also make error messages more consistent.
* Make sure application specific queues don't have the smith namespace.
* Allow the agent to be configured so auto_ack is off. auto_ack is nice when
you are starting out but in high throughput applications it's unlikely to
be the right thing to do so the auto_ack => false in the queue declaration
is noisy.
* Add command line options to the agency:
* fix --daemonise
* specify the environment.
* specify a different config file.
* start the agency with a clean db.
* dump a specific queues' acl.
* specify the agency log level
* add acl reload function to the agency.
* Add smitch options/commands
* dump the config -- this should be in smithctl
* sort the long output of the list command.
* add --kill option to the agency. This is addition to the stop agency command
* add --list option to the config command
* add --path to the agents command: displays the load path.
* add --clear-cache option to the acls command.
* add --agency & --smithctl option to the version command.
* add --pretty-print option to pop command
* add --regex to kill command
* add --name to kill command
* add queue length command.
* add --number to the start command.
* pop should not die if there is an unkown message type on the queue being pop'ed
* add -0 option to smithctl push
* stat should use the management plugin. It's seriously powerful!
* logger command should list the current log level.
In fact look at smith1 agency to see what options it had.
* New commands:
* ping command -- ping an agent.
* Allow ACL types to be specified using their camel case representation.
* Use prefetch with the pop command.
* Write a startup script.
* Have an option to freeze an agent. Not sure of a use case but it could be
quite useful. It might be quite complicated to implement though.
* Clean up the pb cache handling. Have something like:
Smith.set_acl_path
or maybe add it to the Smith run, Smith is usless without it.
[done in the sense that I've added load_acls method but not included it in
Smith.start]
* Have a pb cache reload mechanism at the moment you need to restart the agency.
* Think about passing a class to the receiver. This might be a nice way of
structuring the message handlers.
* Add support for connecting to rabbitmq using ssl with certificates, see
http://hg.rabbitmq.com/rabbitmq-auth-mechanism-ssl/file/default/README
* Fix incorrect message handling (if an acl gets sent to an agent that can't
handle it). At the moment it doesn't catch the exception and the agent dies. It
should probably send it a dead letter queue. Either way at the moment the
message has to be manually removed from the queue.
It's actually a bit tricky to know what to do here: should I assume the agent
is correct and that the message is in error in which case the message should
be sent to the dead letter queue; or should I assume the message is correct
and the agent has a bug in which case the current behaviour is correct.
* Add warning that if there is an error of type:
Channel level exception: PRECONDITION_FAILED - unknown delivery tag 2. Class
id: 60, Method id: 80, Status code : 406
it probably means that you've already acked the message.
* Clean up some of the logging messages, for example:
Messaging::Sender prints:
Publishing to: agency.control. [message]: [agency_command] -> {:command=>"list"}
where Receiver::Reply prints:
Payload content: [queue]: a1bf3328e2f3db87 [message]: [string] -> No agents running
this is inconsistent.
* If nothing is listening on the reply queue to agency command then don't send
the message back and delete the queue. I think this is going to be quite hard
to do. I could provide an option to Reply#reply but that smells.
* Check all the commands for consistency.
* Add next_ticks around things like requeue and general recovery.
* Put a lot more consideration into recovery. Fox example the agency will die if
there is a queue error.
* Add a class to define the queue name and the type of message they can listen
on. This can be used to define the queues in a centralised place which should
avoid configuration errors.
* Add message versioning.
* Add code to see if the reactor has started. If someone runs Smith.start it
appears to hang; at least put a warning.
* Check to see if the acl directory actually exists. This logs a message it
isn't. Is that good enough?
* Add command to list what agents are in each group. Add a --groups option to
the agents command
* Add a headers hash to all messages so there are no
AMQP::IncompatibleOptionsError due to the queue being defined with or without
the headers hash.
* Add an option to pop that returns immediately. If I remove a lot of messages
from a queue smithctl will timeout due to the time taken to delete the
messages. Add an option that returns the number of messages to be removed
without waiting for the messages to be removed.
* Add default on_requeue proc. Check to see if it makes sense to do so or throw
and exception if it doesn't.
* Add a message TTL so that messages will get timed out after a period of time.
This might be particularly useful for smithctl.
* Create a mixin that can be included in the agent to allow a key-value store.
This would be useful for storing transient data that needs to be persisted.
* Check for the existence of the various paths in the config and log a message
if it doesn't exist.
* Payload should be able to be instantiated from an undecoded message.
* Add better error reporting. When someone sets the message content that isn't a
payload object and exception should be raised.
* If the list command is passed an agent just give details about that agent -
maybe get rid of the state command.
* Add a stopping? method to the Smith class. Agents can then check this & do the
right thing when shutting down.
* Add check for ACLs of the same name.
* Add column(1) like formatting to the output of smithctl.
* Add support for flapping in the agent monitor. At the very least allow the
agent stop message to be queued (this might be fraught however).
* Add durable to each message.
* /var/cache/smith/acl is not being created properly.
* Fix the formatting of return data from the commands. I'm relying on pp and
it's not working particularly well.
* Define === for the options Aray in BaseCommand. I can then just have a case
statement for each option. This is the closest I'm going to get to pattern
matching.
* Use git instead of leveldb for the agent config. This can then be used to
implement rollback.
* Have the equivelent of a utmp log.
~~~~~~~~ BUGS ~~~~~~~~
* Smith::ACL::Payload.new(<payload type>, :from => payload) doesn't work for
:default messages.
* Have a callback per message type. In effect implement pattern matching instead
of using a case statement.
* Fix force in protocol_buffer_compiler. There's no way to currently force the
recompilation the pb files.
* Fix the incredibly slow Payload creation time. It's about 500 times slower
than instantiating the protocol buffer itself. [ I think this is fixed -- it
was due to not using lazy log messages. Needs checking.]
* Make sure all time fields in pb files are integers.
* Check queue/exchange leaks. The ones I know of at the moment are:
* if the publish_and_receive method is used and the message is not replied too
then there is a exchange/queue leak.
* I think there is another case but I cannot think of it.
* Fix this bug. I think it's probably due to blah ||= blah. But that's just a guess.
2012/05/14 06:23:08.057818543 [10753] ERROR - Smith::AgentBootstrap:55 - ArgumentError: Unknown option: monitor
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:102:in `merge_options'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `block in options'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `each'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/agent.rb:95:in `options'
/var/lib/digivizer/collection/digivizer-collection/agents/twitter_agent.rb:9:in `<class:TwitterAgent>'
/var/lib/digivizer/collection/digivizer-collection/agents/twitter_agent.rb:7:in `<top (required)>'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:40:in `load'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:40:in `load_agent'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:146:in `block in <main>'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:156:in `call'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:156:in `block (2 levels) in start'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/channel.rb:241:in `call'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/channel.rb:241:in `block (2 levels) in initialize'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `call'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `block in exec_callback_once_yielding_self'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `each'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/callbacks.rb:63:in `exec_callback_once_yielding_self'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/channel.rb:400:in `handle_open_ok'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/channel.rb:425:in `block in <class:Channel>'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:539:in `call'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:539:in `receive_frameset'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapter.rb:519:in `receive_frame'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amq-client-0.9.2/lib/amq/client/async/adapters/event_machine.rb:327:in `receive_data'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run_machine'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/amqp-0.9.5.pre/lib/amqp/connection.rb:38:in `start'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith.rb:98:in `start'
/usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/smith-0.5.8/lib/smith/bootstrap.rb:145:in `<main>'
* Not sure what this actually is. I assume it's because ret is a RuntimeError
but I've no idea how this might be so.
smith-0.5.10/lib/smith/application/agency.rb:22:in `block (4 levels) in setup_queues': undefined method `empty?' for #<RuntimeError: Group does not exist: ndb> (NoMethodError)
from /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/em/deferrable.rb:151:in `call'
from /usr/local/ruby-1.9.3-p194/lib/ruby/gems/1.9.1/gems/eventmachine-1.0.0.beta.4/lib/em/deferrable.rb:151:in `set_deferred_status'