forked from filterfish/smith2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path.todo
316 lines (210 loc) · 12.4 KB
/
.todo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
~~~~~~~~ TODO ~~~~~~~~
* If an agent hasn't started after 20 seconds (ie there are no keepalives)
set the state to dead.
* Have an top like tool. Send the agent state to a queue so that anything
can attach to the queue and do with it what it will. If this is done on a
topic queue with fanout then a tool would be able to get the state of every
agent in the network. Obviously if nothing is listening on the queue then no
messages would be sent.
* Agents should be namespace-able. The agency and smithctl should understand
this. Agents should be controllable on a namespace basis, e.g. shutdown all
agents in a particular namespace. Consider using vhosts.
* Allow an agent to verify that it's the same version as that on disc. Maybe
implement an autoload system. Look at shogun(?)
* Think about whether to not send messages to agents if they aren't alive.
At the moment this is a little inconsistent: some commands check to see it the
agent is alive and others check to see if there is anything listening on the
queue.
* Have an option that shuts down all agents when the agency is closed. Maybe a
-e option to stop agency
* Have a config item where I can put a list of agents that I can then start
(or stop or kill for that matter) all at the same time. For example the command
might be:
smithctl group start <group name> smithctl group stop <group name>
See "Agents should be namespace-able"
* The list command should check to see if the agent is actually alive and
add that information to the output. [done - check]
* Have a flag for pop-queue that doesn't ack. [done]
* Don't send messages to queues that don't exist [send, smithctl messages,
pop-queue.
* list command should display the state of the agent. [partially done]
* Think about the idea of a run list. Or is this already implemented in
the state machine.
* Have a whitelist/blacklist mechanism listing agents that an agent can
receive messages from.
* Have an errback if publish_and_receive times out. [done - check]
* Add a message retry mechanism. [done]
* Name reply queues based on the agent name. This would only work with the
default queue.
* Delete queue should be part of smithctl.
* smithctl message should have some sort of printf formatting string
that can be used to construct the protocol buffer message.
* Change all log messages to use the block form instead of the argument passing.
* Have an option to not ack the pop-queue command. See Receiver#pop for a more
detailed description. [done]
* Have a SMITH_LOAD_PATH environment variable that overrides the path set up in
the config
* Have a path where encoder files can be loaded from. [done]
* Have something monitor the agents' state machine for signs of problems.
For example if an agent is in the starting state for longer than a minutes or
so
* Be able to change log level for individual loggers (ie per class). [done]
* Add to_s to the Encoders, so when I call inspect on the message it
gives a proper representation.
* Setup a dead letter queue in case of error. Have a look at the dead letter
stuff in rabbitmq.
* The number of threads should be configurable programatically
queues per process.
* Have message counters for every agent. [partially done]
* Set default config for all options that make sense (which is most of them).
* Add checking to the config. For example the logging class can only accept File
or Stdout appenders but this is not checked for.
* Think about a generator to write the directory structure and what that
directory structure might look like. Change Smith.root to the root of directory
structure. You will need to think about what to do with the agent_path (ie the
fact that multiple paths are allowed).
* I've added a method called add_agent_load_path to the bootstrapper. When
thinking about the above make sure you check whether that method fits.
* Think about how to make ./bin/send a general tool. Maybe write some rake
tasks.
* Make an equivalent of ripl-rails: ripl-smith
* Add command to smithctl to return the queue names an agent is using.
* Add the ability to clone an agent. See next point (It looks as if the agent
lifecycles ...).
* It looks as if the agent lifecycles are keyed on name (see
Agency#acknowledge_start & Agency#acknowledge_stop). This should be the pid
or maybe both but just using name I can't have more than one agent which
stops things like cloning.
* Add code to verify the protocol buffer payload type. It should go in
Sender#_publish. [done - needs checking]
* Message headers:
* sent time,
* message checksum.
* Add the message_id to all log messages.
* Add i18n. It will also make error messages more consistent.
* Make sure application specific queues don't have the smith namespace.
* Allow the agent to be configured so auto_ack is off. auto_ack is nice when
you are starting out but in high throughput applications it's unlikely to
be the right thing to do so the auto_ack => false in the queue declaration
is noisy.
* Add command line options to the agency:
* dump the config.
* daemonise the agency (and create the pid file.)
* specify the pid file.
* specify the environment
* specify a different config file.
* start the agency with a clean db
* dump a specific queues' acl.
In fact look at smith1 agency to see what options it had.
* Write a startup script.
* Have an option to freeze an agent. Not sure of a use case but it could be
quite useful. It might be quite complicated to implement though.
* Clean up the pb cache handling. Have something like:
Smith.set_acl_path
oar maybe add it to the Smith run, Smith is useless without it.
* Have a pb cache reload mechanism at the moment you need to restart the agency.
* Think about passing a class to the receiver. This might be a nice way of
structuring the message handlers.
* Add support for connecting to rabbitmq using ssl with certificates, see
http://hg.rabbitmq.com/rabbitmq-auth-mechanism-ssl/file/default/README
* Fix incorrect message handling (if an acl gets sent to an agent that can't
handle it). At the moment it doesn't catch the exception and the agent dies. It
should probably send it a dead letter queue. Either way at the moment the
message has to be manually removed from the queue.
It's actually a bit tricky to know what to do here: should I assume the agent
is correct and that the message is in error in which as the message should be
sent to the dead letter queue; or should I assume the message is correct and
the agent has a bug in which case the current behaviour is correct.
* Add warning that if there is an error of type:
Channel level exception: PRECONDITION_FAILED - unknown delivery tag 2. Class
id: 60, Method id: 80, Status code : 406
it probably means that you've already acked the message.
* Sort the output of the agents command into alphabetical order. [done]
* Clean up some of the logging messages, for example:
Messaging::Sender prints:
Publishing to: agency.control. [message]: [agency_command] -> {:command=>"list"}
where Receiver::Reply prints:
Payload content: [queue]: a1bf3328e2f3db87 [message]: [string] -> No agents running
this is inconsistent.
* Remove the commas in the agents command. [done]
* If nothing is listening on the reply queue to agency command then don't send
the message back and delete the queue. I think this is going to be quite hard
to do. I could provide an option to Reply#reply but that smells.
* Check all the commands for consistency.
* Split the functionality of smithctl. At the moment this simply sends any
commands to the agency but some commands are better served by executing the
command directly. For example removing a queue.
Removing a queue is quite problematic as I cannot get the queue attributes
from the broker (not via amqp anyway -- at least I cannot see a way of doing
it) but I can declare the queue with the :passive option which will use any
existing queues but will cause a channel exception if the queue doesn't exist
which makes life really awkward for the agency. If I do this in smithctl then
a channel exception can be handled properly without impacting the agency.
So in short have local and remote commands!
* Add next_ticks around things like requeue and general recovery.
* Put a lot more consideration into recovery. Fox example the agency will die if
there is a queue error.
* Add a class to define the queue name and the type of message they can listen
on. This can be used to define the queues in a centralised place which should
avoid configuration errors.
* Separate out the Payload and ACL classes. Or at least check that they should
be separated out.
Have an ACL factory that returns decorated ACL objects and have the payload
use the factory.
If I set the content in the constructor is that good enough? How about
instantiating nested objects?
* When an ACL is incomplete add code to check which field is missing. At the
moment an exception is being raised, the problem is that I've overloaded the
#inspect and it doesn't display what field is nil.
* Add message versioning.
* Add code to see if the reactor has started. If someone runs Smith.start it
appears to hang; at least put a warning.
* Check to see if the acl directory actually exists.
* change encoders to be acl.
* Add version command to smith, obviously with support from the agency. [done]
* Add command to list what agents are in each group. Add a --groups option to
the agents command
* Add a headers hash to all messages so there are no
AMQP::IncompatibleOptionsError due to the queue being defined with or without
the headers hash.
* Add an option to pop that returns immediately. If I remove a lot of messages
from a queue smithctl will timeout due to the time taken to delete the
messages. Add an option that returns the number of messages to be removed
without waiting for the messages to be removed.
* Add default on_requeue proc. Check to see if it makes sense to do so or throw
and exception if it doesn't.
* Add a message TTL so that messages will get timed out after a period of time.
This might be particularly useful for smithctl.
* Create a mixin that can be included in the agent to allow a key-value store.
This would be useful for storing transient data that needs to be persisted.
* Check for the existence of the various paths in the config and log a message
if it doesn't exist.
* Payload should be able to be instantiated from an undecoded message.
* Add better error reporting. When someone sets the message content that isn't a
payload object and exception should be raised.
* If the list command is passed an agent just give details about that agent -
maybe get rid of the stats command.
~~~~~~~~ BUGS ~~~~~~~~
* Add the internal (to smith) pb directory to the pb load path. [done]
* Smith::ACL::Payload.new(<payload type>, :from => payload) doesn't work for
:default messages.
* Have a callback per message type. In effect implement pattern matching instead
of using a case statement.
* Fix bug where I get an exception if the pb class has already been loaded:
/usr/local/ruby-1.9.3-p0/lib/ruby/gems/1.9.1/gems/ruby_protobuf-0.4.11/lib/protobuf/message/message.rb:43:in `define_field': Field tag 1 has already been used in Smith::ACL::AgencyCommand. (Protobuf::TagCollisionError)
from /usr/local/ruby-1.9.3-p0/lib/ruby/gems/1.9.1/gems/ruby_protobuf-0.4.11/lib/protobuf/message/message.rb:23:in `required'
from /var/cache/smith/pb/agency_command.pb.rb:18:in `<class:AgencyCommand>'
from /var/cache/smith/pb/agency_command.pb.rb:16:in `<module:ACL>'
from /var/cache/smith/pb/agency_command.pb.rb:15:in `<module:Smith>'
from /var/cache/smith/pb/agency_command.pb.rb:14:in `<top (required)>'
In practice this only happens when a pb is compiled.
* Fix force in protocol_buffer_compiler. There's no way to currently force the
recompilation the pb files.
* Fix the incredibly slow Payload creation time. It's about 500 times slower
than instantiating the protocol buffer itself.
* Make sure all time fields in pb files are integers.
* Check queue/exchange leaks. The ones I know of at the moment are:
* if the publish_and_receive method is used and the message is not replied too
then there is a exchange/queue leak.
* I think there is another case but I cannot think of it.
* Start should check to see if the agent exists before starting it.