-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-200 Proposal for Multilang's Metrics feature #38
Conversation
@@ -60,6 +60,9 @@ public Object getValueAndReset() { | |||
m.put("unusedBytes", memUsage.getCommitted() - memUsage.getUsed()); | |||
return m; | |||
} | |||
public void updateMetricFromRPC(List<Object> params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using tabs instead of spaces here. For consistency it would be nice to switch to spaces for all of the indentation.
The code also seems to have missed ShellSpout.java. |
@revans2 , thank you very much for your review, I will redesign the code according your advice |
@revans2 , I have redesign the code and the usage according your advice. please review again. |
import backtype.storm.metric.api.IMetric; | ||
|
||
public interface IShellMetric extends IMetric { | ||
public static final String SHELL_METRICS_UPDATE_METHOD_NAME = "updateMetricFromRPC"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor whitespace issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced all tab to whitespace, thx
Looks good. All of the reflection code I would like to see replaced. Also I'd like to hear back from you about my suggestion on the single method call. Like I said I am fine with how you did it, but I would like to hear your opinion on it. |
I have changed the code as follows:
As 3), when I modify the TopologyContext.java,I found a bug in registerMetrics( https://issues.apache.org/jira/browse/STORM-254 ), So I commented getRegisteredMetricByName when implemention. again, please review. |
@@ -26,6 +26,8 @@ | |||
import backtype.storm.metric.api.ICombiner; | |||
import backtype.storm.metric.api.ReducedMetric; | |||
import backtype.storm.metric.api.CombinedMetric; | |||
import backtype.storm.metric.api.rpc.IShellMetric; | |||
import backtype.storm.spout.ShellSpout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not needed any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has removed
The code looks fine to me once you remove the unneeded imports. But I would like to see a test updated to at least exercise some of these data paths, but preferably some unit tests too. I should have mentioned it before, sorry. You are doing great with the changes keep up the good work! |
I have removed the unneeded imports in TopologyContext.java. |
Great to hear that they are working well in production. Yes currently all of the tests are written with test/clj, even the ones that test pure java. This is something that Nathan wanted. If you want to write them in junit I have no objection to it, but we probably need to have that discussion again on the mailing lists before you spend too much time on writing them one way. |
@revans2 , I have merged your update, all test cases always run success. thank you very much. And I submit all changed to github, please review pull request again. Thanks for your carefully review. |
+1 Thanks again for your efforts on this. |
@revans2 , this pull request have anything else need I do? or just waiting for you merge into master? |
if (nameObj != null && nameObj instanceof String) { | ||
metricName = (String) nameObj; | ||
} | ||
shellMsg.setMetricName(metricName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not throw a ClassCastException here like with several other keys above (i.e., "command"
). We would only see a NullPointerException in ShellSpout or ShellBolt when we check that the String is not empty.
This is really minor.
I just had a couple of minor comments. I'm +1 even without further changes. I think we will want to reformat the clojure to match the style guidelines, but this can be done when we merge this in. |
@dashengju Sorry I didn't respond to your comment sooner. It takes 2 +1s from committers to have the code merged in. d2r is also a committer so when he is ready we can merge it in. Thanks for your patience. |
+1 looks okay to me. |
Metrics test hangs for me (tried three times). OSX/1.7.0_60 Still looking...
|
There is an issue with the shutdown of the local cluster when using simulated time. It looks as though we need to keep advancing the cluster time until we fully shut down the cluster, or else it will sleep forever. I will continue looking into this a when I get more time. |
@d2r , should I solve "the issue with the shutdown of the local cluster when using simulated time" at this pull request? |
@dashengju, I looked again at the issue last night and saw the same hang when trying to shut down the workers. When the cluster shuts down after the test, it has the supervisor kill all workers. But when it waits for worker termination there is a hang since simulated cluster time is no longer advancing. Right now I do not think this is happening as a result of your changes. If I commented out the new tests, I was able to run all tests successfully. However, I need to see if this is happening with other environments. Are you able to run all of the tests successfully? |
@d2r , yes, I can run all of the tests successfully with "mvn clean install"。I ran the tests 5 times with 4 times successfully and 1 time failure(not hang, 2 error asserts about other test). And I changed one of timeout: --- a/storm-core/src/clj/backtype/storm/testing.clj -(def TEST-TIMEOUT-MS 5000) |
Another info: I have run just metrics_test.clj 10 times with successfully. |
OK, @revans2 confirms this hangs for him as well when merged into master. It hangs for me in a Redhat environment and on OSX. I will try a fix today. |
OK, can you try with this patch? diff --git a/storm-core/src/clj/backtype/storm/testing.clj b/storm-core/src/clj/backtype/storm/testing.clj
index c3eb76f..70c783a 100644
--- a/storm-core/src/clj/backtype/storm/testing.clj
+++ b/storm-core/src/clj/backtype/storm/testing.clj
@@ -234,7 +234,10 @@
(log-error t# "Error in cluster")
(throw t#))
(finally
- (kill-local-storm-cluster ~cluster-sym)))))
+ (let [keep-waiting?# (atom true)]
+ (future (while @keep-waiting?# (simulate-wait ~cluster-sym)))
+ (kill-local-storm-cluster ~cluster-sym)
+ (reset! keep-waiting?# false))))))
(defmacro with-simulated-time-local-cluster
[& args] The above seems to fix the issue. It could prevent spurious hangs we see with other tests as well.
This change is not needed. If it does not progress within 5 seconds, then it probably will not progress after another 25 seconds either. (It is unrelated to the hang.) |
@d2r , I have merged your patch. ========== below is the stdout error info ============================================================== ========== below is jstack of main thread ============================================================== |
new information: I am not familiar with with-simulated-time-local-cluster. compare with test-builtin-metrics-1, I found that test-builtin-metrics-2 and test-builtin-metrics-3 have follow code: |
@dashengju, thanks for trying it out.
The hang that was happening on your test should be fixed with the patch. This time-out error is a different issue that existed before your changes, and it does not appear to be related to your changes specifically. I think we should file a separate JIRA for this issue and proceed with this pull request. If you agree, would you merge the patch and update your branch so that this pull request is updated? |
@d2r , I agree with you to file a separate JIRA for time-out error. I have merged your patch and updated this pull request. Thanks for your review and your patch. Please review again. |
I am +1 with the new changes. |
I merged this into master, but I could only find a name of "DashengJu" on both github and JIRA. if you would like me to update the README with a different name to give you proper credit for your work please post a comment here and I'll be happy to update it. |
Make Config.ThriftPurpose enum static.
BUG-41859: Apply to 2.2-maint
for https://issues.apache.org/jira/browse/STORM-200
Storm 0.9.2 exposes a metrics interface to report summary statistics across the full topology. We can build our own metric, and build metrics consumer to use those statistics.
But when we use Multilang(ie. Python), we can not use this feature. So we want to summit a proposal for multilang's metrics.
The specifics of the proposal:
a) first, create a metric object and register in ShellSpout/ShellBolt's sub-class,
b) then update the metric in Python spout/bolt process through RPC call.
In package backtype.storm.metric.api add new package backtype.storm.metric.api.rpc, which includes:
a) IShellMetric interface, which extends IMetirc interface, to support:
public void updateMetricFromRPC(Object params);
b) IShellMetric implemention: AssignableShellMetric, CombinedShellMetric, CountShellMetirc, ReducedShellMetric;
a) add a command "metrics" for shell process to make RPC call. The protocol is:
{"command":"metrics", "name":"metric-registerd-name", "params": param-value}
b) moidfy ShellSpout/ShellBolt to support command metrics: we define a method handleMetrics to handle RPC call from Python spout/bolt process;
c) modify TopologyContext.java, add getRegisteredMetricByName to get registerd metric to update;
c) storm.py add method rpcMetrics(name, params), user can update remote metric through this RPC call.
--------usage example----------------------------------------------------------------------------------------------------------------
--------ExamplePythonBolt.java----------------------------------------------------------------------------------------------------
--------ExampleBolt.py---------------------------------------------------------------------------------------------------
--------ExamplePythonSpout.java---------------------------------------------------------------------------------------------------
--------ExampleSpout.py---------------------------------------------------------------------------------------------------