-
Notifications
You must be signed in to change notification settings - Fork 39
fix: #39 - Improve constructor performance #45
Conversation
Updated the isCID() implementation to use a (private) intermediate method to validate CID without relying on the error being thrown during validation. The validateCID() method also uses this intermediate method, and its behavior has been preserved (e.g., with regards to throwing an error for invalid CIDs).
package.json
Outdated
@@ -1,6 +1,6 @@ | |||
{ | |||
"name": "cids", | |||
"version": "0.5.2", | |||
"version": "0.5.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@realharry version gets updated on publish. Please revert this change.
package.json
Outdated
@@ -43,6 +43,10 @@ | |||
"aegir": "^12.0.8", | |||
"chai": "^4.1.2", | |||
"dirty-chai": "^2.0.1", | |||
"karma-mocha": "^1.3.0", | |||
"karma-mocha-webworker": "^1.3.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets installed with aegir
. Did it not work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diasdavid It was working on my machine, but the CI builds were failing (especially, Jenkins) as shown in my checkin history. I added these to make the CI builds work. I'll revert it back.
package.json
Outdated
@@ -43,6 +43,10 @@ | |||
"aegir": "^12.0.8", | |||
"chai": "^4.1.2", | |||
"dirty-chai": "^2.0.1", | |||
"karma-mocha": "^1.3.0", | |||
"karma-mocha-webworker": "^1.3.0", | |||
"mocha": "^5.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
package.json
Outdated
"karma-mocha": "^1.3.0", | ||
"karma-mocha-webworker": "^1.3.0", | ||
"mocha": "^5.0.1", | ||
"multihashing": "^0.3.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use multihashing-async
to use WebCrypto in the browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was "experimenting", and likely I forgot to remove this package after (accidentally) adding it. I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I must have misunderstood your comment. I'll remove multihashing-async
, but I will still need multihashing
in dev-dependencies. I use multihashing
in my test case cidperf-a.js
.
src/index.js
Outdated
@@ -219,12 +219,11 @@ class CID { | |||
*/ | |||
static isCID (other) { | |||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this try
block still here? shouldn't this be just return !CID._checkCIDComponents(other)
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dignifiedquire If you look at the original implementation of validateCID()
, it relies on the behavior of mh.validate(other.multihash)
, which is supposed to throw an error for invalid multihash. That behavior is preserved in the updated validateCID()
. And, in order to be able to use/share a common implementation between isCID()
and validateCID()
, I had to preserve that behavior in _checkCIDComponents()
as well. Therefore isCID()
still needs to catch the error to return false/true in case an error is thrown (e.g., from mh.validate(other.multihash)
). Contractually (in terms of interface), isCID()
never throws an error since we now have an "outer" try/catch
containing its implementation (even through future updates). I hope my answer makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I forgot that. This is unfortunate, mh.validate
should at least have a counter part that doesn't throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. but, as I indicated in my comment on #39, the try/catch itself is not an issue. It's the cost of creating an error object in the heap that takes time. Depending on the use cases, if we rarely hit mh.validate
part in the "if chaining", the cost of having that is not much. The question is, out of 100 times or a million times calling isCID()
in a typical use case (e.g., are CIDs generally expected to be valid in most cases?, etc.), how many times we end up creating the error object? Having said that, I do agree with you. The code will be cleaner if we don't have to include try/catch
. Thanks.
I removed the karma/mocha dev-dependencies. Getting the same build error as before. @diasdavid Any suggestions? |
Sorry for the mixup. All CI builds pass at this point. Thanks. |
src/index.js
Outdated
} | ||
|
||
if (!Buffer.isBuffer(other.multihash)) { | ||
throw new Error('multihash must be a Buffer') | ||
return 'multihash must be a Buffer' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be thrown errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/index.js
Outdated
* @param {any} other | ||
* @returns {string} | ||
*/ | ||
static _checkCIDComponents (other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be extracted to a separate file given it is an utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a file cid-util.js
or util.js
or util/cid-util.js
. Any preferences?
test/profiling/alt-impl/cid1.js
Outdated
|
||
CID1.codecs = codecs | ||
|
||
module.exports = CID1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call it CID_A and CID_B given that we already use 0 and 1 for version numbers? Pretty cool to include profiling tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this. Instead of duplicating the code. This could just be installing the CID version on npm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it this way will require creating a new npm project folder (with package.json) for testing/profiling (I think). This only works for the "reference" impl. We may still need a separate/duplicated CID for comparison. (See the comment below.)
test/profiling/alt-impl/cid2.js
Outdated
|
||
CID2.codecs = codecs | ||
|
||
module.exports = CID2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why duplicate instead of using the version in src?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diasdavid Thanks for pointing that out. I wasn't really happy with this solution, but I couldn't come up with a better one. The issue was: (1) I wanted to use, as a reference, a "snapshot" of the CID implementation as of, say, Sunday. The CID implementation will (likely) change over time, and using the "current" version (in src) will not make sense for this particular test (say, 2 months from now). (2) I wanted to try a different "alternative" implementations for comparison. And, putting alternative (trial) implementations in the prod source code (e.g., as an extra method, etc.) would not have made sense. (See the comment above.)
That's why I ended up duplicating the code for both. Obviously, there is something "funny" about doing it this way. I'm not sure what would be the best way though. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about it a bit more, I think your suggestion, if I understand correctly, seems to make more sense. I now have to reverse reference and alternative (in my mind):
- Reference: CID version X (in NPM)
- Alternative: CID in src (with alternative/trial implementation)
(When I started out, the CID in src was the reference, and "my version" was the alternative.)
BTW, if we do this, then we cannot have just one test runner (which has an advantage of being simple). We'll need two separate js files/runners, I think. (Correct me if I'm wrong.) And, run two test cases separately to compare the results, which might be an overkill for this particular task. Please let me know what you would prefer, of if you have any alternative suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can npm install a locked version on the profiling test folder (therefore one version) and require the other from the src folder (and you get the other version to compare). Makes sense?
(1) Put checkCIDComponents() function in a separate file. (2) Added a test spec for the new module. (3) Simplify the perf test runner. (Now it tests one CID implementation at a time.)
Updated per feedback: |
@diasdavid Jenkins build fails on Mac. Is there anything I can do to fix this? |
*/ | ||
checkCIDComponents: function (other) { | ||
if (other == null) { | ||
return 'null values are not valid CIDs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a throw Error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not anymore, since throwing errors makes everything slower.
} | ||
|
||
if (!(other.version === 0 || other.version === 1)) { | ||
return 'Invalid version, must be a number equal to 1 or 0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plus this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
if (typeof other.codec !== 'string') { | ||
return 'codec must be string' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plus this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/cid-util.js
Outdated
} | ||
|
||
if (!Buffer.isBuffer(other.multihash)) { | ||
throw new Error('multihash must be a Buffer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like this one :)
src/cid-util.js
Outdated
/** | ||
* Test if the given input is a valid CID object. | ||
* Returns an error message if it is not, or | ||
* Throws an error its multihash is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify it and just make it throw an error if it is doesn't have valid CID components, the error message will tell which component is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diasdavid I'm a little bit confused.
The new implementation of validateCID()
already does that. And, this behavior has been preserved from the existing implementation. The purpose of checkCIDComponents()
was to use the same logic in isCID()
without relying on the error being thrown. The idea was that isCID()
was slow because of the error objects being created on the heap (which obviously we need to verify in the long run). If we change checkCIDComponents()
to throw errors for all these conditions, we are effectively going back to the old implementation where isCID()
directly uses validateCID()
. The purpose of checkCIDComponents()
was to avoid using error objects in CID validation in the constructor (via isCID()
) without having to duplicate the code.
Please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
If that is the case. Let's wrap the mh.validate on a tryCatch block inside the checkCIDComponents itself and only make that function return strings or undefined (should enable V8 to optimize it even more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll check in the updated changes shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
REF: #39
Updated the isCID() implementation to use a (private) intermediate method to validate CID without relying on the error being thrown during validation.
The validateCID() method also uses this intermediate method, and its behavior has been preserved (e.g., with regards to throwing an error for invalid CIDs).