Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.

fix: #39 - Improve constructor performance #45

Merged
merged 7 commits into from
Mar 12, 2018

Conversation

realharry
Copy link
Contributor

@realharry realharry commented Mar 5, 2018

REF: #39

Updated the isCID() implementation to use a (private) intermediate method to validate CID without relying on the error being thrown during validation.

The validateCID() method also uses this intermediate method, and its behavior has been preserved (e.g., with regards to throwing an error for invalid CIDs).

Updated the isCID() implementation to use a (private)
intermediate method to validate CID
without relying on the error being thrown during validation.
The validateCID() method also uses this intermediate method,
and its behavior has been preserved (e.g., with regards to
throwing an error for invalid CIDs).
package.json Outdated
@@ -1,6 +1,6 @@
{
"name": "cids",
"version": "0.5.2",
"version": "0.5.3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@realharry version gets updated on publish. Please revert this change.

package.json Outdated
@@ -43,6 +43,10 @@
"aegir": "^12.0.8",
"chai": "^4.1.2",
"dirty-chai": "^2.0.1",
"karma-mocha": "^1.3.0",
"karma-mocha-webworker": "^1.3.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets installed with aegir. Did it not work for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid It was working on my machine, but the CI builds were failing (especially, Jenkins) as shown in my checkin history. I added these to make the CI builds work. I'll revert it back.

package.json Outdated
@@ -43,6 +43,10 @@
"aegir": "^12.0.8",
"chai": "^4.1.2",
"dirty-chai": "^2.0.1",
"karma-mocha": "^1.3.0",
"karma-mocha-webworker": "^1.3.0",
"mocha": "^5.0.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

package.json Outdated
"karma-mocha": "^1.3.0",
"karma-mocha-webworker": "^1.3.0",
"mocha": "^5.0.1",
"multihashing": "^0.3.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use multihashing-async to use WebCrypto in the browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was "experimenting", and likely I forgot to remove this package after (accidentally) adding it. I'll remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I must have misunderstood your comment. I'll remove multihashing-async, but I will still need multihashing in dev-dependencies. I use multihashing in my test case cidperf-a.js.

src/index.js Outdated
@@ -219,12 +219,11 @@ class CID {
*/
static isCID (other) {
try {
Copy link
Member

@dignifiedquire dignifiedquire Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this try block still here? shouldn't this be just return !CID._checkCIDComponents(other) now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire If you look at the original implementation of validateCID(), it relies on the behavior of mh.validate(other.multihash), which is supposed to throw an error for invalid multihash. That behavior is preserved in the updated validateCID(). And, in order to be able to use/share a common implementation between isCID() and validateCID(), I had to preserve that behavior in _checkCIDComponents() as well. Therefore isCID() still needs to catch the error to return false/true in case an error is thrown (e.g., from mh.validate(other.multihash)). Contractually (in terms of interface), isCID() never throws an error since we now have an "outer" try/catch containing its implementation (even through future updates). I hope my answer makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I forgot that. This is unfortunate, mh.validate should at least have a counter part that doesn't throw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. but, as I indicated in my comment on #39, the try/catch itself is not an issue. It's the cost of creating an error object in the heap that takes time. Depending on the use cases, if we rarely hit mh.validate part in the "if chaining", the cost of having that is not much. The question is, out of 100 times or a million times calling isCID() in a typical use case (e.g., are CIDs generally expected to be valid in most cases?, etc.), how many times we end up creating the error object? Having said that, I do agree with you. The code will be cleaner if we don't have to include try/catch. Thanks.

@realharry
Copy link
Contributor Author

I removed the karma/mocha dev-dependencies. Getting the same build error as before. @diasdavid Any suggestions?

@realharry
Copy link
Contributor Author

Sorry for the mixup. All CI builds pass at this point. Thanks.

src/index.js Outdated
}

if (!Buffer.isBuffer(other.multihash)) {
throw new Error('multihash must be a Buffer')
return 'multihash must be a Buffer'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be thrown errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

src/index.js Outdated
* @param {any} other
* @returns {string}
*/
static _checkCIDComponents (other) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be extracted to a separate file given it is an utility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a file cid-util.js or util.js or util/cid-util.js. Any preferences?


CID1.codecs = codecs

module.exports = CID1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it CID_A and CID_B given that we already use 0 and 1 for version numbers? Pretty cool to include profiling tests!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this. Instead of duplicating the code. This could just be installing the CID version on npm.

Copy link
Contributor Author

@realharry realharry Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it this way will require creating a new npm project folder (with package.json) for testing/profiling (I think). This only works for the "reference" impl. We may still need a separate/duplicated CID for comparison. (See the comment below.)


CID2.codecs = codecs

module.exports = CID2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why duplicate instead of using the version in src?

Copy link
Contributor Author

@realharry realharry Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid Thanks for pointing that out. I wasn't really happy with this solution, but I couldn't come up with a better one. The issue was: (1) I wanted to use, as a reference, a "snapshot" of the CID implementation as of, say, Sunday. The CID implementation will (likely) change over time, and using the "current" version (in src) will not make sense for this particular test (say, 2 months from now). (2) I wanted to try a different "alternative" implementations for comparison. And, putting alternative (trial) implementations in the prod source code (e.g., as an extra method, etc.) would not have made sense. (See the comment above.)

That's why I ended up duplicating the code for both. Obviously, there is something "funny" about doing it this way. I'm not sure what would be the best way though. Any suggestions?

Copy link
Contributor Author

@realharry realharry Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about it a bit more, I think your suggestion, if I understand correctly, seems to make more sense. I now have to reverse reference and alternative (in my mind):

  • Reference: CID version X (in NPM)
  • Alternative: CID in src (with alternative/trial implementation)

(When I started out, the CID in src was the reference, and "my version" was the alternative.)

BTW, if we do this, then we cannot have just one test runner (which has an advantage of being simple). We'll need two separate js files/runners, I think. (Correct me if I'm wrong.) And, run two test cases separately to compare the results, which might be an overkill for this particular task. Please let me know what you would prefer, of if you have any alternative suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can npm install a locked version on the profiling test folder (therefore one version) and require the other from the src folder (and you get the other version to compare). Makes sense?

(1) Put checkCIDComponents() function in a separate file.
(2) Added a test spec for the new module.
(3) Simplify the perf test runner. (Now it tests one CID implementation at a time.)
@realharry
Copy link
Contributor Author

realharry commented Mar 6, 2018

Updated per feedback:
(1) Created a new cid-util.js and its corresponding test spec.
(2) Simplified the test runner so that it now takes only one CID implementation. Will need to run the test multiple times for each alternative implementation to compare.

@realharry
Copy link
Contributor Author

@diasdavid Jenkins build fails on Mac. Is there anything I can do to fix this?

*/
checkCIDComponents: function (other) {
if (other == null) {
return 'null values are not valid CIDs'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be a throw Error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not anymore, since throwing errors makes everything slower.

}

if (!(other.version === 0 || other.version === 1)) {
return 'Invalid version, must be a number equal to 1 or 0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

if (typeof other.codec !== 'string') {
return 'codec must be string'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/cid-util.js Outdated
}

if (!Buffer.isBuffer(other.multihash)) {
throw new Error('multihash must be a Buffer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like this one :)

src/cid-util.js Outdated
/**
* Test if the given input is a valid CID object.
* Returns an error message if it is not, or
* Throws an error its multihash is invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify it and just make it throw an error if it is doesn't have valid CID components, the error message will tell which component is invalid.

Copy link
Contributor Author

@realharry realharry Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid I'm a little bit confused.

The new implementation of validateCID() already does that. And, this behavior has been preserved from the existing implementation. The purpose of checkCIDComponents() was to use the same logic in isCID() without relying on the error being thrown. The idea was that isCID() was slow because of the error objects being created on the heap (which obviously we need to verify in the long run). If we change checkCIDComponents() to throw errors for all these conditions, we are effectively going back to the old implementation where isCID() directly uses validateCID(). The purpose of checkCIDComponents() was to avoid using error objects in CID validation in the constructor (via isCID()) without having to duplicate the code.

Please advise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

If that is the case. Let's wrap the mh.validate on a tryCatch block inside the checkCIDComponents itself and only make that function return strings or undefined (should enable V8 to optimize it even more).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll check in the updated changes shortly.

@daviddias daviddias requested a review from vmx March 7, 2018 08:07
Copy link
Member

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@daviddias daviddias merged commit dc0bfd3 into multiformats:master Mar 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants