Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store sharding function used in the repo. #13

Merged
merged 17 commits into from
Jan 17, 2017
Merged

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Dec 22, 2016

Store the sharding function used in the file "SHARDING" in the repo.
To make this work the sharding function is now always specified as a
string.

@whyrusleeping
Copy link
Member

What are some example strings? We should make sure that these are somewhat nice multicodecs, cc @Kubuxu

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, a few things i noticed

path: path,
getDir: getDir,
sync: sync,
func New(path string, fun0 string, sync bool) (*Datastore, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a New that detects the type from disk, and a NewWithFunc that has this signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but I will still need the "auto" test case as that is used by the conversion utility.

Copy link
Contributor Author

@kevina kevina Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this yet. Do you basically want:

func New(path string, sync bool) (*Datastore, error) {
  NewWithFunc(path, "auto", sync)
}

func NewWithFunc(path string, fun0 string, sync bool) {...}

Is it that you don't like the special "auto" value? (I don't want to eliminate the use of that string as it makes writing the conversion function/utility easier, so using two separate functions won't really simplify any code.)

str := padding + noslash
return str[len(str)-suffixLen:]
fun := ""
if fun0 == "auto" && fun1 == "auto" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should make this a switch statement to be more idiomatic go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I don't particular see the difference, but don't object either. In the future I will try to remember that switch is generally preferred over multiple else ifs.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 6, 2017

What is the content of the SHARDING file directly? It isn't clear from the code?

@kevina
Copy link
Contributor Author

kevina commented Jan 6, 2017

What is the content of the SHARDING file directly? It isn't clear from the code?

Example file:

v1/next-to-last/2

As discussed in ipfs/kubo#3463

@Kubuxu
Copy link
Member

Kubuxu commented Jan 6, 2017

Could you change it to previously proposed: /repo/flatfs/shard/v1/next-to-last/2, with leading slash. There are no costs in it but it might make someone's day some time in future.

@kevina
Copy link
Contributor Author

kevina commented Jan 6, 2017

@whyrusleeping agree with @Kubuxu? I have no problem doing it.

@whyrusleeping
Copy link
Member

Yeap, agreed with @Kubuxu

@kevina
Copy link
Contributor Author

kevina commented Jan 7, 2017

Could you change it to previously proposed: /repo/flatfs/shard/v1/next-to-last/2

Done. Also refactored the parsing of the shard identifier string to make it more obvious what it going on.

return nil, fmt.Errorf("shard function not specified")
case fun0 == "auto":
fun = fun1
case fun1 == "auto":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate the task of opening an existing datastore and creating a new one. That way we can remove a lot of the weirdness here

Copy link
Contributor Author

@kevina kevina Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not. The "weirdness" is also there to allow opening an existing repo without the SHARDING file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm.... We're going to have to run a migration anyways. I think thats acceptable.

Copy link
Contributor Author

@kevina kevina Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping, okay to do things the way you want I will basically have to redo this pull request and the conversion code will require some adjusting. Let me know. (I'm only asking because you indicated you might do it in a private conversation and don't want to duplicate work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started work on this. Will have it ready some time Wed.

@kevina
Copy link
Contributor Author

kevina commented Jan 11, 2017

@whyrusleeping okay please have another look, there are two fixme, but I think this is close to what you want. There is a separate Create and Open method.

This fix also makes the flatfs a little more robust, espacally after the fixme to check if an existing directory without a SHARDING file is not empty.

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Githubs review thing is really annoying me... I thought I submitted this a day ago...

func parseShardFunc(str string) shardId {
str = strings.TrimSpace(str)
parts := strings.Split(str, "/")
// ignore prefix for now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check to make sure that the string has the expected prefix

}
}

func parseShardFunc(str string) shardId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should a pointer to the shardId and an error here

return shardId{funName: parts[0], param: parts[1]}
case 1:
return shardId{funName: parts[0]}
default: // can only happen for len == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case should be an error

}
switch len(parts) {
case 3:
return shardId{version: parts[0], funName: parts[1], param: parts[2]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of returning strings here, we should probably do the parsing from Func() in here. That way we don't end up creating an 'invalid' shardId object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will see what I can do. I originally had parseShardFunc return an error but decided to push all the error checking into Func(). Will try refactoring again and see if I can make it work better the way I think you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

}

func ReadShardFunc(dir string) (string, error) {
fun := "auto"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldnt ever return 'auto' from reading the func from disk. Either its a valid function, or its an error, try to validate input as much as possible as soon as possible.

if err != nil {
return err
}
if str == IPFS_DEF_SHARD {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets separate the step of writing the readme from writing the sharding function, and also make sure to error out or print a log message if we try to write a readme for an unsupported sharding type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I don't think returning an error is a good idea. Especially if the flatfs datastore could be used outside of IPFS. I also don't like the idea of a warning for the same reason, but can add one if you still think it is a good idea.

I will separate the step from the writing of the sharding function.

@@ -214,40 +218,42 @@ func testStorage(p *params, t *testing.T) {
if !seen {
t.Error("did not see the data file")
}
if fs.ShardFunc() == flatfs.IPFS_DEF_SHARD && !haveREADME {
t.Error("expected _README file")
} else if fs.ShardFunc() == flatfs.IPFS_DEF_SHARD && !haveREADME {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two cases the same?

Copy link
Contributor Author

@kevina kevina Jan 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Yes. Fixed.

@kevina
Copy link
Contributor Author

kevina commented Jan 11, 2017

@whyrusleeping I should have addressed all your concerns in the latest review except for returning an error if no README can be created because I am not sure that it was a good idea. See my comment.

@kevina
Copy link
Contributor Author

kevina commented Jan 11, 2017

Note the two special files are named:

  _README
  SHARDING

I use '_README' because that is what @jbenet wanted. For consistency I can also rename 'SHARDING' to '_SHARDING' if you want. Let me know. For the record, I think using the special name '_README' is a good idea to help make it clear there is something special about it.

@whyrusleeping whyrusleeping requested a review from Kubuxu January 13, 2017 17:51
getDir: getDir,
sync: sync,
var DatastoreExists = errors.New("Datastore already exist")
var DatastoreDoesNotExist = errors.New("Datastore directory does not exist")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use one var block and Err name prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

str = str[len(PREFIX):]
}
parts := strings.Split(str, "/")
if len(parts) == 3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the v1 optional? it doesn't have to be. Let's make it simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree (with "it doesn't have to be") but @whyrusleeping would probably say the same thing so I will change it.

}

func ReadShardFunc(dir string) (*ShardIdV1, error) {
buf, err := ioutil.ReadFile(filepath.Join(dir, "SHARDING"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the name of the file constant as it is reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a policy on this? In this particular case it seams like an overkill to me. We also don't seam to be very consistent about it elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to use constants for these sorts of things when we can, There are definitely cases where we forget to do so, but in general its a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, Done. However, I think it makes code harder to read and I also am horrible at naming the things.

return nil, err
}

if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double err handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Will fix.

return nil, fmt.Errorf("invalid prefix in shard identifier: %s", str)
}
str = str[len(PREFIX):]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use TrimPrefix, and check if output string is different than the input, if it is then prefix was trimmed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

}

func Prefix(prefixLen int) ShardFunc {
padding := strings.Repeat("_", prefixLen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It there a better way to do it? It seems clever but it might have performance penalty (string concat), and it is done for every key, potentially multiple time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to other ShardFuncs too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be yes. But this code has not changed, just moved. The same thing was also done before any of my shading changes. So for now (at least this p.r.) I would rather leave it alone.

[Sorry if this is a duplicate. GitHub is being annoying.]

@kevina
Copy link
Contributor Author

kevina commented Jan 13, 2017

@Kubuxu @whyrusleeping I should have addressed all your concerns. I also implemented the fixme.

The only outstanding issue is returning an error if no README can be created because I am not sure that it was a good idea. From my comment above:

Lets separate the step of writing the readme from writing the sharding function, and also make sure to error out or print a log message if we try to write a readme for an unsupported sharding type

@whyrusleeping I don't think returning an error is a good idea. Especially if the flatfs datastore could be used outside of IPFS. I also don't like the idea of a warning for the same reason, but can add one if you still think it is a good idea.


func WriteReadme(dir string, id *ShardIdV1) error {
if id.String() == IPFS_DEF_SHARD {
err := ioutil.WriteFile(filepath.Join(dir, README_FN), []byte(IPFS_DEF_SHARD), 0444)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be README_DEF_SHARD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Told you I was bad a naming things. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Misunderstood and didn't see my own error. Will fix.

if err != nil {
return err
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All above can be just return err.

if err != nil {
return err
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be just return err too (no need for if).


err := os.Mkdir(path, 0777)
if err != nil && !os.IsExist(err) {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be return err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@kevina
Copy link
Contributor Author

kevina commented Jan 13, 2017

@Kubuxu anything else :)

Store the sharding function used in the file "SHARDING" in the repo.
To make this work the sharding function is now always specified as a
string.
@kevina
Copy link
Contributor Author

kevina commented Jan 16, 2017

Note: Just rebased to clean up the commits, nothing else has changed.

@kevina
Copy link
Contributor Author

kevina commented Jan 16, 2017

Note: I just pushed some more commits to avoid the use of strings wherever possible.

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, Thanks for all the review iteration @Kubuxu and @kevina :)

@whyrusleeping whyrusleeping merged commit 9553971 into master Jan 17, 2017
@whyrusleeping whyrusleeping deleted the kevina/SHARDING branch January 17, 2017 20:37
@whyrusleeping whyrusleeping removed the status/in-progress In progress label Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants