Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

worktree, status and reset implementation based on merkletrie #339

Merged
merged 9 commits into from
Apr 12, 2017
Merged

worktree, status and reset implementation based on merkletrie #339

merged 9 commits into from
Apr 12, 2017

Conversation

mcuadros
Copy link
Contributor

@mcuadros mcuadros commented Apr 11, 2017

This is a implementation of the status and the reset implementation based on merkletrie.

Things to be agreed:

  • Two new implementations of merkletrie.Noder have been implemented, those implementations are inside of utils/merkletrie/<use-case>, and the original one is at plumbing/objects/, worth to put all the implementations in the same place?

  • Due to the lack of error in the noder.Hash interface method, the implementation contains panic, We should add an error to this method?

NOTE: hard reset is not implemented, I am finishing this and some additional Worktree.Reset tests tomorrow

Required by #256

@mcuadros mcuadros requested review from alcortesm and smola April 11, 2017 02:47
@alcortesm
Copy link
Contributor

Two new implementations of merkletrie.Noder has been implemented, those implementations are ?
inside of utils/merkletrie/, and the original one is at plumbing/objects/, worth to put all the > implementations in the same place?

I don't think putting all the implementations in the same place is worth it.

Due to the lack of error in the noder.Hash interface method, the implementation contains panic, We should add an error to this method?

I don't think so.

@smola and I talked about it a few months ago when I was working on merkletrie. We agreed that the hash method should not return errors, as valid objects should have valid hashes, while objects, with a hash that cannot be calculated, should return errors upon construction.

Let us know if you would rather do it any other way.

Copy link
Collaborator

@smola smola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR looks good.

With respect to your questions:

  • Moving newTreeNoder/NewTreeRootNode to other package will probably result in a circular dependency.
  • For the Hash thing, let's look first if we should compute the Hash in Hash() at all. If it can error, maybe hashes should be calculated on TreeNoder creation.

@@ -21,10 +21,11 @@ type treeNoder struct {
name string // empty string for the root node
mode filemode.FileMode
hash plumbing.Hash
children []noder.Noder // memoized
children []noder.Noder // memorized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoized was actually the right term ;-)

}

func newTreeNoder(t *Tree) *treeNoder {
// NewTreeRootNode returns the root node of a Tree
func NewTreeRootNode(t *Tree) *treeNoder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be just NewTreeNode? The Root part seems redundant?

@@ -74,7 +75,7 @@ func (t *treeNoder) Children() ([]noder.Noder, error) {
return noder.NoChildren, nil
}

// children are memoized for efficiency
// children are memorized for efficiency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/memorized/memoized/

}

func (n *Node) Hash() []byte {
if n.IsDir() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Hash() be cached? I think DiffTree does a lot of repeated calls to Hash. So either implementations cache Hash, or DiffTree caches them internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but I could see any performance improvement.

options.go Outdated
// Branch to be checked out, if empty uses `master`
Branch plumbing.ReferenceName
Hash plumbing.Hash
// RemoteName is the name of the remote to be pushed to.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is godoc for RemoteName and the variable is Force, copypasta?

options.go Outdated

// CheckoutOptions describes how a checkout operation should be performed.
type CheckoutOptions struct {
// Branch to be checked out, if empty uses `master`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End the sentence with a full stop.

options.go Outdated
// Branch to be checked out, if empty uses `master`
Branch plumbing.ReferenceName
Hash plumbing.Hash
// RemoteName is the name of the remote to be pushed to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not match any code.

options.go Outdated
type CheckoutOptions struct {
// Branch to be checked out, if empty uses `master`
Branch plumbing.ReferenceName
Hash plumbing.Hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation.

options.go Outdated
Branch plumbing.ReferenceName
Hash plumbing.Hash
// RemoteName is the name of the remote to be pushed to.
Force bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation.

}

// Validate validates the fields and sets the default values.
func (o *CheckoutOptions) Validate() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method return an error?

On top of that, maybe validate is not the best name for this method, when validating you return true or false according to some checks, while this method overwrites some attributes (and only some of them, leaving other invalid values untouched) probably because there is not constructor or setters to this properly. Maybe a better name will be FillInDefaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already discussed on previous PRs, and is not related to this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it was in #178.

I'll open an issue, as we agreed back then.

".git": true,
}

func IsEquals(a, b noder.Hasher) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented extensively, including what it will be the algorithm for comparing hashes in directories.


func (n *Node) NumChildren() (int, error) {
files, err := n.readDir()
return len(files), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this, be careful with submodules and empty dirs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, since now the children are pre-computed, return just the len of the children slice.

The submodule are not tested, and yes for sure are failing since, the hash of a submodule is not computed in the same way. I am implementing this.

@@ -0,0 +1,128 @@
package filesystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be moved to the billy package, but definately it should not be under merkletrie utilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't make any sense at billy, is a very specific things to git, maybe outside of the merkletrie package.

@@ -0,0 +1,113 @@
package index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably no go under the merkletrie package, but on the index.

Please document extensively how are you planning to map an index to a merkletrie, the challenges, surprises and your solutions. Otherwise, I can only guess your intentions and it is very time consuming and error prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't place this in index, you will end with circular dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably place it in an internal package under the index package.

func IsEquals(a, b noder.Hasher) bool {
pathA := a.(noder.Path)
pathB := b.(noder.Path)
if pathA[len(pathA)-1].IsDir() || pathB[len(pathB)-1].IsDir() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, reviewing this is too time consuming due to lack of documentation.

Can you can explain the general guidelines of how are you planning to map a filesystem into a merkletrie at the very begining?, also the main challenges, surprises and your solutions.

@alcortesm
Copy link
Contributor

alcortesm commented Apr 11, 2017

I will continue reviewing this later, it is too long. I advise to break this kind of PR into smaller contributions that can be reviewed easily and faster.

@mcuadros
Copy link
Contributor Author

@alcortesm with this goal I created nice commits, to allow you review commit by commit if you wish.

@codecov
Copy link

codecov bot commented Apr 12, 2017

Codecov Report

Merging #339 into master will increase coverage by <.01%.
The diff coverage is 65.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #339      +/-   ##
==========================================
+ Coverage    77.3%   77.31%   +<.01%     
==========================================
  Files         117      122       +5     
  Lines        8062     8268     +206     
==========================================
+ Hits         6232     6392     +160     
- Misses       1164     1184      +20     
- Partials      666      692      +26
Impacted Files Coverage Δ
plumbing/format/index/index.go 0% <0%> (ø)
plumbing/object/difftree.go 77.77% <100%> (ø) ⬆️
submodule.go 65% <100%> (ø) ⬆️
plumbing/object/treenoder.go 80.35% <100%> (ø) ⬆️
plumbing/object/tree.go 77.88% <100%> (+3.51%) ⬆️
status.go 50% <50%> (ø)
options.go 78.57% <54.54%> (-8.53%) ⬇️
utils/merkletrie/filesystem/node.go 63.33% <63.33%> (ø)
worktree.go 66.51% <66.42%> (+9.82%) ⬆️
worktree_status.go 67.16% <67.16%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b45f46...5bcf802. Read the comment docs.

@mcuadros mcuadros merged commit 932ced9 into src-d:master Apr 12, 2017
Copy link
Contributor

@alcortesm alcortesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hash calculation is universal to all the Noder implementations in go-git, otherwise you can compare it. That's why I believe that all the implementations should be in the same place.

@smola

I see. Then we are in a very different page. The whole point of the Hasher interface, defining the IsEqual function and passing it to Difftree was to not have the same hash implementation for all the types. We will need to refactor all that part so it is consistent with your comment above.

@@ -178,6 +178,72 @@ type SubmoduleUpdateOptions struct {
RecurseSubmodules SubmoduleRescursivity
}

// CheckoutOptions describes how a checkout 31operation should be performed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/31//

@@ -178,6 +178,72 @@ type SubmoduleUpdateOptions struct {
RecurseSubmodules SubmoduleRescursivity
}

// CheckoutOptions describes how a checkout 31operation should be performed.
type CheckoutOptions struct {
// Hash to be checked out, if used HEAD will in detached mode. Branch and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a verb:
if used HEAD will be in detached mode.

// Hash to be checked out, if used HEAD will in detached mode. Branch and
// Hash are mutual exclusive.
Hash plumbing.Hash
// Branch to be checked out, if Branch and Hash are empty is set to `master`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing subject:
if Branch and Hash are empty it is set to master.

// Branch to be checked out, if Branch and Hash are empty is set to `master`.
Branch plumbing.ReferenceName
// Force, if true when switching branches, proceed even if the index or the
// working tree differs from HEAD. This is used to throw away local changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End the sentence with a full stop.

}

// Validate validates the fields and sets the default values.
func (o *CheckoutOptions) Validate() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it was in #178.

I'll open an issue, as we agreed back then.

}, nil
}

func (n *node) calculateHash(path string, file billy.FileInfo) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 71, the method calculateChildren sets the children of the node and only returns an error.

Here, the method calculateHash doesn't set the hash but return it instead.

I think we should use the same strategy in both methods, for coherency, or just make this a function instead of a method.


func Test(t *testing.T) { TestingT(t) }

type NoderSuite struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to call it NodeSuite instead.


func IsEquals(a, b noder.Hasher) bool {
if bytes.Equal(a.Hash(), empty) || bytes.Equal(b.Hash(), empty) {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very weird, as it is defeating the whole purpose of using the merkletrie package and implementing Noder.

Maybe it is ok for the tests, but in that case I would mention it in a comment.


var _ = Suite(&NoderSuite{})

func (s *NoderSuite) TestDiff(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way of testing node will make debugging harder than if we do unit testing of its public methods instead: it would be easier to detect errors with the unit tests than having to trace back difftree's results mismatches in the tests to see what errors in node caused them.

I would write unit tests for node and maybe left a cople of these as integrations tests.

@@ -0,0 +1,113 @@
package index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably place it in an internal package under the index package.

@alcortesm
Copy link
Contributor

I am exhausted, I will continue reviewing this the next day.

@alcortesm
Copy link
Contributor

alcortesm commented Apr 12, 2017

I think you merged before my review, you should consider reopening for my comments today.

I will probably add a few more in the next days.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants