Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: avoid sql -> c-dep dependency edges #30001

Closed
dt opened this issue Sep 10, 2018 · 5 comments
Closed

*: avoid sql -> c-dep dependency edges #30001

dt opened this issue Sep 10, 2018 · 5 comments
Assignees
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@dt
Copy link
Member

dt commented Sep 10, 2018

Some of cockroach's code is likely useful outside of just the cockroach binary. Most obvious is some of our general purpose util packages, but additionally anyone building tools that interact with Cockroach or its data such as api responses, on-disk data, backups, etc may want to reuse our defined protobufs and structs as well as the helpers and other functions we've defined around them. Such external tools, both from first or third parties, are common in healthy, thriving OSS ecosystems, and we ideally should, where practical, make it as easy as possible to use our reusable helpers both in and outside the cockroach repo.

However a point of friction currently is that Cockroach uses make to build, including to run just-in-time code generation and to handle building c dependencies for linking by cgo. Non-make driven builds of the majority of the codebase simply do not work. Solving the generation problem has been previously addressed by the creation of the cockroach-gen mirror of the repo, maintained with pre-generated code by CI.

However cgo remains a problem, rendering any package with transitive dependencies on cgo unusable outside the cockroach binary. This is unlikely to change given the platform-specific nature of our c builds and how dependent they are on make. However we can minimize the number of our Go packages that become impossible to consume externally by avoiding depending, including transitively, on cgo.

Current examples:
mvcc encoding in storage/engine is used in sql and elsewhere, but since storage/engine also handles interfacing with rocksdb, pulling in pure-Go mvcc logic pulls in cgo.
distsqlrun uses interfaces that are defined in engine and even creates stores that are backed by rocksdb, thus depending on in rocksdb directly.

This will likely require care in splitting some of the low-level packages, defining interfaces and splitting cgo-backed implementations out into separate packages. In the of the temp-store example, it would also require some form of dependency injection as it is actually calling directly into a cgo-backed implementation.

Additionally all of this is bound to rot if not enforced by machines, as large dependency graphs can easily hide edges from human inspection, so we'll want to make a list of packages that a lint inspects, to ensure their transitive dependency closures remain free of cgo imports.

dt added a commit to dt/cockroach that referenced this issue Sep 10, 2018
As discussed in cockroachdb#30001, many packages contain code that ideally is usable outside of the cockroach binary and build process.
However, making that code buildable outside out usual build process requires ensuring that it does not depend on our make-built c-deps.

This adds a (skipped) linter that inspects the transtive deps of some key packages (protos and sql so far).

Release note: none.
craig bot pushed a commit that referenced this issue Sep 10, 2018
30019: roachtest: for dryrun, skip initBinaries r=ridwanmsharif a=ridwanmsharif

For a dryRun since nothing is actually being run, it is not necessary to
have any of the binaries at all. I ran into this when just trying to
list the roachtests but couldn't because of this issue. Can do so now,
 without any problems. cc @andreimatei  for helping solve the issue.
If this was a desired behaviour do let me know

Release note (roachtest): changes behaviour of `roachtest run -n` to not
check or initialise binaries as they are not required for a dry run

30029: *: linter to keep c-deps out of some packages r=dt a=dt

As discussed in #30001, many packages contain code that ideally is usable outside of the cockroach binary and build process.
However, making that code buildable outside out usual build process requires ensuring that it does not depend on our make-built c-deps.

This adds a (skipped) linter that inspects the transtive deps of some key packages (protos and sql so far).

Release note: none.

30030: build: revert to outputing the stress output to the build log r=andreimatei a=andreimatei

My recent change to teamcity-stress.sh piped the stress output to
test2json cause it sounded like a good idea, but I've then realized that
in the process we've lost that output from the build log (relegating it
only to artifacts/stress.log).
I like the build log, so this patch puts the output back there.

Release note: None

30032: distsql: reduce memory usage for subqueries exec r=jordanlewis a=arjunravinarayan

Subqueries in distsql are executed, and their results stored in a
RowContainer. The rows are then serialized into expressions and sent
to the outer query execution. Previously, the expressions were
constructed by scanning over the RowContainer, so twice as much memory
was used. Now, we construct the expressions by popping rows
one-by-one, so that we use less memory when there are many rows.

Release note: None

Co-authored-by: Ridwan Sharif <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Arjun Narayan <[email protected]>
dt added a commit to dt/cockroach that referenced this issue Sep 11, 2018
More work on cockroachdb#30001.

This splits some pure-go MVCC encoding/decodeing logic out of
stroage/engine into a new package storage/mvcc so it no longer
shares a package any of the rocksdb interfacing code that lives in
storage/engine. This is so that callers can import and use these functions
without introducing our c-deps code to their dependency closure, which
is an issue if callers might end up being used outside of the cockroach build
environment or share packages with code that is.

This is just a mechanical code move with a following rename to avoid
repeating the package name in func/var prefixes, eg. MVCCKey ->
mvcc.Key, and only includes MVCCKey as well as those functions that were
in-use within sqlbase. Significant MVCC encoding/decoding code reamins that can
be moved later as needed.

Release note: none.
dt added a commit to dt/cockroach that referenced this issue Sep 11, 2018
Working on cockroachdb#30001.

This moves a couple key decoding helpers from engine, where they share a package with lots of c-deps, to enginepb.
This leaves the old function signatures in place, but calling through to their moved implementations.

This allows sqlbase to call these helpers without pulling in engine and its cdeps dependencies.

Release note: none.
dt added a commit to dt/cockroach that referenced this issue Sep 12, 2018
Working on cockroachdb#30001.

This moves a couple key decoding helpers from engine, where they share a package with lots of c-deps, to enginepb.
This leaves the old function signatures in place, but calling through to their moved implementations.

This allows sqlbase to call these helpers without pulling in engine and its cdeps dependencies.

Release note: none.
dt added a commit to dt/cockroach that referenced this issue Sep 12, 2018
Working on cockroachdb#30001.

This moves a couple key decoding helpers from engine, where they share a package with lots of c-deps, to enginepb.
This leaves the old function signatures in place, but calling through to their moved implementations.

This allows sqlbase to call these helpers without pulling in engine and its cdeps dependencies.

Release note: none.
dt added a commit to dt/cockroach that referenced this issue Sep 12, 2018
Working on cockroachdb#30001.

This moves a couple key decoding helpers from engine, where they share a package with lots of c-deps, to enginepb.
This leaves the old function signatures in place, but calling through to their moved implementations.

This allows sqlbase to call these helpers without pulling in engine and its cdeps dependencies.

Release note: none.
craig bot pushed a commit that referenced this issue Sep 12, 2018
30082:  engine: move a couple decoding helpers to enginepb r=nvanbenschoten a=dt

Working on #30001.

This moves a couple key decoding helpers from engine, where they share a package with lots of c-deps, to enginepb.
This leaves the old function signatures in place, but calling through to their moved implementations.

This allows sqlbase to call these helpers without pulling in engine and its cdeps dependencies.

Release note: none.

Smaller alternative to #30043.

Co-authored-by: David Taylor <[email protected]>
@knz knz added A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Sep 17, 2018
@dt dt removed their assignment Sep 17, 2018
@maddyblue
Copy link
Contributor

maddyblue commented Sep 18, 2018

I would also love to get the parser package to not depend on util/log or build. Just putting this on the wishlist. I may work on it at some point. This would make, for example, sqlfmt buildable without a buncha weird stuff that prevents easy building on windows and wasm.

@lopezator
Copy link
Contributor

lopezator commented Sep 18, 2018

+1 I just made a project to be able to use sqlfmt outside cockroach binary. To use in dev time and CI time. The fight with CGO was huge.

https://github.com/lopezator/sqlfmt

Works out of the box simply with a "go get" and has cross-platform precompiled binaries.

craig bot pushed a commit that referenced this issue Oct 12, 2018
31292: *: move RowFetcher and co to a new package, row r=jordanlewis a=jordanlewis

This was purely mechanical renaming. The purpose is to lessen the width
of the sqlbase package, which is imported in many places.

The new package now depends on sqlbase, but far fewer packages consume
it than sqlbase.

Major changes:

RowFetcher -> row.Fetcher
RowUpdater -> row.Updater
RowDeleter -> row.Deleter

FK and cascade stuff -> row package

The only unfortunate thing here is that the FK enums all begin with
`row` now (e.g. `row.CheckDeletes`). It would be better if those got a
different package name, like `fk`, but I'll leave that refactor for
another time.

Somewhat related to #30001.

cc @dt, @benesch 

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
@jordanlewis
Copy link
Member

jordanlewis commented Oct 12, 2018

  • sql -> jobs -> storage (for Liveness). Solvable by making a storagepb package within storage.
  • sql -> serverpb -> status (for the Status proto). Solvable by making a statuspb package within status.
  • sql -> status (for some protos, and MetricsRecorder). Solvable by statuspb for the protos, and a new exported interface for MetricsRecorder that lives in different package.
  • sql -> storage (NodeLiveness, Liveness, StoreTestingKnobs, MergeQueueEnabled). Liveness solved by storagepb. NodeLiveness needs a new interface in a new package for a single method, IsLive. StoreTestingKnobs can move to a new package - it's a fully exported struct. MergeQueueEnabled can move to a new package - it's exported and just a field.

I think this is the full list. Very manageable.

@dt
Copy link
Member Author

dt commented Oct 12, 2018

fwiw, making storage c-deps free is also pretty manageable: moving MVCCKey out of engine breaks a most of the storage -> engine edges.

@jordanlewis
Copy link
Member

Fair enough, but I feel like sql->storage is kind of wonky too.

craig bot pushed a commit that referenced this issue Oct 12, 2018
31317: storage: move TestingKnobs to storagebase r=jordanlewis a=jordanlewis

... to permit users of TestingKnobs not to have to import all of
storage.

Also, delete an unused TestingKnob that accepted Replica as a parameter.

Updates #30001.

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
craig bot pushed a commit that referenced this issue Oct 13, 2018
31322: storage: move protos to storagepb r=jordanlewis a=jordanlewis

This is a mechanical change that moves the protobufs that used to live
in storage to a new package, storagepb. The purpose of this change is to
permit packages (such as sql) that need just protobuf definitions in
storage to not have to depend on the whole storage package.

First 2 commits are #31309.

Updates #30001.

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
craig bot pushed a commit that referenced this issue Oct 13, 2018
31312: storage: move MergeQueueEnabled to storagebase r=jordanlewis a=jordanlewis

... to avoid dependency edge from sql to storage.

Updates #30001.

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
craig bot pushed a commit that referenced this issue Oct 15, 2018
31325: sql: assert no dependencies on storage or CGo r=jordanlewis a=jordanlewis

No longer necessary for sql to depend on either storage or CGo - let's keep it that way!

All commits but last one from dependent PRs.

Closes #30001.

31337: distsqlrun: fix tablereader microbenchmark r=jordanlewis a=jordanlewis

It wasn't doing any work besides the initial fetch by accident. Oops!
Numbers look much more realistic now.

```
name                      old time/op    new time/op    delta
TableReader/rows=16-8       68.1µs ± 2%    77.7µs ± 1%    +14.20%
(p=0.000 n=10+10)
TableReader/rows=256-8       139µs ± 3%     243µs ±11%    +75.45%
(p=0.000 n=10+10)
TableReader/rows=4096-8      745µs ± 1%    2172µs ± 1%   +191.62%
(p=0.000 n=10+9)
TableReader/rows=65536-8    9.94ms ± 2%   33.45ms ± 1%   +236.60%
(p=0.000 n=10+10)

name                      old speed      new speed      delta
TableReader/rows=16-8     3.76MB/s ± 2%  3.29MB/s ± 1%    -12.44%
(p=0.000 n=10+10)
TableReader/rows=256-8    29.5MB/s ± 3%  16.9MB/s ±10%    -42.89%
(p=0.000 n=10+10)
TableReader/rows=4096-8   88.0MB/s ± 1%  30.2MB/s ± 1%    -65.71%
(p=0.000 n=10+9)
TableReader/rows=65536-8   106MB/s ± 2%    31MB/s ± 1%    -70.29%
(p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
TableReader/rows=16-8       9.60kB ± 2%   10.06kB ± 2%     +4.78%
(p=0.000 n=10+10)
TableReader/rows=256-8      9.50kB ± 1%   17.93kB ± 2%    +88.67%
(p=0.000 n=9+9)
TableReader/rows=4096-8     9.58kB ± 2%  142.82kB ± 3%  +1390.41%
(p=0.000 n=8+10)
TableReader/rows=65536-8    20.8kB ±93%  2090.0kB ± 1%  +9924.14%
(p=0.000 n=10+9)

name                      old allocs/op  new allocs/op  delta
TableReader/rows=16-8         78.6 ± 1%      79.7 ± 1%     +1.40%
(p=0.001 n=10+10)
TableReader/rows=256-8        79.0 ± 0%      80.0 ± 0%     +1.27%
(p=0.000 n=9+8)
TableReader/rows=4096-8       79.0 ± 0%      83.7 ±12%     +5.95%
(p=0.000 n=8+10)
TableReader/rows=65536-8       100 ±45%       453 ± 3%   +354.81%
(p=0.000 n=10+9)
```

Release note: None

31362: ui: Have the range debug page correctly handle missing lease times r=BramGruneir a=BramGruneir

Before this change, the value was always assumed to be not-null and it was null
would crash. It will now correctly handled the missing value and display a
`no timestamp` warning.

Fixes #31260.

Release note (bug fix): The range debug page will now correctly handle cases in
which there is no lease start or expiration time.

![screen shot 2018-10-15 at 10 35 07](https://user-images.githubusercontent.com/1614265/46958094-08a46300-d067-11e8-92cb-182882f8dd19.png)


Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Bram Gruneir <[email protected]>
@craig craig bot closed this as completed in #31325 Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

6 participants