Skip to content

Commit

Permalink
allow specifying regexp to split
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Jun 29, 2016
1 parent 734bb8a commit 2572de0
Show file tree
Hide file tree
Showing 2 changed files with 78 additions and 9 deletions.
57 changes: 53 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
<!--
rm -rf binaries
mkdir -p binaries/
VERSION=0.1.0
for os in darwin linux windows; do
GOOS=$os GOARCH=$arch go build -o binaries/gargs_${os} main.go
done
-->
gargs
=====

Expand All @@ -8,11 +16,11 @@ gargs is like xargs but it addresses the following limitations in xargs:
+ it keeps the output serialized even when using multiple threads
+ easy to specify multiple arguments

As an example, this will keep the output in order and send 3 arguments to each process.
As an example that currently works, this will keep the output in order and send 3 arguments to each process.
It is using 4 proceses to parallelize.

```
$ seq 12 -1 1 | go run main.go -p 4 -n 3 "sleep {}; echo {} {}"
$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {}; echo {} {}"
11 10
8 7
5 4
Expand All @@ -24,9 +32,50 @@ Note that for each line, we slept 12, 9, 6, 3 seconds respectively but the outpu

For now, the -n 3 is redundant with seeing the "{}"'s. In the future, it may be possible to use numbered arguments:

Example
=======
Let's say we have a file `t.txt` like:
```
# not currently possible
sleep {0}; echo {1} {2}
chr1 22 33
chr2 22 33
chr3 22 33
chr4 22 33
```
That has a mixture of tabs and spaces. We can convert to chrom:start-end format with:

```
cat t.txt | gargs --sep "\s+" -p 2 "echo '{}:{}-{}'"
```

In this case, we're using **2** processes to run this in parallel which will make more of a difference
if we do something time-consuming rather than `echo`. The output will be kept in the order dictated by
`t.txt` even if the processes finish in a different order. This is sometimes at the expense of parallelization
efficiency.


Usage
=====

```
usage: gargs [--procs PROCS] [--nlines NLINES] [--sep SEP] [--shell SHELL] [--verbose] COMMAND
positional arguments:
command command to execute
options:
--procs PROCS, -p PROCS
number of processes to use [default: 1]
--nlines NLINES, -n NLINES
number of lines to consume for each command. -s and -n are mutually exclusive. [default: 1]
--sep SEP, -s SEP regular expression split line with to fill multiple template spots default is not to split. -s and -n are mutually exclusive.
--shell SHELL shell to use [default: bash]
--verbose, -v print commands to stderr before they are executed.
--help, -h display this help and exit
```

TODO
====

+ --unordered flag to specify that we don't care about the output order. Will improve parallelization for some cases.
+ {0}, {1}, {2} place-holders?
+ combinations of `-n` and `--sep`.
30 changes: 25 additions & 5 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ import (
"github.com/brentp/xopen"
)

const VERSION = "0.1.0"

type Args struct {
Procs int `arg:"-p,help:number of processes to use"`
Nlines int `arg:"-n,help:number of lines to consume for each command"`
Nlines int `arg:"-n,help:number of lines to consume for each command. -s and -n are mutually exclusive."`
Command string `arg:"positional,required,help:command to execute"`
Sep string `arg:"-s,help:split line(s) with this to fill multiple template spots default is not to split NOT IMPLEMENTED."`
Sep string `arg:"-s,help:regular expression split line with to fill multiple template spots default is not to split. -s and -n are mutually exclusive."`
Shell string `arg:"help:shell to use"`
Verbose bool `arg:"-v,help:print commands to stderr before they are executed."`
}
Expand All @@ -31,7 +33,10 @@ func main() {
args.Sep = ""
args.Shell = "bash"
args.Verbose = false
arg.MustParse(&args)
p := arg.MustParse(&args)
if args.Sep != "" && args.Nlines > 1 {
p.Fail("must specify either sep (-s) or n-lines (-n), not both")
}
if !xopen.IsStdin() {
fmt.Fprintln(os.Stderr, "ERROR: expecting input on STDIN")
os.Exit(255)
Expand All @@ -48,6 +53,11 @@ func check(e error) {

func genLines(n int, sep string) chan []interface{} {
ch := make(chan []interface{})
var resep *regexp.Regexp
if sep != "" {
resep = regexp.MustCompile(sep)
}

go func() {
rdr, err := xopen.Ropen("-")
check(err)
Expand All @@ -57,8 +67,18 @@ func genLines(n int, sep string) chan []interface{} {
for {
line, err := rdr.ReadString('\n')
if err == nil || (err == io.EOF && len(line) > 0) {
lines[k] = re.ReplaceAllString(line, "")
k += 1
line = re.ReplaceAllString(line, "")
if resep != nil {
toks := resep.Split(line, -1)
itoks := make([]interface{}, len(toks))
for i, t := range toks {
itoks[i] = t
}
ch <- itoks
} else {
lines[k] = line
k += 1
}
} else {
if err == io.EOF {
break
Expand Down

0 comments on commit 2572de0

Please sign in to comment.