forked from agordon/datamash
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNEWS
323 lines (213 loc) · 9.86 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
* Noteworthy changes in release ?.? (????-??-??) [?]
** Changes in Behavior
datamash(1), decorate(1): Add short options -h and -V for --help and --version
respectively.
datamash(1): the rand operation now uses getrandom(2) for generating a random
seed, instead of relying on date/time/pid mixing.
** New Features
rand(1): new program - simulate random variables from popular probability
distributions, analgous to functions like runif, rexp and rnorm in the R
statistical programming language.
datamash(1): add operation dotprod for calculating the scalar product of two
columns.
datamash(1): Add option -S/--seed to set a specific seed for pseudo-random
number generation.
datamash(1): Add option --vnlog to enable experimental support for the vnlog
format. More about vnlog is at https://github.com/dkogan/vnlog.
datamash(1): -g/groupby takes ranges of columns (e.g. 1-4)
** Bug Fixes
datamash(1) now correctly calculates the "antimode" for a sequence
of numbers. Problem reported by Kingsley G. Morse Jr. in
<https://lists.gnu.org/archive/html/bug-datamash/2023-12/msg00003.html>.
When using the locale's decimal separator as field separator, numeric
datamash(1) operations now work correctly. Problem reported by Jérémie
Roquet in
<https://lists.gnu.org/archive/html/bug-datamash/2018-09/msg00000.html>
and by Jeroen Hoek in
<https://lists.gnu.org/archive/html/bug-datamash/2023-11/msg00000.html>.
datamash(1): The "getnum" operation now stays inside the specified field.
* Noteworthy changes in release 1.8 (2022-07-23) [stable]
** Changes in Behavior
Schedule -f/--full combined with non-linewise operations for deprecation.
In a future release, -f/--full will only be usable with operations where
it makes sense. For now, we print a warning to stderr when -f/--full is
used with non-linewise operations, and such usage will no longer be
supported.
The bin operation now uses more intuitive bins. Previously, a command
such as `datamash bin 1 <<< -0` would output -100; and -100 did not fall
in its own bin. We now require all bins to take the form `[nx,(n+1)x)`
with integer n and bin width x. We discard the sign on -0 and gate such
inputs into the [0,x) bin.
Operations taking more than one argument now provide more complete output
with --header-out. Previously, an operation such as `pcov x:y` would
produce an output header like `pcov(y)`, discarding the `x`. The new
behavior will output header `pcov(x,y)`.
datamash(1) no longer ignores --output-delimiter with the rmdup operation.
** New Features
New datamash option --sort-cmd argument to specify the program used
by the -s option to sort input, plus enhancements to the security and
portability of building sort command lines.
New datamash option -c/--collapse-delimiter=X argument uses character
X instead of comma between values in collapse and unique lists.
New datamash operations: mean square (ms) and root mean square (rms).
Decorate now supports sorting IP addresses of both versions 4 and 6
together. IPv4 addresses are logically converted to IPv6 addresses,
either as IPv4-Mapped (ipv6v4map) or IPv4-Compatible (ipv6v4comp)
addresses.
Add two command aliases:
'echo' may now be used instead of 'cut'.
'uniq' may now be used instead of 'unique'.
** Improvements
Updated the bash completion script to reflect recent additions.
** Bug Fixes
Datamash now passes the -z/--zero-terminated flag to the sort(1) child
process when used with "--sort --zero-terminated". Additionally,
if the system's sort(1) does not support -z, datamash reports the error
and exits. Previously it would omit the "-z" when running sort(1),
resulting in incorrect results.
Documentation fixes and spelling corrections.
Incorrect format in a decorate(1) error breaking compilation on some
systems.
datamash(1), decorate(1): Fix some minor memory leaks.
datamash(1) no longer crashes when the unique or countunique operations
are used with input data containing NUL bytes. The problem was reported
in https://lists.gnu.org/archive/html/bug-datamash/2020-11/msg00001.html
by Catalin Patulea.
datamash(1) no longer crashes when crosstab with --header-in is called
by field name instead of index. I.e. `datamash --header-in ct x,y` now
works as expected.
* Noteworthy changes in release 1.7 (2020-04-23) [testing]
** New Features
decorate(1): new program - sorts input in non-standard ordering, e.g.
IPv4, IPv6, roman numerals.
New operations: sha224/sha384.
New operations: geomean (Geometric mean) and harmmean (Harmonic mean).
* Noteworthy changes in release 1.6 (2020-02-24) [stable]
** Bug Fixes
The 'gutnum' operation (introduced in vresion 1.5) now correctly
prints detected numbers without truncating them.
* Noteworthy changes in release 1.5 (2019-09-17) [stable]
** New Features
Datamash now accepts backslash-escaped characters in field names.
This allows working with named fields containing dash/mins,colons,commas
or field names starting with digits (Note the interplay between
backslash and shell quoting). The following are equivalent,
and sum an input field named 'FOO-BAR':
datamash -H sum FOO\\-BAR < input.txt
datamash -H sum 'FOO\-BAR' < input.txt
datamash -H sum "FOO\\-BAR" < input.txt
New operations: dirname, basename
These behave just like dirname(1) and basename(1):
$ echo /home/foo/bar.txt | datamash dirname 1 basename 1
/home/foo bar.txt
New operations: extname, barename
'extname' extract the extension of the file name.
'barename' (not to be confused with 'basename') extract the basename
without the extension.
Example:
$ echo /home/foo/bar.tar.gz | datamash barename 1 extame 1
bar tar.gz
New operation: getnum
This operation extracts a number from a string.
'getnum' accepts an optional single letter option:
getnum:n - natural numbers (positive integers, including zero)
getnum:i - integers
getnum:d - decimal point numbers
getnum:p - positive decimal point numbers (this is the default)
getnum:h - hex numbers
getnum:o - octal numbers
Examples:
$ echo foo-42.0-bar | datamash getnum 1
42.0
$ echo foo-42.0-bar | datamash getnum:n 1
42
$ echo foo-42.0-bar | datamash getnum:i 1
-42
$ echo foo-42.0-bar | datamash getnum:d 1
-42.0
New operation: cut
Similar to cut(1), it copies the input field to the output as-is.
The advantage over cut(1) is that combined with datamash's other features,
input fields can be specified by name instead of column number, and
output fields can be re-ordered and duplicated.
Example:
$ printf "a b c\n1 X 6\n" | datamash -W -H cut c,a,c
cut(c) cut(a) cut(c)
6 1 6
** Bug fixes
Datamash now correctly calculates mode/antimode for negative values.
In version 1.4 and earlier, the following produced incorrect results:
$ echo -1 | datamash-1.4 mode 1
1.844674407371e+19
* Noteworthy changes in release 1.4 (2018-12-22) [stable]
** New Features
New option: -C/--skip-comments to skip comment lines (lines starting
with '#' or ';' and optional whitespace).
* Noteworthy changes in release 1.3 (2018-03-16) [stable]
** New Features
New option: --format=FMT sets printf style floating-point format.
Example:
$ echo '50.5' | datamash --format "%07.3f" sum 1
050.500
$ echo '50.5' | datamash --format "%07.3e" sum 1
5.050e+01
New option: -R/--round=N rounds numeric values to N decimal places.
New option: --output-delimiter=X overrides -t/-W.
New operation: trimmean (trimmed mean value).
To calculate 20% trimmed mean:
$ printf "%s\n" 13 3 7 33 3 9 | datamash trimmean:0.2 1
8
** Bug fixes
Datamash now builds correctly with external OpenSSL libraries
(./configure --with-openssl=yes). The 'configure' script now reports
whether internal or external libraries are used:
$ ./configure [OPTIONS]
[...]
Configuration summary for datamash
md5/sha*: internal (gnulib)
OR
md5/sha*: external (-lcrypto)
* Noteworthy changes in release 1.2 (2017-08-22) [stable]
** New Features
New operations:
perc (percentile),
range (max-min of values in group/column)
Improved 'check' operation:
Expected number of lines/fields can be specified as parameter.
** Improvements
Improved bash-completion script installation path (see README for details).
* Noteworthy changes in release 1.1.1 (2017-01-19) [stable]
** Bug fixes
'check' command correctly counts a trailing delimiter at end of lines.
'transpose' command correctly handles missing fields on the last line.
* Noteworthy changes in release 1.1.0 (2016-01-16) [stable]
** New Features
Bumped version to 1.1.0 to better comply to semver.
New operations:
crosstab (cross-tabulation / pivot-tables),
check (verify tabular structure),
bin (bin numeric values)
strbin (bin strings values)
pearson correlation,
covariance,
rounding functions: round,floor,ceil,trunc,frac
** Improvements
Speed, Portability, Tests, Coverage improvements.
* Noteworthy changes in release 1.0.7 (2015-06-29) [stable]
** New Features
New operations: md5, sha1/256/512, base64, rmdup.
New option --narm to ignore NaN/NA values.
New feature: ability to specify field by names instead of numbers
(require using --header-in or -H).
New translations added.
** Improvements
Speed, Portability, Coverage improvements.
* Noteworthy changes in release 1.0.6 (2014-07-29) [stable]
** New Features
New operations: transpose, reverse.
** Improvements
Tests: improve portability, add I/O error tests, add few edge-case tests.
Build: improve man-page generation, cross-compiling, auxiliary build scripts.
Documentation: expand and fix man-page (and shorten --help screen).
* Noteworthy changes in release 1.0.5 (2014-07-15) [stable]
First release as GNU Datamash.