Professional Documents
Culture Documents
Command-line-syntax examples:
mlr --csv cut -f hostname,uptime mydata.csv
mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv
mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f
group
mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
mlr stats2 -a linreg-pca -f u,v -g shape data/*
mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
mlr --from estimates.tbl put '
for (k,v in $*) {
if (is_numeric(v) && k =~ "^[t-z].*$") {
$sum += v; $count += 1
}
}
$mean = $sum / $count # no assignment if count unset'
mlr --from infile.dat put -f analyze.mlr
mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
Data-format examples:
DKVP: delimited key-value pairs (Miller default format)
+---------------------+
| apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
+---------------------+
Tabular JSON: nested objects are supported, although arrays within them are not:
+---------------------+
|{ |
| "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| "bat": 2, |
| "cog": 3 |
|} |
|{ |
| "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
| "egg": 7, |
| "flint": 8 |
| }, |
| "garlic": "" |
|} |
+---------------------+
Help options:
-h or --help Show this message.
--version Show the software version.
{verb name} --help Show verb-specific help.
--help-all-verbs Show help on all verbs.
-l or --list-all-verbs List only verb names.
-L List only verb names, one per line.
-f or --help-all-functions Show help on all built-in functions.
-F Show a bare listing of built-in functions by name.
-k or --help-all-keywords Show help on all keywords.
-K Show a bare listing of keywords by name.
Verbs:
altkv bar bootstrap cat check clean-whitespace count-distinct count-similar
cut decimate fill-down filter format-values fraction grep group-by
group-like having-fields head histogram join label least-frequent
merge-fields most-frequent nest nothing put regularize remove-empty-columns
rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle
skip-trivial-records sort stats1 stats2 step tac tail tee top uniq
unsparsify
--mmap --no-mmap --mmap-below {n} Use mmap for files whenever possible, never, or
for files less than n bytes in size. Default is for
files less than 4294967296 bytes in size.
'Whenever possible' means always except for when reading
standard input which is not mmappable. If you don't know
what this means, don't worry about it -- it's a minor
performance optimization.
Examples: --csv for CSV-formatted input and output; --idkvp --opprint for
DKVP-formatted input and pretty-printed output.
Comments in data:
--skip-comments Ignore commented lines (prefixed by "#")
within the input.
--skip-comments-with {string} Ignore commented lines within input, with
specified prefix.
--pass-comments Immediately print commented lines (prefixed by "#")
within the input.
--pass-comments-with {string} Immediately print commented lines within input, with
specified prefix.
Notes:
* Comments are only honored at the start of a line.
* In the absence of any of the above four options, comments are data like
any other text.
* When pass-comments is used, comment lines are written to standard output
immediately upon being read; they are not part of the record stream.
Results may be counterintuitive. A suggestion is to place comments at the
start of data files.
Compressed-data options:
--prepipe {command} This allows Miller to handle compressed inputs. You can do
without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...".
However, when multiple input files are present, between-file separations are
lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can
specify an action to be taken on each input file. This pre-pipe command must
be able to read from standard input; it will be invoked with
{command} < {filename}.
Examples:
mlr --prepipe 'gunzip'
mlr --prepipe 'zcat -cf'
mlr --prepipe 'xz -cd'
mlr --prepipe cat
Note that this feature is quite general and is not limited to decompression
utilities. You can use it to apply per-file filters of your choice.
For output compression (or other) utilities, simply pipe the output:
mlr ... | {your compression command}
Numerical formatting:
--ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for
double-precision. Applies to verbs which compute new
values, e.g. put, stats1, stats2. See also the fmtnum
function within mlr put (mlr --help-all-functions).
Defaults to %lf.
Other options:
--seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter
urand()/urandint()/urand32().
--nr-progress-mod {m}, with m a positive integer: print filename and record
count to stderr every m input records.
--from {filename} Use this to specify an input file before the verb(s),
rather than after. May be used more than once. Example:
"mlr --from a.dat --from b.dat cat" is the same as
"mlr cat a.dat b.dat".
-n Process no input files, nor standard input either. Useful
for mlr put with begin/end statements only. (Same as --from
/dev/null.) Also useful in "mlr -n put -v '...'" for
analyzing abstract syntax trees (if that's your thing).
-I Process files in-place. For each file name on the command
line, output is written to a temp file in the same
directory, which is then renamed over the original. Each
file is processed in isolation: if the output format is
CSV, CSV headers will be present in each output file;
statistics are only over each file's own records; and so on.
Then-chaining:
Output of one verb may be chained as input to another using "then", e.g.
mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color
Auxiliary commands:
Miller has a few otherwise-standalone executables packaged within it.
They do not participate in any other parts of Miller.
Available subcommands:
aux-list
lecat
termcvt
hex
unhex
netbsd-strptime
For more information, please invoke mlr {subcommand} --help