Sort Component Group:
Sort
Sort sorts and merges records.
Sort does the following:
1. Reads the records from all flows connected to the in port until it reaches the
number of bytes specified in the max_core parameter.
2. Sorts the records and writes the results to a temporary file on disk.
3. Repeats Steps 1 and 2 until it has read all records.
4. Merges all temporary files, maintaining the sort order.
5. Writes the result to the out port.
Sort stores temporary files in the working directories specified by its layout.
You can use sort component to order records before you send them to a
component(rollup, scan, Join etc..) that requires grouped or sorted records.
NOTE: Sort component is relatively expensive in terms of computing resources —
because it writes files to disk, thus breaking pipeline parallelism. Therefore,
you should place sort in a graph such that it processes the smallest number of
records possible.
Tip:you do not need to use a Gather Component before sort component— because Sort
can gather internally on its in port.
Parameters:
Key:
Name(s) of the key field(s) and the sequence specifier(s) you want the component to
use when it orders records.
max-core:
Maximum memory usage in bytes.
Default is 100663296 (100 MB).
Q. Display the names of the employee in descending order of salary
Q. What is max_core? What value will you provide in the sort?
Q. What will happen if I use high value to max_core?
Q. What will happen if I use low value to max_core?
Q. What is the advantage and disadvantage of sort?
Sort within groups
Sort within Groups refines the sorting of records already sorted according to one
key specifier: it sorts the records within the groups formed by the first sort
according to a second key specifier.
Sort within groups does the following:
1. Reads records from all the flows connected to the in port until it either
reaches the end of a group or reachesthe number of bytes specified in the max_core
parameter.
2. Sorts the records in the group according to the minor_key parameter. Sort within
Groups assumes input records are sorted according to the major-key parameter.
If Sort within Groups encounters input that is not sorted according to the major-
key parameter, and the allow-unsorted parameter is not set to True, it stops
execution of the graph with an error message reporting the first two out-of-order
records.
3. Writes the results to the out port.
4. When it reaches the end of a group, repeats Steps 1 through 3 for the next
group.
Parameters
major-key
Specify name(s) of key field(s) and sequence specifier(s). Sort within Groups
assumes that the input has already been ordered according to major-key.
minor-key
Name(s) of the key field(s) and the sequence specifier(s) you want the component to
use when it orders records.
max-core
Maximum memory usage in bytes before the component stops execution of the graph.
Default is 10485760 (10 MB).
Q. When we use sort within groups?
Q. Is there any advantage of sort within groups over sort?
Q. What if my data is not sorted based on major_key?
Q. Shall I use composite key in major_key or/and minor key?
Q. In sort with in groups component if I specify true for allow-unsorted parameter
and I give unsorted input what will happen to the graph?