You are on page 1of 9

LSORT 4.04 (C) Copyright London Computing, 1983-1999.

All rights reserved.

What is LSORT

LSORT is a general purpose sort/merge utility written in Microsoft


Visual C++ for Microsoft Windows NT 3.1 and above and for Windows 95.
It runs on IBM PCs and compatibles with at least 16MB of RAM and a
fixed disk.

LSORT is User Supported Software, if this program proves useful, please


make a contribution ($35 suggested) to:

London Computing, P.O. Box 696 Cherry Hill, NJ 08003

Support is available at www.londoncomputing.com as are the


latest versions of LSORT. You can email support questions to:
londoncomputing@abac.com

Anyone sending a contribution will receive a disk containing the source


code to LSORTNT as well as a copy of the LSRTNT sort filter. LSRTNT is
similar to the SORT filter but works much faster and will sort on
multiple fields. Once you have registered, you will be able to create
write user exits to LSORT using Visual C++ 5.0 or later.

You may make copies of this software and distribute to other users as
long as there is no charge or other consideration and this notice is not
removed or bypassed.

LSORTNT will sort MSDOS, Windows NT, OS2 files and dBase II and dBase III
databases. (dBase III memo files and FOXPRO memo files are not sorted
but .DBF files will be sorted.) Each file may be sorted using 1 to 32
sort fields. The file to be sorted may contain either fixed length
records, variable length records or comma delimited records. Variable
length records are records ending with cr/lf. Comma delimited records
are variable length records where the fields are also variable length
and separated by a comma. Character fields may be enclosed in either
single or double quotes. It will merge up to 5 files using 1 to 32
sort fields. dBase databases may not be merged. Any field may be
sorted in either ascending or descending sequence. LSORT allows for
three user defined field types to be used: X,Y and Z. You must write
your own comparison subroutine to compare user defined fields.

The sort knows about: field type

binary fields (to 127 bytes) B


A binary field is compared left to right
based on value of the code in each field
(0-255). It is useful for comparing strings
where binary zeros are embedded and for
comparing IBM Mainframe stype binary numbers
packed decimal fields (1-8 bytes) Stored as on P
IBM Mainframe computers. Each digit position
is stored in 4 bits as a binary value between
0 and 10. The digits are stored left to
right with the rightmost position containing
a sign, 0x0D for negative, 0x0C or 0x0F for
positive. A packed decimal field can store
between 1 and 15 digits depending on the
length of the field. If an invalid sign
field is specified, the sort won't produce
what you would expect. Packed decimal values
are only meaningful in fixed length record.
character fields (to 127 bytes) C
character fields compare up to the first
binary zero in the field following C
language conventions.
upper case character fields (sort fields are U
translated to upper case before compare)
2 byte integers in internal format I
4 byte integers in internal format L
floating point numbers (ieee) F
double precision floating point (ieee) D
zoned decimal numbers N
(Text format numbers, Decimals are allowed)
(LSORT now supports scientific notation as)
(well, using E notation, eg. .98 == 9.8E-1)
(xBase N and F fields are sorted as type N)
1 byte logical fields (dBase II or III) T
2 position year field used as part of a date 2
to be sorted, i.e. as the yy part of a
yymmdd date. If the value of yy is > 30, it
is assumed to be a year in 1900. If the
value of yy is <= 30, it is assumed to be a
year in 2000.
6-8 position mm/dd/yy date. The delimiter can be 3
any punctuation character, e.g. period, comma
slash, ... This date field is considered Y2K
compliant using the rule of 30 as above. If
the value of yy is > 30, it is assumed to be
a year in 1900. If the value of yy is <= 30,
it is assumed to be a year in 2000. yy,mm
and dd may be either 1 or 2 positions long.
Leading and trailing spaces are ignored.
6-8 position dd/mm/yy date. The delimiter can be 4
any punctuation character, e.g. period, comma
slash, ... This date field is considered Y2K
compliant using the rule of 30 as above. If
the value of yy is > 30, it is assumed to be
a year in 1900. If the value of yy is <= 30,
it is assumed to be a year in 2000. yy,mm
and dd may be either 1 or 2 positions long.
Leading and trailing spaces are ignored.
6-8 position yy/mm/dd date. The delimiter can be 5
any punctuation character, e.g. period, comma
slash, ... This date field is considered Y2K
compliant using the rule of 30 as above. If
the value of yy is > 30, it is assumed to be
a year in 1900. If the value of yy is <= 30,
it is assumed to be a year in 2000. yy,mm
and dd may be either 1 or 2 positions long.
Leading and trailing spaces are ignored.
6-10 position mm/dd/yyyy date. The delimiter can 6
be any punctuation character, e.g. period,
comma slash, ... dd and mm may be either
1 or 2 positions long. Leading and trailing
spaces are ignored. yyyy may be from 1-4
positions long. Leading and trailing spaces
are ignored.
6-10 position dd/mm/yyyy date. The delimiter can 7
be any punctuation character as above. dd
and mm may be either 1 or 2 positions long.
Leading and trailing spaces are ignored.
yyyy may be from 1-4 positions long. Leading
and trailing spaces are ignored.
6-10 position yyyy/mm/dd date. The delimiter can 8
be any punctuation character as above. dd
and mm may be either 1 or 2 positions long.
Leading and trailing spaces are ignored.
yyyy may be from 1-4 positions long. Leading
and trailing spaces are ignored.
User defined field type X X
User defined field type Y Y
User defined field type Z Z

A zoned decimal number is stored as a character string and may contain


leading and trailing spaces, minus sign, decimal point and digits.

NOTE: zoned decimal numbers and comma delimitted files sort very slowly!
The only reasonable field types for comma delimited files are C or N.
LSORT will accept other field types, but the results are undefined.

Sorting Dates:

Until the Y2K issues became important, it was not necessary for LSORT to
specifically deal with date fields. Release 4.02 of LSORT has added a Y2K
compliant 2 character year field to handle dates that exceed the year 2000.

The internal representation of date fields is generally application dependent


and can be simulated using other field types:

o Unix style date fields (seconds since 1900)


can be sorted as a 4 byte binary integer.
o xBase style date fields can be sorted as the
appropriate sized floating point number.
o yyyymmdd fields (stored as character strings)
can be sorted as an 8 byte character string.
o mm/dd/yyyy and dd/mm/yyy date fields can be
sorted as 3 character strings where the yyyy
field is sorted first, the mm field is sorted
next and the dd field is sorted last.
o dd/mm/yy, mm/dd/yy and yymmdd date fields
can be sorted as 3 fields where the yy field
is sorted first, the mm field is sorted next
and the dd field is sorted last. If all dates
are part of the 20th Century 19yy, then use
a two position character field for the year.
If the dates can be in either the 20th or
21st centuries, 19yy or 20yy, use the new
Y2K complient YY field instead, which will
translate yy fields < 30 to 20yy dates and
yy fields over 30 to 19yy dates.
o dd/mm/yy, mm/dd/yy and yy/mm/dd fields can be
sorted using the special field types for these
date fields. When the special types are used
the dd, mm and yy fields may be either 1 or
2 positions long and may contain a leading or
trailing space. The rule of 30 is used to
make the dates Y2K compliant. If the dates
can be in either the 20th or 21st centuries,
19yy or 20yy, yy fields < 30 are changed to
20yy dates and yy fields over 30 to 19yy dates.
o dd/mm/yyyy, mm/dd/yyyy and yyyy/mm/dd fields
can be sorted using the special field types
for these dates. When the special types are
used the dd and mm fields may be either 1 or
2 positions long and may contain a leading or
trailing space. The yyyy field may be 1-4
positions long and may contain leading or
trailing spaces.

LSORT can be run as a Windows application or as a command line


application.

The maximum record length is 4096 bytes. Files will be sorted in memory
if possible.

Files larger than available memory are sorted in pieces and then merged
together.

Running the LSORTNT console application:

SYNTAX:

LSORTNT [flags] @sort.inp All sort specifications are stored in sort.inp.


or
LSORTNT [flags] sort specifications--will take the specification specified
on the command line.
LSORTNT -R -- will restart a sort.

Flags:

-R -- restart an existing sort


-V -- verify that all delimited fields are present.
-D="x" -- Use character x as a field delimitter for delimitted files.
-W -- Display sort statistics at end of sort
-Q -- Quiet mode. Do not display any messages except error messages while running
LSORT.
-B -- Batch mode. Do not prompt for any information when running LSORT. Send
error
messages to STDERR or the logfile instead of opening a message box.
-Uxx -- Use xx amount of memory for sorting. If xx < 100, then it is the
percentage of system memory to use. If xx >100 it is the number of KB
to use.
-Fxx -- Leave xx amount of memory free for use by the system. If xx < 100, the it
is the percentage of memory to leave for other users, otherwise it is the
number
of KB to leave free for other uses.
-Llogfi
logfi is the name of a log file showing LSORT progress.
If not present, LSORT.LOG is used.

Sort Specifications:
You must specify either a SORT or MERGE operation.

If you ask for a SORT, you may tell LSORTNT to use either a QUICKSORT or
HEAPSORT for internal sorting. You will also be asked to specify two
devices to hold merge files if any are needed. Merge files may be placed on
floppy disk, hard disk or RAM disk. The specified drive must be large
enough to hold the entire input file.

If you specify SORT or MERGE you must enter your input file(s) and
output file as well as the definition of the key fields to be used in the
comparisons. Fields are specified by their starting position and length.
The types of fields have been listed above.

The sort specifications must be entered on the command line or in the


redirection file in the order requested by LSORTNT. Each parameter should
be separated from the others with one or more spaces.

The sort needs the following information in the order shown:

Type of Sort:
S -- for QUICKSORT
H -- for HEAPSORT

Merge Drive 1: You may reply with any drive letter, although it is best to
specify a fixed disk (if any).

Merge Drive 2: This should be different from drive 1 if you are using
floppy disks, but should be a fixed disk if you have one.

Name of input file: You may specify any name including drive letter and
path. Specify :X to use a user specified input routine.

Name of output file: See above. Specify :X to use a user specified output
routine.

File Type (Unless you are sorting a dBase file):


F nnnn -- Fixed length file (all records are the same length),
nnnn is the length of each record.
V -- for a varying length file (records must end with CR LF.)
D -- for delimited files.

You must then enter field definitions. Each field definition has
four parts:
starting position (from 1) or starting field (delimited files)
field length (in bytes) (no prompt for delimited files)
field type (See above list of valid types)
sort order (A--Ascending, D--Descending)

In order to work as efficiently as possible, LSORTNT does not check the


starting position of a field against the actual length of a record. If some
field starts past the end of a record (e.g. sort field 1 starts in column 10
but the record is only 8 bytes long), the results will be undefined and most
certainly not what you want. Please be careful.

Enter a '0' for the starting position to end the prompt for field
definitions.

If you are sorting a dBase file, you may specify a field by name, in
which case you will only be need to enter the sort order {A|D}.
You may enter starting position, length, type and order as above.

example 1:

Sort file test.dat on positions 1-5,char,ascending and 6-7, binary integer,


descending. Use drive C for the work files and put the sorted file in
test.srt.

Issue the following command:

LSORTNT S C C test.dat test.srt V 1 5 C A 6 2 H D 0


| | | | | | |_____| |_____| |
| | | input output F | | |
| | | file file i sort sort |
| | | name name l field 1 field 2 |
| | | e starts starts |
| | merge drive 2 at byte at byte |
| | T 1, is 5 6, is 2 |
| merge drive 1 y byte byte |
| p char- long |
sort using e acter integer |
quicksort string sorted ends list of sort fields.
ascend- descend-
ing ing

Merge Specification:

Enter 'M' to indicate the merge operation.

You will be asked to enter the number of files to be merged followed by 1-5
files to be merged. They are entered one at a time.

You will be asked to enter a file type, output file and a field list as
above.

example:

Merge files t1.dat t2.dat and t3.dat on positions 4-7 defined as a character
field, ascending.

LSORTNT M 3 t1.dat t2.dat t3.dat test.mrg V 4 4 c a 0 y y


| | | | | | | |_____| | |_|
| | input input input output | | | |
| | file 1 file 2 file 3 file | merge | response to mount
| | | field | messages
| merge 3 files | 1 |
| | end of list of merge
do a merge file fields
type
Restarting:

If a sort stops in the middle due to lack of space or is stopped


by you by pressing ^BREAK, it may be restarted by issuing the LSORTNT -R
command providing the dataset(s), SORTPARM.DAT and (DB3PARM.DAT
for dBase III files only) are still available and further providing
all files LSMERGE?.DAT are still available. The sort will be restarted
at the beginning of the LSSORT phase (where the input file is read
and sorted) or at the beginning of an LSMERGE pass, where several
partially sorted files are combined.
User Exits:

You may define your own user exits to read and write data and you may define
your own compare routines for the standard field types or for user defined
field types. These routines must be written in Microsoft Visual C++ 5.0 or in
any other language that can be linked to Microsoft Visual C++.

User Exits are only available to registered users, who will recieve the
source and object files for LSORT.

User input: (Available for Sorting Only)

Specify :X as the name of the input file. LSORTNT uses a routine named
USERIP to read the records to be sorted. You may write your own
version of USERIP and link it with LSORT to create a custom version
containing your own input routine.

USERIP is used as follows:

int l,userip();
char buffer[...];

l = userip(buffer);

USERIP must return the length of the record read which must be <= 4096
or -1 for end of file. If you have specified V type files, USERIP must
return a string ending with a '\0'. The string length must include the
trailing '\0'.

User Output:

Specify :X as the name of the output file. LSORT uses a routine called
USEROP to write to the :X file. You may write your own user output
routine to be used to write the final sorted or merged output by
creating a custom version of USEROP and relinking LSORT to create a
custom LSORT. USEROP works as follows:

int buflen;
char buffer[...];

userop(buffer,buflen); /* userop must write buflen bytes from buffer */


/* buflen == 0 means that you want to write a
0 terminated string */
userop(NULL,-1); /* userop must perform end of file processing */

Sample versions of userip and userop appear below:


/* Userip to return a varying length string */
#define CPMEOF 26
#include "stdio.h"
userip(s)
char *s;
{
static char firsttime = 1;
static int inchan;
char *fgets();
int l;
/* input is string buffer, max length 4k, 4k always available */
/* this routine must return length of string or EOF if end of file */
/* example follows: (Note length of string includes 0 byte at end */
if (firsttime) {
firsttime=0;
inchan=fopen("usertest.dat","r");
}
if (fgets(s,4096,inchan))
return strlen(s)+1;
else
return EOF;
}

/* Userip to return a fixed length string */


#define CPMEOF 26
#define STRLEN 128
#include "stdio.h"
userip(s)
char *s;
{
static char firsttime = 1;
static int inchan;
char *fgets();
int c,l;
/* input is string buffer, max length 4k, 4k always available */
/* this routine must return length of string or EOF if end of file */
/* example follows: (Note length of string includes 0 byte at end */
if (firsttime) {
firsttime=0;
inchan=fopen("usertest.dat","r");
}
if ((l=read(inchan,buffer,STRLEN)) == STRLEN)
return STRLEN;
else
return EOF;
}

userop(s,l)
char *s;
int l;
{
/* s is string to write, l is length or 0 if 0 terminated or -1 for close */
static char firsttime = 1;
static int otchan;
if (firsttime) {
firsttime = 0;
otchan = fopen("usertest.srt","w");
}

if (l == -1 || s == NULL)
fclose(otchan);
else if (l) /* write an F type record */
while(l--) fputc(*s++,otchan);
else fputs(s,otchan); /* write a V type record */
}

User Compare Routines:


You may define up three user defined fields: X,Y,Z. You must write a compare
routine for each field type used. The routine names are:

sxcmp -- for field type X.


sycmp -- for field type Y.
szcmp -- for field type Z.

The compare routines are called with three arguments, the address of the
first field, the address of the second field and the field length. The
routine must return 1 if field 1 < field 2, 0 if field1 == field2 and -1 if
field1 > field2.

Sample routines are shown below:

sxcmp(a,b,l)
long int *a,*b;
int l;
{ /* this routine compares two long integers */
long int c;
c = *a - *b;
return c <0 ? -1 : c == 0 ? 0 : 1;
}

sycmp(a,b,l)
int *a,*b;
int l;
{ /* this routine compares two integers (2 bytes) */
int c;
c = *a - *b;
return c <0 ? -1 : c == 0 ? 0 : 1;
}

szcmp(a,b,l)
float *a,*b;
int l;
{ /* this routine compares two floating numbers */
float c;
c = *a - *b;
return c<0 ? -1 : c == 0 ? 0 : 1;
}

You might also like