You are on page 1of 15

The OTDR (Optical Time-Domain

Reflectometer) Data Format


(Last Revised 2018-01-04)

Introduction

The SOR ("Standard OTDR Record") data format is used to store OTDR (optical time-domain
reflectometer) fiber data. The format is defined by the Telcordia SR-4731, issue 2 standard. While it
is a standard, it is unfortunately not open, in that the specifics of the data format are not openly
available. You can buy the standards document from Telcordia for $750 US (last time I checked),
but this was too much for my budget. (And likely comes with all sorts of licensing restrictions. I
wouldn't know; I have never seen the document!)

There are several freely available OTDR trace readers available from download on the web, but
most do not allow exporting the trace data into, say, a CSV file for further analysis, and only one I
am aware of that runs natively on Linux (although some will work with the Wine emulator). There
have been requests on various Internet forums asking for information on how to extract the trace
data, but I am not aware of anyone providing any answers beyond pointing to the free readers and
the Telcordia standard.

Fortunately the data format is not particularly hard to decipher. The table of contents on the
Telcordia SR-4731, issue 2 page provides several clues, as does the Wikipedia page on optical time-
domain reflectometer. Using a binary-file editor/viewer and comparing the outputs from some free
OTDR SOR file readers, I was able to piece together most of the encoding in the SOR data format.
In this article I will describe my findings, in the hope that it will be useful to other people. But use
it at your own risk! The information provided here is based on guess work from looking at a
limited number of sample files. I can not guarantee that there are no mistakes, or that I have
uncovered all possible exceptions to the rules that I have deduced from the sample files. You have
been warned!

A Simple SOR File Reader

For the impatient, I have written a simple program pubOTDR (hosted at GitHub) that parses a SOR
file and dumps the trace curve into a TAB delimited data file. The program is written in Perl, and is
far from efficient, but it should work! There is also a Python version (pyOTDR), a Ruby version
(rbOTDR), a Javascript/Node.js version (jsOTDR, or use "npm install jsotdr" to install)
and a Clojure version (cljotdr). Since Clojure runs inside a JVM (Java Virtual Machine), the Clojure
version can also be adapted for use with Java.
Instructions on using the stand-alone programs (or using them as a library/module inside your own
program) can be found on the GitHub pages cited above.

Organization of the SOR file


There are actually two main versions of the OTDR SOR files. The earlier version is from Bellcore
(the 1.x versions), the new version is 2.x. The files are binary data filies; all values are encoded as
little-endian signed or unsigned integers, with floating-point values represented as scaled integers
(i.e., the integers are multiplied by some factor, typically some power of 10, to become the actual
value). Floating-point numbers are not used,

In both version, the data is arranged in blocks; some are required, some are optional. They are:
• Map block (required): Map
• General parameters block (required): GenParams
• Supplier parameters block (required): SupParams
• Fixed parameters block (required): FxdParams
• Key events block (required if data point block is not present): KeyEvents
• Link Parameters block (optional): LnkParams
• Data points block (required if key events block is not present): DataPts
• Special proprietary block (optional): these appear to be vendor specific.
• Checksum block (optional): Cksum

The Map block is the first block, containing the format version number and details of the blocks to
follow. The individual blocks in the file are each described by its own "map" which consists of the
name of the block (a string), a version number, and the size of the block (in bytes). These "maps"
also specify the order in which the blocks appear in the file; the order can differ from vendor to
vendor. However, the checksum block always appears as the last block, and this arrangement makes
sense for creating and then appending the checksum of the file after it is calculated.

After the Map block comes the individual blocks that contain the actual data, in the order described
in the Map block.

One difference between the older 1.x version and the new 2.x version is that blocks in the new 2.x
version are preceded by the name of the block (e.g., GenParams), while the older version does not.
The preceding block name is redundant, but it affords an extra layer of sanity check.
The map block
In the newer 2.x version, the Map block starts with the string "Map", followed by a terminating '\0'
character. The older 1.x version does not have the 'Map\0' heading. There are 8 bytes following the
'Map\0' heading (in the 1.x version, just 8 bytes). These are:
• 0-1: version number
• 2-5: number of bytes in the Map block
• 6-7: number of blocks

All numbers are unsigned integers (recall that all are in little-endian order). The version number is
encoded as 100 times the the version number. For example, version 1.10 would be encoded as the
number 110. The number of bytes contained in the Map block include the 'Map\0' header and the 8-
byte information that follows.

The 8-byte information is followed by individual "maps" of the blocks that are to follow. Each of
the block "maps" consist of the block name (a string terminated by the '\0' character), followed by
6-bytes. The 6-bytes are:
• 0-1: version number
• 2-5: number of bytes in the block

The version number are usually the same, especially for the standard/required blocks, but they can
be different for the special proprietary blocks. The version number actually cited as the version
number of the file appears to be the version number from the FxdParams block.

The general parameters block


The format of the 1.x version and 2.x version are slightly different. I will describe the newer 2.x
version first.

The general parameters block starts with the "GenParams\0" heading (string followed by a
terminating '\0' character), then two bytes that indicates the language (EN for English). This is
followed by the following fields (in the following, all strings include a terminating '\0' character
unless indicated otherwise):

1. cable ID: string

2. fiber ID: string

3. fiber type: 2 byte unsigned integer

4. wavelength: 2 byte unsigned integer


5. location A (staring location): string

6. location B (ending location): string

7. cable code (or fiber type): string

8. build condition: 2 byte characters (no terminating '\0')

9. user offset: 4 bytes integer

10.user offset distance: 4 bytes (only in version 2.x)

11.operator: string

12.comments: string

The interpretation of the cable code field (7th field) seem to vary from vendor to vendor, with some
using it as the fiber type.

The fiber type (3rd field) is an integer that indicates the type of fiber. The encoding is as follows
(see here for more details):
• 651: ITU-T G.651 (multi-mode fiber)
• 652: ITU-T G.652 (standard single-mode fiber)
• 653: ITU-T G.653 (dispersion-shifted fiber)
• 654: ITU-T G.654 (1550nm loss-minimzed fiber)
• 655: ITU-T G.655 (nonzero dispersion-shifted fiber)

The wavelength (4th field) is the wavelength: 1310 means 1310.0nm. Note that the wavelength
encoding in the fixed parameters block (described in the next section) has a scaling factor of 10, but
the encoding here (in the general parameters block) does not.

Build condition (8th field) consist of two bytes of character. The encoding is as follows:
• BC: as-built
• CC: as-current
• RC: as-repaired
• OT: other

The string fields may contain the newline or carriage-return characters.


The format for the 1.x version is similar, but it does not have the 'GenParams\0' header, and fiber
type and user offset distance fields are missing. As far as I can tell, there is no encoding of the fiber
type in the 1.x version format. (Thanks to Andrew from the UK for the information on the user
offset and user offset distance parameters; I am not sure about the units though.)

The supplier parameters block


The supplier parameters block starts with the 'SupParams\0' string in the 2.x version format; this
header is absent in the 1.x version format. The fields in the supplier parameters block in the 2.x
version format are as follows (all are strings terminated by the '\0' character):

1. supplier name

2. OTDR name

3. OTDR serial number

4. module name

5. module serial number

6. software version

7. other

The fixed parameters block


The fixed parameters block starts with the 'FxdParams\0' string in the 2.x version format; this
header is absent in the 1.x version format. The parameters in this block are often listed as
"measurement parameters".

The fields for the 2.x version format are as follows (all are unsigned integers unless otherwise
noted):

• 0-3: date/time: 4 bytes

• 4-5: units: 2 characters

• 6-7: wavelength: 2 bytes


• 8-11: acqusition offset: 4 bytes integer

• 12-15: acqusition offset distance: 4 bytes integer

• 16-17: number of pulse width entries: 2 bytes (the next three parameters are repeated
according to the number of entries)

• 18-19: pulse-width: 2 bytes (repeated)

• 20-23: sample spacing: 4 bytes (repeated)

• 24-27: number of data points in trace: 4 bytes (repeated)

• 28-31: index of refraction: 4 bytes

• 32-33: backscattering coefficient: 2 bytes

• 34-37: number of averages (?): 4 bytes

• 38-39: averaging time: 2 bytes

• 40-43: range (?): 4 bytes

• 44-47: acquisition range distance: 4 bytes signed int

• 48-51: front panel offset: 4 bytes signed int

• 52-53: noise floor level: 2 bytes

• 54-55: noise floor scaling factor: 2 bytes signed int

• 56-57: power offset first point: 2 bytes

• 58-59: loss threshold: 2 bytes

• 60-61: reflection threshold: 2 bytes

• 62-63: end-of-transmission threshold: 2 bytes


• 64-65: trace type: 2 characters

• 66-69: X1: 4 bytes signed int

• 70-73: Y1: 4 bytes signed int

• 74-77: X2: 4 bytes signed int

• 78-81: Y2: 4 bytes signed int

The date/time field is a 4-byte unsigned integer that is Unix (or POSIX) time, and is the number of
seconds that have elapsed since 00:00:00 UTC of January 1st, 1970. The two bytes that follow it
may be related to the time-zone (there are cases where the time displayed by the free OTDR readers
are off by one hour from what I get by interpreting it as Unix time), but so far I have not been able
to determine how it is encoded.

"Units" is represented by two characters. These are:


• km: kilometers
• mt: meters
• ft: feet
• kf: kilo-feet
• mi: miles

Since distances and positions are expressed as time-of-travel (more on this later), the units
specification is only a request for how the distances and positions are displayed.

The wavelength is encoded as an unsigned integer that is 10 times the wavelength in nanometers
(this is different from the wavelength field in the general parameters block).

Pulse-width is an unsigned integer in nanoseconds.

Sample spacing is an unsigned integer, representing the time interval of the sample points. These
are in units of 0.01 picoseconds (for example, a value of 2 means 0.02 picoseconds). To convert it
into meters, multiply the integer by 10-8 to become microseconds, and then multiply by the speed-
of-light,
c = 299.792458 m/µsec,
and finally divide by the index of refraction (IOR) of the fiber (which is explained next). This then
becomes the "resolution" of the measurement.

The refractive index is represented as an unsigned integer that is 105 times the value of the index of
refraction (IOR).

The backscattering coefficient an unsigned integer. Multiply the integer by -0.1 to get dB.

The number of averages is an unsigned integer. This one is also a mystery. There are cases where
they match the values from the established software programs; in other case, the values are off by
some multiple, but I have failed to discover what determines the multiple. I might be completely
wrong in assuming that this the number of averages!

The averaging time is an unsigned integer, in seconds.

The range is an unsigned integer. To convert it into kilometers, multiply by 2×10-5. However, this is
not quite right. In some instances the numbers agree with the established (commercial) software
programs; in other cases, the values come up short. It might be that these software programs round
this value up to standard values, but this is only a guess.

The loss, reflection, and end-of-termination (EOT) thresholds are the specified values for
determining when one of the "events" occur. They are unsigned integers. To convert the integers
into dB values, multiply the loss and EOT thresholds by 0.001, and multiply the reflection value by
-0.001.

Trace type is represented by two characters. These are:


• ST: standard trace
• RT: reverse trace
• DT: difference trace
• RF: reference

The format for version 1.x is similar, but do not have some of the fields in the 2.x version. They are:

• 0-3: date/time: 4 bytes

• 4-5: units: 2 characters

• 6-7: wavelength: 2 bytes

• 8-11: acqusition offset: 4 bytes integer


• 12-13: number of pulse width entries: 2 bytes (the next three parameters are repeated
according to the number of entries)

• 14-15: pulse-width: 2 bytes (repeated)

• 16-19: sample spacing: 4 bytes (repeated)

• 20-23: number of data points in trace: 4 bytes (repeated)

• 24-27: index of refraction: 4 bytes

• 28-29: backscattering coefficient: 2 bytes

• 30-33: number of averages: 4 bytes

• 34-37: range (?): 4 bytes

• 38-41: front panel offset: 4 bytes signed int

• 42-43: noise floor level: 2 bytes

• 44-45: noise floor scaling factor: 2 bytes signed int

• 46-47: power offset first point: 2 bytes

• 48-49: loss threshold: 2 bytes

• 50-51: reflection threshold: 2 bytes

• 52-53: end-of-transmission threshold: 2 bytes

The parameters for the 1.x format are handled and converted in the same way as the 2.x format,
with the same caveats described earlier. Note that there is no trace-type or averaging time in the 1.x
format.

As mentioned above, the wavelength is scaled by a factor of 10 (13100 for 1310.0nm), however I
have seen cases in the 1.x format where there is no x10 scaling factor.

Thanks to Andrew for supplying the information about the acquisition offset (AO), acquisition
offset distance (AOD), and number of pulse width entries (TPW). I am not sure about the units. The
number of pulse width entries determines the number of times that the next three parameter (pulse-
width, sample spacing, and number of data points) are repeated. This corresponds to the number of
traces in the data block (to be described below). (Reader Tom Denesyk had pointed this out in an
earlier comment, but I had failed to understand what he meant at that time.)

Neither Andrew or I have ever seen any SOR files "in the wild" that had more than one trace
however.

Andrew also supplied information about the acquisition range distance, front panel offset, noise
floor level, and noise floor scaling factor, and power offset first point. I am (again) not sure about
the units. The noise floor presumably is taking the negative of the noise floor level, divided by the
noise floor scaling factor, in dB.

Finally, for the 2.x format, there are the X1,Y1, X2,Y2 parameters, which presumably define some
region or viewing window in the trace plot (thanks to Andrew again). However the details are not
known.

Key events block


The key events block starts with the 'KeyEvents\0' string in the 2.x version format; this header is
absent in the 1.x version format. The formats of the 1.x version and 2.x version are slightly
different. I will start with the 2.x version:

The first two bytes following the header is a unsigned integer that is the total number of events.
Each event is a fixed 42-byte record followed by a '\0' terminated comment string (which may be
empty). The fixed 42-bytes are as follows:
• 00-01: event number (1, 2, 3, etc.); 2 bytes, unsigned integer
• 02-05: time-of-travel; 4 bytes, unsigned integer
• 06-07: slope; 2 bytes, signed integer
• 08-09: splice loss; 2 bytes, signed integer
• 10-13: reflection loss; 4 bytes, signed integer
• 14-21: event type: 8 characters
• 22-25: end of previous event (is 0 if first event)
• 26-29: beginning of current event
• 30-33: end of current event
• 34-37: beginning of next event (equals range if last event)
• 38-41: peak point in current event

The event number count starts from 1. The time-of-travel is distances are represented as unsigned
integers, in units of 0.1 nanoseconds (for example, a value of 2 means 0.2 nanoseconds). To convert
to distance, use the formula similar to the one used in the fixed parameters block:
(distance in kilometers) = (integer value) × 10-4 × c / (refractive index)

where c = 0.299792458 km/µsec is the speed-of-light in vacuum (in km/µsec).

Slope, splice loss, and reflection loss are all signed integers, and are multiplied by 0.001 to become
dB/km (for slope) and dB.

Event type is represented by a string of the form nx9999LS, where n and x are single characters. x
appears to represents (or correlate with) the "mode" in which the event was added or declared.
When
• x is 'A', it is manual mode, otherwise it is auto mode.
• x can be the characters 'E', 'F', 'M', or 'D', but I have not discovered what they signify, except
that 'E' appears o signify the end of the fiber.

The n character is a number: 0, 1, or 2: 0 is a loss or gain in power; 1 is a reflection, and 2 means


that it is a "multiple event".

The start and end (positions) of the current, previous, or next event, and the peak position of the
current event, are all integers that also represent the time-of-travel (in units of 0.1 nanoseconds).
Translation of the integers to kilometers follow the same formula as the distance encoding of the
event.

Following the end of all event records is 22 bytes. These are encoded as follows:
• 00-03: total loss: 4 bytes, signed integer
• 04-07: fiber start position: 4 bytes, signed integer
• 08-11: fiber length: 4 bytes, unsigned integer
• 12-13: Optical return loss (ORL): 2 bytes, unsigned integer
• 14-17: duplicate of 04-07 (fiber start position)
• 18-21: duplicate of 08-11 (fiber length)

The total loss integer and ORL values are multiplied by 0.001 to become dB. The fiber start position
and fiber length are handled the same way as before, namely:

(distance in kilometers) = (integer value) × 10-4 × c / (refractive index)

Note that the fiber start position can be a negative number. In all examples that I've seen, the last 8
bytes are just duplicates of the fiber start position and the fiber length.

The format for version 1.x is similar, but each event record is a fixed 22-bytes plus a '\0' terminated
comment string:
• 00-01: event number (1, 2, 3, etc.); 2 bytes, unsigned integer
• 02-05: unadjusted distance; 4 bytes, unsigned integer
• 06-07: slope; 2 bytes, signed integer
• 08-09: splice loss; 2 bytes, signed integer
• 10-13: reflection loss; 4 bytes, signed integer
• 14-21: event type: 8 characters

The only difference is that the start/end position information are absent. The trailing 22-bytes in the
version 1.x format appears to be the same as the 2.x version, but the numbers for the fiber starting
position do not match or make sense in the examples that I have studied.

The data points block


We finally come to the data points block that encodes the trace curve itself. Similar to the other
blocks, the block starts with a header string 'DataPts\0' in the version 2.x format, but the header is
absent in the 1.x version format. The format for both the 1.x version and 2.x version are the same.

After the header (if applicable), the data points block starts with 12 bytes. The first 4 bytes is an
unsigned integer that is the number of data points (this will be the same as the number of data points
from the fixed parameters block). This is followed by 2 bytes is the number of traces
(corresponding to the number of pulse width entries in the FxdParams block).

The next 4 bytes is a repeat of the number data points, followed by another 2 bytes (unsigned
integer) that is the scaling factor (almost always 1000).

After the initial 12-bytes comes the OTDR trace data (dB vs. distance). Each data point is a 2 bytes
unsigned integer. Multiply by the negative of the scaling factor times 10-6 to translate the value into
dB (this converts all values to zero or negative). Different OTDR SOR file readers offset the data
differently; some offset the data so that the highest reading is 0 dB. Others add an offset to make the
minimum reading 0 dB.

If there are more than one trace, the pattern repeats: 4 bytes for the number of data points, 2 bytes
for the scaling factor, then the trace data.

The data points are equally spaced, by the "sample spacing" value specified in the fixed parameters
block (after converting it into distance, or the "resolution" of the measurement). One can apply the
"fiber start position" value specified in the key events block, but some OTDR SOF file readers do
not do this.

The scaling factor is almost always 1000. In that case the 2 bytes per data point can only give you a
maximum of 65.535 dB of dynamic range. I have personally never encounter a SOR file where this
scaling factor was not 1000; the information of the scaling factor was kindly communicated to me
by a reader (see the acknowledgement section at the end), which I am reporting here, but
unfortunately I have not been able to personally verify this.

(Thanks again to Andrew for explaining the structure for multiple traces. As noted above, neither of
us have actually seen SOR files with more than one trace.)

The checksum block

The checksum block also starts with a 'Cksum\0' header in the 2.x format, and is absent in the 1.x
format. The checksum value itself is 2 bytes (16-bits). The algorithm for calculating the checksum
uses a particular 16-bit CRC (cyclic redundancy check) algorithm (or function). For the reader who
is not familiar with CRC algorithms, please see the excellent article A Painless Guide to CRC Error
Detection Algorithms by Ross N. Williams (1993). The specific flavor used for the OTDR SOR
format is sometimes known as "CRC-16/CCITT-FALSE" (for a catalog of different CRC-16
algorithms, please see Catalogue of parametrised CRC algorithms with 16 bits).
For a Perl implementation of CRC functions, please see the Digest::CRC module; for a Python
implementation, please see the crcmod module.

Since there are several variants of the CRC-16 algorithm, and there is some confusion of the names
and exact definitions, I will spell out the exact parameters below, following the convention in the
Painless Guide document:
• Width: 16
• Poly: 0x1021
• Init: 0xFFFF
• RefIn: False
• RefOut: False
• XorOut: 0x0000
• Check: 0x29B1 (with an input string of "123456789")

The last item is useful for checking whether the implementation you use is the correct one: when
given the string "12345789", the checksum should come out to be "0x29B1".

The exact algorithm for calculating the checksum (for both version 1.x and 2.x) is as follows: take
the whole content of the file, including the 'Cksum\0' header, as one huge binary string, and
calculate the checksum on this string. The check sum will be two bytes (16-bits), which is then
appended to the file (following the 'Cksum\0' header). However, the two bytes need to be swapped
because the convention in the SOR file is to store numbers in little endian byte order. For example,
if the checksum is 0xD680, the last two bytes of the SOR file are 0x80, 0xD6.

This turns out to be very awkward in some ways. A very interesting property of the CRC algorithm
is that, if you were to append the two bytes of the 16-bit checksum to the file (or string) and then
run the CRC function on it, the checksum will be zero! But this only works if the checksum is
appended in big endian byte order. Said another way, take last two bytes of the SOR file, swap
them, and run the checksum function on it. If the checksum is zero, then everything is okay (or
rather, is most likely to be okay; the CRC error detection code will detect most spare errors, but it
doesn't detect all errors after all). It would have been nice if the byte-swapping of the two checksum
bytes were not necessary.

One last word of caution: I have check this CRC-16 algorithm against many versions of 1.x and 2.x
files; almost all of them check out. I believe that the algorithm is correct, but I have found
exceptions in a very few sample SOR files from some established software programs, and there is
one particular software program that generates the checksum in a very different way (it does not
match any of the commonly used CRC-16 algorithms, or other commonly known 16-bit checksum
algorithms). On the other hand, that software program also accepts SOR files with the "correct" (as
I see it) checksum, so it is still a mystery to me.

(One reader, gazlan, has noted that JDSU, Anritsu, and EXFO do not use the CRC-16 algorithm.)

Closing Remarks and Acknowledgment

There are still a few parts of the SOR format that need clarification, but I believe almost all of the
encoding, as I have described in this article, should be correct, thanks to the many people who
commented and wrote to me with information and corrections. The simple (and extremely
inefficient) pubOTDR program is basically an implementation of the findings described above
(although incomplete). A Python version is also available: pyOTDR. You are free to use these
programs and the information in this article as you see fit (although I would appreciate some
acknowledgement). But once again, all of this is provided at no cost, no guarantees, and no
warranties; use this at your own risk!

After I posted the original article, I had several email exchanges with Dmitry Vaygant
(www.sortraceviewer.ru), who has written his own SOR trace viewer program. I would like to thank
him here for correcting several mistakes and explaining several others parameters in the OTDR
format, specifically in the KeyEvents block and the Fixed Parameters block (there are significant
corrections in these two sections). In particular, I had originally misinterpreted the "sample spacing"
("distance spacing") as distance instead of time; what I thought was the "magic number" was really
half the speed of light in vacuum! Dmitry also communicated to me the information about the
scaling factor in the data point section, which is much appreciated.

I would also like to thank John Leis from Australia who pointed out a few more mistakes in the
descriptions, especially with regards to the wavelength encoding in the general parameters section
(for what it's worth, the code I posted in GitHub was correct; it was my description that was wrong).
I am also very grateful to Andrew from the UK for filling in almost all of the remaining gaps, as
noted in the sections above. There are still a few questions about the units that I haven't yet figured
out, but the description should be basically complete. Any remaining mistakes that you see here are
my own of course, and I welcome any further corrections, comments and suggestion.

You might also like