You are on page 1of 10

Chapter xx: Exporting NX Data to Excel

■ Introduction
There are many situations where it is useful to exchange information (in either direction) between NX and Excel.
For example, you might want to export an NX bill-of-material or other report to Excel. Alternatively, you might want
to import data from an Excel spreadsheet and use this to assign values to attributes in NX. There are two somewhat
separate steps in the data exchange process: getting data into and out of NX, and getting data into and out of Excel.
These two steps are both discussed in this chapter, even though the second one is not really related to NX.
There are (at least) three different ways to programmatically “push” data to Excel:
 The Automation Approach: use the Excel API to write data into an Excel document.
 The CSV Approach: write a csv file that can then be imported into Excel.
 The XML Approach: write an XML file that can then be imported into Excel
These are discussed in the sections below. However, note that the first and second of these are somewhat related,
so you should read about the Automation Approach before reading about the CSV Approach.

■ The Automation Approach


Since its earliest days, Excel has had an API that you can call to gain access to its functions. Driving Excel from a
user-written program in this way is known as “Excel Automation”, and the API is often known as the “Automation
API”. The Excel API was originally built around a technology called COM, which was popular in the 1990’s.
Specifically, the Excel objects that are exposed by this original API are COM objects, and Excel is said to be a “COM
server”. Although Excel has changed dramatically over the last 20 years, its API is still fundamentally based on COM.
Today, many people want to use .NET languages like Visual Basic and C# to automate Excel, so there needs to be
some way to access the COM API from these languages. The solution is an interoperability layer — a collection of
functions that “wrap” the COM API and make it easier to call from .NET code. The wrapper functions are contained
in DLLs called “interoperability assemblies”, which you must reference in order to access Excel functions. Of course,
if you want to call Excel functions, you still need a copy of Excel itself; the interoperability assemblies by
themselves are just the glue, so they are not sufficient.

Excel Macros
The functions in the Excel API (whether wrapped or not) correspond very closely with the functions of interactive
Excel. You can understand this correspondence by recording macros in Excel and examining their contents. For
example, if you record the actions of typing “xyz” into cell C3, and making it bold, your macro will contain the
following code:

Range("C3").Select
ActiveCell.FormulaR1C1 = "xyz"
Selection.Font.Bold = True

This code uses the VBA language (Visual Basic for Applications), but translating it into VB.NET or other languages
is typically straightforward. This is the same process that we use to discover functions in the NX/Open API, though
NX has the added advantage that it can record macro code in several language s, not just in VBA.

Example
This example shows you how you can write data into an Excel spreadsheet and format the cells. We use a fictitious
set of part records to illustrate the techniques. Each part has an 18-digit part number, a weight, a cost, and a
purchase date. An array of part records is returned from a function called GetParts, which you have to provide. In
an NX scenario, the GetParts function would probably read data from NX attributes.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 1
To use the code shown below, you will need to have a reference to Microsoft.Office.Interop.Excel. As its name
suggests, this is the interoperability assembly for Excel, and you can find it on the .NET tab of the Add Reference
dialog in Visual Studio, as shown below:

On the COM tab, you can find another Excel library, called Microsoft Excel 14.0 Object Library. This will work, too,
but the one on the .NET tab is preferable. Then, the code is as follows:

Imports Excel = Microsoft.Office.Interop.Excel

Public Class AutomateExcel

Public Shared Sub Main()

Dim app As Excel.Application = New Excel.Application()


app.Visible = True
Dim workBook As Excel.Workbook = app.Workbooks.Add()
Dim workSheet As Excel.Worksheet = workBook.ActiveSheet
Dim cells As Excel.Range

' Change the formatting of the columns


cells = workSheet.Columns(1)
cells.NumberFormat = "@" ' Format column #1 as text
cells.ColumnWidth = 22 ' Adjust the column width
cells = workSheet.Columns(2)
cells.NumberFormat = "0.00" ' Format with 2 decimal places
cells = workSheet.Columns(3)
cells.NumberFormat = "$#,##0" ' Format column #3 as currency
cells = workSheet.Columns(4)
cells.NumberFormat = "dd-mmm-yyy" ' Format column #4 as dates

' Get the part data (from somewhere)


Dim parts As PartRecord() = GetParts()

' Write the part data into cells


For i As Integer = 1 To parts.Length
cells = workSheet.Cells(i, 1) : cells.Value = parts(i-1).PartNumber
cells = workSheet.Cells(i, 2) : cells.Value = parts(i-1).Weight
cells = workSheet.Cells(i, 3) : cells.Value = parts(i-1).Cost
cells = workSheet.Cells(i, 4) : cells.Value = parts(i-1).LastPurchased
Next

' Save, close, and quit


workBook.SaveAs("D:\public\parts.xlsx")
workBook.Close()
app.Quit()

' Clean up COM objects (details later)


Cleanup(cells, workSheet, workBook, app)

End Sub

End Class

Most of the code is straightforward, and should be easy to understand. One thing to note is that cell numbering in
Excel starts at 1, not at 0. So, cell “C2” is Cells(2,3), for example.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 2
The only real mystery is the last step, where we “clean up COM objects”. This step is necessary because we are
using (indirectly) a COM API. When our VB code defines objects like the Excel application, the workbook, and the
worksheet, hidden COM objects are created, and the normal .NET garbage collection is unable to handle these. So,
when we are finished with these objects, we have to take care of destroying them and free-ing their memory. Some
code to do this cleanup is shown below. Don’t worry if you don’t understand this code; many experienced
programmers don’t understand it, either. Just place this code inside your AutomateExcel class, accept that it’s
necessary, and try not worry about it too much.

Private Shared Sub Cleanup(ParamArray objs As Object())


GC.Collect()
GC.WaitForPendingFinalizers()
For Each obj As Object In objs
System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
obj = Nothing
Next
End Sub

If you really want to know more, you can start here:


http://stackoverflow.com/questions/158706/how-to-properly-clean-up-excel-interop-objects?rq=1
Assuming you have provided a working GetParts function, running the code should produce an Excel spreadsheet
that looks something like this:

Text Versus Numbers


You may have noticed that our code formatted the first column as text, even though our part numbers are
numerical. Excel thinks this might be a mistake, and displays the little green triangles in the “A” column to get our
attention. But, in fact, there is no mistake — text formatting is the right choice in this situation. To explore this
further, remove the two lines of code that say:

cells = workSheet.Columns(1)
cells.NumberFormat = "@" ' Format column #1 as text

and run the code again. The result will be this spreadsheet:

Since we are no longer providing any help, Excel tries to make a guess about the contents of column #1, and it
guesses that they should be numbers, and stores them internally as numbers. But, Excel numbers only have around
15 digits of precision, so, as the display in the formula bar shows, the last 3 digits of each part number have been
lost. Clearly, text formatting (and text storage) is needed, here.

■ The CSV Approach


A “CSV” file is a simple text file in which each line of text represents a data record that will become a row of the
eventual spreadsheet. Each line of text consists of data “fields” that are separated by commas or other delimiters.
These files can be imported into Excel either manually, by a user running Excel, or programmatically using
functions in the Excel automation API. The main benefit of CSV files is their simplicity. This means that they are
easy to write, and can be imported into a wide range of spreadsheet and database applications.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 3
Their main disadvantage is that they contain no formatting information. However, some simple formatting options
can be specified when importing the data into Excel using the Text Import Wizard:

Roughly these same formatting controls are available if the file is imported into Excel programatically by calling the
OpenText function. This function has the following arguments:
void OpenText(
string Path,
[object Origin = System.Type.Missing],
[object StartRow = System.Type.Missing],
[object DataType = System.Type.Missing],
[Excel.XlTextQualifier TextQualifier = Excel.XlTextQualifier.xlTextQualifierDoubleQuote],
[object ConsecutiveDelimiter = System.Type.Missing],
[object Tab = System.Type.Missing],
[object Semicolon = System.Type.Missing],
[object Comma = System.Type.Missing],
[object Space = System.Type.Missing],
[object Other = System.Type.Missing],
[object OtherChar = System.Type.Missing],
[object FieldInfo = System.Type.Missing],
[object TextVisualLayout = System.Type.Missing],
[object DecimalSeparator = System.Type.Missing],
[object ThousandsSeparator = System.Type.Missing],
[object TrailingMinusNumbers = System.Type.Missing],
[object Local = System.Type.Missing])

The square brackets indicate optional arguments, as usual. The meanings of the arguments are as follows:

Argument Name Data Type Description


Path String Specifies the path name (including extension) of the text file
to be imported.
Origin Object Specifies the platform or geographic origin of the text file.
See below for details
StartRow Object The line number at which to start parsing lines of text in
(Integer) the input file. The numbering starts at “1”, which is the
default value.
DataType Object Specifies the column format of the data in the file. Can be
(XlTextParsingType) one of the following XlTextParsingType constants:
xlDelimited or xlFixedWidth. If this argument is not
specified, Microsoft Excel attempts to guess the column
format when it opens the file.
TextQualifier XlTextQualifier The text qualifier. This is explained below.
ConsecutiveDelimiter Object True to have consecutive delimiters considered as one
(Boolean) single delimiter. The default is False.
Tab Object True to have the tab character be a field delimiter.
(Boolean) The default value is False.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 4
Semicolon Object True to have the semicolon character be a field delimiter.
(Boolean) The default value is False.
Comma Object True to have the comma character be a field delimiter.
(Boolean) The default value is False.
Space Object True to have the space character be a field delimiter.
(Boolean) The default value is False.
Other Object True to have the character specified by the OtherChar
(Boolean) argument be a field delimiter. The default value is False.
OtherChar Object (required if Other is True). Specifies the delimiter
(String) character when Other is True. If more than one character
is specified, only the first character of the string is used.
FieldInfo Object(2, n) A two-dimensional array containing parse information for
individual columns of data. See below for further details.
TextVisualLayout XlTextVisualLayoutType The visual layout (direction) of the text. The default is the
system setting (I think) which will usually be
xlTextVisualLTR (left-to-right), unless you are using a
language like Hebrew.
DecimalSeparator Object The decimal separator that Microsoft Excel uses when
(String) recognizing numbers. The default setting is the system
setting.
ThousandsSeparator Object The thousands separator that Excel uses when recognizing
(String) numbers. The default setting is the system setting.
TrailingMinusNumbers Object Specify True if numbers with a minus character at the end
(Boolean) should be treated as negative numbers. If False or omitted,
numbers with a minus character at the end are treated as
text.
Local Object Specify True if regional settings of the machine should be
(Boolean) used for separators, numbers and data formatting.

Some of the more puzzling parameters are described in detail in the paragraphs below.

Origin
This can be one of the following XlPlatform constants: xlMacintosh, xlWindows, or xlMSDOS. Alternatively, this
could be an integer indicating the number of the desired code page. The allowable integer values are shown in the
“File origin” menu in the Text Import Wizard:

If this argument is omitted, the method uses the current setting from the Text Import Wizard.

TextQualifier
This is a character that can be used to enclose a sequence of characters, thereby forcing them to become one cell,
even if they include a delimiter character. For example, suppose that commas are being used as delimiters. Then the
string 1,260 would be split into two cells, even though the intention is probably to create a single cell containing
the number 1260. Similarly, we would probably want to force the string “Monday, July 4th” to be a single cell. Of
course, there is no need for a TextQualifier if you choose delimiter characters that don’t appear within the data
itself.
So, to make three cells from the three numbers 1,260 1,261 1,262, there are two possible approaches:
(1) If you have control over how the file is generated, create it with semicolons or some other characters as
delimiters (not commas), like this: 1,260; 1,261; 1,262. You can then use
TextQualifier = Excel.XlTextQualifier.xlTextQualifierNone. This is usually the easiest approach.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 5
(2) If you’re forced to use commas as delimiters, then you must use a TextQualifier to properly group the data. For
example, you might use TextQualifier = Excel.XlTextQualifier.xlTextQualifierDoubleQuote, and write
your input data as “1,260”, “1,261”, “1,262”.
The three allowable values of XlTextQualifier are:
Excel.XlTextQualifier.xlTextQualifierNone
Excel.XlTextQualifier.xlTextQualifierSingleQuote
Excel.XlTextQualifier.xlTextQualifierDoubleQuote
There is widespread confusion about the TextQualifier argument, possibly as a result of its poorly chosen name.
Many people think that using this argument will force Excel to format the enclosed strings as text (rather than as
numbers). This is not correct. To force Excel to format data as text, you must use the “FieldInfo” parameter.

FieldInfo
This is a two-dimensional array indicating how various columns of data should be parsed and formatted during
import. It is easiest to think of it as a list of pairs of the form (columnNumber, dataType), where columnNumber
indicates which column is under consideration, and dataType is one of the enumerated values from
Excel.XlColumnDataType. The most interesting values of this enum are:

Argument Value How Data is Parsed and Formatted


Excel.XlColumnDataType.xlTextFormat As text
Excel.XlColumnDataType.xlGeneralFormat Using Excel’s built-in general rules
Excel.XlColumnDataType.xlDMYFormat As a D/M/Y data
Excel.XlColumnDataType.xlSkipColumn Skipped (not parsed and not imported)

Here is an example. The code:

Dim text As Excel.XlColumnDataType = Excel.XlColumnDataType.xlTextFormat


Dim date As Excel.XlColumnDataType = Excel.XlColumnDataType.xlMDYFormat
Dim format As Object(,) = { {1, text}, {3, date}, {4, text} }

Says that
 columns #1 and #4 (the “A” column and the “D” column) should be formatted as text,
 column #3 (the “C” column) should be formatted as dates
 All other columns should be parsed and formatted as “general”.
The order of the pairs doesn’t matter. The code

Dim format As Object(,) = { {4, text}, {1, text}, {3, date} }

gives the same result as the code above. If there's no column specifier for a particular column in the input data, the
column is parsed with the General setting, which means that Excel will try to guess the correct format. If the
column contains strings that Excel can recognizes as dates, for example, then this column will be formatted as dates
even though you specified a “general” format or no format at all.
If you specify that a column is to be skipped, you must explicitly state the type for all the other columns, or the data
will not parse correctly.
The xlDMYFormat date format seems to have some bugs, but the xlMDYFormat one works fine. Having spaces at the
beginning of a date field will confuse the parsing, just as it does when typing into Excel.

Calling the OpenText Function


Since the OpenText function has 18 arguments, calling it can be a bit complicated. However, note that all but the
first argument (the file pathname) are optional, so we can omit them, if we want to.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 6
In its full glory, a call would look something like this:

app.Workbooks.OpenText(
pathName, origin, startRow,
dataType, textQualifier, consecutiveDelimiter,
useTab, useSemicolon, useComma, useSpace, useOther, otherChar,
myFormat,
textVisualLayout,
decimalSeparator, thousandsSeparator, trailingMinusNumbers, local)

But we can take advantage of Visual Basic’s ability to omit optional arguments, and write this, instead:

app.Workbooks.OpenText(
pathName,
Semicolon := True,
DataType := Excel.XlTextParsingType.xlDelimited,
FieldInfo := myFormat)

The “:=” syntax is used to give values to optional named arguments. Some people use sequences of commas when
omitting arguments, or they use System.Type.Missing as a placeholder, but the approach shown above seems
easier to read and less error-prone.

A Simple Example
Suppose we have the following simple text file, containing part records:
123456123456123456; 14.75; 1,995 ;2/3/2012
234567234567234567; 2.75; 675 ;6/11/2012
345678345678345678; 0.25; 69 ;12/17/2011
The fields represent the part number (18 digits), the weight, the cost in U.S. dollars, and the purchase date. As you
can see, semicolons are used as delimiters. Using commas would complicate things since commas also appear
within the cost field.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 7
The code to import this file would be as follows. Again, note that you’ll need a a reference to the
Microsoft.Office.Interop.Excel assembly to use this code

Imports Excel = Microsoft.Office.Interop.Excel

Class OpenTextExample

Public Shared Sub Main()

Dim app As Excel.Application = New Excel.Application()


app.Visible = True

Dim path As String = "D:\public\data.txt"


Dim myType As Excel.XlTextParsingType = Excel.XlTextParsingType.xlDelimited

' Define some abbreviations for column formats, for convenience


Dim t As Excel.XlColumnDataType = Excel.XlColumnDataType.xlTextFormat
Dim g As Excel.XlColumnDataType = Excel.XlColumnDataType.xlGeneralFormat
Dim d As Excel.XlColumnDataType = Excel.XlColumnDataType.xlMDYFormat

' Format column #1 as text, #2 as general, and #4 as date


Dim myFormat As Object(,) = { {1, t}, {2, g}, {4, d} }

' Import the data file, which will add a new item to the Workbooks collection
Dim workBooks As Excel.Workbooks = app.Workbooks
workBooks.OpenText(path, DataType:=myType, Semicolon:=True, FieldInfo:= myFormat)

' Save, close, and quit


Dim workBook As Excel.Workbook = app.ActiveWorkbook
Dim fileFormat As Excel.XlFileFormat = Excel.XlFileFormat.xlOpenXMLWorkbook
workBook.SaveAs("D:\public\parts.xlsx", fileFormat)
workBook.Close()
app.Quit()

' Clean up COM objects


Cleanup(workBook, workBooks, app)

End Sub

End Class

This produces the following result:

Specifying the DataType as XlDelimited is necessary, or else Excel will interpret the file as having fixed-width
fields. As you can see, we have asked for the first column to be parsed and formatted as text. Without this request,
the part numbers would be interpreted and stored as numbers, which would cause problems. Also, we need to
specify the “general” format for the second column, or else Excel will mysteriously interpret the 2.75 in cell B2 as a
date (February 1st 1975). Please refer to the discussion earlier in this chapter for more information about the
Cleanup function.

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 8
Once we have imported the data, we can use other Excel API functions to format it further, as we saw earlier in this
chapter. To do this, replace the code after the workBooks.OpenText line with the following:

Dim workBook As Excel.Workbook = app.ActiveWorkbook


Dim workSheet As Excel.Worksheet = workBook.ActiveSheet
Dim cells As Excel.Range

cells = workSheet.Columns(1)
cells.Font.Bold = True ' Make column #1 bold
cells.ColumnWidth = 22 ' Adjust its width
cells = workSheet.Columns(2)
cells.NumberFormat = "0.00" ' Format column #2 with 2 decimal places
cells = workSheet.Columns(3)
cells.NumberFormat = "$#,##0" ' Format column #3 as currency
cells = workSheet.Columns(4)
cells.NumberFormat = "dd-mmm-yyy" ' Change the date format in column #4

' Save, close, and quit


Dim fileFormat As Excel.XlFileFormat = Excel.XlFileFormat.xlOpenXMLWorkbook
workBook.SaveAs("D:\public\parts.xlsx", fileFormat)
workBook.Close()
app.Quit()

' Clean up COM objects


Cleanup(cells, workSheet, workBook, workBooks, app)

This produces the following results in Excel:

Note the little green triangles in the “C” column. Excel is telling us that the items in this column look like numbers,
but we have formatted them as text, which might be a mistake. It’s not a mistake, in this case, of course, but Excel
gives us the helpful hint, anyway.

Import First and Format Later?


As we have seen, the Excel API has a rich set of functions for formatting cells. So, since the FieldInfo argument of
the OpenText function is a bit complicated, it’s tempting to just import data in the simplest way possible, and then
reformat it later. Unfortunately, this doesn’t always work. In the example above, suppose we did not specify any
formatting information at all when calling OpenText. Our code would be quite a bit simpler. We wouldn’t need the
code to define the FieldInfo argument, and our code to do the import would just be:

workBooks.OpenText(path, DataType:= myType, Semicolon:=True)

But the result in Excel would be this:

Again, Excel has stored the part numbers as numbers, rather than text, so we have lost the last three digits, and no
subsequent reformatting operation will be able to recover them.

■ The XML Approach


Beginning with Office 2007, the file formats of Word, Excel and PowerPoint were changed completely. The new
format is conceptually very simple — a document is just a collection of XML files that are bundled together in a zip

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 9
container. The overall scheme is called OpenOffice XML, and the constituent pieces use formats called
SpreadsheetML, DrawingML, and so on. Microsoft provides a software library called the Open XML SDK containing
functions that make it easier to work with the XML data. For many scenarios, this is now the recommended way of
reading and writing MS Office documents.
One advantage of the XML-based approach is that it allows a spreadsheet document to be created and formatted
without using the Excel API functions, which means that it will work on a machine that has no access to Excel itself.
The same is true of the CSV-based approach, to some extent, although you will need to use Excel API functions if
you want to do any formatting or other operations.
<example>
http://blogs.msdn.com/b/brian_jones/archive/2008/11/04/document-assembly-solution-for-spreadsheetml.aspx

Getting Started with SNAP Chapter xx: Sharing Data with MS Office Page 10

You might also like