Author: Creation Date: Version: Last Updated: Brendan Furey 1 August 2011 1.1 2 August 2011
64256367.doc
Page 1 of 10
Table of Contents
Methods.......................................................................................................4
Code Listings.....................................................................................................6 XL_Array Package........................................................................................6 T_XL_Array Driver Script..............................................................................7 Example............................................................................................................8 Data.............................................................................................................8 Output.........................................................................................................9 References......................................................................................................10
Change Record
Date 01-Aug-2011 02-Aug-2011 Author BPF BPF Version 1.0 1.1 Change Reference Initial Code headers
64256367.doc
Page 2 of 10
Introduction
I was involved in a project to migrate data from several Oracle databases into a new system by means of staging tables having the same structure as the interface tables in the target system. The structure of these tables and the mappings from the various source systems were defined in tabs in a Microsoft Excel file. Most of the Oracle code for doing the migration was generated using two Perl scripts: A driving script and a package that I wrote for reading Excel data into arrays in a Perl Object with accessor methods. This article provides the Perl code for the object with high level specifications and an example of its use based on Oracles demonstration HR schema. The migration code itself is outside the scope of this article.
Hardware/Software Summary
Component Perl Excel Diagrammer Operating System Computer Description ActivePerl 5.12.4.1205, by ActiveState Software Inc. 2007 version, files in compatibility mode Microsoft Visio 2003 (11.3216.5606) Microsoft Windows 7 Home Premium (64 bit) Samsung 900X3A, 4GB memory, Intel I5-2537M @ 1.4GHz x 2
64256367.doc
Page 3 of 10
Package Specification
Object Structure
Data Model Diagram
Description Cell value Index of new master row in the 2-dimensional expansion of the cell list Index of the master column in the 2dimensional expansion of the cell list Number of rows in the 2-dimensional expansion of the cell list Reference to array of cell values, columns stored sequentially Reference to array of the indexes of the new master rows in the 2-dimensional expansion of the cell list
Master Index Number of Rows XL Array Cell List Master Row List
Methods
new (Constructor) This method reads the Excel file and stores the relevant data in memory. Parameters Name XL File Worksheet Start Row Column List Master Column listMaster This method returns the distinct master values as an array. Type Character Integer Integer Array Reference Integer Notes Fully qualified Excel file name Worksheet number Starting row in the worksheet List of numbers of columns to include (Excel convention, starting with 1) Master column index in the above list (which starts at 0)
64256367.doc
Page 4 of 10
Parameters (none) listDetail This method returns the detail values for given master and column indexes as an array. Parameters Name Master Index Column Index listColumn This method returns all the values for a given column index as an array. Parameters Name Column Index Type Integer Notes Column index Type Integer Integer Notes Master index Column index
64256367.doc
Page 5 of 10
Code Listings
XL_Array Package
package XL_Array; use strict; use warnings; ################################################################################################## # # Author: Brendan Furey # Date: 2 August 2011 # Description: A Perl object for reading Excel data into arrays in with accessor methods. See: # 'A Perl Object for Flattened Master-Detail Data in Excel', www.sribd.com/BrendanP ################################################################################################### my ($num_rows, @cells, @master_indexes); sub new { my $class = shift; my $this = []; bless $this; $this->[0] = &_getArrays (@_); # $this->[1] = $num_rows; # $this->[2] = [@cells]; # $this->[3] = [@master_indexes]; # last one) undef (@cells); undef (@master_indexes); return $this; }
master index number of rows cells by columns master row indexes (first new value indexes, with a dummy
sub listMaster { # # This returns the list of master values, taking the master column, with rows as the master indexes # my $this = shift; my @master_indexes = @{$this->[3]}; return @{$this->[2]}[map {$_ + $this->[1] * $this->[0]} @master_indexes[0..$#master_indexes-1]]; } sub listDetail { # # return elements from $col for rows corresponding to the master record specified in $m_ind # my $this = shift; my ($m_ind, $col) = @_; my $offset_beg = $this->[1] * $col + @{$this->[3]}[$m_ind]; my $offset_end = $this->[1] * $col + @{$this->[3]}[$m_ind+1] - 1; return @{$this->[2]}[$offset_beg..$offset_end]; } sub listColumn { # # return elements from $col for all rows # my $this = shift; my $col = shift; my $offset_beg = $this->[1] * $col; my $offset_end = $this->[1] * ($col + 1); return @{$this->[2]}[$offset_beg..$offset_end]; } sub _getArrays { my ($xlfile, $wk_sheet, $start_row, $ref_col_list, $master_col) = @_; use Win32::OLE; # get already active Excel application or open new my $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit'); # open Excel file my $Book = $Excel->Workbooks->Open("$xlfile") or die Win32::OLE->LastError;; # select worksheet number $wk_sheet (you can also select a worksheet by name) my $Sheet = $Book->Worksheets($wk_sheet); my ($row, $i, $tab_old) = ($start_row, 0, 'X!Y?Z'); my $m_xl_col = @{$ref_col_list}[$master_col]; my $cell_value; while (defined $Sheet->Cells($row, $m_xl_col)->{'Value'}) { my $tab = $Sheet->Cells($row, $m_xl_col)->{'Value'}; if ($tab ne $tab_old) { $tab_old = $tab; 64256367.doc Page 6 of 10
push @master_indexes, $row - $start_row; } $row++; } push @master_indexes, $row - $start_row; $num_rows = $row - $start_row; foreach my $col (@{$ref_col_list}) { for (my $row = $start_row; $row < $start_row + $num_rows; $row++) { if (defined $Sheet->Cells ($row, $col)->{'Value'}) { $cell_value = $Sheet->Cells ($row, $col)->{'Value'}; } else { $cell_value = 'NULL'; } push @cells, $cell_value; } } $Book->Close; return $master_col; } 1;
# Increment time for each detail # Increment aggregate time for all
for (my $j = 0; $j < @cols; $j++) {printf ("\t%-30s\t%-10s\t%3s\n", $cols[$j], $typs[$j], $lens[$j])}; } $timer->writeTimes;
64256367.doc
Page 7 of 10
Example
Data
Table Name COUNTRIES COUNTRIES COUNTRIES DEPARTMENTS DEPARTMENTS DEPARTMENTS DEPARTMENTS EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW JOBS JOBS JOBS JOBS JOB_HISTORY JOB_HISTORY JOB_HISTORY JOB_HISTORY JOB_HISTORY LOCATIONS LOCATIONS LOCATIONS LOCATIONS LOCATIONS LOCATIONS REGIONS REGIONS
64256367.doc
Column Name COUNTRY_ID COUNTRY_NAME REGION_ID DEPARTMENT_ID DEPARTMENT_NAME LOCATION_ID MANAGER_ID COMMISSION_PCT DEPARTMENT_ID EMAIL EMPLOYEE_ID FIRST_NAME HIRE_DATE JOB_ID LAST_NAME MANAGER_ID PHONE_NUMBER SALARY CITY COMMISSION_PCT COUNTRY_ID COUNTRY_NAME DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_ID FIRST_NAME JOB_ID JOB_TITLE LAST_NAME LOCATION_ID MANAGER_ID REGION_NAME SALARY STATE_PROVINCE JOB_ID JOB_TITLE MAX_SALARY MIN_SALARY DEPARTMENT_ID EMPLOYEE_ID END_DATE JOB_ID START_DATE CITY COUNTRY_ID LOCATION_ID POSTAL_CODE STATE_PROVINCE STREET_ADDRESS REGION_ID REGION_NAME
Data Type CHAR VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER NUMBER NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 DATE VARCHAR2 VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 NUMBER CHAR VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER NUMBER NUMBER DATE VARCHAR2 DATE VARCHAR2 CHAR NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER VARCHAR2
Lengt h 2 40 22 22 30 22 22 22 22 25 22 20 7 10 25 22 20 22 30 22 2 40 22 30 22 20 10 35 25 22 22 25 22 25 10 35 22 22 22 22 7 10 7 30 2 22 12 25 40 22 25
Page 8 of 10
Output
COUNTRIES COUNTRY_ID COUNTRY_NAME REGION_ID DEPARTMENTS DEPARTMENT_ID DEPARTMENT_NAME LOCATION_ID MANAGER_ID EMPLOYEES COMMISSION_PCT DEPARTMENT_ID EMAIL EMPLOYEE_ID FIRST_NAME HIRE_DATE JOB_ID LAST_NAME MANAGER_ID PHONE_NUMBER SALARY EMP_DETAILS_VIEW CITY COMMISSION_PCT COUNTRY_ID COUNTRY_NAME DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_ID FIRST_NAME JOB_ID JOB_TITLE LAST_NAME LOCATION_ID MANAGER_ID REGION_NAME SALARY STATE_PROVINCE JOBS JOB_ID JOB_TITLE MAX_SALARY MIN_SALARY JOB_HISTORY DEPARTMENT_ID EMPLOYEE_ID END_DATE JOB_ID START_DATE LOCATIONS CITY COUNTRY_ID LOCATION_ID POSTAL_CODE STATE_PROVINCE STREET_ADDRESS REGIONS REGION_ID REGION_NAME Timer ---------Construct Master Details for All Details Details for Details for Details for Details for Details for Details for Details for (Other) ---------Totals ---------CHAR VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER NUMBER NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 DATE VARCHAR2 VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 NUMBER CHAR VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER NUMBER NUMBER DATE VARCHAR2 DATE VARCHAR2 CHAR NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER VARCHAR2 2 40 22 22 30 22 22 22 22 25 22 20 7 10 25 22 20 22 30 22 2 40 22 30 22 20 10 35 25 22 22 25 22 25 10 35 22 22 22 22 7 10 7 30 2 22 12 25 40 22 25
Elapsed CPU = User + System ---------- ---------- ---------- ---------1.11 0.13 0.11 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ---------- ---------- ---------- ---------1.11 0.13 0.11 0.02 ---------- ---------- ---------- ----------
64256367.doc
Page 9 of 10
References
REF REF-1 Document A Simple PL SQL Code Timing Object Location www.scribd.com/BrendanP
64256367.doc
Page 10 of 10