You are on page 1of 10

A PERL OBJECT FOR FLATTENED MASTERDETAIL DATA IN EXCEL

Author: Creation Date: Version: Last Updated: Brendan Furey 1 August 2011 1.1 2 August 2011

64256367.doc

Page 1 of 10

Table of Contents

Introduction.......................................................................................................3 Hardware/Software Summary......................................................................3 Package Specification.......................................................................................4 Object Structure..........................................................................................4


Data Model Diagram.......................................................................................... 4 Data Types........................................................................................................ 4 new (Constructor).............................................................................................. 4 listMaster........................................................................................................... 4 listDetail............................................................................................................ 5 listColumn.......................................................................................................... 5

Methods.......................................................................................................4

Code Listings.....................................................................................................6 XL_Array Package........................................................................................6 T_XL_Array Driver Script..............................................................................7 Example............................................................................................................8 Data.............................................................................................................8 Output.........................................................................................................9 References......................................................................................................10

Change Record
Date 01-Aug-2011 02-Aug-2011 Author BPF BPF Version 1.0 1.1 Change Reference Initial Code headers

64256367.doc

Page 2 of 10

Introduction
I was involved in a project to migrate data from several Oracle databases into a new system by means of staging tables having the same structure as the interface tables in the target system. The structure of these tables and the mappings from the various source systems were defined in tabs in a Microsoft Excel file. Most of the Oracle code for doing the migration was generated using two Perl scripts: A driving script and a package that I wrote for reading Excel data into arrays in a Perl Object with accessor methods. This article provides the Perl code for the object with high level specifications and an example of its use based on Oracles demonstration HR schema. The migration code itself is outside the scope of this article.

Hardware/Software Summary
Component Perl Excel Diagrammer Operating System Computer Description ActivePerl 5.12.4.1205, by ActiveState Software Inc. 2007 version, files in compatibility mode Microsoft Visio 2003 (11.3216.5606) Microsoft Windows 7 Home Premium (64 bit) Samsung 900X3A, 4GB memory, Intel I5-2537M @ 1.4GHz x 2

64256367.doc

Page 3 of 10

Package Specification
Object Structure
Data Model Diagram

Data Types Type Name Cell List Master Row List

Element Cell Value Master Row Index

Category Array Character Array Number Array

Description Cell value Index of new master row in the 2-dimensional expansion of the cell list Index of the master column in the 2dimensional expansion of the cell list Number of rows in the 2-dimensional expansion of the cell list Reference to array of cell values, columns stored sequentially Reference to array of the indexes of the new master rows in the 2-dimensional expansion of the cell list

Master Index Number of Rows XL Array Cell List Master Row List

Integer Integer Cell List Master Row List

Methods
new (Constructor) This method reads the Excel file and stores the relevant data in memory. Parameters Name XL File Worksheet Start Row Column List Master Column listMaster This method returns the distinct master values as an array. Type Character Integer Integer Array Reference Integer Notes Fully qualified Excel file name Worksheet number Starting row in the worksheet List of numbers of columns to include (Excel convention, starting with 1) Master column index in the above list (which starts at 0)

64256367.doc

Page 4 of 10

Parameters (none) listDetail This method returns the detail values for given master and column indexes as an array. Parameters Name Master Index Column Index listColumn This method returns all the values for a given column index as an array. Parameters Name Column Index Type Integer Notes Column index Type Integer Integer Notes Master index Column index

64256367.doc

Page 5 of 10

Code Listings
XL_Array Package
package XL_Array; use strict; use warnings; ################################################################################################## # # Author: Brendan Furey # Date: 2 August 2011 # Description: A Perl object for reading Excel data into arrays in with accessor methods. See: # 'A Perl Object for Flattened Master-Detail Data in Excel', www.sribd.com/BrendanP ################################################################################################### my ($num_rows, @cells, @master_indexes); sub new { my $class = shift; my $this = []; bless $this; $this->[0] = &_getArrays (@_); # $this->[1] = $num_rows; # $this->[2] = [@cells]; # $this->[3] = [@master_indexes]; # last one) undef (@cells); undef (@master_indexes); return $this; }

master index number of rows cells by columns master row indexes (first new value indexes, with a dummy

sub listMaster { # # This returns the list of master values, taking the master column, with rows as the master indexes # my $this = shift; my @master_indexes = @{$this->[3]}; return @{$this->[2]}[map {$_ + $this->[1] * $this->[0]} @master_indexes[0..$#master_indexes-1]]; } sub listDetail { # # return elements from $col for rows corresponding to the master record specified in $m_ind # my $this = shift; my ($m_ind, $col) = @_; my $offset_beg = $this->[1] * $col + @{$this->[3]}[$m_ind]; my $offset_end = $this->[1] * $col + @{$this->[3]}[$m_ind+1] - 1; return @{$this->[2]}[$offset_beg..$offset_end]; } sub listColumn { # # return elements from $col for all rows # my $this = shift; my $col = shift; my $offset_beg = $this->[1] * $col; my $offset_end = $this->[1] * ($col + 1); return @{$this->[2]}[$offset_beg..$offset_end]; } sub _getArrays { my ($xlfile, $wk_sheet, $start_row, $ref_col_list, $master_col) = @_; use Win32::OLE; # get already active Excel application or open new my $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit'); # open Excel file my $Book = $Excel->Workbooks->Open("$xlfile") or die Win32::OLE->LastError;; # select worksheet number $wk_sheet (you can also select a worksheet by name) my $Sheet = $Book->Worksheets($wk_sheet); my ($row, $i, $tab_old) = ($start_row, 0, 'X!Y?Z'); my $m_xl_col = @{$ref_col_list}[$master_col]; my $cell_value; while (defined $Sheet->Cells($row, $m_xl_col)->{'Value'}) { my $tab = $Sheet->Cells($row, $m_xl_col)->{'Value'}; if ($tab ne $tab_old) { $tab_old = $tab; 64256367.doc Page 6 of 10

push @master_indexes, $row - $start_row; } $row++; } push @master_indexes, $row - $start_row; $num_rows = $row - $start_row; foreach my $col (@{$ref_col_list}) { for (my $row = $start_row; $row < $start_row + $num_rows; $row++) { if (defined $Sheet->Cells ($row, $col)->{'Value'}) { $cell_value = $Sheet->Cells ($row, $col)->{'Value'}; } else { $cell_value = 'NULL'; } push @cells, $cell_value; } } $Book->Close; return $master_col; } 1;

T_XL_Array Driver Script


The driver script constructs an object for the given file, obtains the master values, which are HR tables here, then lists the three detail records for each master. Note that the script uses a timing object not described here, which is based on one that I did in Oracle PL/SQL (REF-1) but is an improved version in which the timers are named only on first use and indexes are not required.
use strict; use warnings; ################################################################################################## # # Author: Brendan Furey # Date: 2 August 2011 # Description: A simple test driving script for the Perl object for reading Excel data into arrays # with accessor methods. See: # 'A Perl Object for Flattened Master-Detail Data in Excel', www.sribd.com/BrendanP ################################################################################################### use TimerH; use XL_Array; my $mig_dir = "C:/Users/Brendan/Documents/Home"; my $xlfile = "$mig_dir/HR.xls"; my @cols = (1, 2, 3, 4); my $timer = TimerH->new (); my $xl = XL_Array->new ($xlfile, 1, 2, \@cols, 0); # tab 1, start row 2, master (array) col 0 $timer->incrementTime ('Construct'); my @master = ($xl->listMaster()); $timer->incrementTime ('Master'); for (my $i = 0; $i < @master; $i++) { print "$master[$i]\n"; my @cols = $xl->listDetail ($i, 1); my @typs = $xl->listDetail ($i, 2); my @lens = $xl->listDetail ($i, 3); $timer->incrementTime ("Details for $master[$i]"); individually $timer->incrementTime ("All Details"); details

# Increment time for each detail # Increment aggregate time for all

for (my $j = 0; $j < @cols; $j++) {printf ("\t%-30s\t%-10s\t%3s\n", $cols[$j], $typs[$j], $lens[$j])}; } $timer->writeTimes;

64256367.doc

Page 7 of 10

Example
Data
Table Name COUNTRIES COUNTRIES COUNTRIES DEPARTMENTS DEPARTMENTS DEPARTMENTS DEPARTMENTS EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMPLOYEES EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW EMP_DETAILS_VIEW JOBS JOBS JOBS JOBS JOB_HISTORY JOB_HISTORY JOB_HISTORY JOB_HISTORY JOB_HISTORY LOCATIONS LOCATIONS LOCATIONS LOCATIONS LOCATIONS LOCATIONS REGIONS REGIONS
64256367.doc

Column Name COUNTRY_ID COUNTRY_NAME REGION_ID DEPARTMENT_ID DEPARTMENT_NAME LOCATION_ID MANAGER_ID COMMISSION_PCT DEPARTMENT_ID EMAIL EMPLOYEE_ID FIRST_NAME HIRE_DATE JOB_ID LAST_NAME MANAGER_ID PHONE_NUMBER SALARY CITY COMMISSION_PCT COUNTRY_ID COUNTRY_NAME DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_ID FIRST_NAME JOB_ID JOB_TITLE LAST_NAME LOCATION_ID MANAGER_ID REGION_NAME SALARY STATE_PROVINCE JOB_ID JOB_TITLE MAX_SALARY MIN_SALARY DEPARTMENT_ID EMPLOYEE_ID END_DATE JOB_ID START_DATE CITY COUNTRY_ID LOCATION_ID POSTAL_CODE STATE_PROVINCE STREET_ADDRESS REGION_ID REGION_NAME

Data Type CHAR VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER NUMBER NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 DATE VARCHAR2 VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 NUMBER CHAR VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER NUMBER NUMBER DATE VARCHAR2 DATE VARCHAR2 CHAR NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER VARCHAR2

Lengt h 2 40 22 22 30 22 22 22 22 25 22 20 7 10 25 22 20 22 30 22 2 40 22 30 22 20 10 35 25 22 22 25 22 25 10 35 22 22 22 22 7 10 7 30 2 22 12 25 40 22 25
Page 8 of 10

Output
COUNTRIES COUNTRY_ID COUNTRY_NAME REGION_ID DEPARTMENTS DEPARTMENT_ID DEPARTMENT_NAME LOCATION_ID MANAGER_ID EMPLOYEES COMMISSION_PCT DEPARTMENT_ID EMAIL EMPLOYEE_ID FIRST_NAME HIRE_DATE JOB_ID LAST_NAME MANAGER_ID PHONE_NUMBER SALARY EMP_DETAILS_VIEW CITY COMMISSION_PCT COUNTRY_ID COUNTRY_NAME DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_ID FIRST_NAME JOB_ID JOB_TITLE LAST_NAME LOCATION_ID MANAGER_ID REGION_NAME SALARY STATE_PROVINCE JOBS JOB_ID JOB_TITLE MAX_SALARY MIN_SALARY JOB_HISTORY DEPARTMENT_ID EMPLOYEE_ID END_DATE JOB_ID START_DATE LOCATIONS CITY COUNTRY_ID LOCATION_ID POSTAL_CODE STATE_PROVINCE STREET_ADDRESS REGIONS REGION_ID REGION_NAME Timer ---------Construct Master Details for All Details Details for Details for Details for Details for Details for Details for Details for (Other) ---------Totals ---------CHAR VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER NUMBER NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 DATE VARCHAR2 VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 NUMBER CHAR VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER NUMBER NUMBER NUMBER DATE VARCHAR2 DATE VARCHAR2 CHAR NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER VARCHAR2 2 40 22 22 30 22 22 22 22 25 22 20 7 10 25 22 20 22 30 22 2 40 22 30 22 20 10 35 25 22 22 25 22 25 10 35 22 22 22 22 7 10 7 30 2 22 12 25 40 22 25

COUNTRIES DEPARTMENTS EMPLOYEES EMP_DETAILS_VIEW JOBS JOB_HISTORY LOCATIONS REGIONS

Elapsed CPU = User + System ---------- ---------- ---------- ---------1.11 0.13 0.11 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ---------- ---------- ---------- ---------1.11 0.13 0.11 0.02 ---------- ---------- ---------- ----------

64256367.doc

Page 9 of 10

References
REF REF-1 Document A Simple PL SQL Code Timing Object Location www.scribd.com/BrendanP

64256367.doc

Page 10 of 10

You might also like