You are on page 1of 72

Project Report

High Technologies Solutions


Project Submitted in partial fulfillment of the requirement of the award of the degree of

Master of Computer Application

Undertaken By Ana Das Institute of Informatics and Instrumentation University of Rajasthan Under the Guidance of Project Guide Mr. Avinash Choudhary

Plot No:
Email: Avinash.choudhary@hts.com

CERTIFICATE
This is to certify that the project Report entitled Data warehousing Migration (Capacity Boxing ) from High Technologies Solution, Gurgaon(Haryana) is submitted in partial fulfillment of requirement for the degree of Master of Computer Application, University Of Rajasthan , Jaipur(Raj.) The project work has been completed by Arpita Tillani, a bonafide student of, University of Rajasthan , Jaipur (Raj.) in High Technologies Solution from August 2007 to December 2007 under our supervision. It is further certified that this project has not been submitted earlier. During this training period her performance was found to be _______________.

Director

(Dr. P. R. Sharma)

ACKNOWLEDGEMENT

I express my sincere thanks to Inter Globe Technologies for providing me the opportunity to carry out my project work at Inter Globe Technologies,Gurgaon (Haryana) I acknowledge my deep gratitude to Mr. Rajendra Sureka, Project Manager ,who gave me the opportunity to carry out project work in this prestigious organization. I express my profound gratitude to MadhusudanReddy ,Avinash Choudhary for their constant support, guidance, encouragement and helping me learn many new things during my stay at Inter Globe Technologies My sincere regards, thanks and deep concern goes to my guide Mr. Avinash Choudhary(Teamleader),who advised and encouraged me at each step, leading me in the right direction. His incessant encouragement and invaluable technical support and guidance have been of immense help in realizing this project. I also extend our special thanks to our faculty Dr. P. R. Sharma, Mr. Hanuman Choudhary and other staff members who were always ready to extend their helping hands towards us. Their guidance was always fruitful to one.

Arpita Tillani Semester - VI

Table of Contents
Chapte r
1 2 3 3.1 3.2 4 4.1 4.2 5 5.1 5.2 5.3 6 7 7.1 7.2 8 9 10 11 12

Particulars
Corporate Profile Environment Specification Introduction of DataWarehouse Migration Objective of the System Modules Details Overview Of used Technologies INFORMATICA SQL Server Database Design Table Design Modules Dataflow Diagram Detailed Dataflow Diagram Snap Shorts Input Design/ Output Design Input Design Output Design System Testing and Validation System Implementation System Maintenance Scope of Future Enhancement Conclusion

Page No.
5 10 11 19 25 32 32 40 44 45 49 50 52 63 63 64 65 68 70 71 72

1. Corporate Profile
High Technologies Solution
High Technologies Solution (HTS) is a leading global travel technology company, with integrated IT & BPO offerings across the Travel domain. HTS provides unique domain & operational expertise to deliver solutions & services to airlines, travel distribution providers and large travel agencies. Headquartered in India, HTS has presence in the Americas, Europe, Australasia, Middle East and Africa.

HTS's Background
High Technologies Solution (HTS) is a joint venture between one of India's foremost travel groups- InterGlobe Enterprises and the leading worldwide global distribution service provider- Galileo. Ever since its inception in 1998, HTS has leveraged its unique combination of domain and operational expertise to offer best-of-breed IT and BPO services to the travel industry worldwide.

Services offered
Mainframe TPairline Distribution System, Departure Control System, Revenue Ticketing Implementation , Agency Back Office Automation ,Internet Booking Engine Agency Back Office Automation Travel, Technology Consulting ,Airline Contact Centre Travel Application Interfaces Web Services Enterprise Management Solution Rules Coding & Quality Check, Processing Handling Teletype Services Program Loyalty Low cost carrier (LCCs) or no frill airlines are fast filling the skies across the globe. The new generation low-cost carriers are mushrooming up at breakneck speed. Though the road has not been easy, the LCCs have succeeded in taking over a large part of the market In a market where inputs costs like fuel, manpower etc are on a rise, it is the technological innovation outsourcing on operational processes that help in providing the LCCs the edge & the support they need to compete in the respective market. HTSs Offerings to LCCs HTS as a Pure play travel industry solutions provider has profound travel domain knowledge & thorough understanding of the challenges & issues faced by low cost airlines. HTS has

been the preferred technology vendor of many well-known airlines from across the world. Airline Distribution System HTS offers a web based system that will helps the airlines to manage their inventory, flight schedules, fares and sales for their scheduled flight services. The system facilitates the bookings and ticket sales through multiple points of sales Departure Control System In addition to the ADS HTS also provides a web based Departure Control System that can bring about automation of all processes related to the airlines airport management operations. The designated users shall connect to the application via Virtual Private Network (VPN) to ensure integrity and security of mission critical data at all airports.
Our portfolio of prime products and superior services

Revenue Management System: The Revenue Management System provided by HTS will facilitate the creation of dynamic fares for the flights. The system will the facilitate creation of fare pools and assigning seats to defined buckets. The fares for each bucket are defined as per or percentage of the base fare. Cargo Management System: Cargo Management System (CMS) offered by HTS is a B2B application that allows cargo to be booked and tracked. The system acts as an online interface for application and can be used by all desired stakeholders. The system provides the airline cargo operation with reliable booking, inventory control and shipment tracking capabilities. BPO Services: HTS offers an array of voice and back office services that can help the airlines to achieve better efficiencies while keeping their costs in control. The range of services offered to low cost carriers include Ticket Reservations, Customer Service, Duplicate Booking, Schedule Change (Planned & Adhoc), Group Booking, Fraud Control, Fare Loading etc Customized Solutions & Other Services Apart from the above mentioned ready to use solutions, HTS also offers a gamut of customized IT & BPO services, including Development of Web Check-in & Handheld Check-in Dynamic Packaging of airline ticketing with Non Air Solutions, Load Planning Telemarketing ,Promotions Management
6

Mainframe TPF: With an increasing focus on cost-reduction on IT spend, even while retaining control on mission critical projects, airlines are finding that outsourcing production control, development and testing work on Mainframe TPF is today a critical imperative. With its rich experience of working on TPF-development environment on Mainframe operating system for more than 7 years, HTS has delivered more than 600 person-years of software services to its various clients in the application areas Ticketing Services, Subscriber, Teletype and Availability. As the leading Travel Technology focused organization with a very strong Mainframe TPF practice, HTS has extensive implementation and maintenance experience on TPF projects and is engaged in the implementation of highly mission-critical projects for GDS and Airline customers. HTS provides development, enhancement and maintenance services for highly mission critical Central Reservation Systems (CRS) / Airline Distribution System (ADS) Framework and has the largest skilled TPF / ALCS resource base in the country with a dedicated development centre for TPF / ALCS / MVS projects. HTS's Mainframe TPF/ALCS services include: TPF Consulting .With vast experience and domain expertise in the Travel Industry, HTS offers Consulting Services to Airlines to build a robust and competitive system. Software Development Services HTS offers its rich experience of having implemented end-to-end solutions for its clients like Galileo International and Saudi Arabian Airlines. Best practices of Project Management have helped HTS to deliver the best to the customer in terms of Quality, Cost and Time. Some Projects successfully delivered by HTS include - ATB Fujitsu Printers, Automated MCO's, XML Select Enhancements and E-Tickets and Interline.

Software Testing: Testing assures the organization of the quality and integrity of the mainframe solutions. With proven proficiencies in mainframe architecture, our testing team has successfully undertaken testing assignments for several prestigious clients. Some of the business critical Testing Projects undertaken by HTS are- Regression Testing for Ticketing

Services, Subscriber and Amtrak. The Software tools used to perform Testing include XML Select, Viewpoint, Focal point, and Galileo Technical Writing/ Documentation: They provide complete Documentation Solution and Disaster Recovery Plan. They have the ability to create Software manuals, instruction materials and other publications. Examples of projects undertaken include- Castle Grey Skull Ticketing, Fares and Pricing, XML Select version 2.1/2.2 and Automated Refunds and Exchanges Software Maintenance: Provides an offshore, trained, experienced and dedicated team for maintenance requests. Their maintenance team has fixed more than 1500 maintenance requests (ORBITS) in Ticketing Services, Subscriber, Sell seating and Teletype Domain. We have also taken up maintenance projects like Ticketing Input Editor, Rewrite (TIER), ADI / ADL.

DataWarehousing HTS assists airlines and the travel industry players to turn heaps of their unorganized information into business intelligence by using specialised tools that analyse and mine data; thus enabling users to make informed decisions and ensuring that the organisation is more efficient and competitive. We help our clients identify enterprise business intelligence information needs for today and into the future. Production Support HTS's production support service include: 1 .Monitoring the daily and monthly batch cycles and fixing the abend (abrupt end of batch cycle)if it occurs. 2.Updating the data in production flat files and DB2 tables to avoid occurrences of abend in production. 3.Analyzing the problems occurs in production and fixing them permanently by modifying the source code. 4.Enhancement and maintenance of above mentioned systems. HTS provides production support (24 X 7) to critical Revenue Data System of Galileo International. These systems run on MVS and developed in COBOL, PL1, FOCUS, and JCL. Data is stored in flat files, GDG, VSAM, and DB2.

1.4 Our Mission HTSsc ommmmitment to its shareholders is to Company Value - Accelerated revenue growth and profitability (20% PBT) - Enhance travel and hospitality industry expertise partnerships; - Institutionalize business development process - Improve client concentration HTSs commitment to its clients is Predictable Quality Deliverables - Consistent, high quality outputs - Value the appropriate balance of risk mitigation and cost - Deep knowledge of clients business and core technologies - Alignment of cultures how we partner to do business together HTSs commitment to its associates is to create a Fulfilling Work Environment - Competitive remuneration and benefits - Growth and development opportunities - Values based culture - Connected to a successful company ownership mentality

2.Environment Specification
HARDWARE SPECIFICATION

Hardware Configuration

Processor Memory RAM Hard Disc

: : :

Intel Pentium IV 1.7GHz 512 MB 40GB

SOFTWARE ENVIRONMENT

Software configuration Operating System Front-end Back-end Connectivity : : : : Windows 2000 Professional INFORMATICA SQL Server Using ODBC

10

3. Introduction of DataWarehouse Migration


"A Data Warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision-making process." The main purpose, a DWh has to support. It contains data and delivers it to executives as knowledge, they can built their decisions upon. The four named adjectives characterizing a DWh distinguish DWhs as informational systems from so called operational systems. A DWh is subject-oriented because the data it contains is structured in a way reflecting the business objects of the company (e.g. products, clients, sales). Operational systems on the other hand tend to be "organized around the applications of the company" e.g. databases handling all data relevant for booking passengers for flights 2. the integration, is the main characteristic of a DWh. A DWh contains data stemming from several sources (i.e. operational systems) which are spread all over the enterprise. These heterogeneous sources have to be integrated to access data in a uniform and clear way, i.e. all data has to be represented in an integrated way. Integration means, all data that is loaded into the DWh is transformed into a unique representation, e.g. no matter how the gender of persons is represented in several operational (source) systems (e.g. male/female, m/f, 0/1, X/Y etc.), one representation is selected and all others are transformed into this unique one 3. A DWh is a time-variant collection of data, i.e. it contains current data as well as historic data. In contrast, operational systems only contain up-to-date data, thus no trends are recognizable within such a system. The DWh contains a sequence of snapshots taken periodically from operational level data. 4. Nonvolatility of a DWh means, everything put into a DWh remains there in one way or another. Operational systems are highly volatile, i.e. records are frequently added, accessed, updated, or deleted
1.

11

A DWh is organized within at least two orthogonal dimensions, a dimension of time (see above) and a granularity dimension. Data loaded into the DWh from an operational system enters as up-to-date, detailed data (see figure 1). All detailed data can be aggregated under several criteria to yield lightly summarized data. These summaries can further be aggregated to yield highly summarized data, etc. E.g. daily sales could be stored at the detailed level (i.e. one snapshot of sales data is taken each day), the lightly summarized data represents weekly and the highly summarized data represents monthly aggregation. Thus several levels of granularity are stored in a DWh, although this produces some redundancy. Because of the enormous amounts of data stored in a DWh some analytical tasks only are computable within an acceptable time, if some required data is pre-aggregated. Since all data remains in the DWh it ages with time and simultaneously its importance and the chances of accessing decrease. The time horizon for a DWh (normally 5 to 10 years) is significantly longer than that for operational systems (normally 60 to 90 days). Despite the data's age it actually may be accessed in the future, so it stays in the DWh but moves to external (slower but cheaper) storage media, e.g. optical disks, tapes, or micro fiches, while the more interesting data is stored on direct access storage devices, e.g. hard disk. Even data stored in these external media is considered part of the DWh, because these data can be accessed for analyses, if needed. Besides raw and aggregated data a DWh contains metadata describing its contents, the sources of data, and the transformation procedures converting raw data into aggregated

12

data or source data into integrated, cleansed data. Metadata also serves as a navigation aid for the DWh-users, i.e. the data analysts. The analysts will consult metadata when planning data analyses. The DWh has been defined as a "collection of data" with the goal to support "decision making processes the DWh provides data for analyses which then support decision making.

Data Warehouse and Knowledge Management The DWh contribute to a company wide knowledge management. In fact, a DWh could serve as one main component in a knowledge management system. The data contained in a DWh represents a large part of a company's knowledge, e.g. the company's clients and their demographic attributes. The DWh represents an enterprise wide data collection, which is central and defines a common basis for several enterprise units accessing it. From the stored data new knowledge can be derived using technologies such as On-Line Analytical Processing (OLAP) or Knowledge Discovery in Databases (KDD). Data analyses may consist of several reporting and visualisation mechanisms of the data, presented on different levels of aggregation, from different angles (i.e. dimensions), and using different graphical types of diagrams. These reporting facilities can be exploited interactively using OLAP-technology. Through OLAP the data analyst is enabled to formulate queries and to decide on further queries depending on the outcome of his former queries. Data is stored in the DWh. The data analyst interprets parts of the data, which is represented in a way more adequate for human users. The process of interpreting data needs some knowledge and if the yielded information leads to decisions or actions performed by the management this information becomes knowledge. Another way of gaining knowledge out of the DWh's data are algorithms provided by Knowledge Discovery in Databases (KDD). These mostly mathematical and statistical methods are able to detect knowledge previously unknown to the owners of the data. "Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data." To be able to gain valid and useful patterns out of data, it is necessary for the underlying database to contain as less noise as possible. A DWh guarantees --through its integration mechanisms-- that all data is correct, so that data mining algorithms will work properly.

13

On the solutions produced by these algorithms the management may found its decisions upon. Star Schema and Snow Flake Schema

What is Star Schema?


Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data. Steps in designing Star Schema Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar).

Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies.

Glossary: Hierarchy A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.

14

Level A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels. Fact Table A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation. Example of Star Schema

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below. Sales Dollar value for a particular product Sales Dollar value for a product in a location Sales Dollar value for a product in a year within a location Sales Dollar value for a product in a year within a location sold or serviced by an employee

15

Snowflake Schema A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table(sales). In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space, Snowflake schema approach may be avoided. Example of Snowflake Schema

16

Fact Table The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures. Measure Types Additive - Measures that can be added across all dimensions. Non Additive - Measures that cannot be added across all dimensions. Semi Additive - Measures that can be added across few dimensions and not with others.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables. Steps in designing Fact Table Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar).

17

Example of a Fact Table with an Additive Measure in Star Schema: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is explained below. Sales Dollar value for a particular product Sales Dollar value for a product in a location Sales Dollar value for a product in a year within a location Sales Dollar value for a product in a year within a location sold or serviced by an employee

18

3.1 OBJECTIVES OF THE SYSTEM


In data warehousing migration we are shifting our databases which are maintained in teradata database previously to Micrsoft sql server by using Informatica platform

Capacity Boxing Enhancement to KM

Objectives
1. To pick up data from the existing GLM database, merge that data with CPU measurements from KMs insights data and pass the merged data to the IBM performance team. 2. To store the capacity boxing data in KM, making it available to the Galileo performance team though Cognos. As now our main center is in Denver and it has some kind of collaboration with Micrsoft So we are using Microsoft product MS SQL SERVER . as back end server for maintaining our databases and ETL(Extraction ,Transformation and Loading ),Informatica tools at Front end.

Background: Capacity Boxing is the name of a method used on the 1V and 1G host systems to throttle users (agencies, web sites) that invoke too much structured data traffic. A user might be throttled if the user fits into a box, with a box being defined as a list of up to 10 pseudos using up to 10 SD procedures. For each pseudo in a box (or is it the entire box?), they are only allowed a defined number of instances of the procs in the box to be active on a host processor (e.g. PRE-A) at any given time. Once that number of procs is active any subsequent procs in a box from pseudos in the same box are rejected. The host systems write a report record containing the information below every 30 minutes. This is stored in GLM. The 30 min frequency is variable and it will change.

Problem or Requirements: To pick up data from the existing GLM database, merge that data with CPU measurements from KMs insights data and pass the merged data to the IBM performance team.
19

To store the capacity boxing data in KM, making it available to the Galileo performance team though Cognos

This walkthrough pertains to mapping and workflow created for Capacity Boxing. Approach: We have assumed this as a hourly workflow. Get maximum RPT_TMSTMP (Datetime filed) from KM_WHSE.SDDBOXRJCTRPT table and write it to parameter ($ $CAPBOXDATE) file. Populate KM_WHSE.SDDBOXRJCTRPT table from GLM.SDDBOXRJCTRPT database where RPT_TMSTMP > $$CAPBOXDATE SCD2 will be triggered incase there is change in MAX value for a BOX_NBR based on SYS_CD and PROSSESOR_ID Load (Insert /Update) bridge table based on BOX_NBR, SYS_CD and PROCESSSOR_ID Populate Fact table with keys from source table with lookup on BOX_DIM and Bridge tables.

Design: 1. Mapping: m_create_parameter:

This mapping reads max(RPT_TMSTMP) date from KM_WHSE.SDDBOXRJCTRPT table and write to parameter file e.g. [Capacity_Boxing.s_m_populate_fact]

20

$$CAPBOXDATE='10/06/2007 23:59:12' 2. Mapping: m_load_SDDBOXRJCTRPT

This mapping loads data from GLM.SDDBOXRJCTRPT table with filter SDDBOXRJCTRPT.RPT_TMSTMP between $$CAPBOXDATE and dateadd(hh,1,$ $CAPBOXDATE) Loads data for one hour into KM_WHSE.SDDBOXRJCTRPT table. e.g. following data will be loaded by this mapping for BOX_NBR =9:

3. Mapping: m_box_dim_scd2

21

In SCD2 we will check for the change in MAX_MSG_CNT column for BOX_NBR based on HST_PROC_ID, TFP_SYS_ID and RPT_TMSTMP column. If MAX_MSG_CNT changes then BOX_DIM table will be updated with new values in BOX_END_DT column. For active MAX_MSG_CNT value will be 9999-12-31 00:00:00.000

4. Mapping: m_load_bridge_table

22

In this mapping we are doing lookup on the existing bridge tables and checking if record exists in the bridge tables or not. Incase records exists then we are updating records with the new values other wise we are inserting values. We are also using a SQL Cursor to write to a temporary table TMPSDDBOXRJCTRPT records for a Box. E.g.

For above record we should have 8 records in TMPSDDBOXRJCTRPT table.

Now after lookup following will be values in box_proc_brdg and box_psdo_brdg tables for box with max of 68: BOX_PSDO_BRDG

BOX_PROC_BRDG

5. Mapping: m_load_LNKCPNFACT

23

In this mapping we are loading data into LNKCAPBOXFACT table.

Workflow Name: wfk_CapBox_hourly

We need to look into following: Data populated by Bridge table e.g. do we need to keep history of Proc and PCC associated with BOX. For retrieval of data we have to run SQL by joining BOX_DIM , BOX_PSDO_BRDG, BOX_PROC_BRDG and LNKCAPBOXFACT table. We need to understand how we are going to merge TRANS_COST_FACT data with this data.

24

3.2 Module Details


Data Warehousing Migration Folder Name :Capacity Boxing 1. Box definition - Hourly 2. IBM Handoff -Weekly 3. Data maintenance - Monthly

Capacity Boxing Overview Steps to create mapping Step 1. Identify Dimensions: - we have four Dimensions (a) Box_dim (b) Psdo_dim (Lookup table) (c) Proc_dim (Lookup table) (d) Comp_sys_dim (Lookup table) Identify Facts: - LINKCAPBOXFACT Step 2. Identify Columns of Source table: - Index , Host, Date, BoxNo, Max, Rej0, Rej1, Rej2, Rej3, MsgCnt. Identify Columns of Target table: - CAP_Key, CRC_code, Processor_Id, Rej0, Rej1, Rej2, Rej3, Peak Step 3. In the First mapping we need to update Bridge tables (Box_Proc_Brdg and Box_Psdo_Brdg) by using Dynamic lookup on Dimension tables (Psdo_Dim and Proc_Dim).

25

Step 4. KM will get source from GLM Database. So source connection will be GLM and target connection will be KM_whse_ss. Here target will be LNKCAPBOXFACT. Step 5. Source table will have following columns Index, Host, Date, BoxNo, Max, Rej0, Rej1, Rej2, Rej3, MsgCnt. These columns we will get from the new existing tables in GLM database.

Step 6. We need to create a table (LNKCAPBOXFACT) by getting columns from GLM database and dimension tables like psdo_dim, Proc_dim and comp_sys_dim. Rest of the columns will be removed by join to the cap_box_dim table. Step 7. Now we will use SCD-II to source (cap_box_dim) table to reflect the changes in Definition of Capacity Box dimension(Cap_Box_Dim). Step 8. In the final mapping we will take cap_box_dim as a source table and Target table will be the Fact table LINKCAPBOXFACT. KM workflow overview

1. Daily workflow

26

In the daily workflow we are taking source data from new tables of existing GLM database. We are creating two Bridge tables which will be updated with the help of Dynamic lookup on two Dimension tables(Psdo_Dim and Proc_Dim). In the next step we are using SCD-II for Cap_Box_Dim which will be executed every time when there is a change in Max Value. Now we are creating a Fact table (LNKCAPBOXFACT) in which there will be a Cap_Key which is foreign key in this table and Primary key in Cap_box_Dim. This Cap_key in LNKCAPBOXFACT is a join on four columns (Max, Box_No, Pcc and Procs) in Cap_Box_Dim. We have to use a join on Comp_sys_dim table to get value for the column Crs_Code. The Rejects values will be coming in Source tables as a fact from GLM database. One more column we will add with the name Peak which will calculate the max peak in the duration of one hour. Finally we are running daily task which will load previous day data from Cap_Box_Dim to LNKCAPBOXFACT.

2. Weekly workflow
In the Weekly workflow we are loading Mondays 24 hours data to the target table in our Database. We will be calculating data on hourly basis.

3. Monthly workflow
In the Monthly workflow we will have a session which will load monthly data in Database and a task will delete data older than 40 days.

27

Daily Workflow

(GLM)
AIX

Wait for file from GLM datab ase

Update Dims (Box_pro c_Bridge and Box_Psd o_Bridge

Use SCDII on Cap_B ox_Di m

Run Daily Mappi ng


km_whsw_s s

Weekly Workflow

Task to calculate hourly changes in data on Monday and


Source Table (cap_box_dim)

Insert previous days data into KM_whse_ss database.


Target Table (lnkcapboxfact)

monthly corn job

Task which will delete Data older than 40 days.

28

TERADATA AND MICROSOFT SQL SERVER Teradata is a massively parallel processing system running a shared nothing architecture. The Teradata DBMS is linearly and predictably scalable in all dimensions of a database system workload (data volume, breadth, number of users, complexity of queries).[1] The scalability explains its popularity for enterprise data warehousing applications. Operating System Compatibility Teradata offers a choice of several operating systems:

NCR UNIX SVR4.2 MP-RAS, a variant of System V UNIX from AT&T Microsoft Windows 2000 and Windows Server 2003 SUSE Linux Enterprise Server on 64-bit Intel servers.

Teradata Enterprise Data Warehouses are often accessed via ODBC or JDBC by applications running on operating systems such as Microsoft Windows or flavors of UNIX. The warehouse typically sources data from operational systems via a combination of batch and trickle loads. Teradaota acts as a single data store that can accept large numbers of concurrent requests from multiple client applications. Significant features include: Unconditional parallelism, with load distribution shared among several servers. Complex ad hoc queries with up to 64 joins. Parallel efficiency, such that the effort for creating 100 records is same as that for creating 100,000 records. Scalability, so that increasing of the number of processors of an existing system linearly increases the performance. Performance thus does not deteriorate with an increased number of users MICROSOFT SQL SERVER Whether you're upgrading, migrating, or adopting an enterprise database application for the first time, Microsoft SQL Server 2005 offers retail businesses: Heightened system performance, scalability, and reliability A rich family of new business analytics and reporting tools A means to significantly increase operational productivity and profitability

29

Whether you're upgrading, migrating, or adopting an enterprise database application for the first time, Microsoft SQL Server 2005 offers retail businesses: Heightened system performance, scalability, and reliability A rich family of new business analytics and reporting tools A means to significantly increase operational productivity and profitability

Using Microsoft SQL Server as a tool for maintaining Databases We get following Benefits
:

Improved, faster application performance: "One of the biggest


immediate out-of-the-box performance benefits is in the area of user queries," Johnson says. "They're typically completed 20 percent faster." In addition, range queries, which let you manage and manipulate data in bulk, can be 50 to 100 percent faster with Microsoft SQL Server 2005. More streamlined, effective analysis and reporting processes: Equipped with an enhanced set of data collection and forecasting tools, many of which are new, retail businesses can better anticipate and react to sales and customer purchasing patterns. Doing so enables you to stock inventory accordingly from store to store, helping reduce the amount of unsold stock. Savings in time, costs, and human resources: Faster performance and improved data management can significantly reduce the amount of time your information technology (IT) staff needs to spend developing and maintaining your databases. Also, users can reach customer, order, and inventory data more quickly, meaning they're able to keep their focus on their jobs and not on using the software. Superior integration with third-party applications and tools:. This critical capability can help you extend the value of your existing applications, regardless of your underlying platform. SQL Server Integration Services (SSIS): This new tool can enable you to easily integrate and analyze data across a wide array of operational systems, giving you a more holistic understanding of your business. The ability to mine and quickly interpret data from multiple sources across the enterprise can save retail businesses countless human resource hours. Report Builder: Featuring a user-friendly interface with a look similar to Microsoft Office System programs, this new component of SQL Server 2005

30

Reporting Services makes it easy for employees to create, edit, and publish their own

reports. Analysis Services: Improved from Microsoft SQL Server 2000, this component is now a much more flexible and richer environment for data reporting, analysis, and mining, Johnson says. "You can now report on dozens if not hundreds of dimensions [of data tables] or attributes."

4. Introduction of INFORMATICA

31

4.1 INFORMATICA
Informatica suite is a powerful data warehousing development suite. It is the complete process of extraction of data from data sources,carrying out transformations and then loading the data into target database using informatica tools.As data extraction ,transformation and loading (ETL) is the most challenging task in the data warehouse development , a good understanding of the use of tool is required. Closing the Gap between Business and IT Informatica is committed to providing an infrastructure that closes the gap between organizational needs and IT delivery capacity. This commitment guides the products, solutions, and services we build: 1. To break the barriers that separate people from the data they need. 2. To break the barriers that separate IT from the owners of the data. 3. To build an infrastructure you can trust to support ever-growing demand.

Informatica products provide a set of seamlessly integrated tools built upon a single, unified data integration platform based on a service-oriented architecture (SOA). This platform consists of universal data access and a common set of metadata services, data services, infrastructure services, data quality services, and data integration services. Various Informatica Products are : InformaticaPowerCenter: allows companies and government organizations of all sizes to access and integrate data from virtually any business system, in any format, and deliver that data throughout the enterprise at any speed. This single, unified enterprise data integration platform addresses the challenges of data integration as a mission-critical, enterprise-wide solution to complex problems such as migrating off legacy systems, consolidating application instances, and synchronizing data across multiple operational systems Informatica PowerExchange : PowerExchange, in conjunction with the PowerCenter platform, helps organizations unlock mission-critical operational data and deliver it, on demand, to people and processes across the organization. Data can be extracted, converted, and filtered-without programming-and delivered in batch or in real time. PowerExchange supports dozens of data sources and targets, all available without intermediate staging of data, through an easy-to-use, vendor-neutral set of metadatadriven tools. Informatica Complex Data Exchange: provides a platform-independent infrastructure and tools that enable the automated transformation of complex data, including unstructured data (e.g. spreadsheets, documents, binary files, print streams),

32

semi-structured data (e.g. legacy formats such as COBOL, standards such as HIPAA, EDI, HL7, SWIFT), and complex structured data (e.g. ACORD, MISMO, or data in XML documents with deeply hierarchical and recursive structures). Informatica Data Explorer: Informatica Data Explorer provides a complete and completely accurate picture of enterprise data through an automated process called data profiling. This product alerts users to incompatibilities between the source and target and identifies issues that can cause downstream integration problems. Informatica Data Explorer helps ensure that there are no surprises, so valuable IT initiatives can proceed on time, on budget, with less risk. Informatica Data Quality: is specifically designed to put control of data quality processes into the hands of business professionals. With unparalleled ease-of-use, the software delivers powerful data quality profiling, cleansing, matching, and monitoring capabilities in a single solution. Data analysts and data stewards use the intuitive Informatica Data Quality interface to design, manage, deploy, and control individual and enterprise-wide data quality initiatives. Informatica Data Quality empowers business information owners to implement and manage effective and lasting data quality processes.

Informatica PowerCenter 8.5


Powering the Real-Time Integration Competency Center PowerCenter 8.5, the latest release from Informatica, is specifically designed to respond to the mission-critical needs of a real-time Integration Competency Center (ICC). Meeting enterprise demands for security, scalability, and performance, PowerCenter 8.5 serves as the ideal foundation for enterprise-wide data integration initiatives. Key Features Universal Data Access 1.Access more enterprise data types from a single platform, including: .Structured, unstructured, and semi-structured data .Relational, mainframe, file, and standards-based data Message queues 2.Use a single engine and framework for real-time, batch and CDC data sources 3.Leverage the PowerCenter Real Time Option, which processes data in real time, providing transaction-aware, non-stop execution optimized for real-time connectivity

33

4. Support JMS and other messaging systems, such as IBM MQ Series and TIBCO Rendezvous, by being able to publish and subscribe to these message queuing systems 5.Extend data access further by combining traditional physical and virtual data integration approaches in a single platform with the PowerCenter Data Federation Option 6.Take advantage of PowerCenter Advanced Edition (AE) Metadata Manager, which collects and links metadata from a wide variety of sources and provides rich metadata analysis and reporting capabilities, increasing visibility into where data comes from, where its going, and how its changed

Mission-Critical, Enterprise-Wide Data Integration 1.Handle mission-critical, enterprise-wide data integration with a single, unified platform 2. Provide an ideal foundation for data integration services in a service-oriented architecture 3.Meet the enterprises performance and scalability demands through high availability/failover/seamless recovery, support for grid computing, pushdown optimization, and dynamic partitioning New and Enhanced Features in PowerCenter 8.5 Better Access to Timely, Trusted Data PowerCenter 8.5 features new and enhanced capabilities for delivering real-time data to the enterprise and increasing trust in that data. Enhanced PowerCenter Real Time Option capabilities work together with PowerExchange to guarantee that messages are read, processed and delivered only once and in order, even in system failure scenarios Enhanced PowerCenter AE Metadata Manager delivers major usability improvements with intuitive, interactive data lineage, and advanced search, filtering and personalization capabilities More Secure, Scalable Platform PowerCenter 8.5 builds upon Informaticas heritage of supporting enterprise-wide, mission-critical data integration deployments with improved scalability and new security features designed for large, distributed teams Better Cross-Functional Productivity Collaboration for Increased

34

Enhanced and now unified user, groups, privileges, and roles management greatly simplifies administration across PowerCenter tools New Web services development wizard and enhanced testing and monitoring tools speed development, testing, deployment, and monitoring of data integration and data quality services New and enhanced pre-defined wizards and templates for mapping generation take the complexity out of common data integration scenarios, including slowly changing dimensions and incremental load Open, Platform-Neutral Architecture Informatica offers a data integration platform based on an open, platform-neutral architecture : Delivers return on all technology infrastructure investmentsincluding business process, messaging, portal, packaged application, and database systems Reduces total cost of ownership with design once, run anywhere capabilities Minimizes risk by eliminating conflicts with third-party software and hardware partners in todays heterogeneous IT environment Heterogeneous, Enterprise Grid Support The Informatica platform supports heterogeneous grid deployments to make the best use of existing resources. Through adaptive, dynamic load balancing, the platform scales intelligently. End-to-End, Integrated Data Quality The Informatica platform provides integrated, fully functional data quality capabilities to deliver proactive data quality management, a core component of any data integration initiative. These capabilities leverage the platforms universal data access, scalability and data services capabilities to support enterprise deployments and meet the needs of the business. Benefits or Advantages Provides Organizations with the Right Information, at the Right Time PowerCenter helps the business meet its analytical and operational needs for trusted, timely data delivered throughout the enterprise. PowerCenter raises organizations confidence in its data through enterprise-wide visibility into data definitions, lineage, and relationships. The platform minimizes compliance exposure by supporting monitoring and audit activities for data governance and stewardship. Providing comprehensive

35

access to high-quality, real-time data, PowerCenter helps organizations resolve businessto-IT data-related questions and make more timely business decisions. Meets Demands for Enterprise-Wide, Mission-Critical Deployment Meeting enterprise demands for security, scalability, and performance, PowerCenter serves as the ideal foundation for enterprise-wide data integration initiatives. PowerCenter reduces IT costs by minimizing development complexity and increasing productivity, accelerating time to delivery. The platform helps IT organizations costeffectively scale to meet increased data demand, save on hardware costs, and reduce the costs and risks associated with data downtime. PowerCenter also reduces the risk of security and privacy breaches with ICC-grade security features. Enhances Cross-Team Productivity and Cross-Functional Collaboration PowerCenter enables development teams to easily share and reuse work and results. Cross-functional teams that collaborate easily are more productive and efficient, which keeps IT project development and deployment costs down. Business Users With universal data access, the Informatica platform provides a consistent, accurate view of all enterprise data and delivers that data to the right place at the right time. With a comprehensive view of enterprise data, organizations improve their ability to respond quickly and effectively to changing business requirements IT Organizations The Informatica platform makes unparalleled performance, scalability, security, and high availability consistently available for the full range of data integration initiatives and enterprise data assets. IT organizations dont have to rewrite data integration applications to keep up with increasing data volumes and business demands. They can reduce costs by standardizing data transformation logic across all their different data integration initiatives. Developers and Global IT Teams Based on a unique metadata-driven architecture, the Informatica platform enables global development teams to easily reuse development assets across a wider range of platforms and projects. The platform provides a single set of tools that developers can use across all data integration projects, reducing training costs and accelerating ramp-up times.

Informatica suite contain five tools: 1.Repository server administrator console 2.Repository manager. 3.Designer. 4.Workflow manager.
36

5.Workflow monitor. Repository server administrator console is used to connect/disconnect to the repository server. Repository server is used to create/organize/manage the repository object like folders,users,configure permissions and privileges for users and groups. Designer is used to create mappings that contain transformation instructions for the Informatica server. Workflow manager is used to create and run workflows and tasks. Workflow monitor is used to monitor schedule and running workflows for each informatica server. Designer Windows The Designer consists of the following windows: Navigator. Use to connect to and work in multiple repositories and folders. You can also copy and delete objects and create shortcuts using the Navigator. Workspace. Use to view or edit sources, targets, mapplets, transformations, and mappings. You can work with a single tool at a time in the workspace. You can use the workspace default or workbook format. Status bar. Displays the status of the operation you perform. Output. Provides details when you perform certain tasks, such as saving your work or validating a mapping. Right-click the Output window to access window options, such as printing output text, saving text to file, and changing the font size. Overview. An optional window to simplify viewing workbooks containing large mappings or a large number of objects. Outlines the visible area in the workspace and highlights selected objects in color. To open the Overview window, choose ViewOverview Window. Instance Data. View transformation data while you run the Debugger to debug a mapping. For more information, see Using the Debugger Target Data. View target data while you run the Debugger to debug a mapping.

Source To extract data from a source, you must first define sources in the repository. You can import or create the following types of source definitions in the Source Analyzer: Relational tables, views, and synonyms Fixed-width and delimited flat files that do not contain binary data. COBOL files

37

XML files Data models using certain data modeling tools through Metadata Exchange for Data Models (an add-on product) Target Before we create a mapping, we must define targets in the repository. Use the Warehouse Designer to import and design target definitions. Target definitions include properties, such as column names and datatypes. We can create and maintain target definitions in the Warehouse Designer. We can also use the Warehouse Designer to create target tables in the target database. Transformations Overview: A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Transformations in a mapping represent the operations the Informatica Server performs on the data. Data passes into and out of transformations through ports that you connect in a mapping or mapplet. Mappings Overview: Mappings represent the data flow between sources and targets. When the Informatica Server runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Every mapping must contain the following components: Source definition. Describes the characteristics of a source table or file. Transformation. Modifies data before writing it to targets. Use different transformation objects to perform different functions. Target definition. Defines the target table or flat file. Connectors. Connect sources, targets, and transformations so the Informatica Server can move the data as it transforms it. Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the configured filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation.

38

Mapplets Overview: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as you need. You create mapplets in the Mapplet Designer. Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you have several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping.

39

4.2 SQL Server Overview


Microsoft SQL Server 2000 is a full-featured relational database management system (RDBMS) that offers a variety of administrative tools to ease the burdens of database development, maintenance and administration. In this article, we'll cover six of the more frequently used tools: Enterprise Manager, Query Analyzer, SQL Profiler, Service Manager, Data Transformation Services and Books Online. Let's take a brief look at each: 1. Enterprise Manager is the main administrative console for SQL Server installations. It provides you with a graphical "birds-eye" view of all of the SQL Server installations on your network. You can perform high-level administrative functions that affect one or more servers, schedule common maintenance tasks or create and modify the structure of individual databases. 2. Query Analyzer offers a quick and dirty method for performing queries against any of your SQL Server databases. It's a great way to quickly pull information out of a database in response to a user request, test queries before implementing them in other applications; create/modify stored procedures and execute administrative tasks. 3. SQL Profiler provides a window into the inner workings of your database. You can monitor many different event types and observe database performance in real time. SQL Profiler allows you to capture and replay system "traces" that log various activities. It's a great tool for optimizing databases with performance issues or troubleshooting particular problems. 4. Service Manager is used to control the MSSQLServer (the main SQL Server process), MSDTC (Microsoft Distributed Transaction Coordinator) and SQLServerAgent processes. An icon for this service normally resides in the system tray of machines running SQL Server. You can use Service Manager to start, stop or pause any one of these services. 5. Data Transformation Services (DTS) provide an extremely flexible method for importing and exporting data between a Microsoft SQL Server installation and a large variety of other formats. The most commonly used DTS application is the "Import and Export Data" wizard found in the SQL Server program group. SQL Server 2000 introduces updateable distributed views. This option allows SQL Server systems to share a logical database, thus increasing scalability. The logical database can become large, and you can spread it across many computers to increase its capacity. Updateable distributed views. Database Layout

40

An important part of designing the SQL Server system is laying out the database. This process involves the physical placement of transaction logs, data files, and so forth. This is one of the most important tasks involved in designing a SQL Server system because placement decisions are so difficult to reverse. Chapters 5 and 6 include tips on the physical placement of the transaction log and data files. Transaction Log The transaction log is critical to the operation, the stability, and the performance of the database server. Each database has its own transaction log; thus, each transaction log should be properly placed. The transaction log is used to record changes to the database, thus allowing the system to recover in the event of a failure. Because recovery relies on the transaction log, it is important that you use a RAID I/O device to protect this component of the database from possible faults. In the event of the loss of a disk drive, the transaction log should still be available. In addition to protecting the transaction log from disk failure, you should ensure that the transaction log is on a high-performance device. If the transaction log is too slow, transactions must wait, which drastically affects the performance of the system. The transaction log should also be configured as fault tolerant. These requirements are covered in more detail in the next chapter. Finally there must be sufficient space within the transaction log so that the system can run uninterrupted for a long period of time. If the transaction log fills up, all transaction processing ceases until space is freed up. Space is freed up by backing up the transaction log. However, backing up the transaction log can affect performance. Some DBAs prefer to create a sufficiently large transaction log so that it is necessary to back it up only once per hour or once per day. The transaction log should be sized to run for at least eight hours without having to be backed up. As you will learn later in this book, this is a simplification of the transaction log process. Data Files Data file placement is an entirely different process from transaction log placement. Depending on how the data files are accessed, you should place all of them on as many disks as possible, distributing the I/O load among all of the disk drives. You should size data files so that there is enough capacity to handle system growth. You will sometimes be surprised by how fast your database grows. As data grows, so do indexes. Periodically you should check your system and perform a sizing and capacity-planning exercise. So that you can plan the proper layout for the data files, the space should be calculated, performance needs should be assessed, and the proper number of disk drives should be created using a RAID subsystem. Whether or not fault tolerance is used will

41

depend on your specific needs. Once the I/O subsystem has been determined, the data files should be evenly spread across controllers and disk drives. Application A major part of your system is the application, which should be designed to perform well now and in the future. In this section, you will learn how to design an application with performance, scalability, and growth in mind. Architecture The basic architecture of an application can take one of many forms. The major differences between application architectures have to do with the number of systems involved in the application. This distinction is known as the number of tiers. Many of the most popular applications are advertised based on the number of tiers they comprise. One-Tier Architecture The one-tier, or single-tier, architecture is a system in which the database, application, and presentation services (the user interface) all reside on one system. This type of system does no processing external to the platform on which it is running. An example of single-tier architecture is a Microsoft Access database with local presentation services. o It is rare nowadays to find a substantial single-tier application, especially on a Windows 2000 platform. However, many smaller, single-user applications are single tier. Examples of this are Microsoft Money, Quicken, and TurboTax. These applications typically reside on the same system on which they are running. It is much harder to find an example that uses SQL Server. In fact, even though you can run Enterprise Manager on the same system that the database resides on, it isn't really a single-tier application because the application uses SQL Server networking components. The fact that you happen to be running them on the same system is irrelevant. Two-Tier Architecture o A two-tier application is one in which the presentation services and the database reside on different systems. The presentation services (user interface) layer usually includes application logic. A good example of a two-tier application is one that uses SQL Server Enterprise Manager. For this type of application, the user interface and the application logic reside in Enterprise Manager, but all of the data that the application uses to function resides in a SQL Server database on a different system. o Two-tier applications are common. You might have worked with many of these applications already. These applications are typically created in languages that support the Windows programming APIs, such as Microsoft Visual C++ or Visual

42

Basic. With a two-tier application, each user must have one or more connections into the SQL Server database. This architecture can be inefficient because most of those connections will be idle for most of the time.

Three-Tier Architecture o Three-tier applications separate the database layer, the application layer, and the presentation services layer into three distinct components. Typical three-tier applications use the middle layer to multiplex connections from the presentation services layer, which reduces the number of connections into SQL Server. In addition, the middle layer can perform a great deal of the business logic, leaving the database free to do what it does best: deliver data. o There is some debate over whether Web-based applications are two-tier or threetier applications. You can use this simple test: if the data presented in the presentation services layer could just as easily use a terminal or a Web browser, the application probably has two tiers.

43

5. Database Design 5.1 Tables Design


1.SDDBOXRJCTRPT Data Column Name Type Precision Scale RPT_ID decimal 19 RPT_TMSTMP datetime 23 TPF_SYS_ID char 3 ? HST_PROC_ID char 1 ? BOX_NBR int 10 RJCT_0 int 10 RJCT_1 int 10 RJCT_2 int 10 RJCT_3 int 10 MSG_CNT int 10 PEAK_CNT int 10 PCC_TXT varchar 100 ? PROC_TXT varchar 100 ? MAX_MSG_CNT int 10 MNT_TMSTMP datetime 23
2.Box_dim

Nullable 0 No 3 Yes No No 0 No 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes Yes Yes 0 Yes 3 No

Column Name

Data Type Precision Scale int Box_Key identity 10 Box_nbr int 10 Box_max int 10 Comp_Sys_cd int 10 Pros_ID varchar 1 ? Box_eff_dt datetime 23 Box_end_dt datetime 23

Nullable Comment 0 0 0 0 No Yes Yes Yes Yes 3 Yes 3 Yes ? ? ? ? ? ? ?

44

3. Box_proc_brdg

Column Name Box_key proc_key Box_nbr Box_eff_dt Box_end_dt

Data Type Precision Scale Nullable int 10 0 No int 10 0 No int 10 0 Yes datetime 23 3 Yes datetime 23 3 Yes

Comment ? ? ? ? ?

4. Box_psdo_brdg

Column Name Box_key psdo_key Box_nbr Box_eff_dt Box_end_dt

Data Type int int int datetime datetime

Precision 10 10 10 23 23

Scale 0 0 0 3 3

Nullable No No Yes Yes Yes

Comment ? ? ? ? ?

5. LNKCAPBOXFACT

Column Name RPT_TMSTMP Box_key comp_sys_key Processor_id Rej0 Rej1 Rej2 Rej3 Peak Trans_cnt
6. PROC_DIM

Data Type datetime int int char int int int int int int

Precision Scale Nullable 23 3 Yes 10 0 Yes 10 0 Yes 1 ? Yes 10 0 Yes 10 0 Yes 10 0 Yes 10 0 Yes 10 0 Yes 10 0 Yes

Comment ? ? ? ? ? ? ? ? ? ?

Column Name PROC_KEY PROC_CD PROC_DESC PROC_CATEG PROC_LKP_CD

Data Type int char varchar char char

Precision 10 12 30 10 8

Scale ? ? ? ?

Nullable 0 No No Yes No Yes

Comment ? ? ? ? ?

45

7. PSDO_DIM

Column Name PSDO_KEY CRS_CD PSDO_CTY_CD ARC_IATA_NBR AAT_ACCT_TYPE CHANNEL_ID TST_PSDO_IND PSDO_STAT_CD PSDO_STAT_DESC PSDO_EFF_DT PSDO_END_DT MNT_RQSTR_NM MNT_INSRT_DT MNT_UPD_DT

Data Type int char char char char char char char varchar datetime datetime varchar datetime datetime

Precision 10 2 4 9 1 6 1 1 30 23 23 30 23 23

Scale 0 ? ? ? ? ? ? ? ? 3 3 ? 3 3

Nullable Comment No ? No ? No ? No ? Yes ? Yes ? Yes ? Yes ? Yes ? No ? No ? Yes ? Yes ? Yes ?

8. COMP_SYS_DIM

Column Name COMP_SYS_KEY SYS_CD SYS_DESC SYS_TYPE CRS_KEY CRS_CD CRS_DESC

Data Type int char varchar char int char varchar

Precision 10 3 30 10 10 2 20

Scale ? ? ? ? ?

Nullable 0 No No Yes Yes 0 No No Yes

Comment ? ? ? ? ? ? ?

46

9. TRANS_COST_FACT

Column Name SYS_CRS_KEY PROC_KEY TRANS_DBLK_KEY KLR_KEY DIST_CHANNEL_KEY SEG_DAY_KEY SUBSCR_KEY HOUR# NO_OF_PROC NO_OF_TRANS_DB NO_OF_KLR ESTIM_COST PSDO_KEY TRANS_DT

Data Type int int int int int int int smallint int int int decimal int datetime

Precision 10 10 10 10 10 10 10 5 10 10 10 14 10 23

Scale 0 0 0 0 0 0 0 0 0 0 0 7 0 3

Nullable No No No No No No No No No Yes Yes Yes Yes No

Comment ? ? ? ? ? ? ? ? ? ? ? ? ? ?

10.DIST_CHANNEL_DIM

Data Column Name Type DIST_CHANNEL_KEY int SRC_ID char SRC_TYPE char VEND_ID char VEND_TYPE char PROD_DESC varchar PROD_CATEG varchar SPNSR_NM varchar SPNSR_TYPE varchar CHANNEL_ID varchar PROD_VAR varchar USR_GRP varchar USR_SUB_GRP varchar

Precision 10 6 1 4 1 40 20 30 20 20 20 20 20

Scale ? ? ? ? ? ? ? ? ? ? ? ?

Nullable 0 No No No No No Yes Yes Yes Yes Yes Yes Yes Yes

Comment ? ? ? ? ? ? ? ? ? ? ? ? ?

47

11. TRANS_DBLK_DIM

Data Column Name Type Precision Scale TRANS_DBLK_KEY int 10 TRANS_DBLK_CD char 5 ? TRANS_DBLK_MAJ_VER char 2 ? TRANS_DBLK_MIN_VER char 2 ? START_DT datetime 23 END_DT datetime 23 CRS_CD char 2 ? COST decimal 14 TRANS_DBLK_DESC varchar 36 ? TRANS_DBLK_CATEG char 10 ? STRUCT_FLG char 1 ? BILL_IND char 1 ?
12. ECB_HRLY_FACT Column Name COMP_SYS_KEY PROC_KEY TRANS_DBLK_KEY DIST_CHANNEL_KEY ECB_DT_KEY SUBSCR_KEY HOUR# NO_OF_PROC NO_OF_TRANS_DB NO_OF_ECB IO_AMNT CPU_AMNT NORM_PROC_CNT NORM_TRANS_DBLK_CNT NORM_ECB_CNT NORM_IO_AMT NORM_CPU_AMT AVG_MIPS PSDO_KEY AGCY_KEY ECB_DT Data Type int int int int int int smallint int int int int decimal int int int decimal decimal decimal int int datetime Precision 10 10 10 10 10 10 5 10 10 10 10 17 10 10 10 15 17 18 10 10 23

Nullable 0 No No No No 3 No 3 No No 7 Yes Yes Yes Yes Yes

Comment ? ? ? ? ? ? ? ? ? ? ? ?

Scale 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 3 6 0 0 3

Nullable No No No No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No

Comment ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

48

Data Flow Diagram 5.2 Module Data Flow Diagram


Main Flow

1. Box Definition - hourly

DataWarehousee Migration

2. IBM Handoff - Weekly

3. Data Maintenance Monthly

49

5.3 Detailed Data Flow Diagram

1. Box Definition hourly

Wfk_Capbox_hourly

Get date

Load SDDBOXRJCTRPT

Update Box_psdo_ brdg

Load LINK CAP BOXFACT

Use SCD2 tools

Update Box_psdo_ brdg

50 Solutions of Complaint

2.

IBM Handoff Weekly

Wkf_create_IBM_Handoff

Extract data from Ecb_Hrly_fact

Extract data from Trans_cost_fact

IBM-Handoff

3.

Data Maintenance M Monthly M

Report is Prepared By U Using Cognos U

51

6.Snap Shorts
1.

Main Workflow of Box Definition hourly

52

2.Mapping used to get date in session s_m_get_date

53

3.Mapping used to load SDDBOXRJCTRPT in session s_m_load_SDDBOXRJCTRPT

54

2.

Mapping used to detect changes using SCD2 tools is session s_m_load_scd2_max_box

55

5 Mapping used to update Box_psdo_brdg in session s_ m_load_new_PCC is :

56

6 Mapping used to update Box_proc_brdg in session s_ m_load_new_Proc is :

57

7.Mapping used to load LNKCAPBOXFACT in session s_m_load_ LNKCAPBOXFACT

58

8 Main Workflow of IBM Handoff Weekly

59

9. .Mapping used to extract data from Ecb_hrly_fact in session s_m_extract_ECB_data

60

10. .Mapping used to extract data from Trans_cost_fact in session s_m_extract_TRANSLOG_data

61

11. Loading data in IBM_handoff file in session s_m_create_IBM_HAND_OFF

62

7.Input Design/ Output Design

7.1 INPUT DESIGN


The main objectives of input design are to specify how information is put in to a form that is acceptable to the computer. The files and databases are maintained through the timely and accurate input of data. Volume of information, frequency and verification requirements are considered in the selection of input format. Another objective is to ensure that the input is acceptable and understandable to the user. Input design is the process of converting user originates input to a computer base format. The goal of designing input data is to make data entry as easy as possible and should be free from errors as possible. A well designed input will serve for common purposes o o o o To control workflow To reduce redundancies in recording data To increase clerical accuracy To allow easier checking of data

Inputs are important because in almost all instances, they are the only means of contact of a user or customer has with the system. Various forms are used to serve as data entry screens in order to input data in to the system. The user should key the data in the order in which it occurs on the form and the computer should reformat it as and when require

63

7.2 OUTPUT DESIGN


The primary consideration in the output design is to arrange the data in a form most convenient to the user. In dealing with output, the basic elements of a form, preprinted information and spaces to be filled by the user must again be considered. The standards for output design are given below o Give each output a specific name or title o State whether each output field is to include significant zeroes , spaces between fields and alphabetic or any other data o Specify the procedure for proving the accuracy of the output data Efficient, intelligent output design should improve the systems relationships with the user and help in decision making. A major form of output is a hardcopy from the printer. Printouts should be designed around the output requirements of the user. The output devices to consider depend on the factors such as compatibility of the device with the system, response time requirements and expected print quality.

64

8. System Testing and Validation


SYSTEM TESTING System testing is actually a series of different tests whose primary purpose is to fully exercise the computer-based system. Although each test has a different purpose, all work to verify that all system elements have been properly integrated and perform allocated functions. During testing WE tried to make sure that the product does exactly what is supposed to do. Testing is the final verification and validation activity within the organization itself. In the testing stage, WE try to achieve the following goals; to affirm the quality of the product, to find and eliminate any residual errors from previous stages, to validate the software as a solution to the original problem, to demonstrate the presence of all specified functionality in the product, to estimate the operational reliability of the system. During testing the major activities are concentrated on the examination and modification of the source code.

TESTING METHODOLOGIES
The following are the Testing Methodologies:
o

o o o o

Unit Testing. Integration Testing. User Acceptance Testing. Output Testing. Validation Testing.

Unit Testing Unit testing focuses verification effort on the smallest unit of Software design that is the module. Unit testing exercises specific paths in a modules control structure to ensure complete coverage and maximum error detection. This test focuses on each module individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing. Integration Testing Integration testing addresses the issues associated with the dual problems of verification and program construction. After the software has been integrated a set of high order tests are conducted. The main objective in this testing process is to take unit tested modules and builds a program structure that has been dictated by design.

65

User Acceptance Testing: User Acceptance of a system is the key factor for the success of any system. The system under consideration is tested for user acceptance by constantly in touch with the prospective system users at time of developing and making changes wherever required is done in regard to the following point: o Input Screen design o Output Screen design o Menu driven system Output Testing: After performing the validation testing, the next step is output testing of the proposed system, since no system could be useful if it does not produce the required output in the specified format. Asking the users about the format required by them tests the outputs generated or displayed by the system under consideration. Hence the output format is considered in 2 ways one is on screen and another in printed format. Validation Checking: Validation checks are performed on the following fields. Text Field: The text field can contain only the number of characters lesser than or equal to its size. The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry always flashes and error message. Numeric Field: The numeric field can contain only numbers from 0 to 9. An entry of any character flashes an error messages. The individual modules are checked for accuracy and what it has to perform. Each module is subjected to test run along with sample data. The individually tested modules are integrated into a single system. Testing involves executing the real data information is used in the program the existence of any program defect is inferred from the output. The testing should be planned so that all the requirements are individually tested. Preparation of Test Data Taking various kinds of test data does the above testing. Preparation of test data plays a vital role in the system testing. After preparing the test data the system under study is tested using that test data. While testing the system by using test data errors are again uncovered and corrected by using above testing steps and corrections are also noted for future use.
66

Using Live Test Data: Live test data are those that are actually extracted from organization files. After a system is partially constructed, programmers or analysts often ask users to key in a set of data from their normal activities. Then, the systems person uses this data as a way to partially test the system. In other instances, programmers or analysts extract a set of live data from the files and have them entered themselves. It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And, although it is realistic data that will show how the system will perform for the typical processing requirement, assuming that the live data entered are in fact typical, such data generally will not test all combinations or formats that can enter the system. This bias toward typical values then does not provide a true systems test and in fact ignores the cases most likely to cause system failure. Using Artificial Test Data: Artificial test data are created solely for test purposes, since they can be generated to test all combinations of formats and values. In other words, the artificial data, which can quickly be prepared by a data generating utility program in the information systems department, make possible the testing of all login and control paths through the program.

67

9.System Implementation
Implementation is the stage of the project where the theoretical design is turned into a working system. At this stage the main work load, the greatest upheaval and the major impact on the existing system shifts to the user department. If the implementation is not carefully planned a controlled it can cause chaos and confusion. Implementation includes all those activities that take place to convert from the old system to the new one. The new system may be totally new, replacing an existing manual or automated system or it may be a major modification to an existing system. Proper implementation is essential to provide a reliable system to meet the organization requirements. Successful implementation may not guarantee improvement in the organization using the new system, but improper installation will prevent it. The process of putting the developed system in actual use is called system implementation. This includes all those activities that take place to convert from the old system to the new system. The system can be implemented only after thorough testing is done and if it is found to be working according to the specifications. The system personnel check the feasibility of the system. The most crucial stage is achieving a new successful system and giving confidence on the new system for the user that it will work efficiently and effectively. It involves careful planning, investigation of the current system and its constraints on implementation, design of methods to achieve the changeover. The more complex the system being implemented, the more involved will be the system analysis and the design effort required just for implementation. The system implementation has three main aspects. They are education and training, system testing and changeover. The implementation stage involves following tasks. o o o o o Careful planning. Investigation of system and constraints. Design of methods to achieve the changeover. Training of the staff in the changeover phase. Evaluation of the changeover method.

The method of implementation and the time scale to be adopted are found out initially. Next the system is tested properly and the same time users are trained in the new procedures. IMPLEMENTATION PROCEDURES Implementation of software refers to the final installation of the package in its real environment, to the satisfaction of the intended users and the operation of the system. In many organizations some one who will not be operating it, will commission the software
68

development project. The people who are not sure that the software is meant to make their job easier In the initial stage, they doubt about the software but we have to ensure that the resistance does not build up as one has to make sure that o The active user must be aware of the benefits of using the system o Their confidence in the software is built up o Proper guidance be imparted to the user so that he is comfortable in using the application. Before going ahead and viewing the system, the user must know that for viewing the result, the server program should be running in the server. If the server object is not up running on the server, the actual processes will not take place.

69

10.System Maintenance
The maintenance phase of the software cycle is the time in which a Software product performs useful work. After a system is successfully implemented, it should be maintained in a proper manner. System maintenance is an important aspect in the software development life cycle. The need for system maintenance is for it to make adaptable to the changes in the system environment. There may be social, technical and other environmental changes, which affect a system, which is being implemented. Software product enhancements may involve providing new functional capabilities, improving user displays and mode of interaction, upgrading the performance characteristics of the system. So only through proper system maintenance procedures, the system can be adapted to cope up with these changes. Software maintenance is of course, far more than finding mistakes. We may define maintenance by describing four activities that are undertaken to after a program is released for use. The first maintenance activity occurs because it is unreasonable to assume that software testing will uncover all latent errors in a large software system. During the use of any large program, errors will occur and be reported to the developer. The process that includes the diagnosis and correction of one or more errors is called corrective maintenance. The second activity that contributes to a definition of maintenance occurs because of the rapid change that is encountered in every aspect of computing. Therefore, adaptive maintenance- an activity that modifies software to properly interfere with a changing environment is both necessary and commonplace. The third activity that may be applied to a definition of maintenance occurs when a software package is successful. As the software is used, recommendations for new capabilities, modifications to existing functions, and general enhancements are received from users. To satisfy requests in this category, perfective maintenance is performed. This activity accounts for the majority of all effort expended on software maintenance The fourth maintenance activity occurs when software is changed to improve future maintainability or reliability, or to provide a better basis for future enhancements. Often called preventive maintenance, this activity is characterized by reverse engineering and re-engineering techniques.

70

11.Scope for Further Enhancement


In order for IBM Capacity Planning to properly forecast, report, and investigate TPF capacity we require the following;
-

A daily* flat file (CSV file) of the Box report to contain the following tags: INDEX, CPU(PID), BoxNo, Date(When), Rej0-Rej3, MsgCount o Report to include each five(5) minute observations period not just occurrences where the throttle was active o This file will come from the Travelport GLM server and either be pushed or pulled to the IBM YODA server Available by the start of the following business day IBM will then have the file transferred from the YODA server to MVS/TSO

*The Box report file may need to be pushed/pulled to the YODA server on a more frequent basis (a possible TPF operations requirement).

A weekly flat file (CSV file) containing the measurement (MIPS or Instructions) of SDD PROCs by PCC and BoxNo (path length per SDD PROC by PCC and BoxNo) which includes: o Only Mondays data o 24 hours of data rolled up into 60 minute averages o This file will come from the Travelport GLM server and either be pushed or pulled to the IBM YODA server IBM will then have the file transferred from the YODA server to MVS/TSO o To be provided by EOD every Thursday

These two data feeds will be used to create a SAS database in which the data can be manipulated and reports can be produced ---------------------------------------------------------------------------------------The email from IBM will specify the handoff from KM to IBM as A weekly flat file (CSV file) containing the measurement (MIPS or Instructions) of SDD PROCs by PCC and BoxNo (path length per SDD PROC by PCC and BoxNo) which includes:

Only Mondays data 24 hours of data rolled up into 60 minute averages (on the half hour - 00:00 and 00:30) This file will come from the Travelport GLM server and either be pushed or pulled to the IBM YODA server IBM will then have the file transferred from the YODA server to MVS/TS

71

12.Conclusion
The objective of this project was proper migration from Teradata to Microsoft SQL Sserver .The system developed is able to meet all the basic requirements. The security of the system is also one of the prime concerns. There is always a room for improvement in any software, however efficient the system may be. The important thing is that the system should be flexible enough for future modifications. The system has been factored into different modules to make system adapt to the further changes. Every effort has been made to cover all user requirements and make it user friendly.

Goal achieved: The System is able provide the interface to the user so that he can replicate his desired data. . User friendliness: Though the most part of the system is supposed to act in the background, efforts have been made to make the foreground interaction with user as smooth as possible.

72