This action might not be possible to undo. Are you sure you want to continue?
TCS understands that the current system of Lousiana Department of Revenue (LDR) has different applications like tax administration,budget development, and performance measurement which reside in different relational database like DB2, SQL Server, MS Access and non relational data sources (sequential, VSAM files) All the Operational data is lying in different isolated organisational silos. The information for various parts of the business are held either within the division or/and the Data Warehouse. The LDR captures administrative and general information. This data could be used for various purposes but currently being under-utilised in absence of a centralised architecture and integrated information approach. There is an absence of information governance with regards to access to data within the data warehouse. The Data Quality within some source systems is not up to the mark. The LDR aims to build a highly accessible, intuitive, single point of access to the information contained within the LDR’s various applications. In essence the different application data needs to be integrated into this common Enterprise wide structure to model and support the business management process. The data pertaining to all organizational entities and stakeholders will be held at the lowest level of granularity possible to maximize the number and type of management information requirements that can be satisfied. The DWH will initially source from within its own systems but in the future will look to enhance this data with third party information providers. The future requirements of the client cannot be met by the present system.
To build a single and coherent common information platform for LDR, TCS proposes a solution based on SQL Server 2005 at the back end and utilising the, COGNOS Reporting Services and Integration Services bundles of SQL Server 2005 to aid in the enterprise wide reporting as a medium to make this information available to the end users, The figure below depicts the overall architecture for the solution:
Microsoft SQL Server 2005 Integration Services (SSIS) will be used for extraction, transformation, and loading (ETL) of data from various data sources to the staging database and from there to enterprise data warehouse. Cognos 8 Framework Manager/Report Studio will be used to build/generate different kinds of prepared reports from Data Mart/Enterprise Data warehouse. Task flow and data flow engine SSIS consists of both an operations-oriented task-flow engine as well as a scalable and fast data-flow engine. The data flow exists in the context of an overall task flow. It is the task-flow engine that provides the runtime resource and operational support for the data-flow engine. This combination of task flow and data flow enables SSIS to be effective in l ETL or data warehouse (DW). Pipeline architecture At the core of SSIS is the data transformation pipeline. This pipeline has a bufferoriented architecture that is extremely fast at manipulating rowsets of data once they have been loaded into memory. The approach is to perform all data transformation steps of the ETL process in a single operation without staging data,
hardware requirements need to be addressed. Nevertheless, for
maximum performance, the architecture avoids staging. Even copying the data in memory is avoided as far as possible. With SSIS, all types of data (structured, unstructured, XML, etc.) are converted to a tabular (columns and rows) structure before being loaded into its buffers. Any data operation that can be applied to tabular data can be applied to the data at any step in the data-flow pipeline. This means that a single data-flow pipeline can integrate diverse sources of data and perform arbitrarily complex operations on these data without having to stage the data. It should also be noted though, that if staging is required for business or operational reasons, SSIS has good support for these implementations as well. SSIS for Datawarehouse loading SSIS can consume data from (and land data into) a variety of sources including OLE DB, managed (ADO.NET), ODBC, flat file, Excel, and XML using a specialized set of components called adapters. SSIS can even consume data from custom data adapters (developed in-house or by third parties). This allows the wrapping of legacy data loading logic into a data source that can be seamlessly consumed in the SSIS data flow. SSIS includes a set of powerful data transformation components that allow data manipulations that are essential for building data warehouses. These transformation components include: Aggregate: Performs multiple aggregates in a single pass. Sort: Sorts data in the flow. Lookup: Performs flexible cached lookup operations to reference datasets. Pivot and UnPivot: Two separate transformations Merge, Merge Join, and UnionAll: Can perform join and union operations. Derived Column: Performs column-level manipulations such as string, numeric, date/time, etc. operations, and code page translations. Data Conversion: Converts data between various types (numeric, string, etc.). Audit: Adds columns with lineage metadata and other operational audit data. Programmability In addition to providing a professional development environment, SSIS exposes all its functionality via a set of rich APIs. These APIs are both managed (.NET Framework) and native (Win32) and allow developers to extend the functionality of SSIS by developing custom components in any language supported by the .NET
Framework (such as Visual C#, Visual Basic .NET, etc.) and C++. These custom components can be work-flow tasks and data-flow transformations (including source and destination adapters). This allows legacy data and functionality to be easily included in SSIS integration processes, allowing the past investments in legacy technologies to be effectively leveraged. It also allows easy inclusion of third-party components. Extensibility The extensibility is not only limited to re-usable custom components but also includes script-based extensibility. SSIS has script components both for task flow as well as for data flow. These allow users to write scripts in Visual Basic. NET to add ad hoc functionality (including data sources and destinations) and to re-use any preexisting functionality packaged as .NET Framework assemblies. SSIS is fully programmable, embeddable and extensible as it has a great development environment hosted in a visual studio shell with capabilities of building workflow and pipelines through a rich set of pre-built or custom components.
The proposed solution will be built using SQL Server 2005 at the back end, with COGNOS as a front end tool for and Integration Services provided by SQL Server aid in extraction of data. Design of the Datawarehouse The data warehouse designing should aid in organizing large amounts of stable data for ease of analysis and retrieval which is rapid access to information for analysis and reporting. Dimensional modeling is used in the design of data warehouse databases to organize the data for efficiency of queries that are intended to analyze and summarize large volumes of data. The data updates consist primarily of periodic additions of new data. The design must be centralized so that all of the organization's data warehouse information is consistent and usable. A data warehouse with star schema needs to be designed .This will comprise of the data required for reporting and analysis to different users in the organization. In a star schema, each dimension table has a single-part primary key that links to one part of the multipart primary key in the fact table. In a snowflake schema, one or more dimension tables are decomposed into multiple tables with the subordinate dimension tables joined to a primary dimension table instead of to the fact table. In
most designs, star schemas are preferable to snowflake schemas because they involve fewer joins for information retrieval and are easier to manage. Fact Tables Each data warehouse or data mart includes one or more fact tables. Central to a star or snowflake schema, a fact table captures the data that measures the organization's business operations. A fact table contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. It contains large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization. Fact tables should not contain descriptive information or any data other than the numerical measurement fields Aggregation Tables Aggregation tables are tables that contain summaries of fact table information. These tables are used to improve query performance when SQL is used as the query mechanism. Dimension Tables Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions Data Extraction using SQL Server Integration Services The major functions of data extraction & Integration are: Getting the data out in the format that is necessary for data integration. Cleansing the data and mapping the data from multiple sources into one coherent and meaningful format. • The ETL package /program/process should be scaleable, and provide better performance. Data from transactional systems, legacy data and unmanaged data is transferred into the staging database using SQL Server 2005 Integration Services (SSIS). The solution is flexible to accommodate data from future data sources also, by making
appropriate changes to the SSIS packages. SSIS will integarte data from disparate data sources in a staging database. Another set of SSIS packages will use data from these staging databases, transform this data and will upload it into the enterprise data warehouse. The connection to the source database can be made by using different connection managers like OLEDB or Flat file connection or Excel file connection or XML file or by using a custom adapter for VSAM file. The data reconciliation between the source and data warehouse will be done through incremental load using SSIS. The data refresh process for loading the data will be initiated manually or automatically. The following figure depicts the technical architecture of the proposed solution:
Security Considerations for SQL Server Integration Services Security in SQL Server 2005 Integration Services (SSIS) is comprised of several layers that provide a rich and flexible security environment. Integration Services security combines the use of package level properties, SQL Server database roles, operating system permissions, and digital signatures. SQL Server 2005 Integration Services (SSIS) implements security on the client and on the server using the following security features: • Setting the ProtectionLevel property of the package to specify whether sensitive data should be protected by encryption, or whether the data should be removed before saving the package. • Setting the ProtectionLevel and PackagePassword properties of the package to protect packages by encrypting all or part of a package using passwords or user keys.
Controlling access to packages by using SQL Server database-level roles. Integration Services includes the three fixed database-level roles db_dtsadmin, db_dtsltduser, and db_dtsoperator for controlling access to packages. A reader and a writer role can be associated with each package. You can also define custom database-level roles to use in Integration Services packages. Roles can be implemented only on packages that are saved to the msdb database in an instance of SQL Server. Securing the operational environment by protecting file locations and limiting access to packages in SQL Server Management Studio. Packages can be saved to the file system as XML files, using the .dtsx file name extension, or to the msdb database in an instance of SQL Server 2005. Saving the packages to msdb provides security at the server, database, and table levels.
Guaranteeing the integrity of packages by signing packages with certificates. A package can be signed with a certificate and can be configured to check the signature when the package is loaded and to issue a warning if the package has been altered. Reporting
TCS proposes the Reporting solution with Cognos and the solution comprise of Cognos Framework Manager where the Model is developed, Query Studio and/or Report Studio where reports are developed. High Level architecture of the LDR Reporting process is as follows:
LDR Data Mart/Data Warehouse: Tax processing relating data from different LDR
Tax processing systems will be stored in proposed Data mart which will be used as a single data store for all LDR department (identified during requirements) needs. This data mart will be feed through scheduled ETL process. In case of LDR, this would be in a SQL Server2005 database with a Dimensional Data Mart. Application Layer This layer contains Cognos 8 servers and web servers. A Cognos 8 server runs requests, such as reports, analyses, and queries that are forwarded by a gateway. A Cognos 8 server also renders the Cognos Connection and other interfaces.
Framework Manager is a metadata modeling tool. A model is a business presentation of the information in one or more data sources. Security and multilingual capabilities can be added to the business presentation, the model can serve the reporting, ad hoc querying, and analysis needs of many groups of users around the globe.
Cognos8 Content Manager
Content Manager is the Cognos 8 service that manages the storage of customer application data, including security, configuration data, models, report specifications, and report output. This is a logical representation of the database used for reporting. Content Manager is needed to publish models, retrieve or store report specifications, manage scheduling information, and manage the Cognos namespace. Content Manager stores all the above information in a content store database. Presentation Layer This layer will deliver the data required for reports and analysis to the LDR users in the form of different types of reports. The reports can be viewed on-demand using a browser based front end (Microsoft IE 5.5 or later) or with Cognos Reporting services. Alternatively, they can be delivered to the users in multiple formats including HTML, Excel, PDF and image files via email or file share, based on scheduled subscriptions. Using Cognos Report / Query studio the below functionalities can be performed. • • • • • • Ad-hoc reporting Can produce multiple copies of a given report or portion of it in different output format Can deliver a single report to multiple destinations simultaneously Can provide reports with a standard banner page Scheduled reports by Date/time Can run report once and output is sent to different set of users according to the requirements or security matrix
• • •
Burst one large instance into many small ones Flexible output formats and destinations Drill down, drill through option
The Cognos presentation layer will comprise of the following components to provide powerful reporting capabilities: • • • Cognos Connections Report Studio Query Studio
Cognos Connection is a Web portal provided with Cognos 8, providing a single access point to the corporate data available for its products. It provides a single point of entry for querying, analyzing, and organizing data, and for creating reports, scorecards, and events. Users can run all their Web-based Cognos 8 applications through Cognos Connection. Other business intelligence applications, and URLs to other applications, can be integrated with Cognos Connection. Like the other Web browser interfaces in Cognos 8, Cognos Connection uses the default configurations of your browser. It does not require the use of Java, ActiveX, or plug-ins, and does not install them.
Query Studio lets users with little or no training quickly design, create and save reports to meet reporting needs not covered by the standard, professional reports created in Report Studio.
Report Studio lets report authors create, edit, and distribute a wide range of professional reports. They can also define corporate-standard report templates for use in Query Studio, and edit and modify reports created in Query Studio or Analysis Studio.