Real-Time Sensor Data Warehouse Architecture Using MySQL Database

Jacob Nikom MIT Lincoln Laboratory The MySQL Users Conference 2005 19 April 2005
MIT Lincoln Laboratory

MySQL Users Conf. 04-19-2005

This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002. Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.

1

Outline
‡ Introduction ‡ Corporate Information Factory (CIF) and its
Data Management Architecture (DMA)

‡ Designing ROCC DMA using CIF architecture ‡ Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 2

Outline
‡ Introduction
± Reagan Test Site (RTS) and its instrumentation ± What is RTS Operations Coordination Center (ROCC)? ± ROCC primary operations ± ROCC logical component block diagram ± ROCC modernization ± New ROCC Data Management Architecture

‡ Corporate Information Factory (CIF) and its Data
Management Architecture (DMA)

‡ Designing ROCC DMA based on CIF architecture ‡ Summary

MySQL Users Conf. 04-19-2005

MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 3

04-19-2005 Mobile and fixed ground safety instrumentation MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 4 .Reagan Test Site (RTS) and its Instrumentation ‡ The Reagan Test Site (RTS) range instrumentation ± Multiple RF sensors collecting data in several regions of electromagnetic spectrum ± Multiple optical sensors collecting objects¶ metrics and spectral characteristics ± Telemetry systems capable of tracking multiple targets ± MySQL Users Conf.

04-19-2005 .What is RTS Operations Coordination Center (ROCC)? ‡ RTS instrumentation is controlled by the ROCC Current DMA Data Analysis Algorithms Decision Algorithms Network Displays Flat Files Sensors ‡ ROCC primary operations ± Executes the prepared scenario for the acquisition session ± Manages the data flow from multiple sensors ± ± ± ± Processes the acquired data Provides operator displays to track and predict the path of space objects Stores the acquired data for later analysis and reporting Facilitates training and simulation of performed activities MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 5 MySQL Users Conf.

04-19-2005 . called reference value A system designed to follow a changing reference is called tracking control system ROCC is a tracking control system following the predefined reference input MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 6 MySQL Users Conf.What kind of system is ROCC? Feedback control system block diagram COMPARATOR reference Input r(t) FORWARD PATH actuating signal m(t) controlled variable c(t) + - error signal e(t) CONTROLLER PLANT feedback signal b(t) feedback processor FEEDBACK PATH c(t) ‡ ‡ Control is the process of making a system variable adhere to a particular value.

analysis and distribution processes ‡ Maximizes the quality of delivered data over specified time Tactical decision control loop Reference Data Planning Data Plant Sensors Simulation Output Data Report: Data analysis Manual Processing & Analysis Displays Voice Operators Automatic Real-Time Processing & Analysis Tracking Fusion Classification Identification Trajectory Estimation MySQL Users Conf.Current ROCC DMA Block Diagram ‡ ROCC controls the data acquisition. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 7 .

10 MB/s (peak) Data accumulation rate: 1 TB/year MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 8 ‡ Modernized system .ROCC Modernization ‡ Obsolete system hardware ± Old central processors and boards are no longer supported ± Not enough computational power to perform new tasks ± Old components and interfaces are incompatible with modern technology ‡ Aging system software ± ± ± ± ± ± ± ± ± ± ± MySQL Users Conf. 04-19-2005 Centralized monolithic architecture Flat files for storing data Use of old procedural languages Alphanumeric displays Industry standard 32/64-bit Xeon or Opteron servers Software vendor independence: Linux and Java Database-based storage Distributed architecture using publish/subscribe paradigm Graphical user interface for visualization tools Targeted dataflow rates: 5 MB/s (sustained).

New Data Management Architecture ‡ ROCC data management challenges ± ± ± ± ± ± Support powerful high-precision instrumentation with almost real-time response Support intensive and costly data collection process involving many human operators with high level of reliability Support data analysis leading to changes in data acquisition environment Be adequate for the wide range of transaction types ± from simple real-time record reads and inserts to complex multidimensional analytical queries Manage combination of streaming data with traditional structures Provide request management. configuration management and data quality management capabilities ‡ Search for new data management architecture ± ± ± ± New system represents conceptual change from the old architecture Instrumentation and Control software traditionally concentrates on algorithm development and lacks good data architecture Need for framework supporting ³analysis ± decision ± execution´ paradigm Enterprise software is a leading implementer of distributed architecture and publish/subscribe paradigm MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 9 MySQL Users Conf. 04-19-2005 .

Outline ‡ Introduction ‡ Corporate Information Factory (CIF) for Data Management Architecture ± What is Corporate Information Factory (CIF)? ± CIF data flow diagram ± CIF data ± CIF layers ± CIF logical component block diagram ‡ Designing ROCC data management architecture using CIF architecture ‡ Summary MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 10 .

Claudia Imhoff. Wiley. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 11 .H. by W.What is Corporate Information Factory (CIF) ? * ‡ Information ecosystem is a model of corporate information processing ± ³CIF is the physical embodiment of the notion of an information ecosystem´ ‡ CIF consists of the following components ± ± ± ± ± ± ± ± ± ± ± External world Applications An integration and transformation layer (I & T layer) An operational data store (ODS) A data warehouse (DW) with current and historical detailed data A data mart(s) An internet and intranet A metadata repository An exploration and data mining warehouse Alternative (secondary) storage Decision support system (DSS) ‡ CIF approach could be used for modeling information processing in any organization (³forest vs. Ryan Sousa. 2000) MySQL Users Conf. 2 edition (December 18. trees´ view) * ³Corporate Information Factory´. Inmon.

CIF Data Flow Diagram External data External world Data acquisition Primary storage management Historical reference data Data delivery Exploration warehouse Reference data Internet Enterprise Resource Planning (ERP) Statistical analysis Data mining warehouse Application layer eComm (tx) Integration Operational &Transform layer layer Warehouse layer Alternative storage Report & Analysis layer eComm (rpt) CRM (rpt) ERP (tx) ERP (rpt) BI (rpt) DSS applications Enterprise transactions CRM (tx) BI (tx) DW ODS Finance Sales Marketing Accounting CRM = Customer Relation Management BI = Business Intelligence Data marts Operational reports Row detailed data Metadata management MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 12 .

redundant or unnecessary items Data format is defined outside of corporation.CIF Data ‡ External data ± ± Data is defined outside of corporation. Could have erroneous. Reformatting could be required ‡ Reference data ± ± ± Allows to standardize on commonly used names for important and frequently used information Allows consistent interpretation of corporate data across different departments Could be aliases for common and often referred names ‡ Historical data ± ± ± Volume of data ± longer history more data Usefulness of data ± recent data is more useful than the older one Granularity of data ± older data likely be used on summary level Corporate timeline Ancient history Data Recent history Most current activity Immediate future DW MySQL Users Conf. 04-19-2005 ODS Applications MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 13 .

CIF Layers eComm (tx) ‡ Application layer ± Interacting directly with end user ± Gathering detailed transaction data ERP (tx) CRM (tx) ± Auditing and adjusting data ± Editing data BI (tx) ‡ Integration and transformation layer ± Combined non-integrated data from multiple application ± Transform external data into corporate data ± Creating appropriate metadata ± Mathematical transformation ± Reformatting and resequencing MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 14 .

CIF Layers (Continued) ‡ Operational layer ODS ± ± ± ± ± ± Subject-oriented Integrated Volatile Current-valued Detailed Normalized ‡ Warehouse layer Data Warehouse ± ± ± ± ± ± ± Subject-oriented Integrated Nonvolatile Time-variant Comprised of both summary and detailed data Summary data optimized for Report & Analyses queries Normalized and de-normalized data Statistics eComm (rpt) CRM (rpt) ERP (rpt) BI (rpt) MySQL Users Conf. 04-19-2005 ‡ Report & Analysis layer ± ± Statistical analysis ± Exploration reporting ± Data mining reporting DSS analysis and reporting ± Finance ± Sales ± Marketing ± Accounting MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 15 .

CIF Logical Component Block Diagram ‡ System controls the corporation resources using real-time and long-term DSS ‡ Maximized the expected profit of corporation over specified time Strategic decision control loop Tactical decision control loop Reference Data Corporate Goals Data Plant Output Data Applications Real-time DSS Operational Data Store Corporate Report Long-term DSS Data Warehouse MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 16 .

Outline ‡ Introduction ‡ Corporate Information Factory (CIF) for Data Management Architecture (DMA) ‡ Designing ROCC DMA using CIF architecture ± ROCC data flow diagram ± ROCC data ± ROCC layers ± ROCC logical component block diagram ± Database selection ± Three dangers of database design ‡ Summary MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 17 .

04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 18 .ROCC Data Flow Diagram Data acquisition Reference data Operational data Archived data External world Integration &Transform layer Operational layer Planning Warehouse layer Secondary storage Report & Analysis layer Bias modeling Data mining warehouse Multicast middleware RIB DSS applications Classifier Long-term reporting & analysis RIB Best Choice ODS RIB Smoother BET Post overview DW Short-term reporting & analysis Impact « Sensor control data RIB Data Fusion Space Quick Look reports Data marts RIB = ROCC Interface Box MySQL Users Conf.

programs Comprise the user names. name of computers. sensor names.ROCC Data ‡ External data ± ± Data is defined outside of ROCC. 04-19-2005 ‡ Historical data ‡ Planning data . Reformatting or object conversion could be required Comprise geophysics models and constants necessary for external data interpretation Comprise common locations. redundant. access rights and privileges Operational data being migrated to the warehouse become historical data Detailed historical data are used to produce summarized historical data Historical data only inserted. Could have erroneous. passwords. never updated Comprise configuration data for the sensors¶ acquisition procedures Comprise ROCC software components¶ configuration data (XML format) Comprise data to plan specific activities to acquire space objects¶ coordinates MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 19 ‡ Reference data ± ± ± ± ± ± ± ± ± MySQL Users Conf. or unnecessary items Data format is defined outside of ROCC.

ROCC Layers ‡ External world ± ± ± ± Simultaneous output from multiple sensors up to 10 MB/s Capable to produce data autonomously Capable to work under the guidance of DSS applications Produces data as streams with considerable output rates Feedback from DSS applications ‡ Integration and transformation layer RIB Plays vitally important role in reconciling the incoming external data content and format with the internal data requirements ± ± ± ± Converts incoming data into appropriate Java objects Creates necessary metadata Mathematical transformation Reformatting and resequencing RIB RIB RIB MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 20 .

etc. Ids.ROCC Layers (continued) ‡ Operational Layer ± Subject-oriented Focusing on basic transaction processing. After acquisition session is done the data are moved to the DW ± ± ± Current-valued ODS data values are related to the current event (current acquisition session). ODS ± Integrated Physical unification and cohesiveness ‡ Uniform key structures ‡ Table naming conventions ‡ Common physical units and coordinate systems ‡ Data layouts and Metadata ± DSS applications Classifier Volatile ODS data could be updated (replaced) as a normal part of processing. Inserts and reads the streams of integrated and transformed sensor data ‡ Tracks. etc MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 21 . sensor allocation. For the next mission the ODS will be updated and its content will be moved to the DW (data migration) Best Choice Detailed ODS contains inserted values of the published sensor objects and does not expect to have summary data Smoother Normalized ODS contains normalized data Data Fusion ± Decision Support System Applications Makes real-time operational decisions like ID assignment. Control blocks.

ROCC ODS Specifics ‡ Data streams of objects ± ± ± ± ± ± ± Streams of measurements usually don¶t have very complex structures Object-relational mapping is straightforward and not computationally intensive High-speed insertion does not allow to use indices Relatively small size of the ODS allows to work without indices Indices do exist in the DW Could control the sensors. 04-19-2005 ODS Secondary System DW Archive System Necessary operations could be performed during the copying Two operational databases could be used in parallel right after the acquisition MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 22 . which in turn influences the input data Typical analytical application assume that data producer is not changed during the query ‡ Indices ‡ Real-time DSS feedback ‡ Fault-tolerance (primary and secondary ODS) Network Network Network Additional benefits ‡ ‡ ODS Primary System MySQL Users Conf.

but focused on the modeling and analysis of data ± ± Integrated Data migrated into DW from ODS are integrated with the rest of DW data Time-variant Every datum in the data warehouse is identified with a particular time period. The past cannot be changed. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 23 . In addition to the detailed historical data DW contains summary data. only inserts. only summarized tables MySQL Users Conf. All summarized data are correct only for the particular period to whom the corresponding detailed data are identified with ± Non-volatile There are no updates in the warehouse. They are pre-calculated to reduce analytical query times ± ROCC DW specifics ROCC DW does not use multidimensional data model yet. they became a part of history. only expanded ± Comprised of both summary and detailed data Once detailed data from ODS migrated into DW.ROCC Layers (continued) ‡ Historical (data warehouse) layer ± Data Warehouse Subject-oriented Organized like ODS around major ROCC entities.

ROCC Layers (continued) ‡ Analysis and Reporting layer Continuous automatic monitoring of sensor metric performance Example: Angle Bias Modeling using ROCC Data Warehouse What is Angle Bias Modeling? RealReal-time queries Storing sensor data streams ODS Creation of a mathematical model to describe differences between reported and actual antenna pointing positions Sensor data collection RIB Bias Data migration Analytical queries Bias model coefficients Corrected pointing information Data Warehouse Bias Modeling Application Sensor Control System MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 24 .

Angle Bias Modeling using ROCC Data Warehouse Organization of Sensor-Specific Summary Track Data in the Warehouse Observed Data Source Time Range Az El Iono Corr Tropo Corr SNR Truth Data (Time-aligned and in Sensor Coord) Range Az El Delta Rng Residual Data Delta Az SNR Bias Modeling Application Data Flow Bias Model Analytic Equation Strategic decision control loop Sensor Control System Truth Data Data Warehouse Observed Data Generate Residuals Residual Data Multivariate Regression Atmospheric Data Report Bias Model Coefficients Data Warehouse MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 25 .

04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 26 .ROCC Logical Component Block Diagram ‡ ROCC controls the RTS resources using tactical and strategic DSS ‡ Maximizes the quality of collected data over specified time Strategic decision control loop Reference Data Planning Tactical decision control loop Data Plant Sensors Simulation Output Data Report Data Analysis Tactical real-time DSS Displays Tracking Fusion Voice Classification Identification Operators Trajectory Estimation Operational Data Store Strategic long-term DSS Bias Modeling Sensor Comparison Operators Data Warehouse MySQL Users Conf.

Database Selection ‡ The same server should work adequately for both ODS and DW ‡ Deficiency in sophistication could be mitigated by custom programming Comparison criteria (qualitative values) MySQL Oracle DB2 (IBM) SQL Server (Microsoft) PostgreSQL Speed Sophistication Reliability Administration simplicity Standardization Savings High Moderate High High High High High High High Low Moderate Low High High High Low Moderate Low High High Moderate Moderate Moderate Low Low High Low High Moderate High MySQL Users Conf. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 27 .

another (closely related on) stored in the file system ± Duplication of data between database and file system ± Increases the maintenance const MySQL Users Conf.Three dangers of ROCC DMA design ‡ ³Balkanization´ of data ± Different groups of data have different design ± Attempt to fit data definitions into requirements of the existing tool ± In the long run increase the maintenance cost ‡ Dialectism ± Usage of specific database dialects ± Deviation from existing SQL standards ± Locks the user with specific vendor ‡ ³Dirty´ repository design ± Part of the data stored in the database. 04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 28 .

04-19-2005 MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 29 .Outline ‡ Introduction ‡ Corporate Information Factory (CIF) for Data Management Architecture ‡ Designing ROCC data management architecture using CIF Architecture ‡ Summary MySQL Users Conf.

low granular queries DW is used for complex queries against summary-level data ODS provides information for tactical decisions about near real-time data acquisition DW delivers feedback for strategic decisions leading to system improvements Good performance for fast queries in ODS Capable of storing large amount of data in DW Simple installation and licensing allow many independent servers to run inside one system being used as ODS. data marts.Summary ‡ Modernization of the ROCC calls for a new type of data management architecture ± ± ± New high-performance hardware Significant increase of generated and managed volumes of data Introduction of new services Designed to support large scale information system Effectively manages different types of information queries Provides flexibility in distributing data between multiple producers and consumers ODS supports near real-time storage requirements and targeted. etc. DW. 04-19-2005 . Excellent Java support allows seamless integration with the rest of the software MIT Lincoln Laboratory 7/19/2011 2:32:52 AM 30 ‡ CIF satisfies the requirements ± ± ± ‡ ‡ ‡ ODS and DW represent two types of repositories for information request ± ± ODS and DW are parts of different control loops ± ± ± ± ± ± MySQL is a good fit for ODS and DW databases MySQL Users Conf.