Professional Documents
Culture Documents
Fuel is one of the largest cost components for a shipping company. Optimum fuel consumption in relation to the speed
of the vessel is a tough balancing act for most companies. The data collected daily by the fleet is essential to analyze
the best-fit speed and consumption curve. Figure 1 demonstrates an example of a speed versus fuel consumption
exponential curve plotted to determine the optimum speed range at which the ships should operate. With only a
few errors made by the crew in entering the data (such as an incorrect placement of a decimal point), the analysis
presented is unusable for making decisions. The poor quality of data makes it impossible to determine the relationship
between a change in speed and the proportional change in fuel consumption as presented in Figure 1.
Speed-Consumption Curve
6,000.00
5,000.00
4,000.00
3,000.00
2,000.00
1,000.00
0.00
If the outliers are removed, the analysis shown in Figure 2 provides a clear a correlation between the speed of the
vessel and its fuel consumption.
Speed-Consumption Curve
120.00 Vessel A » BALLAST
Vessel A » LADEN
110.00
Vessel C » BALLAST
100.00 Vessel C » LADEN
Vessel F » BALLAST
90.00 Vessel F » LADEN
Fuel Consumption
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00
Speed
Figure 2: Speed – Fuel consumption curves (cleaned data by removing outliers).
In some cases, the practice of removing outliers can lead to the deletion of a significant number of data points from the
analysis. But can users get the answer they are looking for by ignoring 40 percent of the data set? Companies need to
be able to determine the speed at which vessels are most efficient with a lot more certainty. Data quality issues only
reduce the confidence in the analysis conducted. In the shipping example, a difference in speed of 1 to 2 knots can
potentially result in a difference of $500,000 to $700,000 in fuel consumption for a round trip US West Coast to Arabian
Gulf voyage at the current bunker price.
Does this mean that data needs to be validated 100 percent before it can be used for analytics? Does the entire
universe of data need to be clean before it is useful for analytics? Absolutely not. In fact, companies should only clean
the data they intend to use. The right approach can help to determine which issues should be addressed to manage
data quality.
For instance, in the shipping example above, it might be more important to ensure that the data used for analysis
is accurate as compared to ensuring that all the data is available. In other words, using 80 percent of 100 percent
accurate data to generate the trend is better than using 100 percent of data that is only 80 percent accurate. An
organization should focus most of its energy on data used by high-impact business processes.
To manage the quality of data, organizations need a robust data quality management framework. This will enable them
to control, monitor and improve data as it relates to various analytics use cases.
Although each type of data needs a distinct plan and approach for management, there is a generic framework that can
be leveraged to effectively manage all types of data. As shown in Figure 3, the data quality management framework
consists of three components: control, monitor and improve.
Improve Control
Fix when data quality drops Validate before loading
Monitor
Assess periodically
Control Monitor
The best way to manage the quality of data in an It is natural to think that if a company has strong controls
information system is to ensure that only the data that at each system’s entry gate, then the data managed
meets the desired standards is allowed to enter the within the systems will always be high in quality. In
system. This can be achieved by putting strong controls reality, as processes mature, people responsible for
in place at the front end of each data entry system, managing the data change, systems grow old and the
or by putting validation rules in the integration layer quality controls are not always maintained to keep up
responsible for moving data from one system to another. with the desired data quality levels. This generates the
Unfortunately, this is not always feasible or economically need for periodic data quality monitoring by running
viable when, for example, data is captured manually and validation rules against stored data to ensure the quality
then later captured in a system, or when modifications meets the desired standards.
to applications are too expensive, particularly with
commercial off-the-shelf (COTS) software. In addition, as information is copied from one system
to another, the company needs to monitor the data
In one particular case, a company decided against to ensure it is consistent across systems or against
implementing changes to one of its main data a “system of record.” Data quality monitors enable
capture COTS applications that would have enforced organizations to proactively uncover issues before they
stricter data controls. They relied instead on training, impact the business decision-making process.
monitoring and reporting on the use of the system to As shown in Figure 4, an industry-standard five-
help them improve their business process, and as a dimension model can be leveraged to set up effective
result, experienced improved data quality. However, data quality monitors.
companies that have implemented strong quality
controls at the entry gates for every system have
realized very effective data quality management.
An example of a monitoring dashboard is shown in Figure 5. It is built to provide early detection of data quality issues.
This enables organizations to perform root-cause analysis and to prioritize their investments in training, business
process alignment or redesign.
5. Communication—Any data quality initiative is likely to meet resistance from some groups of stakeholders and poor
communication can make matters worse. Therefore, a well-thought-out communication plan must be put in place
to inform and educate people about the initiative and quantify how it may impact them. Also, it is important to clarify
that the objective is not just to fix the existing bad data, but to also put tools and processes in place to improve and
maintain the quality at the source itself. This communication can be in the form of mailers, roadshows or lunch-
and-learn sessions. Further, the sponsors and stakeholders must be kept engaged throughout the lifecycle of the
program to maintain their support.
6. Remediation—Every attempt should be made to make the lives of data stewards easier. They should not view data
quality monitoring and remediation routines as excessive or a hindrance to their day-to-day job. If data collection can
be integrated and the concept of a single version of truth replicated across the value chain, it will ultimately improve
the quality of data. For example, if the operational data captured by a trading organization (such as cargo type,
shipment size or counterparty information) is integrated with pipeline or marine systems, it will ultimately enable
pipeline and shipping companies to focus on collecting and maintaining data that is intrinsic to their operation.
CONCLUSION THE AUTHORS
As organizations increasingly rely on their vast
collections of data for analytics in search of a Niko Papadakos
is a Director at Sapient Global Markets
competitive advantage, they need to take a practical and
in Houston, focusing on data. He has
fit-for-purpose approach to data quality management. more than 20 years of experience
This critical dependency for analytics is attainable by across financial services, energy and
transportation. Niko joined Sapient Global
following these principles: Markets in 2004 and has led project
engagements in key accounts involving
› Tackle analytics with an eye on data quality data modeling, reference and market
data strategy and implementation,
information architecture, data
› Use analytics use cases to prioritize data quality governance and data quality.
hot spots npapadakos@sapient.com
mosharma@sapient.com
Mohit Arora
is a Senior Manager at Sapient Global
Markets and is based in Houston. He has
over 11 years of experience leading large
data management programs for energy
trading and risk management clients
as well as for major investment banks
and asset management firms. Mohit
is an expert in data management and
has a strong track record of delivering
many data programs that include
reference data management, trade data
centralization, data migration, analytics,
data quality and data governance.
marora@sapient.com
Kunal Bahl
is a Senior Manager in Sapient Global
Markets’ Midstream Practice based in
San Francisco. He is focused on Marine
Transportation and his recent assignments
include leading a data integration and
analytics program for an integrated
oil company, process automation for
another integrated oil company and power
trading system integration for a regional
transmission authority.
kbahl@sapient.com
Sapient is not regulated by any legal, compliance or financial regulatory authority or body. You remain solely responsible for obtaining independent
legal, compliance and financial advice in respect of the Services.