This action might not be possible to undo. Are you sure you want to continue?
• Balancing your approach to Big Data • Criteria for evaluating your enterprise approach • Tips for getting started
Years of Research Into Big Data for Analysts
For the last four years, the research team at CTOlabs.com has been contributing to studies and analysis and community events on the topic of Big Data. Through our leadership of venue like the yearly Government Big Data Forum and our continuous dialog with thought leaders via our Weekly Government Big Data Newsletter we have sought to highlight lessons learned, share best practices, and foster a greater dialog between and among practitioners fielding real world solutions in the Big Data space. Our research team, led by former CTO of the Defense Intelligence Agency Bob Gourley, has been collecting community advice and success tips on the implementation of Big Data projects with the goal of continually feeding those back to the community to enhance as many efforts as possible. This presents design criteria, best practices and lessons learned in a way you can use to enhance your organization’s approach to Big Data. We constructed this piece in a way we hope you will find logical and compelling, but are ready at any time to provide more background, insights, introductions to thought or other additional information. Contact us at CTOlabs.com at any time to weigh in with your thoughts.
Empowering Analysts With Big Data
Enterprises are awash in more data than they can make sense of and it is only getting worse. Every agency is realizing that if you are not acting now to think through this challenge it will be far harder to address in the future. These challenges are highlighted in the graph to the left. The ability of humans to analyze data, represented by the red arrow, is only slightly growing. Whereas the amount of data available to support national security missions, represented by the blue arrow, has grown far beyond the ability of analysts to make sense of it. For militaries and intelligence organizations, data has been growing due to the proliferation of collection systems. But new open source information, including social media feeds is also having a dramatic impact on data growth. This curve is relevant to enterprises everywhere, but lessons from national security community successes may be most relevant due to the scale of data they have been working with.
Lessons learned from the national security community give us a framework to solve Big Data challenges
National security enterprises, including military and intelligence organizations and the commanders that depend on them, were among the first to face today’s big data challenges. National security missions have long required a rigorous analytical tradecraft, sensemaking, to emphasize the action orientation of operation analysis. Sensemaking is the creation of knowledge and the optimization of decisions from data. Sensemaking enables organizations to develop situational awareness and make maximum use over their data holdings. In the national security space, the most promising big data solutions are those that enable sensemaking in a balanced way - where analysts are empowered to do that they do best but supported by technologies that do what humans cannot - and are governed by policies forged from experience..We will leverage this conceptual framework of people, technology and policy in a more proscriptive way below.
What do humans do best, and what do computers do best?
Our years of operation experience in enterprise IT and continuous interaction with the emerging Big Data community have made something incredibly clear: Organizations are optimized for analysis when they design systems that empower their analysts to do what they do best and leverage IT to do what it does best. Here is the logic behind this observation:
• Analysts leverage the greatest processor on earth, their brains. They generate knowledge that supports their organization’s mission. Humans develop insights and inferences and produce actionable intelligence for decision makers to act upon. Analysts can be great at utilizing pattern recognition and sensemaking skills, up to a point. Even the most trained analyst can only process a fixed number of objects at any one time. Once analysts pass that threshold, human processing power degrades rapidly. • No human can handle the multi-dimensional correlations of factors present in large data challenges. The need to comprehend the large arrays of data in modern enterprises and the need to assess how data interrelates is beyond human capabilities. And although the trained analyst takes steps to avoid bias, enterprises should consider leveraging automation in ways to mitigate human bias. • Computers exist to compute. They can conduct repetitive tasks at scale and can also apply logical reasoning over large and intricate data sources. When the right architecture is in place, computers can operate over vast quantities of data of all formats, at speeds that no person or team could ever hope to do. Computers can deliver to humans new inferences based on complex evidentiary discovery and insight. A well functioning
enterprise architecture can enable computers to apply analytics holistically over data holdings, comparing many millions of relationships and correlations to each other, leading to enhanced discovery and knowledge creation for presenting to analysts. The differentiation between Human and machine computational reasoning is significant. Humans are incredibly flexible, adaptive, and broad in their reasoning constructs but are not ‘deep’ and therefore not able to handle large amounts of information or reasoning tasks. Computers on the other hand, are incredibly efficient in handling handle large amounts of information or reasoning tasks, but are not flexible, adaptive nor broad in their reasoning constructs. As a side effect, the vast amount of human reasoning is in the sub-conscious level meaning it is difficult to near-impossible to audit the logic trail. Computer reasoning is exposed and therefore auditable. Audit-ability is difficult with human reasoning forcing confidence assessments to be based on past performance, not logic assessment of the analysis in question. The key takeaway though for heightened analysis efficiency is to balance the human and the computer in a analytic functional pairing. This takes advantage of the best of both worlds. This pairing however is not just limited to human sensory assist functionally such as data visualization, it is also in the extension of compare and contrast operations (pairwise analysis) that effectively discovers more from difficult evidence in a Big Data environment. Without this pairing, the ability to exploit Non Obvious relationships (NOR) is limited and the results are sub-par.
The problem of Overwhelming Data
Gartner analyst Doug Laney created a widely used construct for understanding the enterprise data landscape, using Volume, Velocity and Variety as dimensions. Many government and military organizations add forth and fifth dimensions: Veracity and Volatility. These dimensions are important constructs for considering the contributions of big data technologies to the modern enterprise:
• Volume: Enterprise data holdings have all grown exponentially, making use of data at rest requires computer-based automation that can index, search, correlate and discover connections. This must be done in ways that bring new insights to analysts. • Velocity: Data streams into new organizations and in operational organizations must be quickly understood. The velocity of data poses many architectural challenges (how fast can it be stored?) but introduces the biggest issues around quickly determining the relevance of new data in context of existing knowledge. • Variety: The widely varying formats in data include both structured data that comes in fields and unstructured data that must have structure divined. Technologies
that work over all types of data help balanced organizations leverage all their data holdings.
• Veracity: What is the true meaning of the data? Pre-processing of data ensures the system knows data provenance and can assess validity, at speed. This is important with all data sources but is especially relevant in data created or touched by humans, such as social media. Technological contributions to veracity should also include advanced identity assessment/entity extraction and also relationship building. • Volatility: When is data valuable or when is the data most valuable? Data often has a ‘half-life’ of value meaning it is valuable for a certain period of time. What makes this even more complex is that often this volatility is related to the availability and detection of other like volatile data. Its like putting a puzzle together on a moving board… responsive technologies like analytic visualization combined with automated compare/contrast (pairwise) operations assist here… a balanced approach is needed…one or the other may be incomplete.
Big Data alone however is not enough to fully grasp the significance of the intelligence and/or investigatory analysis problem. Doug Laney’s highly applicable Big Data characterization model is focused on data… within that data we need to further explore the impact of complex evidence that is difficult to isolate, identify, and comprehend. This is the additional concept of Difficult Evidence. Like Big Data, Difficult Evidence has several dimensions: • Sparse: This is the ratio between the evidence that matters (relates and has analytic impact to the question or information goal) to the information at hand. The proverbial “needles in the haystack” or often “needles in the needle stack”. Technology assists with filtering out the overall body of non-applicable information but elevating the essential elements with analytics is essential to finding the key evidence. • Obscure: Key evidence is rarely obvious. It can be incomplete, inaccurate, vague, or intermixed with non-applicable information in many cases. These obscuration factors make ‘pulling’ the essential evidentiary patterns extremely challenging. Obscuration cloaks the essential ‘meaning’ of the evidence. • Ambiguous: Evidence can mean different things to different analysts. Like obscuration, ambiguity cloaks the ‘meaning’ of the evidence in relation to the other evidence. This is a contextual obscuration. Disambiguation is best accomplished by relation of other evidence to the ambiguous information thus enhancing the context and eliminating multiple ‘meanings’. • Fragmented: Evidence is often not complete. Fragmentation occurs due to the nature of the information or as an artifact of its gathering and storage. Whether the ‘silo-ing’ of the information is internal or external, the result is the same… evidence must often be identified, partially understood, holistically recognized, and associated with what’s missing before context is achieved.
What does an Analyst-Centric Big Data framework look like?
How will you know if you are building towards a balanced, analyst-centric big data framework? We offer insights below based on our interactions with experienced technologists through the Government Big Data Forum and through direct consultations with leaders across government. We review key capabilities you should consider for your enterprise in three broad categories: • Analyst-facing capabilities • Enterprise IT capabilities • Enterprise policy considerations Evaluation criteria in each of these areas is presented below:
Evaluating Analyst Facing Capabilities
Capability Operability Evaluation Factors How well can the analysts in your organization operate their tools? To what degree do they require assistance from the IT department or from specialized outside contractors? Are technologies in place that funnel the right data and assessments to the right person? Do technologies support analyst’s needs for social network analysis (SNA)? To what degree does the functionality of tools provided to your analysts support the full spectrum of functions (find who, what, where, when, connections and concepts and changes to all the above)? Do the capabilities your analysts use help them discover connections and concepts over large/diverse data sets? Can analysts evaluate data veracity? Can analysts evaluate data relevance? When analysts need to tailor their capabilities for new data sources can they import them themselves? Or do they require assistance from outsiders? Are there flexible import specifications or are rigid schemas used? Flexible schemas allow analysts the opportunity to get data into their analysis tools and do analysis. Can analysts access enterprise capabilities where the mission requires it? Are there thin client and mobility options? Are there stand-alone options that can synchronize with the enterprise when reconnected? Can analysts work with other analysts both inside and outside their organization? Solutions should be integrated and configurable across domains and work across internal boundaries and with partners. Interoperabiilty should include an ability to work with all standard GIS solutions.
Can analysts work across the collaborative spectrum from one independent analyst to an entire enterprise collaborating together? Can analysts move their conclusions quickly to others on the team and to decision-makers? Have coalition sharing capabilities been engineered that enable sanitization while protecting the essence of the information? As new conclusions and insights are developed they need to be smartly captured to build upon and for continued fusion and analysis. This can include knowledge from partners and others outside the organization.
The criteria above are best assessed in conjunction with experienced analysts who know your organization’s mission and function and are familiar with their current tools. But keep in mind that analysts are not paid to know the full potential of modern technologies. Additional evaluation factors below will be best evaluated in conjunction with both your internal technology team and the broader technology community.
Evaluating Enterprise IT Capabilities For Big Data
Capability Data Layers Evaluation Factors Does your data layer connect all relevant data? Have you established a trusted information layer? Does the system require loading all information into a proprietary repository or does it allow federated search among distributed data sources to gather required information for analysis and do this leveraging open architectures? Does your trusted information layer include incorporation of unstructured information into a semi-structured format? Key here is being able to crawl massive amounts of unstructured data, identify documents of interest, extract and know entities, and prepare this information for use. Seek solutions that automatically extract entities based on semantic rules, extracting directly into the intelligence repositories available for analysis…any manual process at this point has direct impact on time spent on analysis Does your enterprise security model support getting all the information to those that need it? Have you engineered for a multidimensional security model that ensures your policies are always enforced and the mission is still always supported with the best possible analysis? Additionally, this security model needs to interoperate with the existing access and security systems. Have you engineered in an ability to synchronize data sources? Does this enable smooth interoperability between single user, workgroups and enterprises?
True Service Orientation
Do you have an architecture that facilitates sharing of secure information for both service (request/response) and notification (publish/subscribe) via widely supported standards and best practices? Does this architecture provide the flexibility and adaptability needed to keep pace with the change and evolution of the data type and volume, the analytic tools, and the analytic mission? Can enterprise IT staff tailor the capabilities for analyst use, or do they need to task an outside vendor to re-code capabilities?
Most modern enterprises, especially those in the national security community, have already been building towards more service-oriented, data smart structures, so it is very likely that your organization has a good foundation along this path. But remember it is a journey, and the balanced approach your mission requires may well require changes to your configuration and perhaps even more modern technologies to optimize your ability to support the mission.
Evaluating Enterprise Policy Factors for Big Data
Capability Efficient Evaluation Factors Do your policies emphasize the need for automation for efficiency? Do you have measures of Return on Investment or Return for Mission that are used to inform architecture decisions? Do your policies seek out and eliminate barriers to collaboration that impact data design? Do you seek out and remove capabilities that do not play well with others? Before selecting new capabilities do you conduct market assessments and solicit the opinions of others with similar mission needs? Do you enforce mandates for open API’s and SOA best practices?
Frictionless Interoperable Learning Governing
Whatever the status of your technology infrastructure, you will need a good governance process in place to move to a more optimized infrastructure.
Accelerating balanced analytical solutions:
Ready to move out? Here are four steps to consider as you do:
1) Evaluate your enterprise in light of the recommended criteria above. Use that to build your plan. 2) Enlist the aid of your analyst community to prioritize the analytical capabilities to deliver. 3) After prioritizing the analytical capabilities your mission requires, address the enterprise technology gaps required to enhance support to mission. 4) Track improvements to your enterprise like a project-- Watch cost, schedule and performance
Every enterprise is different, with different missions, different infrastructures and architectures. You may find that many of the criteria we outlined above are already met by your existing enterprise. A quick inventory of capabilities and gaps will help you assess the challenge and prioritize how you architect for improvement. We most strongly recommend a structured engagement with your organization’s analysts. They understand your organization’s mission and vision and will likely be strong supporters in your move to bring more balance to your organization’s approach to big data analytical solutions. Their prioritization of needs and capabilities should help drive organizational improvement plans. However, keep in mind that your analysts are not paid to understand the power of modern computing. External advice and assistance in this area, including connecting with other organizations that have met similar challenges, will provide important insights into your road ahead. We have observed organizations making this type of transformation around the globe, including commercial organizations, government agencies and militaries. One thing all seem to have in common is a deep need to automate with efficiency. For some this translates to a calculation of Return on Investment. For militaries it can be a more operationally focused Return on Mission. But in every case, understanding the efficiencies and total cost to the enterprise of a solution is critically important to ensuring success.
For more federal technology and policy issues visit:
• CTOvision.com- A blog for enterprise technologists with a special focus on Big Data. • CTOlabs.com - A reference for research and reporting on all IT issues. • J.mp/ctonews - Sign up for the government technology newsletters including the Government Big Data Weekly.
About the Authors
Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former federal CTO. His career included service in operational intelligence centers around the globe where his focus was operational all source intelligence analysis. He was the first director of intelligence at DoD’s Joint Task Force for Computer Network Defense, served as director of technology for a division of Northrop Grumman and spent three years as the CTO of the Defense Intelligence Agency. Bob serves on numerous government and industry advisory boards. Contact Bob at email@example.com Ryan Kamauff is a technology research analyst at Crucial Point LLC, focusing on disruptive technologies of interest to enterprise technologists. He writes at http://ctovision.com. He researches and writes on developments in technology and government best practices for CTOvision.com and CTOlabs.com, and has written numerous whitepapers on these subjects. Contact Ryan at Ryan@ crucialpointllc.com
For More Information
If you have questions or would like to discuss this report, please contact me. As an advocate for better IT use in enterprises I am committed to keeping this dialogue up open on technologies, processes and best practices that will keep us all continually improving our capabilities and ability to support organizational missions.
Bob Gourley firstname.lastname@example.org
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.