You are on page 1of 18

US010042834B2

(12) United States Patent (10 ) Patent No.: US 10 ,042,834 B2


Hinterbichler et al. (45) Date of Patent: Aug. 7 , 2018
(54) DYNAMIC FIELD EXTRACTION OF DATA (58 ) Field of Classification Search
??? ............................................... GO6F 3 /0484
(71) Applicant: VMware, Inc ., Palo Alto , CA (US) USPC .............. . ...... 715 /738 , 739
See application file for complete search history.
@(72 ) Inventors: Erik Hinterbichler, Mountain View ,
CA (US ); Chengdu Huang, Mountain (56 ) References Cited
View , CA (US); Zhenmin Li, San Jose ,
CA (US ); Ron Oded Gery, Kirkland , U .S . PATENT DOCUMENTS
WA (US) 7 ,836 ,439 B2 * 11/2010 Shenfield ............ GO6F 9/ 44521
717 / 143
(73 ) Assignee : VMware, Inc., Palo Alto , CA (US) 7 ,925 ,729 B2 * 4 /2011 Bush ................. H04L 12/ 2807
@ 709 /223
( * ) Notice : Subject to any disclaimer, the term of this 8 ,239 ,754 B1 * 8/ 2012 Orthlieb ............... G06F 17 / 243
patent is extended or adjusted under 35 715 / 232
U .S .C . 154 (b ) by 565 days. 8,265,925 B2 * 9/2012 Aarskog ............... GO6F 17 /271
704 / 1
(21 ) Appl. No.: 14 /790 ,189
@
8 ,498 ,987 B1 * 7/ 2013 Zhou ................ G06F 17 / 30946
707 /741
9 ,075 ,718 B2 * 7 /2015 Hinterbichler . .... G06F 11/3656
(22) Filed : Jul. 2, 2015 9 ,460 ,074 B2 * 10 /2016 Huang ........ ........ G06F 17 /2705
9 ,507,848 B1 * 11/2016 Li ..................... G06F 17 /30911
(65) Prior Publication Data 2007/0005535 A1* 1/2007 Salahshour ............. G06F 9 /542
706 /20
US 2015 /0301996 A1 Oct. 22 , 2015 2007/0220031 A1* 9/2007 MacMahon ............. A63F 13 / 10
Related U .S . Application Data (Continued )
Primary Examiner — William Titcomb
(63 ) Continuation of application No. 13 /827 ,037, filed on (74 ) Attorney , Agent, or Firm — Patterson + Sheridan ,
Mar. 14 , 2013 , now Pat. No . 9 ,075 ,718 . LLP
(51) Int. Cl. (57 ) ABSTRACT
GOOF 3 /00 (2006 .01)
G06F 17 /24 ( 2006 .01) A log analytics graphical user interface enables a user to
G06F 11 / 36 (2006 .01) dynamically extract and define a field from unstructured log
G06F 11/07 (2006 .01) data . The log analytics module automatically determines a
G06F 1730 (2006 .01) definition for a field based on log text selected by the user.
(2013.01) A portion of each log message is highlighted to reflect what
G06F 3/ 0484 the extracted field may be to assist users with understanding
(52 ) U .S . CI. if input parameters are selected the intended log data .
CPC ........ G06F 17 /241 ( 2013 . 01 ); G06F 11/ 0712 Changes to the definition of the field , by the user,may cause
( 2013 .01 ); G06F 11 /0778 ( 2013 .01 ); G06F further highlighting that to indicate an incomplete or erro
11/3656 ( 2013 . 01); G06F 17/30657 (2013.01) ; neous field definition .
G06F 17/30716 (2013.01 ); G06F 3 /0484
(2013 .01 ) 20 Claims, 8 Drawing Sheets

302
mple .com /products / solutions /search .php HTTP / 1 . 1 " 200 Fields
2043 - patible ; discobot / 1 . 1 ; http : / / discoveryexamp . com /
2 status return_ size browser_ name
300
208
302
mple . com /products / solutions /search . php HTTP / 1 . 1 " 200
patible ; discobot / 1 . 1 ; +http : / /discoveryexample . com /
> status return size
212
browser name Extract Field
300
304
A Fields
US 10 ,Page
042 ,2834 B2

(56) References Cited


U . S . PATENT DOCUMENTS
2008/0127043 A1 * 5/ 2008 Zhou . ............... G06F 11/3604
717 / 104
2013/0117679 A1 * 5/2013 Polis ..... .... H04L71567/738
/00
2014 /0282031 A1* 9/2014 Hinterbichler ...... GO6F 11/3656
715 /738
* cited by examiner
atent Aug. 7 , 2018 Sheet 1 of 8 US 10 ,042 ,834 B2

n MALN OYTGICS 1OD3UL2E WO 007 FET


VIVO
wwwwwww

VF 114
.
STOR

OOL
-

????
CA

.
11NETWORK wwwwwwwwwwwwwwwwww

SYESRTVEMR
3
-
102
wani
win
WERKLUA
1APL2ICATONS KARO O
S
1 P
Y
*

E
ko
R
2AT E
I
0 N
MG
IF
NETW

ndo 104
901
1A
.
FIG

VAKUR IMIN
A
IF
.
STOR IF 114
.
STOR
nhon

inhos wumthnari k rt OLT


IM ON
SYESRTVEMR
1
-
102 120SOPYERSATIENMG k 1SOPYE2RAT0EINMG
1APLI2CATONS 1SYE0SR2TVEMR 1APL2ICATONS
wmasuernoi WOW 901
CPU 104 EX CPU 104
06

?????????????????????
U . S . Patent Aug. 7 , 2018 Sheet 2 of 8 US 10 ,042 ,834 B2

KULT ... . .. tot

KE
WA ex
BMWUAN
w

112 1HYPERVI6SOR HARDW E 118PLATFORM AMBAL


W

134
DATA TAKE
MW 4
-
108
HOST WA
TRAIS
KARMEW

PAKLRISO
.

ra
.

VM WILL

AYW w www w w* * * * * ** * * * * ** * * * * * **

EX L SAN A
XK SAN
*
IWA WW MARIA

WAR
*
XK

MALNOYTIGCS ODULE
132 ZOEK 112 HOSIAUETDAHW JUVMOTH 1PLATFO8RM
Adat
150
TXIRE
.. . . . . . . ....

x x HOST
3
-
108
WE VM WMABLU
memanas xx A

xx BRMANKBAM E
xx wwwwwwwwwwwwwwwwwwwwwwwwwwwwwww SAN KA K
BA 1B
.
FIG
U

SHFAANRNKAAN
110NETWORK xx
VM
.
kthi? WARKAMAAM E
OK
xx
nit
ISOH
80L
-
7 VM . . .
112 1HYPERVI6SOR HARDW EP1LATFO8RM TERI -

WSMARBAH
?x x n VM

MVIRTUALZO ANGE T KAWLVMU


.
DETIK WA M N
KE KT
. .. . .

WA
1SOF3TWA0RE
.

SAW
.

KA

1HYPERVI6SOR
.

.
VM
VMVM

HARDW E
. .

1PLATFO8RM
.

. .

S
.

112 ANAMAN
.

. .

1
-
108
HOST
.

ERT
.

. F
.

*EXEC
.

. WA RNA
ALMANY
WA

wwwnnwan www www wanunun


U . S . Patent Aug. 7 , 2018 Sheet 3 of 8 US 10 ,042,834 B2

210 their todevice ile taarifa ikitokeailikuwa yake maisha iki r i wishesh territorialiteiteiteiteiteiteiteiteitseinheitelerindendiens individuinterioada with

Search browser
name
_
respon

Fields resource source http_status return_size +severity


woman

200
an
.

.
Fifth
HERE 202 330
*
9531
home
.

nit w
.

"
,

"
.
212mansion *

inaton
.

.
I
.
nbraowmser ,01Server

i
.

w sizeretum 2
.
FIG
.

M:
"
5
/WANT
(
0
.
;
U 6
)
61
KHTML
1
13
522
like
,
opiznliedwloebwkasit
-

GET
"
-
326
148
101
.
38
pephp
scom
/
HTTP
200
1
orxelaudmturipcoltnehs15587
M/
.
"c;
(
0
5d1
:
fttp
ocom
imszcpovacietrolyiebxaolmpet BTU
.iiYB 20180181
Gecko
)
2
:
WOW64
;
1
6
NT
Windows
(
5
Mozilla
"
0
.
4
/
Firefox
*

-
:.
262
249
191
.
69
gif
blank
pics
/
GET
"
6249
200
0
1
HTTP
http e,
www
com
a4
M/
"
*
05
)
PPC
I
;
xbocazdmienpftlgoehs2af
25
.
75
-
86
68
1
example
GET
"
pcp
/
com
miner
_
gal
?
php
7r3o2d8u1ac2t3sp
=
code
&
_namesizebshtipretosautwrsucemsr
SBROED
7
1

nbsrhetoaiulwtrmzsucepnr
r

.
1
/
HITP
145931
200
"
http
*
searchc
www
:
om
referer
=
q
?
rshetfaetiruepsr
-
.

.
59
23
:
00
01
05
-
2011
from
events
log
200
of
out
20
to
1 "
1
,
13
522
Safari
2
0
.
3
/
Version
)
Gecko $

.
www

wit
-

13.TSiEJI
-

discobot
"
)
html
. resource %
1
148
101
.
38
live
ap
/
htesttdocs seovurictey

HH
SA
source
aLogApnliycaticosn 18
05
-
2011
:
23
690
.
04
58
000
.
04
58
:
23
18
05
-
2011
800
.
41
56
:
23
18
05
-
2011
40
56
:
23
900
.
18
05
-
2011
23
44
:
39
800
.
2011
-
%
1
09
44
:
23
000
.
39

206 WWAAAAAAAAAAARRRRRY WWW

204mm
204
-
1
U . S . Patent Aug. 7 , 2018 Sheet 4 of 8 US 10 ,042,834 B2

.
www

320 anes 338

DFEexfitrneaictleodn TypeValue
Demed
c326
imal
w Context w Ww
+
10
?
104
?
-
.
www
/*
HTTP 200
code
_
http
Name wW 581
w 336 3D
.
FIG

are
the 328 330 332 334 hey

322
in
*
324 were
which
*
*

twwww
*
**
*
* war
wat
with
more
than
**
*
*
w
so
D,efin tion say
they

.
Fields
Fields Fields W

kere dition

?????????????????????????????
E

320m
310
306
308 IcLrEeUsRKuA
302 302
300 300 312 1

?? ? muitoYAcode
47 +2.4AVY AY TA
porelaudtricotnhs
com
smple
200
1
/
HTTP
php
. porelaudturicotnhs
com
smple
200
"
1
/
HTTP
php
. poreluadturicotnhs
com
smple
280
"
1
/
HTTP
php
.
dpiasctovcieorbyelxoaemtp
:
http
+
;
1
/
com
. 3A
.
FIG dpisacotveciryobxalmopet
:
http
+
;
1
/
com
. FIG
.
3B dpisacostveciryoebxalmopet
:
http
+
;
1
/
com
. 3C
.
FIG
304FnameEsizebr>sxetioaerwutaslrcednst brsetoawutsrensr
size
code
http
name
_
nbsretaiowmzutsrens 212

208

-
204
atent Aug. 7 , 2018 Sheet 5 of 8 US 10 ,042 ,834 B2

wwwwwwwwww huis is hy byhawwwwwwwwwww wwwwwww w wwwws

Search

200
w
w .i .

"

-
ww

sto rs w w
320

Fields Defin t on
*
intended

306
310
VTypealue Decimal
det
+
d
2
.
1
*
0
\
?
»

306
:
HTTP
Context Name
d
w 200
*

indenden
httpcode (
Save
Test W

Ww
indledende
bresource raowmser

Winity
ting
n
J

w
-
*

w
-

308
-

482 SEceriast
-

*
*

:CSoeuravtin
1
.
"

requren
-

3E
.
FIG
erxoadmupcltes
GET
-
126
148
101
p38
com
*
15531
200
.
1
HTTP
php
,
find
/
-

-
?u
eoaxmcízaimbncpdtleofegsha2f
McWww
(
25
.
4
/
"
)
PPC
I
;
_namesizebrhtipsetosfautwersucemsr M;
"WNT
(
0
.
5
/U
A1
)
nl
6,
KHTML
opizlneiwdlbokwY7as527 ene p3r2ogdjuac2t3sp *
com
example
GET
"
-
86
168
25
775
=
code
&
ga1
?
php
.
miner
_
cp
/ "
4
Firefox
20100101
Gecko
)
2
:
rv
WOW64
;
1
6
NT
Windows
(
0
.
5
/
Mozilla
-

-
error
(
]
1
0
.
sd127
ecby
:
/
onxlfiprnguovierarntitdon
.i3Idrt1 eorxlaudmtuipcoltnes
GET
126
148
101
38
18
05
-
p2011
scom
"
1
HTTP
php
.
search
/
name_sizebrhttpsetosautwrsucemsr
1

.
1

.
/
HTTP
145931
2001
"
1
http
searchc
www
:
om
referer
-
a
?
-

.
59
23
:
00
01
05
-
2011
from
events
log
200
of
out
20
to
1
58
:
23
209
090
.
04
»
19587
Mozilla
"
*
c;
(
0
5
/discobot
http
+
.
1o: html)"com/discovceroybxaomptle 310306size308rshttpninetosautrucnse
mpatible .
"
1
13
522
Safari
2
0
.
3
/
Version
)
Gecko
like

GET
.

.
home
live
/
haptesttdocs
Sseovurictey
"

ww 1

fa
HNYIHII
Yiyinix
310
306
308
w
W

1
*
. E
25
.2
LWWAH

aALpnloiycatgiosn .

.
LAU

www
18
05
-
2011
892
.
41
56
:
23
18
05
-
2011
49
50
:
123
006
.
2011
18
05
-
18
05
-
2017
23
39
44
:
000
.
18
05
-
2011
18
05
-
2011
44
:
123
000
.
39

xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2
-
204 3
-
204 4
-
204 5
-
204
2?4
atent Aug. 7 , 2018 Sheet 6 of 8 US 10 ,042,834 B2

320 *

*6 Save
Test
DFEexiftrnacltiedon VTypealue Decimal *

+
d
*
0
\
?

666
TPI
*

*
dj?
apo
W

RRRRRRRRRRRRRRRRRRRRRRRR
Context *
200
"

RRRRR
alueN
RRRRRR

330 -
come
-
322
*
*
*
*
324 out
moron
i
-
.
income om
.
sonmonthientone
heme .
MARRETERA

Fields
.
?
stig
}
{

to
wistenmomnasteka.

6 Fed Definton
YA

appa
*

310
*

310
306
308
306
408
406
*

sporeluadturicotnhs
200
"
1
/
HTTP
php
. sporelaudtruicotnhs
200
1
/
HTTP
(
php
.

K
4A
.
FIG 4B
.
FIG
discovceroy bxaomptle
:
http
+
;
1
/
com
. discsovceroyebxaomptle
:
http
+
;
1
/
com
.

H
name_sizebr?steoatwlsuemsr namesizebr>steoaiwtsuemsr
U . S . Patent Aug. 7 , 2018 Sheet 7 of 8 US 10 ,042 ,834 B2

500 mm
502 -
Display a plurality of log messages
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

504 -
Receive indication to extract a field based on a
specified portion of log textof a first log message
506
Determine a pattern for the extracted field that matches
the specified portion oflog text
508
Determine a context for the extracted field based on the
specified portion of log text
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

510 mm
Generate a definition of the extracted field having the
determined pattern and context
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

512
Modify display of the plurality oflog messages in the
graphical user interface that have instances of the
extracted field according to the generated definition

FIG . 5
U . S . Patent Aug. 7 , 2018 Sheet 8 of 8 US 10 ,042 ,834 B2

600
602 waren
Receive indication that context associated with the rr
n604 - extracted field has been modified

Determine the modified context partially matches a


Aran
token of log text adjacent to an instance of the
extracted field thatmatches the pattern n
606

n
608
Modify display of a portion ofthe adjacent token which
matches the modified context to indicate an incomplete
match to the user Pan
Modify display of a remainder of the adjacent token to nari
suggest a completion of themodified context to the
SE nar itu
FIG . 6
US 10 ,042 ,834 B2
DYNAMIC FIELD EXTRACTION OF DATA FIG . 1B depicts a block diagram that illustrates a virtu
alized computing system with which one or more embodi
CROSS -REFERENCE TO RELATED ments of the present invention may be utilized .
APPLICATION (S ) FIG . 2 depicts a screenshot of a user interface for viewing
5 and analyzing log data , according to one embodiment of the
This application is a continuation of prior U .S . application invention .
FIGS. 3A -3E depict screenshots of a user interface for
Ser. No. 13 /827,037 , filed Mar . 14 , 2013 , the entire contents dynamically
of which are incorporated by reference herein . extracting a field from log data , according to
one embodiment of the invention .
BACKGROUND 10 FIGS . 4A -4B depict screenshots of a user interface for
modifying a definition of a field extracted from log data ,
System administrators provide virtualized computing according according toto one
one embodiment
embo of the invention .
infrastructure , which typically includes a plurality of virtual FIG . 5 is a flow diagram that illustrates steps for a method
machines executing on a shared set of physical hardware 15 aforcomputer providing a user interface for analyzing log messages for
infrastructure, according to an embodiment of
components , to offer highly available, fault- tolerant distrib the present invention .
uted systems. However, a large - scale virtualized infrastruc FIG . 6 is a flow diagram that illustrates steps for a method
ture may have many (e.g., thousands ) of virtual machines for providing a user interface for modifying a definition of
running on many of physical machines . High availability an extracted field from log messages , according to an
requirements provide system administrators with little time 20 embodiment of the present invention .
to diagnose or bring down parts of infrastructure for main
tenance . Fault- tolerant features ensure the virtualized com DETAILED DESCRIPTION
puting infrastructure continues to operate when problems
arise , but generates many intermediate states that have to be One ormore embodiments disclosed herein provide meth
reconciled and addressed . As such , identifying , debugging , 25 ods, systems, and computer programs for displaying and
and resolving failures and performance issues for virtualized analyzing log data for a computing infrastructure . In one
computing environments have become increasingly chal- embodiment, log data , sometimes referred to as runtime
lenging . logs, error logs , debugging logs , event data , is displayed in
Many software and hardware components generate log a graphical user interface . A log analytics application may
data to facilitate technical support and troubleshooting. 30 parse each entry of the log data to extract several statically
However, over an entire virtualized computing infrastruc defined , pre -determined fields, such as a timestamp. How
ture, massive amounts of unstructured log data can be ever , due to the unstructured format of log data , there may
generated continuously by every component of the virtual- be information within log data that a user, such as a system
ized computing infrastructure . As such , finding information administrator, may wish to identify and extract from the log
within the log data that identifies problems of virtualized 35 data for additional analysis . According to one embodiment,
computing infrastructure is difficult, due to the overwhelm the user may select text, via user input, from the log data and
ing scale and volume of log data to be analyzed . dynamically extract a definition of a field from the selected
text. The extracted field definition can be applied to the
SUMMARY entirety of log data and be used similarly as statically
40 defined , pre - determined fields , for example , in use with
One or more embodiments disclosed herein provide a searches, filters , charts , and statistical analysis .
method for displaying a graphical user interface for analyz - FIG . 1A is a block diagram that illustrates a computing
ing a plurality of logmessages for a computing environment system 100 with which one or more embodiments of the
The method includes displaying a plurality of log messages, present invention may be utilized . As illustrated , computing
including a first log message comprised of log text, and 45 system 100 includes a plurality of server systems, identified
receiving an indication to extract a field based on a specified as server system 102- 1, 102 - 2 , 102- 3 , and referred to col
portion of log text of the first log message . The method lectively as servers 102 . Each server 102 includes CPU 104 ,
further includes generating, by operation of one or more memory 106 , networking interface 110 , storage interface
processing units, a definition of the extracted field having ( 1 ) 114 , and other conventional components of a computing
a pattern that matches the specified portion of the log text, 50 device . Each server 102 further includes an operating system
and (2 ) a context for the extracted field , wherein the context 120 configured to manage execution of one or more appli
is determined based on the specified portion of the first log cations 122 using the computing resources ( e. g ., CPU 104 ,
message . The method further includes annotating a first memory 106 , networking interface 110 , storage interface
portion of the log text of the first logmessage which matches 114 ) .
the pattern , and annotating a second portion of the log text 55 As mentioned earlier, software and infrastructure compo
of the first log message which matches the context. nents of computing system 100 including servers 102 ,
operating systems 120, and applications 122 running on top
BRIEF DESCRIPTION OF THE DRAWINGS of operating system 120 , may generate log data during
operation . Log data may indicate the state, and state transi
So that the manner in which the above recited aspects are 60 tions, that occur during operation , and may record occur
attained and can be understood in detail, a more particular rences of failures, as well as unexpected and undesirable
description of embodiments of the invention , briefly sum - events . In one embodiment, log data may be unstructured
marized above , may be had by reference to the appended text comprised of a plurality of log messages , including
drawings . status updates, error messages, stack traces , and debugging
FIG . 1A depicts a block diagram that illustrates a com - 65 messages. With thousands to millions of different processes
puting system with which one or more embodiments of the running in a complex computing environment, an over
present invention may be utilized . whelming large volume of heterogeneous log data , having
US 10 ,042 ,834 B2
varying syntax , structure , and even language ,may be gen components of host 108 . Each VM 112 includes a guest
erated . While some information from log data may be parsed operating system ( e. g ., Microsoft Windows, Linux ) and one
out according to pre -determined fields, such as time stamps , or more guest applications and processes running on top of
other information in the log messages may be relevant to the the guest operating system .
context of a particular issue, such as when troubleshooting 5 In the embodiment shown in FIG . 1B , computing system
or proactively identifying issues occurring in the computing 150 includes a virtualization management software 130 that
system 100.
Accordingly, embodiments of the present invention pro may 110 .
communicate to the plurality of hosts 108 via network
Virtualization management software 130 is configured
vide a log analytics module 132 configured to store and to carry out administrative tasks for the computing system
analyze log data 134 from software and infrastructure com - 10 150 , including managing hosts 108 ,managing VMsrunning
ponents of computing system 100 . In one embodiment, log within each host 108 , provisioning VMs, migrating VMs
analytics module 132 may be configured to perform lexical from one host to another host, and load balancing between
analysis on log data 134 to convert the sequence of charac
ters of log text for each log message in log data 134 into a ization108management
hosts of host group 124. In one embodiment, virtual
software 130 is a computer program
sequence of tokens (i.e ., categorized strings of characters ). 15
As described later, log analytics module 132 may use lexical that resides and executes in a central server, which may
analysis to generate definitions for fields dynamically reside in computing system 150, or alternatively, running as
extracted from log text, and to provide instant visual feed - a VM in one of hosts 108 . One example of a virtualization
back regarding changes to the definition for the extracted management software is the vCenter® Server product made
field . 20 available from VMware, Inc. Similar to the software and
According to some embodiments , users, such as system infrastructure components of computing system 100, the
administrators, can access log analytics module 132 to software and infrastructure components of computing sys
access, process, and analyze log data 134 in an interactive tem 150, including, host group (s) 124 , hosts 108 , VMs 112
visualization via the graphical user interface. The graphical running on hosts 108 , guest operating systems, applications,
user interface may be configured to enable the user to select 25 and processes running within VMs, may generate large
text from log data 134 to dynamically define one or more amounts of log data during operation .
fields based on the selected text. The graphical user interface While log analytics module 132 is depicted in FIG . 1B as
may highlight portions of log data 134 based on the gener a separate component that resides and executes on a separate
ated definition for the field . While the user edits the defini server or virtual machine , it is appreciated that log analytics
tion , the graphical user interface may dynamically highlight 30 module 132 may alternatively reside in any one of the
portions of log data 134 based on the changes to the
definition to indicate the effects of the modified definition to for exampledevices
computing of the virtualized computing system 150,
, such as the same central server where the
the user. In some embodiments, the graphical user interface
of log analytics module 132 may be configured to graphi virtualization management software 130 resides. In one
cally suggest changes to the definition of an extracted field . 35 embodiment, log analytics module 132 may be embodied as
for example, by highlighting portions of log data 134 that a plug - in component configured to extend functionality of
would be affected by a suggested change . Log analytics virtualization management software 130 . Access to the log
module 132 may store the definition of the extracted field , analytics module 132 can be achieved via a client applica
and apply the definition of the extracted field to other log tion (not shown ) . For example, each analysis task , such as
messages in log data 134 . One example of the graphical user 40 searching for log messages, filtering for log messages,
interface of log analytics module 132 is shown in FIG . 2 . analyzing log messages over a period of time, can be
While embodiments of the present invention are described accomplished through the client application . One embodi
in conjunction with a computing environment having physi- ment provides a stand - alone application version of the client
cal components, it should be recognized that log data 134 application . In another embodiment, the client application is
may be generated by components of other alternative com - 45 implemented as a web browser application that provides
puting architectures , including a virtualized computing sys - management access from any networked device .
tem as shown in FIG . 1B . FIG . 1B is a block diagram that FIG . 2 depicts a screenshot of a user interface 200 for
illustrates a computing system 150 with which one or more managing log data of a computing system 100 , according to
embodiments of the present invention may be utilized. As various embodiments of the invention. The screenshot
illustrated , computing system 150 includes a host group 124 50 shown in FIG . 2 is an example of a user interface that is
of host computers , identified as hosts 108 - 1 , 108 - 2 , 108 - 3 , displayed in the log analytics module 132 . As described in
and 108 - 4 , and referred to collectively as hosts 108 . Each greater detail below , the user interface 200 includes a log
host 108 is configured to provide a virtualization layer that area 202 and a fields area 210 . Log area 202 displays a
abstracts computing resources of a hardware platform 118 plurality of log messages 204 ( including a first log message
into multiple virtualmachines (VMs) 112 that run concur - 55 204 - 1 ) generated over a period of time. In some embodi
rently on the samehost 108 . Hardware platform 118 of each ments, log area 202 may display one view of log messages
host 108 may include conventional components of a com - 204 that constitute one page of log messages from a pagi
puting device , such as a memory, processor, local storage , nated set of log messages (i. e., “ 1 to 20 out of 200 log
disk interface , and network interface . The VMs 112 run on events ” ) . Log area 202 may display one view of log mes
top of a software interface layer, referred to herein as a 60 sages 204 that satisfy a specified criteria or constraint. Log
hypervisor 116 , that enables sharing of the hardware area 202 may display one view of log messages 204 in a
resources of host 108 by the virtualmachines . One example specified order, such as by time, later first.
of hypervisor 116 that may be used in an embodiment As shown in FIG . 2, user interface 200 includes a global
described herein is a VMware ESXi hypervisor provided as fields area 210 that displays a list of fields 214 aggregated
part of the VMware vSphere solution made commercially 65 from all log messages shown in log area 202 . In one
available from VMware , Inc . Hypervisor 116 may run on top embodiment, log area 202 includes a field label area 212 , for
of the operating system of host 108 or directly on hardware each displayed log message 204, which represents a list of
US 10 ,042 ,834 B2
5
existing fields (e.g ., “ source,” “ resource," "http _ status " ) that text that matches the context of the extracted field may be
have been parsed from log text 208 of a particular log highlighted with a light green . While highlighted portions
message . 306 , 308 , 310 are depicted as having highlighted back
In the embodiment shown , each of the plurality of log grounds, it should be appreciated that “highlighting ” text
messages 204 includes a timestamp 206 ( e . g ., “ 2011 -05- 18 5 includes a variety of techniques for displaying and rendering
23 :58: 04 . 000 ” ) that indicates a date and time corresponding text in a manner that graphically distinguishes the text from
to the creation of the corresponding log message 204 , and a other text, including rendering text using particular back
text description , herein referred to as log text 208 ( e . g ., ground colors , background patterns, background textures,
“ 38 . 101. 148 . 126 – GET ' example . com /products/ solutions font colors , font styles such as bold -face , italics , underlines ,
search .php HTTP / 1. 1 ' 200 15587 ' ). While each log message 10 borders , font families , font sizes , font animations, insertion
204 is depicted as a separate line of text delimited by of delimiting characters such as brackets, and any combi
carriage returns for sake of illustration , it should be recog - nation thereof.
nized that log messages 204 may be arranged in a variety of FIG . 3D depicts an enlarged view of definition area 320
formats , including log messages that span several lines . that displays the parameters that define the extracted field , as
FIGS. 3A - 3E depict screenshots of a user interface for 15 initially determined by log analytics module 132. Definition
dynamically extracting a field from log data , according to area 320 includes input elements having the parameters that
one embodiment of the invention . The screenshots shown in define the extracted field to enable the user to view , modify ,
FIGS. 3A -3D may be enlarged view of user interface 200 test, and save changes to the definition of the extracted field .
that is displayed in log analytics module 132 and depict a In the embodiment shown , definition area 320 includes a
series of user interactions with user interface 200 . According 20 value - type element 322 , a context element 324 , a name
to one embodiment, a user interacting with user interface element 334 , a test button 336 , and a save button 338 .
200 of log analytics module 132 may select a portion of log Value - type element 322 indicates the pattern determined
text 208 from one of log messages 204 in log area 202. As to match selected text 302 . In one embodiment, value -type
shown in FIG . 3A , the user manipulates a cursor 300 to element 322 includes a value type list element 326 that
generate a selection of the text “ 1 . 1 ” from log message 25 provides a pre -determined list of value - types ( e . g ., “ Deci
204 - 1 (depicted as selected text 302) , using known text mal" ) that may be used for matching selected text 302 and
manipulation techniques , including click -and -dragging a a value type field element 328 that displays a pattern
text caret, and double clicking on a word . associated with the selected value -type (e .g ., regular expres
In response to a selection of log text from one of log s ion “ - ? \ d * \. ? \ d + " ) . As described earlier, value type list
messages 204, user interface 200 displays a graphical user 30 element 326 and value type field element 328 may be user
interface element, such as a button , that enables the user to editable fields , such as a drop - down list or a text field ,
execute a process for dynamically extracting a field from configured to enable the user to modify the value- type used
selected text 302 . As shown in FIG . 3B , responsive to for matching the extracted field . Examples of value -types
selecting text 302 , an “ Extract Field ” button 304 appears that may be specified by value -type list element 326 include
within field label area 212 to indicate to the user that a field , 35 integer values, decimal values , hexadecimal values , values
in addition to existing fields such as “ return _ size ," " browser - consisting of letters , digits, and underscores , Internet Pro
_ name” already displayed in field label area 212 , may be tocol (IP ) addresses v4 or v6 ,Media Access Control (MAC )
extracted using selected text 302 . The user activates (e .g ., addresses, currency values , values consisting of any char
click on ) button 304 to dynamically extract a field from acter except whitespace , and a custom pattern (e. g., regular
selected text 302 . In response to activating button 304 , log 40 expression ).
analytics module 132 automatically generates a definition of Context element 324 indicates the context determined to
an extracted field based on selected text 302 . As described match text surrounding selected text 302. In one embodi
in greater detail later, the definition of the extracted field ment, the determined context associated with the extracted
may include at least a pattern that matches selected text 302 field may be comprised of string values , patterns, or regular
and may further include a context thatmatches text portions 45 expressions thatmatch log text before and after selected text
surrounding selected text 302 . Parameters of the definition 302 . As shown in FIG . 3D , context element 324 includes a
may be displayed to the user in a definition area 320 of the " before" context input element 330 and an “ after” context
user interface 200 , which is shown in greater detail in FIG . input element 332 that display the context before and the
3D , within fields area 210 . context after the selected text 302 , respectively , as deter
As shown in FIG . 3C , user interface 200 highlights a 50 mined by log analytics module 132 . As shown, before
portion 306 of log text of log message 204 - 1 that corre - context input element 330 and after - context input element
sponds to an instance of the extracted field that matches the 332 may be editable text fields configured to enable the user
pattern ( e . g ., “ 1 . 1 ” ). User interface 200 further highlights to modify and adjust the context of the extracted field
portions 308 , 310 of log text of log message 204- 1 that initially determined by log analytics module 132 .
match the context of the extracted field ( e. g ., "HTTP /" and 55 In one embodiment, name element 334 may be a text field
“ " 200 " ). In the embodiment shown, highlighted portion 306 configured to receive text input from the user that specifies
is rendered with a first background color representing the a name or label associated with the extracted field . As
matched pattern and highlighted portions 308 , 310 are shown , user interface 200 may display the name 312 asso
rendered with a second background color different from the ciated with the extracted field within field label area 212 , for
first background color ( depicted as different textured por - 60 example , the " http _ code ” label shown in FIG . 3C .
tions). In some embodiments , the background colors may be According to one embodiment, in addition to highlighting
selected such that the first background color has a different portions of log text from the same log message of which
color saturation or intensity ( e.g ., greater color saturation , selected text 302 is a part (e .g ., log message 204 - 1), user
different intensity ) than the second background color. For interface 200 may highlight other log messages (e . g ., log
example , highlighted portion 306 corresponding to the 65 messages 204 - 2 to 204 - 5 ) that also have instances of the
instance of the extracted field may highlighted with a dark extracted field , as shown in FIG . 3E . As shown , user
green , and highlighted portions 308 , 310 corresponding to interface 200 determines log messages 204 -2 , 204 -3 , 204-5
US 10 ,042 ,834 B2
have instances of the extracted field http _ code and high - ment of the present invention . It should be recognized that ,
lights portions 306 of log text that matches the pattern even though themethod 500 is described in conjunction with
associated with a decimal value-type and portions 308 , 310 the system of FIGS. 1A - 1B , any system configured to
of log text that match the before -context “ HTTP /” and the perform the method steps is within the scope of embodi
after -context " " 200 " . For example , user interface 200 high - 5 ments of the invention .
lights portions 306 of log messages 204 - 2 and 204 -5 that The method 500 begins at step 502, where log analytics
have the value “ 1. 1” according to the definition of the module 132 displays, in a graphical user interface, a plural
extracted field . In another example , a portion 306 of log ity of log messages 204 from log data 134 generated by
message
m 204 -2 is highlighted even though log message
204 -2 has a value (e.g., “ 1.0” ) for the extracted field http - 10 tem 100 .and
software
In
infrastructure components of computing sys
one embodiment, log analytics module 132
_ code different than the selected text (e . g ., " 1. 1 ” ) from receives a stream of log data 134 generated by software and
which the definition is based , because the portion matches infrastructure components of computing system 100 . In
the definition of the extracted field . It should be appreciated other embodiments , log analytics module 132 may be con
that log analytics module 132 determines log message 204 -4
ted field hased
does not contain an instance of the extracted field based on 15 on 15 figured
g to retrieve log data ( e . g ., log files) from software and
the determined definition and therefore does not highlight 150 infrastructure components of virtualized computing system
portions of log text of log message 204 -4 . , including hypervisors 116 , guest application and oper
In some embodiments , the user may test the definition of ating systems running within VMs 112 . In some embodi
the extracted field beyond those log messages displayed in ments, software and infrastructure components of comput
the graphical user interface . As such , user interface 200 20 ing system 100 may be configured to write log files to a
includes test button 336 which the user may press to re -run common destination, such as an external storage , from
search results or filtering using the extracted field as a which log analytics module 132 may periodically retrieve
tentatively defined field . If satisfied , the user may press save log data . In another embodiment, log data 134 may be
button 338 of user interface 200 to save the definition of the imported by a user ( e .g ., system administrator ) into log
extracted field . 25 analytics module 132 using one or more file transfer meth
FIGS. 4A -4B depict screenshots of a user interface for ods .
modifying a definition of a field extracted from log data , At step 504 , log analytics module 132 receives an indi
according to one embodiment of the invention . The screen cation , via user input, to extract a field based on a specified
shots shown in FIGS . 4A -4B may be enlarged view of user portion of log text of a first log message of the plurality of
interface 200 that is displayed in log analytics module 132 30 log messages shown in the graphical user interface. In one
and depict a series of user interactions with user interface embodiment, the graphical user interface of the log analytics
200 . According to one embodiment, the user interacting with module 132 detects a text selection (e . g ., blocking ) of log
user interface 200 of log analytics module 132 may modify text in the first log message and dynamically reveals a button
the definition of the extracted field using definition area 320 . (e .g ., “ Extract field ” ) responsive to the text selection . In
For example , the user may change the value -type specified 35 some embodiments , the received indication may specify a
by value-type element 322 or edit the text fields of context string of the selected log text. In other embodiments , the
element 324 . In the example shown in FIG . 4A , the user received indication may specify a position identifier locating
edits the before -context specified by before - context input the text selection within the string (e . g ., string index 15 to
element 330 ( e. g ., by editing the text value therein ) by 20 ), which log analytics module 132 may use to parse a
deleting the characters “HT” from the existing before - 40 substring of the selected log text.
context "HTTP !” . At step 506 , responsive to receiving the indication to
In one embodiment, as the user makes changes to context extract a field , log analytics module 132 determines a pattern
element 324 , user interface 200 actively modifies the high for the extracted field that matches the specified portion of
lighted log text of log messages 204 based on the changes . log text. In one embodiment, log analytics module 132
User interface 200 highlights portions of log text in a manner 45 determines whether the specified log text matches a type of
that indicates to the user that the modified context incom - value ( or " value-type" ) based on a pre -determined list of
pletely matches existing instances of the extracted field . As patterns ( e .g ., regular expressions ). The list of patterns may
shown in FIG . 4B , user interface 200 highlights a portion be generated based on common value -types found in log
408 of log text of one or more log messages (e. g., log messages. For example , the list of patterns may include a
messages 204 - 1 ) that matches the modified context and 50 regular expression that matches an integer value ( e. g .,
further highlights another portion 406 of log text that “ - ? \d + " ), a regular expression that matches decimal values
constitutes a remainder of a text token parsed by log ( e . g ., " - ? \d * \. ?\ d + " ), regular expressions that match hexa
analyticsmodule 132 . For example , when the usermodifies decimal values, regular expressions that matches IP
the before - context to be the match string “ TP /,” user inter- addresses (e. g., "Id { 1,3 } \.\d { 1,3 } \.\d { 1,3 } \.\d { 1,3 ) ” ), regu
face 200 highlights portion 408 ( e.g ., with one background 55 lar expressions thatmatch values consisting of letters, digits ,
color, such as red ), and highlights portion 406 as the and underscores, regular expressions that match currency
remainder of the text token (i.e ., “ HT” ) differently ( e. g ., with values , regular expressions that match values consisting of
another background color, such as light pink ). In one any character except whitespace , etc . In some embodiments ,
embodiment, user interface 200 may present an alert indi- the list of patterns may have an order or priority, for
cation , such as icon 404 within the text field of context input 60 example , based on specificity or frequency of occurrence .
element 330 , which indicates to the user that the modified Log analytics module 132 iterates through the pre - deter
context does not match a full token in at least some log m ined list of patterns until one of the patterns matches the
messages, and directs the user to complete the token as specified log text. In some embodiments, if log analytics
suggested by the highlighting shown in FIG . 3E . module 132 is unable to find a match to a value-type , log
FIG . 5 is a flow diagram that illustrates steps for a method 65 analytics module 132 may then default to a pattern of any
500 for providing a user interface for analyzing log mes - characters ( e. g ., “ * ” ) , thereby relying on the context of the
sages for a computer infrastructure , according to an embodi extracted field to identify instances of the field .
US 10 ,042 ,834 B2
10
At step 508, log analytics module 132 determines a embodiment of the present invention. It should be recog
context for the extracted field based on the specified log text nized that, even though the method 600 is described in
in the first log message . In one embodiment , log analytics conjunction with the system of FIGS. 1A - 1B , any system
module 132 determines a context before and after for the configured to perform the method steps is within the scope
extracted field based on text before and after the specified 5 of embodiments of the invention . The method 600 begins at
log text. In one implementation , log analytics module 132 step 602, where log analytics module 132 receives, via user
performslexical analysis on the entire log text of the first log input, an indication that the context associated with the
message to determine tokens before and after the specified extracted field has been modified . In one embodiment, the
log text. For example , log analytics module 132 may send user interface of log analytics module 132 may detect
log text and a position of the specified log text to a lexical 10 changes to a before -context or after -context associated with
component, referred to as a tokenizer, which is configured to the extracted field as the user interacts with the user interface
break up the log text into a plurality of tokens according to ( e . g ., by typing new characters , deleting existing characters ).
one or more heuristics (e . g ., tokens are separated by At step 604 , log analytics module 132 determines whether
whitespace characters; contiguous strings of alphanumeric the modified context partially matches a token of log text
characters constitute a token ; tokens are separate by punc - 15 adjacent to an instance of the extracted field in one or more
tuation characters within certain contexts ). The tokenizer of the log messages displayed in the graphical user interface .
processes the log text and returns back tokens comprised of For example , log analytics module 132 may determine a
log text that are before and after the specified position of log modified before - context matches some, but not all, of the
text. token before the instance of the extracted field . If the
In some embodiments, log analytics module 132 may 20 modified context matches the entirety of the token of log text
determine the context for the extracted field to be the literal adjacent to an instance of the extracted field , log analytics
string of characters of the tokens before and after the module 132 may continue to highlight portions of log text as
specified log text, for example , the literal string " HTTP /” . In described in method 500 above . If the modified context
some embodiments, log analytics module 132 may gener matches none of the token of log text adjacent to an instance
alize the context of the extracted field from before - and 25 of the extracted field , log analytics module 132 may remove
after -tokens into patterns or regular expressions . In one highlighting from portions of log text to indicate to the user
implementation , log analytics module 132 may choose a that the modified context no longer matches portions of the
generalization , for example , by running the before and after plurality of log messages .
tokens through the pre - determined list of patterns, similar to At step 606 , responsive to determining the modified
a process described at step 506 earlier, and testing the 30 context partially matches an adjacent token , log analytics
generalization with similar log messages displayed by log module 132 modifies display of a portion of the adjacent
analytics module 132 to verify the generalization matches token by annotating the portion which matches the modified
common contexts . context to indicate an incomplete match to the user . In the
At step 510 , log analytics module 132 generates a defi example shown in FIG . 4B , log analytics module 132 may
nition of the extracted field having the determined pattern 35 determine the token adjacent to the specified log text “ 1. 1” ,
and context. In some embodiments , log analytics module the token containing the log text “ HTTP /” , partially matches
132 may save the definition for later use by an individual the new before -context " TP /” . As such , log analytics module
user or for a plurality of users accessing log analytics 132 highlights the portion “ TP /” (as depicted by textured
module 132 . In some embodiments, log analytics module portion 408 ) to indicate to the user that only part of the token
132 may assign a name to the extracted field ( e . g ., " http _ 40 matches the new before context “ TP /" .
code ” ) or receive a name via user input for the extracted In one embodiment, log analytics module 132 highlights
field . thematching portion of the token using a first warning color.
At step 512 , log analytics module 132 modifies display of In some embodiments , the first warning color may be
the plurality of log messages , including the first log mes - different than colors used for highlighting as described in
sage , in the graphical user interface which have instances of 45 steps 510 and 512 earlier. For example , a matched value may
the extracted field according to the generated definition . A be highlighted in dark green , matched context may be
particular log message may be deemed to have an instance highlighted in light green , and partially matched context
of the extracted field if the log message satisfies the pattern may be highlighted in red .
and the context of the extracted field . In one embodiment, At step 608 , log analytics module 132 further modifies
log analytics module 132 annotates a first portion of log text 50 display of a remainder of the token adjacent to the instance
of at least one log message that matches the pattern , and of the extracted field by annotating the remaining portion of
annotates a second portion of the log text that matches the the token to suggest a completion of the modified context to
context. For example , log analytics module 132 applies text the user. In the example shown in FIG . 4B , log analytics
highlighting to log text thatmatches the pattern of extracted module 132 highlights the remainder of the token “ HTTP /"
field and applies additional text highlighting to log text that 55 which does not match the modified context (i.e., highlights
matches the context of the extracted field . In embodiments the log text “ HT” ) to suggest, to the user, that adding "HT”
where log analytics module 132 is a web application , log to the before -context “ TP /” would complete a pattern that
analytics module 132 provides live , client- side highlighting matches that adjacent token in the first log message 204 - 1 .
of log text in the graphical user interface , for example, using In one embodiment, log analytics module 132 highlights
JavaScript, HTML5 , or other client- side technologies, to 60 the remainder of the adjacent token using a second warning
apply the regular expressions of the extracted field to the color , the second warning color being different than the first
plurality of log messages . Embodiments of the invention warning color. The log analytics module 132 displays the
provide highlighting of log messages for visual feedback to remainder of the token using the second warning color to
the user of the accuracy and precision of the extracted field . indicate to the user how to complete the token as suggested
FIG . 6 is a flow diagram that illustrates steps for a method 65 by the highlighting ( e .g ., by typing in the remainder of the
600 for providing a user interface formodifying a definition token in the before context text field ). In some embodiments ,
of an extracted field from log messages , according to an the first warning color may be different from the second
US 10 ,042 ,834 B2
12
warning color. For example , the warning colors may be memory , random - access memory ( e .g ., a flash memory
selected such that the first warning color has greater color device ), a CD -ROM (Compact Disc -ROM ), a CD - R , or a
saturation or different color intensity than the second warn CD - RW , a DVD ( Digital Versatile Disc ), a magnetic tape,
ing color. Specifically, the partially matched context may be and other optical and non -optical data storage devices . The
highlighted in red , and the remainder of the token may be 5 computer readable medium can also be distributed over a
highlighted in light pink . network coupled computer system so that the computer
Accordingly, embodiments of the present invention pro readable code is stored and executed in a distributed fashion .
vide a technique for dynamically extracting fields from Plural instances may be provided for components, opera
unstructured log data generated by many software and tions or structures described herein as a single instance .
infrastructure components of a computer system 100 . In 10 Finally , boundaries between various components , operations
contrast to conventional approaches , embodiments and data stores are somewhat arbitrary , and particular opera
described herein advantageously reduces the need for users tions are illustrated in the context of specific illustrative
to learn complex , technical programming to specify fields configurations. Other allocations of functionality are envi
found within log data . Embodiments of the invention pro - sioned and may fall within the scope of the invention ( s ). In
vide live highlighting which changes highlighting of log text 15 general, structures and functionality presented as separate
while the user is typing, thereby assisting the user in components in exemplary configurations may be imple
understanding if their field parameters are selecting the log mented as a combined structure or component. Similarly,
data the user intends to select. structures and functionality presented as a single component
Although one or more embodiments of the present inven - may be implemented as separate components . These and
tion have been described in some detail for clarity of 20 other variations, modifications , additions, and improve
understanding, it will be apparent that certain changes and ments may fall within the scope of the appended claims( s ).
modifications may be made within the scope of the claims.
Accordingly, the described embodiments are to be consid What is claimed is :
ered as illustrative and not restrictive , and the scope of the 1 . A method for displaying a graphical user interface for
claims is not to be limited to details given herein , butmay 25 analyzing unstructured data , the method comprising:
be modified within the scope and equivalents of the claims. displaying a plurality of items of unstructured data ,
In the claims, elements and / or steps do not imply any including a first item of unstructured data comprised of
particular order of operation , unless explicitly stated in the text;
claims. receiving an indication to extract a field based on a
The various embodiments described herein may employ 30 specified portion of text of the first item ;
various computer - implemented operations involving data generating, by operation of one or more processing units ,
stored in computer systems. For example , these operations a definition of the extracted field having (1 ) a pattern
may require physical manipulation of physical quantities that matches the specified portion of the text, and ( 2 ) a
which usually , though not necessarily, take the form of context for the extracted field , wherein the context is
electrical or magnetic signals where they, or representations 35 determined based on the specified portion of the first
of them , are capable of being stored , transferred , combined , item ;
compared , or otherwise manipulated . Further, such manipu annotating a first portion of the text of the first item which
lations are often referred to in terms, such as producing , matches the pattern ; and
identifying, determining, or comparing . Any operations annotating a second portion of the text of the first item
described herein that form part of one or more embodiments 40 which matches the context.
of the invention may be useful machine operations . In 2 . The method of claim 1 , wherein receiving the indica
addition , one or more embodiments of the invention also tion to extract the field based on the specified portion of text
relate to a device or an apparatus for performing these of the first item further comprises receiving a text selection ,
operations. The apparatus may be specially constructed for from a user via the graphical user interface, which indicates
specific required purposes , or it may be a general purpose 45 the specified portion of text.
computer selectively activated or configured by a computer 3 . The method of claim 1 , wherein the pattern associated
program stored in the computer. In particular, various gen with the definition of the extracted field is a value type
eral purpose machines may be used with computer programs determined based on a match from a pre - determined list of
written in accordance with the description provided herein , regular expressions .
or it may be more convenient to construct a more specialized 50 4 . The method of claim 1 , wherein the context associated
apparatus to perform the required operations. with the definition of the extracted field comprises a before
The various embodiments described herein may be prac pattern that matches a token of text before an instance of the
ticed with other computer system configurations including extracted field and an after pattern that matches a token of
hand-held devices, microprocessor systems, microproces text after the instance of the extracted field .
sor- based or programmable consumer electronics ,minicom - 55 5 . The method of claim 1 , wherein annotating of the first
puters , mainframe computers, and the like. One or more and second portions of the text of the first item comprises :
embodiments of the present invention may be implemented highlighting the first portion of the text using a first color ;
as one or more computer programs or as one or more and
computer program modules embodied in one or more com highlighting the second portion of the text using a second
puter readable media . The term computer readable medium 60 color, wherein the first color has different color inten
refers to any data storage device that can store data which sity than the second color.
can thereafter be input to a computer system ; computer 6 . The method of claim 1, further comprising:
readable media may be based on any existing or subse annotating the plurality of items of unstructured data in
quently developed technology for embodying computer pro the graphical user interface , such that for each of the
grams in a manner that enables them to be read by a 65 plurality of items of unstructured data having an
computer. Examples of a computer readable medium include instance of the extracted field that satisfies the gener
a hard drive , network attached storage (NAS), read -only ated definition :
US 10 ,042 ,834 B2
13 14
annotating a first portion of the item to indicate a match annotating the plurality of items of unstructured data in
with the pattern of the extracted field associated with the graphical user interface :
the item ; and for each of the plurality of items of unstructured data
annotating a second portion of the item , the second having an instance of the extracted field that satisfies
portion which matches with the context for the 5 the generated definition , annotating a first portion of the
extracted field . item to indicate a match with the pattern of the
7 . The method of claim 1 , further comprising: extracted field associated with the item and annotating
receiving an indication that the context associated with a second portion of the item which matches the context
the extracted field has been modified ;
annotating the second portion of the first item to indicate 10 for the extracted field .
an incomplete match with the modified context. 15 . The non -transitory computer readable storage medium
8 . The method of claim 7 , wherein the annotating the of claim 9 , further comprising:
second portion to indicate the incomplete match with the receiving an indication that the context associated with
modified context further comprises: the extracted field has been modified ; and
determining the modified context partially matches a 1515 annotating the second portion of the first item to indicate
an incomplete match with the modified context .
token of text adjacent to an instance of the extracted 16 . The non - transitory computer readable storage medium
field that matches the pattern ;
highlighting a portion of the token that matches the of claim 15 , wherein annotating the second portion to
modified context with a first color ; and indicate the incomplete match with the modified context
highlighting a remainder of the token with a second color, 20 further comprises:
wherein the first color has a different color intensity determining the modified context partially matches a
than the second color. token of text adjacent to an instance of the extracted
9 . A non- transitory computer readable storage medium field that matches the pattern ;
having stored thereon computer software executable by a annotating a portion of the token that matches the modi
processor, the computer software embodying a method for 25 fied context with a first color; and
displaying a graphical user interface for analyzing unstruc annotating a remainder of the token with a second color,
tured data , the method comprising: wherein the first color has a different color intensity
displaying a plurality of items of unstructured data , than the second color .
including a first item of unstructured data comprised of 17 . A computer system for displaying a graphical user
text; 30 interface for analyzing unstructured data for a computing
receiving an indication to extract a field based on a environment, the computer system comprising :
specified portion of text of the first item ; a system memory ;
generating a definition of the extracted field having ( 1 ) a a storage device having a plurality of items of unstruc
pattern that matches the specified portion of the text, tured data including a first item of unstructured data
and ( 2 ) a context for the extracted field , wherein the 35 comprised of text; and
context is determined based on the specified portion of a processor programmed to carry out the steps of:
the first item ; displaying the plurality of items of unstructured data ;
annotating a first portion of the text of the first item which receiving an indication to extract a field based on a
matches the pattern ; and specified portion of text of the first item ;
annotating a second portion of the text of the first item 40 generating a definition of the extracted field having ( 1 )
which matches the context. a pattern that matches the specified portion of the
10 . The non -transitory computer readable storage medium text, and (2 ) a context for the extracted field , wherein
of claim 9 , wherein receiving the indication to extract the the context is determined based on the specified
field based on the specified portion of text further comprises : portion of the first item ;
receiving a text selection , from a user via the graphical 45 modifying display of a first portion of the text of the
user interface, which indicates the specified portion of first item which matches the pattern ; and
text. modifying display of a second portion of the text of the
11 . The non -transitory computer readable storagemedium first item which matches the context.
of claim 9 , wherein the pattern associated with the definition 18 . The computer system of claim 17 , wherein the context
of the extracted field is a value type determined based on a 50 associated with the definition of the extracted field com
match from a pre -determined list of regular expressions prises a before pattern that matches a token of text before an
12 . The non - transitory computer readable storagemedium instance of the extracted field and an after pattern that
of claim 9 , wherein the context associated with the definition matches a token of text after the instance of the extracted
of the extracted field comprises a before pattern thatmatches field .
a token of text before an instance of the extracted field and 55 19 . The computer system of claim 17 , wherein the pro
an after pattern thatmatches a token of text after the instance cessor programmed to carry out the step of annotating the
of the extracted field . first and second portions of text of the first item is further
13 . The non -transitory computer readable storage medium programmed to carry out the steps of:
of claim 9, wherein annotating display of the first and second annotating the first portion of the text using a first color ;
portions of text of the first item comprises : and
annotating the first portion of the text using a first color; annotating the second portion of the text using a second
and color, wherein the first color has different color inten
annotating the second portion of the text using a second sity than the second color.
color , wherein the first color has different color inten 20. The computer system of claim 17, wherein the pro
sity than the second color. 65 cessor is further programmed to carry out the steps of:
14 . The non - transitory computer readable storage medium annotating the plurality of items of unstructured data in
of claim 9 , further comprising : the graphical user interface, such that for each of the
US 10 ,042 ,834 B2
15
plurality of items of unstructured data having an
instance of the extracted field that satisfies the gener
ated definition :
annotating a first portion of the item to indicate a match
with the pattern of the extracted field associated with 5
the item ; and
annotating a second portion of the item , the second
portion which matches with the context for the
extracted field .
* * * * * 10

You might also like