You are on page 1of 4

CTCORP Link.

ML
Process Steps:

1. Website Crawling
Websites are crawled using Sequentum. All information for an
entity will be crawled and harvested which will be stored in an
excel file. The excel file is an intermediate file which will be used
for further processing.

2. Database Uploading and Normalization

The output of Sequentum (in excel format) will be uploaded to a


database. The field names of each entity name for every different
state are normalized for easy implementation. For example, an
Entity Name information can be called differently across states. In
the normalized database, Entity name should be assigned with a
one name and the same will apply to other information as well.
During normalization process, the application detects which
information are new, which are updates, and which are for
removal. This is feasible through the detection process or
functionality in Sequentum.

The uploading and normalization process, operates on backend.

3. Entity Searching (Link.ML)

This is a web base search engine that is conceptualize based on


project needs. Highly configurable and easily to change or
enhance as this is being developed inhouse.
4. On Demand Crawling

On Demand Crawling is when an entity needs to be crawled out of


the prescribe crawling schedule by which there is a need to
harvest the updated information. Once an entity name is marked
for ‘On Demand Crawling’. The user can extract the list of entity
names and fed that to Connotate using the existing Connotate
set-up and license. The user can do this on a defined schedule or
anytime of the day for any urgent need.

You might also like