Professional Documents
Culture Documents
When the initial corpus is developed, it is likely that a lot of data will be imported
using extract‐transform‐load (ETL) tools. These tools may have risk
management, security, and regulatory features to help the user guard against data
misuse or provide guidance when sources are known to contain sensitive data.
The availability of these tools doesn’t absolve the developers from responsibility
to ensure that the data and metadata is in compliance with applicable rules and
regulations. Protected data may be ingested (for example, personal identifiers) or
generated (for example, medical diagnoses) when the corpus is updated by the
cognitive computing system. Planning for good corpus management should
include a plan to monitor relevant policies that impact data in the corpus. The data
access layer tools described in the next section must be accompanied by or embed
compliance policies and procedures to ensure that imported and derived data and
metadata remain in compliance. That includes consideration of various
deployment modalities, such as cloud computing, which may distribute data
across geopolitical boundaries.