A Comprehensive Analysis of Thales’ Machine Learning-Powered Data Discovery and Classification Solution
In the rapidly expanding datasphere, where an estimated 80% of data is unstructured, traditional manual data classification methods are becoming obsolete due to their labor-intensive and error-prone nature. Thales addresses this challenge with CipherTrust Data Discovery and Classification (DDC), a solution enhanced by Machine Learning (ML) models. These models are crucial in automating data classification, significantly improving accuracy and efficiency in complex hybrid IT environments.
The first step in data classification is data discovery, crucial for compliance with global data protection regulations. CipherTrust DDC effectively navigates diverse data repositories, classifying data based on sensitivity and risk, whether it’s stored on-premises, in third-party servers, or in the cloud.
Thales’ innovative approach combines pattern matching with ML to establish meaningful relationships between disparate data points. This hybrid technique not only locates data across IT systems but also contextualizes it, enhancing classification accuracy. The ML component is versatile, utilizing different models for various tasks like categorization and Named Entity Recognition (NER).
CipherTrust DDC’s pattern matching, powered by Ground Labs’ GLASS™ engine, covers a wide range of data types and complies with numerous data privacy laws. This includes personal, financial, and health data, as well as potentially compromised information like hardcoded private keys.
NER, a key feature of CipherTrust DDC, utilizes Natural Language Processing (NLP) to extract entities like names and dates from unstructured text, eliminating the need for manual analysis. This process is highly scalable and adaptable to various document types and languages.
Lastly, CipherTrust DDC employs ML for category classification, determining the nature of documents, such as financial or legal, based on their content. This ability to categorize documents accurately is pivotal in identifying and protecting Personally Identifiable Information (PII).
Key Takeaways for Cybersecurity Professionals:
- Embrace ML in data classification to handle the growing volume and complexity of unstructured data.
- Utilize hybrid approaches combining pattern matching and ML for comprehensive data discovery and classification.
- Leverage advanced ML models like NER for scalable and efficient data analysis.
- Ensure compliance with data protection laws by using tools that cover a wide range of data types and privacy regulations.
Link to the article: Data Classification with Machine Learning in CipherTrust DDC