<< Click to Display Table of Contents >> Navigation: 5. Detailed description of the Actions > 5.25. Other Transformations Actions (TA) > 5.25.1. Encrypt (High-Speed action) |
Icon:
Function: Encrypt
Property window:
Short description:
Encrypt/Decrypt some fields.
Long Description:
Encryption and Decryption are based on a symmetric key (i.e. there is one unique key that allow encryption and decryption). The key is save inside a “Key File”. You need a “Key File” to encrypt or decrypt your data.
To select a “Key File” (using the “Browse” button) or to create a new “Key File” (using the “Create New Key File” button), you need to switch to “Expert-user-mode”. To switch to expert-user-mode: Click the button in the main toolbar of the application! Once you are in “expert-user-mode”, the “Browse” button and the “Create New Key” button are enabled.
When you click the “Create New Key” button, Anatella opens this Window:
By moving randomly your mouse inside this window, you generate random numbers that are used to create a 100% random key. This encryption key is saved inside a “Key File” or inside a string (this last option allows to use a “Global Parameter” to specify the key when decrypting the data).
NOTE:
Never lose your “Key File”. If you lose your key file, you’ll never be able to decrypt your data later.
NOTE:
Never send your “Key File” to third parties.
NOTE:
The encryption algorithm that is used is DES (for the short keys) and 3DES (for the long keys). It’s a well-studied encryption algorithm that does not seem to have any weakness.
The encryption algorithm used inside Anatella is symmetric. This guarantees that there will never be any “collisions”. For example: Let’s assume that you are encrypting many MSISDN (i.e. many phone numbers): because there are no collisions, the number of distinct MSISDN before and after encryption is the same. There will never be 2 different un-encrypted MSISDN that are “mapped” to the same encrypted MSISDN (i.e. there are no collisions, never).
Since there are no collisions, you can safely use the encrypt Action to anonymize your datasets. In particular, when anonymizing datasets containing MSISDN numbers, you’ll lose, after encryption, some precious information about the MSISDN. The lost information is:
•Is it a “short” phone number? (e.g. like the voice-mail number)
•Is it an international call?
These pieces of information are *very* important when analyzing communication-graphs using SNA (Social Network Analysis) algorithms. You can use:
•The “Extract Original Prefixes” option to keep un-encrypted the first few digits of the MSISDN (This allows to detect international calls).
•The “Extract Original Lengths” option to save the length of the un-encrypted MSISDN (This allows to detect “short” phone number like the voice-mail number).
Anonymizing a dataset using a non-symetric encoding (such as MD5) can lead to some “collisions”. Non-symetric encodings (such as MD5) are thus bad and dangerous alternatives when anonymizing some datasets.
Let’s take an example. Let’s assume that you are anonymizing 2 million different MSISDN using a 5-characters-MD5-code. A 5-characters-MD5-code can only have, at maximum, 1 million different values (=165). This means that you will have a *catastrophic* number of collisions that will make your anonymized dataset completely useless (Actually, even if you use, on the same population, a 6-character-MD5-code, there are 99% chance that you’ll also have so many collisions that your anonymized dataset is also useless).