Command line example¶
This brief example shows using anonlink
- the command line tool that is packaged with the
anonlink-client
library. It is not a requirement to use anonlink-client
with the Entity Service REST API.
We assume you have access to a command line prompt with Python and Pip installed.
Install anonlink-client
:
$ pip install anonlink-client
Generate and split some mock personally identifiable data:
$ anonlink generate 2000 raw_pii_2k.csv
$ head -n 1 raw_pii_2k.csv > alice.txt
$ tail -n 1500 raw_pii_2k.csv >> alice.txt
$ head -n 1000 raw_pii_2k.csv > bob.txt
A corresponding hashing schema can be generated as well:
$ anonlink generate-default-schema schema.json
Process the personally identifying data into Cryptographic Longterm Key:
$ anonlink hash alice.txt horse_staple schema.json alice-hashed.json
generating CLKs: 100%|████████████████████████████████████████████| 1.50K/1.50K [00:00<00:00, 6.69Kclk/s, mean=522, std=34.4]
CLK data written to alice-hashed.json
$ anonlink hash bob.txt horse_staple schema.json bob-hashed.json
generating CLKs: 100%|████████████████████████████████████████████| 999/999 [00:00<00:00, 5.14Kclk/s, mean=520, std=34.2]
CLK data written to bob-hashed.json
Now to interact with an Entity Service. First check that the service is healthy and responds to a status check:
$ anonlink status --server https://anonlink.easd.data61.xyz
{"rate": 53129, "status": "ok", "project_count": 1410}
Then create a new linkage project and set the output type (to groups
):
$ anonlink create-project \
--server https://anonlink.easd.data61.xyz \
--type groups \
--schema schema.json \
--output credentials.json
The entity service replies with a project id and credentials which get saved into the file credentials.json
.
The contents is two upload tokens and a result token:
{
"update_tokens": [
"21d4c9249e1c70ac30f9ce03893983c493d7e90574980e55",
"3ad6ae9028c09fcbc7fbca36d19743294bfaf215f1464905"
],
"project_id": "809b12c7e141837c3a15be758b016d5a7826d90574f36e74",
"result_token": "230a303b05dfd186be87fa65bf7b0970fb786497834910d1"
}
These credentials get substituted in the following commands. Each CLK dataset gets uploaded to the Entity Service:
$ anonlink upload --server https://anonlink.easd.data61.xyz \
--apikey 21d4c9249e1c70ac30f9ce03893983c493d7e90574980e55 \
--project 809b12c7e141837c3a15be758b016d5a7826d90574f36e74 \
alice-hashed.json
{"receipt_token": "05ac237462d86bc3e2232ae3db71d9ae1b9e99afe840ee5a", "message": "Updated"}
$ clkutil upload --server https://anonlink.easd.data61.xyz \
--apikey 3ad6ae9028c09fcbc7fbca36d19743294bfaf215f1464905 \
--project 809b12c7e141837c3a15be758b016d5a7826d90574f36e74 \
bob-hashed.json
{"receipt_token": "6d9a0ee7fc3a66e16805738097761d38c62ea01a8c6adf39", "message": "Updated"}
Now we can compute linkages using various thresholds. For example to only see relationships where the
similarity is above 0.9
:
$ anonlink create --server https://anonlink.easd.data61.xyz \
--apikey 230a303b05dfd186be87fa65bf7b0970fb786497834910d1 \
--project 809b12c7e141837c3a15be758b016d5a7826d90574f36e74 \
--name "Tutorial mapping run" \
--threshold 0.9
{"run_id": "31a6d3c775151a877dcac625b4b91a6659317046ea45ad11", "notes": "Run created by anonlink-client 0.1.2", "name": "Tutorial mapping run", "threshold": 0.9}
After a small delay the linkage result will have been computed and we can use anonlink
to retrieve it:
$ anonlink results --server https://anonlink.easd.data61.xyz \
--apikey 230a303b05dfd186be87fa65bf7b0970fb786497834910d1 \
--project 809b12c7e141837c3a15be758b016d5a7826d90574f36e74 \
--run 31a6d3c775151a877dcac625b4b91a6659317046ea45ad11
State: completed
Stage (3/3): compute output
Downloading result
Received result
{
"groups": [
[
[0, 403],
[1, 903]
],
[
[0, 402],
[1, 092]
],
[
[0, 401],
[1, 901]
],
...
This output shows the linked pairs between Alice and Bob that have a similarity above 0.9.
Looking at the corresponding entities in Alice’s data:
head -n 405 alice.txt | tail -n 3
901,Sandra Boone,1974/10/30,F
902,Lucas Hernandez,1937/06/11,M
903,Ellis Stevens,2008/06/02,M
And the corresponding entities in Bob’s data:
head -n 905 bob.txt | tail -n 3
901,Sandra Boone,1974/10/30,F
902,Lucas Hernandez,1937/06/11,M
903,Ellis Stevens,2008/06/02,M