{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {}
},
"outputs": [],
"source": [
"import csv\n",
"import json\n",
"import os\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"KEY1 = 'correct'\n",
"KEY2 = 'horse'\n",
"\n",
"SERVER = os.getenv(\"SERVER\", \"https://testing.es.data61.xyz\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"# Scenario\n",
"\n",
"There are three parties named Alice, Bob, and Charlie, each holding a dataset of about 3200 records. They know that they have some entities in common, but with incomplete overlap. The common features describing those entities are given name, surname, date of birth, and phone number.\n",
"\n",
"They all have some additional information about those entities in their respective datasets, Alice has a person's gender, Bob has their city, and Charlie has their income. They wish to create a table for analysis: each row has a gender, city, and income, but they don't want to share any additional information. They can use Anonlink to do this in a privacy-preserving way (without revealing given names, surnames, dates of birth, and phone numbers)."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Alice, Bob, and Charlie: agree on secret keys and a linkage schema\n",
"\n",
"They keep the keys to themselves, but the schema may be revealed to the analyst."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"keys: correct, horse\n"
]
}
],
"source": [
"print(f'keys: {KEY1}, {KEY2}')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"{\n",
" \"version\": 2,\n",
" \"clkConfig\": {\n",
" \"l\": 1024,\n",
" \"kdf\": {\n",
" \"type\": \"HKDF\",\n",
" \"hash\": \"SHA256\",\n",
" \"salt\": \"SCbL2zHNnmsckfzchsNkZY9XoHk96P/G5nUBrM7ybymlEFsMV6PAeDZCNp3rfNUPCtLDMOGQHG4pCQpfhiHCyA==\",\n",
" \"info\": \"c2NoZW1hX2V4YW1wbGU=\",\n",
" \"keySize\": 64\n",
" }\n",
" },\n",
" \"features\": [\n",
" {\n",
" \"identifier\": \"id\",\n",
" \"ignored\": true\n",
" },\n",
" {\n",
" \"identifier\": \"givenname\",\n",
" \"format\": {\n",
" \"type\": \"string\",\n",
" \"encoding\": \"utf-8\"\n",
" },\n",
" \"hashing\": {\n",
" \"ngram\": 2,\n",
" \"positional\": false,\n",
" \"strategy\": {\"k\": 15}\n",
" }\n",
" },\n",
" {\n",
" \"identifier\": \"surname\",\n",
" \"format\": {\n",
" \"type\": \"string\",\n",
" \"encoding\": \"utf-8\"\n",
" },\n",
" \"hashing\": {\n",
" \"ngram\": 2,\n",
" \"positional\": false,\n",
" \"strategy\": {\"k\": 15}\n",
" }\n",
" },\n",
" {\n",
" \"identifier\": \"dob\",\n",
" \"format\": {\n",
" \"type\": \"string\",\n",
" \"encoding\": \"utf-8\"\n",
" },\n",
" \"hashing\": {\n",
" \"ngram\": 2,\n",
" \"positional\": true,\n",
" \"strategy\": {\"k\": 15}\n",
" }\n",
" },\n",
" {\n",
" \"identifier\": \"phone number\",\n",
" \"format\": {\n",
" \"type\": \"string\",\n",
" \"encoding\": \"utf-8\"\n",
" },\n",
" \"hashing\": {\n",
" \"ngram\": 1,\n",
" \"positional\": true,\n",
" \"strategy\": {\"k\": 8}\n",
" }\n",
" },\n",
" {\n",
" \"identifier\": \"ignoredForLinkage\",\n",
" \"ignored\": true\n",
" }\n",
" ]\n",
"}\n",
"\n"
]
}
],
"source": [
"with open('data/schema_ABC.json') as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"# Sneak peek at input data\n",
"### Alice"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" givenname | \n",
" surname | \n",
" dob | \n",
" phone number | \n",
" gender | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0 | \n",
" tara | \n",
" hilton | \n",
" 27-08-1941 | \n",
" 08 2210 0298 | \n",
" male | \n",
"
\n",
" \n",
" | 1 | \n",
" 3 | \n",
" saJi | \n",
" vernre | \n",
" 22-12-2972 | \n",
" 02 1090 1906 | \n",
" mals | \n",
"
\n",
" \n",
" | 2 | \n",
" 7 | \n",
" sliver | \n",
" paciorek | \n",
" NaN | \n",
" NaN | \n",
" mals | \n",
"
\n",
" \n",
" | 3 | \n",
" 9 | \n",
" ruby | \n",
" george | \n",
" 09-05-1939 | \n",
" 07 4698 6255 | \n",
" male | \n",
"
\n",
" \n",
" | 4 | \n",
" 10 | \n",
" eyrinm | \n",
" campbell | \n",
" 29-1q-1983 | \n",
" 08 299y 1535 | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id givenname surname dob phone number gender\n",
"0 0 tara hilton 27-08-1941 08 2210 0298 male\n",
"1 3 saJi vernre 22-12-2972 02 1090 1906 mals\n",
"2 7 sliver paciorek NaN NaN mals\n",
"3 9 ruby george 09-05-1939 07 4698 6255 male\n",
"4 10 eyrinm campbell 29-1q-1983 08 299y 1535 male"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv('data/dataset-alice.csv').head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"### Bob"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" givenname | \n",
" surname | \n",
" dob | \n",
" phone number | \n",
" city | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 3 | \n",
" zali | \n",
" verner | \n",
" 22-12-1972 | \n",
" 02 1090 1906 | \n",
" perth | \n",
"
\n",
" \n",
" | 1 | \n",
" 4 | \n",
" samuel | \n",
" tremellen | \n",
" 21-12-1923 | \n",
" 03 3605 9336 | \n",
" melbourne | \n",
"
\n",
" \n",
" | 2 | \n",
" 5 | \n",
" amy | \n",
" lodge | \n",
" 16-01-1958 | \n",
" 07 8286 9372 | \n",
" canberra | \n",
"
\n",
" \n",
" | 3 | \n",
" 7 | \n",
" oIji | \n",
" pacioerk | \n",
" 10-02-1959 | \n",
" 04 4220 5949 | \n",
" sydney | \n",
"
\n",
" \n",
" | 4 | \n",
" 10 | \n",
" erin | \n",
" kampgell | \n",
" 29-12-1983 | \n",
" 08 2996 1445 | \n",
" perth | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id givenname surname dob phone number city\n",
"0 3 zali verner 22-12-1972 02 1090 1906 perth\n",
"1 4 samuel tremellen 21-12-1923 03 3605 9336 melbourne\n",
"2 5 amy lodge 16-01-1958 07 8286 9372 canberra\n",
"3 7 oIji pacioerk 10-02-1959 04 4220 5949 sydney\n",
"4 10 erin kampgell 29-12-1983 08 2996 1445 perth"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv('data/dataset-bob.csv').head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Charlie"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" givenname | \n",
" surname | \n",
" dob | \n",
" phone number | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" joshua | \n",
" arkwright | \n",
" 16-02-1903 | \n",
" 04 8511 9580 | \n",
" 70189.446 | \n",
"
\n",
" \n",
" | 1 | \n",
" 3 | \n",
" zal: | \n",
" verner | \n",
" 22-12-1972 | \n",
" 02 1090 1906 | \n",
" 50194.118 | \n",
"
\n",
" \n",
" | 2 | \n",
" 7 | \n",
" oliyer | \n",
" paciorwk | \n",
" 10-02-1959 | \n",
" 04 4210 5949 | \n",
" 31750.993 | \n",
"
\n",
" \n",
" | 3 | \n",
" 8 | \n",
" nacoya | \n",
" ranson | \n",
" 17-08-1925 | \n",
" 07 6033 4580 | \n",
" 102446.131 | \n",
"
\n",
" \n",
" | 4 | \n",
" 10 | \n",
" erih | \n",
" campbell | \n",
" 29-12-1i83 | \n",
" 08 299t 1435 | \n",
" 331476.599 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id givenname surname dob phone number income\n",
"0 1 joshua arkwright 16-02-1903 04 8511 9580 70189.446\n",
"1 3 zal: verner 22-12-1972 02 1090 1906 50194.118\n",
"2 7 oliyer paciorwk 10-02-1959 04 4210 5949 31750.993\n",
"3 8 nacoya ranson 17-08-1925 07 6033 4580 102446.131\n",
"4 10 erih campbell 29-12-1i83 08 299t 1435 331476.599"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv('data/dataset-charlie.csv').head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Analyst: create the project\n",
"\n",
"The analyst keeps the result token to themselves. The three update tokens go to Alice, Bob and Charlie. The project ID is known by everyone."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Project created\n"
]
}
],
"source": [
"!clkutil create-project --server $SERVER --type groups --schema data/schema_ABC.json --parties 3 --output credentials.json\n",
"\n",
"with open('credentials.json') as f:\n",
" credentials = json.load(f)\n",
" project_id = credentials['project_id']\n",
" result_token = credentials['result_token']\n",
" update_token_alice = credentials['update_tokens'][0]\n",
" update_token_bob = credentials['update_tokens'][1]\n",
" update_token_charlie = credentials['update_tokens'][2]"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Alice: hash the data and upload it to the server\n",
"The data is hashed according to the schema and the keys. Alice's update token is needed to upload the hashed data. No PII is uploaded to the service—only the hashes."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"generating CLKs: 0%| | 0.00/3.23k [00:00, ?clk/s, mean=0, std=0]\n",
"generating CLKs: 6%|6 | 200/3.23k [00:02<00:31, 96.1clk/s, mean=372, std=32.6]\n",
"generating CLKs: 25%|##4 | 800/3.23k [00:02<00:17, 136clk/s, mean=371, std=35.5] \n",
"generating CLKs: 63%|######2 | 2.03k/3.23k [00:02<00:06, 193clk/s, mean=372, std=34.7]\n",
"generating CLKs: 100%|##########| 3.23k/3.23k [00:02<00:00, 1.29kclk/s, mean=372, std=34.9]\n",
"CLK data written to dataset-alice-hashed.json\n"
]
}
],
"source": [
"!clkutil hash data/dataset-alice.csv $KEY1 $KEY2 data/schema_ABC.json dataset-alice-hashed.json --check-header false"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"message\": \"Updated\", \"receipt_token\": \"c54597f32fd969603efba706af1556abee3cc35f2718bcb6\"}\n"
]
}
],
"source": [
"!clkutil upload --server $SERVER --apikey $update_token_alice --project $project_id dataset-alice-hashed.json"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Bob: hash the data and upload it to the server"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"generating CLKs: 0%| | 0.00/3.24k [00:00, ?clk/s, mean=0, std=0]\n",
"generating CLKs: 6%|6 | 200/3.24k [00:01<00:25, 119clk/s, mean=369, std=32.4]\n",
"generating CLKs: 31%|### | 1.00k/3.24k [00:01<00:13, 168clk/s, mean=371, std=35]\n",
"generating CLKs: 56%|#####5 | 1.80k/3.24k [00:01<00:06, 238clk/s, mean=371, std=35.5]\n",
"generating CLKs: 100%|##########| 3.24k/3.24k [00:02<00:00, 1.45kclk/s, mean=372, std=35.3]\n",
"CLK data written to dataset-bob-hashed.json\n"
]
}
],
"source": [
"!clkutil hash data/dataset-bob.csv $KEY1 $KEY2 data/schema_ABC.json dataset-bob-hashed.json --check-header false"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"message\": \"Updated\", \"receipt_token\": \"6ee2fe5df850b795ee6ddff1aaf4dfb03f6d4398dedcc248\"}\n"
]
}
],
"source": [
"!clkutil upload --server $SERVER --apikey $update_token_bob --project $project_id dataset-bob-hashed.json"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Charlie: hash the data and upload it to the server"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"generating CLKs: 0%| | 0.00/3.26k [00:00, ?clk/s, mean=0, std=0]\n",
"generating CLKs: 6%|6 | 200/3.26k [00:01<00:24, 122clk/s, mean=371, std=33.3]\n",
"generating CLKs: 55%|#####5 | 1.80k/3.26k [00:01<00:08, 174clk/s, mean=372, std=34.5]\n",
"generating CLKs: 100%|##########| 3.26k/3.26k [00:01<00:00, 1.73kclk/s, mean=372, std=34.8]\n",
"CLK data written to dataset-charlie-hashed.json\n"
]
}
],
"source": [
"!clkutil hash data/dataset-charlie.csv $KEY1 $KEY2 data/schema_ABC.json dataset-charlie-hashed.json --check-header false"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"message\": \"Updated\", \"receipt_token\": \"064664ed9fd1f58c4da05c62a4832b813276d09342137a42\"}\n"
]
}
],
"source": [
"!clkutil upload --server $SERVER --apikey $update_token_charlie --project $project_id dataset-charlie-hashed.json"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Analyst: start the linkage run\n",
"\n",
"This will start the linkage computation. We will wait a little bit and then retrieve the results."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"pycharm": {}
},
"outputs": [],
"source": [
"!clkutil create --server $SERVER --project $project_id --apikey $result_token --threshold 0.7 --output=run-credentials.json\n",
"\n",
"with open('run-credentials.json') as f:\n",
" run_credentials = json.load(f)\n",
" run_id = run_credentials['run_id']"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Analyst: retreve the results"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"State: completed\n",
"Stage (3/3): compute output\n",
"State: completed\n",
"Stage (3/3): compute output\n",
"State: completed\n",
"Stage (3/3): compute output\n",
"Downloading result\n",
"Received result\n"
]
}
],
"source": [
"!clkutil results --server $SERVER --project $project_id --apikey $result_token --run $run_id --watch --output linkage-output.json"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"pycharm": {}
},
"outputs": [],
"source": [
"with open('linkage-output.json') as f:\n",
" linkage_output = json.load(f)\n",
" linkage_groups = linkage_output['groups']"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Everyone: make table of interesting information\n",
"\n",
"We use the linkage result to make a table of genders, cities, and incomes without revealing any other PII."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"pycharm": {}
},
"outputs": [],
"source": [
"with open('data/dataset-alice.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" genders = tuple(row[-1] for row in r)\n",
" \n",
"with open('data/dataset-bob.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" cities = tuple(row[-1] for row in r)\n",
" \n",
"with open('data/dataset-charlie.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" incomes = tuple(row[-1] for row in r)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" gender | \n",
" city | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" | \n",
" peGh | \n",
" 395273.665 | \n",
"
\n",
" \n",
" | 1 | \n",
" | \n",
" sydnev | \n",
" 77367.636 | \n",
"
\n",
" \n",
" | 2 | \n",
" | \n",
" pertb | \n",
" 323383.650 | \n",
"
\n",
" \n",
" | 3 | \n",
" | \n",
" syd1e7y | \n",
" 79745.538 | \n",
"
\n",
" \n",
" | 4 | \n",
" | \n",
" perth | \n",
" 28019.494 | \n",
"
\n",
" \n",
" | 5 | \n",
" | \n",
" canberra | \n",
" 78961.675 | \n",
"
\n",
" \n",
" | 6 | \n",
" female | \n",
" brisnane | \n",
" | \n",
"
\n",
" \n",
" | 7 | \n",
" male | \n",
" canbetra | \n",
" | \n",
"
\n",
" \n",
" | 8 | \n",
" | \n",
" sydme7 | \n",
" 106849.526 | \n",
"
\n",
" \n",
" | 9 | \n",
" | \n",
" melbourne | \n",
" 68548.966 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" gender city income\n",
"0 peGh 395273.665\n",
"1 sydnev 77367.636\n",
"2 pertb 323383.650\n",
"3 syd1e7y 79745.538\n",
"4 perth 28019.494\n",
"5 canberra 78961.675\n",
"6 female brisnane \n",
"7 male canbetra \n",
"8 sydme7 106849.526\n",
"9 melbourne 68548.966"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = []\n",
"for group in linkage_groups:\n",
" row = [''] * 3\n",
" for i, j in group:\n",
" row[i] = [genders, cities, incomes][i][j]\n",
" if sum(map(bool, row)) > 1:\n",
" table.append(row)\n",
"pd.DataFrame(table, columns=['gender', 'city', 'income']).head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last 20 groups look like this."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"pycharm": {},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[[[0, 2111], [1, 2100]],\n",
" [[0, 2121], [2, 2131], [1, 2111]],\n",
" [[1, 1146], [2, 1202], [0, 1203]],\n",
" [[1, 2466], [2, 2478], [0, 2460]],\n",
" [[0, 429], [1, 412]],\n",
" [[0, 2669], [1, 1204]],\n",
" [[1, 1596], [2, 1623]],\n",
" [[0, 487], [1, 459]],\n",
" [[1, 1776], [2, 1800], [0, 1806]],\n",
" [[1, 2586], [2, 2602]],\n",
" [[0, 919], [1, 896]],\n",
" [[0, 100], [2, 107], [1, 100]],\n",
" [[0, 129], [1, 131], [2, 135]],\n",
" [[0, 470], [1, 440]],\n",
" [[0, 1736], [1, 1692], [2, 1734]]]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linkage_groups[-15:]"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"# Sneak peek at the result\n",
"\n",
"We obviously can't do this in a real-world setting, but let's view the linkage using the PII. If the IDs match, then we are correct."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"pycharm": {}
},
"outputs": [],
"source": [
"with open('data/dataset-alice.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" dataset_alice = tuple(r)\n",
" \n",
"with open('data/dataset-bob.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" dataset_bob = tuple(r)\n",
" \n",
"with open('data/dataset-charlie.csv') as f:\n",
" r = csv.reader(f)\n",
" next(r) # Skip header\n",
" dataset_charlie = tuple(r)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"pycharm": {}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" given name | \n",
" surname | \n",
" dob | \n",
" phone number | \n",
" non-linking | \n",
"
\n",
" \n",
" \n",
" \n",
" | 6426 | \n",
" 1171 | \n",
" isabelle | \n",
" bridgland | \n",
" 30-03-1994 | \n",
" 04 5318 6471 | \n",
" mal4 | \n",
"
\n",
" \n",
" | 6427 | \n",
" 1171 | \n",
" isalolIe | \n",
" riahgland | \n",
" 30-02-1994 | \n",
" 04 5318 6471 | \n",
" sydnry | \n",
"
\n",
" \n",
" | 6428 | \n",
" 1171 | \n",
" isabelle | \n",
" bridgland | \n",
" 30-02-1994 | \n",
" 04 5318 6471 | \n",
" 63514.217 | \n",
"
\n",
" \n",
" | 6429 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" | 6430 | \n",
" 1243 | \n",
" thmoas | \n",
" doaldson | \n",
" 13-04-1900 | \n",
" 09 6963 1944 | \n",
" male | \n",
"
\n",
" \n",
" | 6431 | \n",
" 1243 | \n",
" thoma5 | \n",
" donaldson | \n",
" 13-04-1900 | \n",
" 08 6962 1944 | \n",
" perth | \n",
"
\n",
" \n",
" | 6432 | \n",
" 1243 | \n",
" thomas | \n",
" donalsdon | \n",
" 13-04-2900 | \n",
" 08 6963 2944 | \n",
" 489229.297 | \n",
"
\n",
" \n",
" | 6433 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" | 6434 | \n",
" 2207 | \n",
" annah | \n",
" aslea | \n",
" 02-11-2906 | \n",
" 04 5501 5973 | \n",
" male | \n",
"
\n",
" \n",
" | 6435 | \n",
" 2207 | \n",
" hannah | \n",
" easlea | \n",
" 02-11-2006 | \n",
" 04 5501 5973 | \n",
" canberra | \n",
"
\n",
" \n",
" | 6436 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" | 6437 | \n",
" 5726 | \n",
" rhys | \n",
" clarke | \n",
" 19-05-1929 | \n",
" 02 9220 9635 | \n",
" mqle | \n",
"
\n",
" \n",
" | 6438 | \n",
" 5726 | \n",
" ry5 | \n",
" clarke | \n",
" 19-05-1939 | \n",
" 02 9120 9635 | \n",
" | \n",
"
\n",
" \n",
" | 6439 | \n",
" 5726 | \n",
" rhys | \n",
" klark | \n",
" 19-05-2938 | \n",
" 02 9220 9635 | \n",
" 118197.119 | \n",
"
\n",
" \n",
" | 6440 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id given name surname dob phone number non-linking\n",
"6426 1171 isabelle bridgland 30-03-1994 04 5318 6471 mal4\n",
"6427 1171 isalolIe riahgland 30-02-1994 04 5318 6471 sydnry\n",
"6428 1171 isabelle bridgland 30-02-1994 04 5318 6471 63514.217\n",
"6429 \n",
"6430 1243 thmoas doaldson 13-04-1900 09 6963 1944 male\n",
"6431 1243 thoma5 donaldson 13-04-1900 08 6962 1944 perth\n",
"6432 1243 thomas donalsdon 13-04-2900 08 6963 2944 489229.297\n",
"6433 \n",
"6434 2207 annah aslea 02-11-2906 04 5501 5973 male\n",
"6435 2207 hannah easlea 02-11-2006 04 5501 5973 canberra\n",
"6436 \n",
"6437 5726 rhys clarke 19-05-1929 02 9220 9635 mqle\n",
"6438 5726 ry5 clarke 19-05-1939 02 9120 9635 \n",
"6439 5726 rhys klark 19-05-2938 02 9220 9635 118197.119\n",
"6440 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = []\n",
"for group in linkage_groups:\n",
" for i, j in sorted(group):\n",
" table.append([dataset_alice, dataset_bob, dataset_charlie][i][j])\n",
" table.append([''] * 6)\n",
" \n",
"pd.DataFrame(table, columns=['id', 'given name', 'surname', 'dob', 'phone number', 'non-linking']).tail(15)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}