{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "import csv\n", "import json\n", "import os\n", "\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "SECRET = 'my_secret'\n", "\n", "SERVER = os.getenv(\"SERVER\", \"https://anonlink.easd.data61.xyz\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "# Multiparty Linkage with Clkhash\n", "\n", "\n", "## Scenario\n", "\n", "There are three parties named Alice, Bob, and Charlie, each holding a dataset of about 3200 records. They know that they have some entities in common, but with incomplete overlap. The common features describing those entities are given name, surname, date of birth, and phone number.\n", "\n", "They all have some additional information about those entities in their respective datasets, Alice has a person's gender, Bob has their city, and Charlie has their income. They wish to create a table for analysis: each row has a gender, city, and income, but they don't want to share any additional information. They can use Anonlink to do this in a privacy-preserving way (without revealing given names, surnames, dates of birth, and phone numbers)." ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Alice, Bob, and Charlie: agree on secret keys and a linkage schema\n", "\n", "They keep the keys to themselves, but the schema may be revealed to the analyst." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "keys: my_secret\n" ] } ], "source": [ "print(f'keys: {SECRET}')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"version\": 3,\n", " \"clkConfig\": {\n", " \"l\": 1024,\n", " \"kdf\": {\n", " \"type\": \"HKDF\",\n", " \"hash\": \"SHA256\",\n", " \"salt\": \"SCbL2zHNnmsckfzchsNkZY9XoHk96P/G5nUBrM7ybymlEFsMV6PAeDZCNp3rfNUPCtLDMOGQHG4pCQpfhiHCyA==\",\n", " \"info\": \"c2NoZW1hX2V4YW1wbGU=\",\n", " \"keySize\": 64\n", " }\n", " },\n", " \"features\": [\n", " {\n", " \"identifier\": \"id\",\n", " \"ignored\": true\n", " },\n", " {\n", " \"identifier\": \"givenname\",\n", " \"format\": {\n", " \"type\": \"string\",\n", " \"encoding\": \"utf-8\"\n", " },\n", " \"hashing\": {\n", " \"strategy\": {\n", " \"bitsPerToken\": 15\n", " },\n", " \"comparison\": {\n", " \"type\": \"ngram\",\n", " \"n\": 2,\n", " \"positional\": false\n", " }\n", " }\n", " },\n", " {\n", " \"identifier\": \"surname\",\n", " \"format\": {\n", " \"type\": \"string\",\n", " \"encoding\": \"utf-8\"\n", " },\n", " \"hashing\": {\n", " \"strategy\": {\n", " \"bitsPerToken\": 15\n", " },\n", " \"comparison\": {\n", " \"type\": \"ngram\",\n", " \"n\": 2,\n", " \"positional\": false\n", " }\n", " }\n", " },\n", " {\n", " \"identifier\": \"dob\",\n", " \"format\": {\n", " \"type\": \"string\",\n", " \"encoding\": \"utf-8\"\n", " },\n", " \"hashing\": {\n", " \"strategy\": {\n", " \"bitsPerToken\": 15\n", " },\n", " \"comparison\": {\n", " \"type\": \"ngram\",\n", " \"n\": 2,\n", " \"positional\": true\n", " }\n", " }\n", " },\n", " {\n", " \"identifier\": \"phone number\",\n", " \"format\": {\n", " \"type\": \"string\",\n", " \"encoding\": \"utf-8\"\n", " },\n", " \"hashing\": {\n", " \"strategy\": {\n", " \"bitsPerToken\": 8\n", " },\n", " \"comparison\": {\n", " \"type\": \"ngram\",\n", " \"n\": 1,\n", " \"positional\": true\n", " }\n", " }\n", " },\n", " {\n", " \"identifier\": \"ignoredForLinkage\",\n", " \"ignored\": true\n", " }\n", " ]\n", "}\n" ] } ], "source": [ "with open('data/schema_ABC.json') as f:\n", " print(f.read())" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Sneak peek at input data\n", "\n", "### Alice" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgivennamesurnamedobphone numbergender
00tarahilton27-08-194108 2210 0298male
13saJivernre22-12-297202 1090 1906mals
27sliverpaciorekNaNNaNmals
39rubygeorge09-05-193907 4698 6255male
410eyrinmcampbell29-1q-198308 299y 1535male
\n", "
" ], "text/plain": [ " id givenname surname dob phone number gender\n", "0 0 tara hilton 27-08-1941 08 2210 0298 male\n", "1 3 saJi vernre 22-12-2972 02 1090 1906 mals\n", "2 7 sliver paciorek NaN NaN mals\n", "3 9 ruby george 09-05-1939 07 4698 6255 male\n", "4 10 eyrinm campbell 29-1q-1983 08 299y 1535 male" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv('data/dataset-alice.csv').head()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "### Bob" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgivennamesurnamedobphone numbercity
03zaliverner22-12-197202 1090 1906perth
14samueltremellen21-12-192303 3605 9336melbourne
25amylodge16-01-195807 8286 9372canberra
37oIjipacioerk10-02-195904 4220 5949sydney
410erinkampgell29-12-198308 2996 1445perth
\n", "
" ], "text/plain": [ " id givenname surname dob phone number city\n", "0 3 zali verner 22-12-1972 02 1090 1906 perth\n", "1 4 samuel tremellen 21-12-1923 03 3605 9336 melbourne\n", "2 5 amy lodge 16-01-1958 07 8286 9372 canberra\n", "3 7 oIji pacioerk 10-02-1959 04 4220 5949 sydney\n", "4 10 erin kampgell 29-12-1983 08 2996 1445 perth" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv('data/dataset-bob.csv').head()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Charlie" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgivennamesurnamedobphone numberincome
01joshuaarkwright16-02-190304 8511 958070189.446
13zal:verner22-12-197202 1090 190650194.118
27oliyerpaciorwk10-02-195904 4210 594931750.993
38nacoyaranson17-08-192507 6033 4580102446.131
410erihcampbell29-12-1i8308 299t 1435331476.599
\n", "
" ], "text/plain": [ " id givenname surname dob phone number income\n", "0 1 joshua arkwright 16-02-1903 04 8511 9580 70189.446\n", "1 3 zal: verner 22-12-1972 02 1090 1906 50194.118\n", "2 7 oliyer paciorwk 10-02-1959 04 4210 5949 31750.993\n", "3 8 nacoya ranson 17-08-1925 07 6033 4580 102446.131\n", "4 10 erih campbell 29-12-1i83 08 299t 1435 331476.599" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv('data/dataset-charlie.csv').head()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Analyst: create the project\n", "\n", "The analyst keeps the result token to themselves. The three update tokens go to Alice, Bob and Charlie. The project ID is known by everyone." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mProject created\u001b[0m\r\n" ] } ], "source": [ "!clkutil create-project \\\n", " --server $SERVER \\\n", " --type groups \\\n", " --schema data/schema_ABC.json \\\n", " --parties 3 \\\n", " --output credentials.json\n", "\n", "with open('credentials.json') as f:\n", " credentials = json.load(f)\n", " project_id = credentials['project_id']\n", " result_token = credentials['result_token']\n", " update_token_alice = credentials['update_tokens'][0]\n", " update_token_bob = credentials['update_tokens'][1]\n", " update_token_charlie = credentials['update_tokens'][2]" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Alice: hash the data and upload it to the server\n", "The data is hashed according to the schema and the keys. Alice's update token is needed to upload the hashed data. No PII is uploaded to the service—only the hashes." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mCLK data written to dataset-alice-hashed.json\u001b[0m\r\n" ] } ], "source": [ "!clkutil hash \\\n", " data/dataset-alice.csv \\\n", " $SECRET \\\n", " data/schema_ABC.json \\\n", " dataset-alice-hashed.json \\\n", " --check-header false" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\": \"Updated\", \"receipt_token\": \"c202d98eb83c7e55e6177ba9bcf55cb35f40ac1d21714897\"}" ] } ], "source": [ "!clkutil upload \\\n", " --server $SERVER \\\n", " --apikey $update_token_alice \\\n", " --project $project_id \\\n", " dataset-alice-hashed.json" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Bob: hash the data and upload it to the server" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mCLK data written to dataset-bob-hashed.json\u001b[0m\r\n" ] } ], "source": [ "!clkutil hash \\\n", " data/dataset-bob.csv \\\n", " $SECRET \\\n", " data/schema_ABC.json \\\n", " dataset-bob-hashed.json \\\n", " --check-header false" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\": \"Updated\", \"receipt_token\": \"75083f544df8e944cc590089bb3e31c134e810992f08ea80\"}" ] } ], "source": [ "!clkutil upload \\\n", " --server $SERVER \\\n", " --apikey $update_token_bob \\\n", " --project $project_id \\\n", " dataset-bob-hashed.json" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Charlie: hash the data and upload it to the server" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mCLK data written to dataset-charlie-hashed.json\u001b[0m\r\n" ] } ], "source": [ "!clkutil hash \\\n", " data/dataset-charlie.csv \\\n", " $SECRET \\\n", " data/schema_ABC.json \\\n", " dataset-charlie-hashed.json \\\n", " --check-header false" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"message\": \"Updated\", \"receipt_token\": \"814b4a226453d7261348a403e134b0764501432bf679658f\"}" ] } ], "source": [ "!clkutil upload \\\n", " --server $SERVER \\\n", " --apikey $update_token_charlie \\\n", " --project $project_id \\\n", " dataset-charlie-hashed.json" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Analyst: start the linkage run\n", "\n", "This will start the linkage computation. We will wait a little bit and then retrieve the results." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "!clkutil create \\\n", " --server $SERVER \\\n", " --project $project_id \\\n", " --apikey $result_token \\\n", " --threshold 0.7 \\\n", " --output=run-credentials.json\n", "\n", "with open('run-credentials.json') as f:\n", " run_credentials = json.load(f)\n", " run_id = run_credentials['run_id']" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Analyst: retrieve the results" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mState: completed\n", "Stage (3/3): compute output\u001b[0m\n", "\u001b[31mState: completed\n", "Stage (3/3): compute output\u001b[0m\n", "\u001b[31mState: completed\n", "Stage (3/3): compute output\u001b[0m\n", "\u001b[31mDownloading result\u001b[0m\n", "\u001b[31mReceived result\u001b[0m\n" ] } ], "source": [ "!clkutil results \\\n", " --server $SERVER \\\n", " --project $project_id \\\n", " --apikey $result_token \\\n", " --run $run_id \\\n", " --watch \\\n", " --output linkage-output.json" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "[[[0, 1787], [1, 1751], [2, 1784]],\n", " [[0, 565], [1, 557], [2, 564]],\n", " [[0, 836], [1, 815], [2, 850]],\n", " [[0, 505], [2, 495]],\n", " [[0, 536], [2, 525], [1, 512]],\n", " [[0, 1641], [2, 1608], [1, 1584]],\n", " [[0, 2234], [1, 2228], [2, 2242]],\n", " [[0, 781], [1, 762], [2, 799]],\n", " [[0, 918], [2, 2840]],\n", " [[1, 1393], [2, 1421], [0, 1451]],\n", " [[1, 1587], [2, 1609], [0, 1642]],\n", " [[1, 1730], [2, 1767]],\n", " [[1, 2808], [2, 2813]],\n", " [[0, 2765], [2, 2794], [1, 2789]],\n", " [[1, 351], [2, 356]]]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('linkage-output.json') as f:\n", " linkage_output = json.load(f)\n", " linkage_groups = linkage_output['groups']\n", "linkage_groups[-15:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a list of groups of records. Every record in such a group belongs to the same entity and consists of two values, the party index and the row index:\n", "```\n", "[\n", " [[party_id, row_index], ... ],\n", " ...\n", "]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Everyone: make table of interesting information\n", "\n", "We use the linkage result to make a table of genders, cities, and incomes without revealing any other PII." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "with open('data/dataset-alice.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " genders = tuple(row[-1] for row in r)\n", " \n", "with open('data/dataset-bob.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " cities = tuple(row[-1] for row in r)\n", " \n", "with open('data/dataset-charlie.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " incomes = tuple(row[-1] for row in r)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gendercityincome
0malemelbourne
1femalr277039.294
2pertb21407e.192
3mlebourne56899.522
4malecanberra
5femaoesydn3y
6male154195.553
7female44652.704
8malesydnely
9mal3sydney
\n", "
" ], "text/plain": [ " gender city income\n", "0 male melbourne \n", "1 femalr 277039.294\n", "2 pertb 21407e.192\n", "3 mlebourne 56899.522\n", "4 male canberra \n", "5 femaoe sydn3y \n", "6 male 154195.553\n", "7 female 44652.704\n", "8 male sydnely \n", "9 mal3 sydney " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = []\n", "for group in linkage_groups:\n", " row = [''] * 3\n", " for i, j in group:\n", " row[i] = [genders, cities, incomes][i][j]\n", " if sum(map(bool, row)) > 1:\n", " table.append(row)\n", "pd.DataFrame(table, columns=['gender', 'city', 'income']).head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last 20 groups look like this." ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "## Sneak peek at the result\n", "\n", "We obviously can't do this in a real-world setting, but let's view the linkage using the PII. If the IDs match, then we are correct." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "with open('data/dataset-alice.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " dataset_alice = tuple(r)\n", " \n", "with open('data/dataset-bob.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " dataset_bob = tuple(r)\n", " \n", "with open('data/dataset-charlie.csv') as f:\n", " r = csv.reader(f)\n", " next(r) # Skip header\n", " dataset_charlie = tuple(r)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgiven namesurnamedobphone numbernon-linking
64505436nikkispears10-02-209706 9447 1767156639.106
6451
64525833nellrud06-1p-195608 5510 5369sydnev
64535833nedreif06-20-195608 5510 5369117275.089
6454
6455872jacksongreen06-09-1920
6456872jacksongnn06-00-192008 3409 2246147663.277
6457
64588662luctpulfort05-03-190302 0726 9479male
64598662lucypulford05-03-1903melbourrie
64608662lusypulford05-03-199302 0726 0489192230.309
6461
64621885nicholasrobson06-01-191402 7799 6803canberra
64631885nicho|asrobson06-91-191402 7799 680361333.218
6464
\n", "
" ], "text/plain": [ " id given name surname dob phone number non-linking\n", "6450 5436 nikki spears 10-02-2097 06 9447 1767 156639.106\n", "6451 \n", "6452 5833 nell rud 06-1p-1956 08 5510 5369 sydnev\n", "6453 5833 ned reif 06-20-1956 08 5510 5369 117275.089\n", "6454 \n", "6455 872 jackson green 06-09-1920 \n", "6456 872 jackson gnn 06-00-1920 08 3409 2246 147663.277\n", "6457 \n", "6458 8662 luct pulfort 05-03-1903 02 0726 9479 male\n", "6459 8662 lucy pulford 05-03-1903 melbourrie\n", "6460 8662 lusy pulford 05-03-1993 02 0726 0489 192230.309\n", "6461 \n", "6462 1885 nicholas robson 06-01-1914 02 7799 6803 canberra\n", "6463 1885 nicho|as robson 06-91-1914 02 7799 6803 61333.218\n", "6464 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = []\n", "for group in linkage_groups:\n", " for i, j in sorted(group):\n", " table.append([dataset_alice, dataset_bob, dataset_charlie][i][j])\n", " table.append([''] * 6)\n", " \n", "pd.DataFrame(table, columns=['id', 'given name', 'surname', 'dob', 'phone number', 'non-linking']).tail(15)\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mProject deleted\u001b[0m\r\n" ] } ], "source": [ "# Deleting the project\n", "!clkutil delete-project --project=\"{credentials['project_id']}\" \\\n", " --apikey=\"{credentials['result_token']}\" \\\n", " --server=\"{SERVER}\"" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }