Benchmarking¶
In the benchmarking folder is a benchmarking script and associated Dockerfile.
The docker image is published at https://quay.io/repository/n1analytics/entity-benchmark
The container/script is configured via environment variables.
SERVER
: (required) the url of the server.EXPERIMENT
: json file containing a list of experiments to run. Schema of experiments is defined in ./schema/experiments.json.DATA_PATH
: path to a directory to store test data (useful to cache).RESULT_PATH
: full filename to write results file.SCHEMA
: path to the linkage schema file used when creating projects. If not provided it is assumed to be in the data directory.TIMEOUT
: this timeout defined the time to wait for the result of a run in seconds. Default is 1200 (20min).
Run Benchmarking Container¶
Run the container directly with docker - substituting configuration information as required:
docker run -it
-e SERVER=https://testing.es.data61.xyz \
-e RESULTS_PATH=/app/results.json \
quay.io/n1analytics/entity-benchmark:latest
By default the container will pull synthetic datasets from an S3 bucket and run default benchmark experiments
against the configured SERVER
. The default experiments (listed below) are set in
benchmarking/default-experiments.json
.
The output will be printed and saved to a file pointed to by RESULTS_PATH
(e.g. to /app/results.json
).
Cache Volume¶
For speeding up benchmarking when running multiple times you may wish to mount a volume at the DATA_PATH
to store the downloaded test data. Note the container runs as user 1000
, so any mounted volume must be read
and writable by that user. To create a volume using docker:
docker volume create linkage-benchmark-data
To copy data from a local directory and change owner:
docker run --rm -v `pwd`:/src \
-v linkage-benchmark-data:/data busybox \
sh -c "cp -r /src/linkage-bench-cache-experiments.json /data; chown -R 1000:1000 /data"
To run the benchmarks using the cache volume:
docker run \
--name ${benchmarkContainerName} \
--network ${networkName} \
-e SERVER=${localserver} \
-e DATA_PATH=/cache \
-e EXPERIMENT=/cache/linkage-bench-cache-experiments.json \
-e RESULTS_PATH=/app/results.json \
--mount source=linkage-benchmark-data,target=/cache \
quay.io/n1analytics/entity-benchmark:latest
Experiments¶
Experiments to run can be configured as a simple json document. The default is:
[
{
"sizes": ["100K", "100K"],
"threshold": 0.95
},
{
"sizes": ["100K", "100K"],
"threshold": 0.80
},
{
"sizes": ["100K", "1M"],
"threshold": 0.95
}
]
The schema of the experiments can be found in benchmarking/schema/experiments.json
.