Road map for the entity serviceΒΆ

  • baseline benchmarking vs known datasets (accuracy and speed) e.g recordspeed datasets
  • blocking
  • Schema specification and tooling
  • Algorithmic improvements. e.g., implementing canopy clustering solver
  • A web front end including authentication and access control
  • Uploading multiple hashes per entity. Handle multiple schemas.
  • Check how we deal with missing information, old addresses etc
  • Semi supervised machine learning methods to learn thresholds
  • Handle 1 to many relationships. E.g. familial groups
  • Larger scale graph solving methods
  • Remove bottleneck of sparse links having to fit in redis.
  • improve uploads by allowing direct binary file transfer into object store
  • optimise anonlink memory management and C++ code

Bigger Projects - consider more than 2 organizations participating in one mapping - GPU implementation of core similarity scoring - somewhat homomorphic encryption could be used for similarity score - consider allowing users to upload raw PII