Road map for the entity serviceΒΆ

  • baseline benchmarking vs known datasets (accuracy and speed) e.g recordspeed datasets
  • Schema specification and tooling
  • Algorithmic improvements. e.g., implementing canopy clustering solver
  • A web front end including authentication and access control
  • Uploading multiple hashes per entity. Handle multiple schemas.
  • Check how we deal with missing information, old addresses etc
  • Semi supervised machine learning methods to learn thresholds
  • Handle 1 to many relationships. E.g. familial groups
  • Larger scale graph solving methods
  • optimise anonlink memory management and C++ code

Bigger Projects

  • GPU implementation of core similarity scoring
  • somewhat homomorphic encryption could be used for similarity score
  • consider allowing users to upload raw PII