#855: RDF Canonicalization

Visit on Github.

Opened Jun 9, 2023

こんにちは TAG-さん!

I'm requesting a TAG review of RDF Data Canonicalization.

There are a variety of use cases that depend on the ability to calculate a unique and deterministic hash value of RDF Datasets, such as Verifiable Credentials, the publication of biological and pharmaceutical data, or consumption of mission critical RDF vocabularies that depend on the ability to verify the authenticity and integrity of the data being consumed. See the use cases for more examples. These use cases require a standard way to process the underlying graphs contained in RDF Datasets that is independent of the serialization itself.

  • An explainer was created to support the WG's charter. The current draft of the specification 2023-06-09 indicates that we plan to link to the explainer document but also to augment that section of the spec with further detail that will cover aspects that have come to light as the spec has evolved.
  • Specification URL: https://www.w3.org/TR/2023/WD-rdf-canon-20230609/
  • Tests: are at https://w3c.github.io/rdf-canon/tests/
  • Current implementations are listed at https://github.com/w3c/rdf-canon/wiki/List-of-available-implementations
  • User research: [url to public summary/results of research] N/A
  • Security and Privacy self-review²: https://github.com/w3c/rdf-canon/issues/70 (reviews have been requested simultaneously with this request to the TAG)
  • GitHub repo (if you prefer feedback filed there): https://github.com/w3c/rdf-canon/issues/118 please
  • Primary contacts (and their relationship to the specification):
    • Greg Kellogg (gkellogg), [Invited Expert] (editor)
    • Dave Longley (dlongley), [Digital Bazaar] (editor)
    • Dan Yamamoto (yamdan), [Invited Expert] (editor)
    • Phil Archer (philarcher), [GS1] (WG co-chair)
    • Markus Sabadello (peacekeeper), [Danube Tech] (WG co-chair)
  • Organization(s)/project(s) driving the specification: Although not exclusively about Verifiable Credentials, that technology is a major driver and there is a lot of overlap in personnel in that group.
  • Key pieces of existing multi-stakeholder review or discussion of this specification - please note the extensive list of existing implementations.
  • External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): N/A

Further details:

  • [✓] I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: We hope to go to CR in July or August at the latest, i.e. before TPAC. The VCWG's work on ECDSA has a dependency on RDF Dataset Canonicalization
  • The group where the work on this specification is currently being done: RDF Dataset Canonicalization and Hash
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): N/A
  • Major unresolved issues with or opposition to this specification: There are open issues at this time but no disputes
  • This work is being funded by:

You should also know that...

The spec has a long history and has implementations using the original version in production software.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify gkellogg, dlongley, yamdan, philarcher, peacekeeper.

Discussions

2023-06-19

Minutes

bumped

2023-07-mos-eisley

Minutes

Hadley: use cases are making sense.

Amy: the explainer reads as 'this is the work we will do' rather than 'this is the work we have done', as the explainer was originally written for the charter (noted by phila). Would be useful to have it updated to what was actually done. But assume they would have mentioned if they'd done anything radically different. They haven't filled out S&P questionnaire, but have S&P in the spec. We should ask them to fill the questionnaire.

Hadley: using quads as a triple with the graph name. Sounds complicated and repetitive. If you're hashing you should just be able to do that once

Amy: could ask the rationale for that. There's a 'todo' in privacy considerations in the spec.

Hadley: what if the hashing algorithm is no longer secure? SHA256 is okay for now.

Amy: be good to see mention of that in security considerations

We (@hadleybeeman and I) reviewed this in our virtual face-to-face this week. We like the direction of the work, and the design is sensible.

We noticed you haven't yet filled out the privacy and security questionnaire. Understanding that not all of the questions may be relevant, please could you do this?

Also, we see that you are using quads instead of triples and adding in the graph name once? It sounds more complex — but we suspect you have considered this at length. We are just interested in your thought process here. (This is the sort of thing we normally expect to see in an [explainer](https://github.com/w3ctag/tag.w3.org/blob/main/explainers/template.md).) 

Also, we'd love to see the explainer when you've updated your explainer to bring it in line with the spec. 

And finally, what happens if the hashing algorithm becomes insecure? It might be helpful to put a comment in the security considerations section to advise implementers in the future to consider that possiblity.