design-reviews#758: TAG review of Data Catalog Vocabulary (DCAT)

#758: TAG review of Data Catalog Vocabulary (DCAT) - Version 3

Opened Jul 14, 2022

Wotcher TAG!

I'm requesting a TAG review of Data Catalog Vocabulary (DCAT) - Version 3.

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

DCAT enables a publisher to describe datasets and data services in a catalog using a standard model and vocabulary that facilitates the consumption and aggregation of metadata from multiple catalogs. This can increase the discoverability of datasets and data services. It also makes it possible to have a decentralized approach to publishing data catalogs and makes federated search for datasets across catalogs in multiple sites possible using the same query mechanism and structure. Aggregated DCAT metadata can serve as a manifest file as part of the digital preservation process.

Explainer¹ (minimally containing user needs and example code): https://github.com/w3c/dxwg/blob/gh-pages/docs/explainer.md
Specification URL: https://www.w3.org/TR/2022/WD-vocab-dcat-3-20220510/
Security and Privacy self-review²: https://github.com/w3c/dxwg/issues/1507
GitHub repo (if you prefer feedback filed there): https://github.com/w3c/dxwg/
Primary contacts (and their relationship to the specification):
- Peter Winstanley (@pwin), Invited expert: WG Chair & Specification Editor
- Caroline Burle (@cburle), Invited expert: WG Chair
- Riccardo Albertoni (@riccardoAlbertoni), Invited expert: Specification Editor
- David Browning (@davebrowning), Invited expert: Specification Editor
- Simon Cox (@dr-shorthair), Invited expert: Specification Editor
- Alejandra Gonzalez Beltran (@agbeltran), Invited expert: Specification Editor
- Andrea Perego (@andrea-perego), Invited expert: Specification Editor
Key pieces of existing multi-stakeholder review or discussion of this specification: https://github.com/w3c/dxwg/issues
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://github.com/w3c/dxwg/issues

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: End of September, Early October as we plan to move to CR in Autumn.
The group where the work on this specification is currently being done: Dataset Exchange Working Group
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Dataset Exchange Working Group
Major unresolved issues with or opposition to this specification: The DXWG received considerable feedback, requests for additional features and specific use cases. The group has considered the feedback and included suggestions and revisions received, closing more than 160 GitHub issues and discussing an even greater number. However, the total number of issues exceeded the group capacity in terms of effort and available resources. As a result, DXWG was forced to prioritize the efforts to some of the most urgent extensions, dealing with versioning and dataset series. The remaining pending issues are left open in the Github repository for consideration in future DCAT versions.

We'd prefer the TAG to provide feedback as:

☂️ open a single issue in our GitHub repo for the entire review

Discussions

Discussed Jul 1, 2022 (See Github)

Amy: would be good if the explainer gave more context.

Hadley: we can ask them. We could also say "it looks sensible enough - no architectural problems" and [close the review].

Amy: I don't see any architectural problems.

Hadley: [channeling Danbri] interest in series - being able to represent a series - diff says they've done it. But not sure how well it's done.

Dan: alignment with schema.org...

Amy: I'm going to look at their open issues...

Amy: Status?

Dan: "The Working Group plans to request Candidate Recommendation no sooner than 10 August 2022"

Amy: some privacy tracker issues...

Dan: reviewing Nick's issue on authenticity of data - it's a good point but is it in scpoe?

Hadley: i don't think it's in scope for what they want to do - feels like work for another working group - a substantial piece of work, but worth doing.

Dan: .. overspecialised for just clean dataset use case ...

Hadley: librarian world where everything is organised ... vs the search engine world where ...

Amy: balance between flexibility and interopability. No way to reliably combine data... Fine with the goal of starting with clean data - and that is what this is for. Should be clear.

Hadley: messy data not in scope - or wasn't in scope for first ...

Amy: organisations exist to help make messy data better... As long as it's obvious what part of the pipeline this is for ... But in the grand scheme of things there's a lot more messy data.

Hadley: I'm aware of use cases... when DCAT 1.0 became a Rec, there was a lot more momentum behind data cataloges but now there's less.. more data being surfaced via non-web APIs... stand-along websites without a catalog... I feel it would be interesting to mention in our comments it would be great to hear their thoughts on how this should evolve. Can they look more to the horizon?

Amy: they're coming at from a "data in the lab" perspective vs "data in the wild".

Hadley: rdf itself has first/next/last

they could have used rdf lists, https://www.w3.org/TR/rdf-schema/#ch_collectionvocab but choose dataset specific terms instead. Perhaps we should ask if it's worth creating new terms (dcat:first, dcat:prev and dcat:last)?

Thanks for your review request. It would be really helpful for us if the explainer could go into a bit of detail about the "more pressing use cases and requests among those left unaddressed in the previous standardization round" that version 3 is addressing. It would help us to understand if/how you've accomplished what you set out to do with DCAT 3.0.

We also try to join up work across working groups and W3C specs. We see that you've created new terms (such as dcat:first, dcat:prev and dcat:last) which could instead be covered by something like [RDF lists](rdf lists, https://www.w3.org/TR/rdf-schema/#ch_collectionvocab) - could you tell us why you made that choice? Are there any other opportunities to reuse existing work in DCAT 3.0?

agreed to paste and wait for response

Comment by @hadleybeeman Jul 26, 2022 (See Github)

Hello! Thanks for your review request, and it's nice to see the progress on DCAT.

It would be really helpful for us if the explainer could go into a bit of detail about the "more pressing use cases and requests among those left unaddressed in the previous standardization round" that version 3 is addressing. It would help us to understand if/how you've accomplished what you set out to do with DCAT 3.0.

We also try to join up work across working groups and W3C specs. We see that you've created new terms (such as dcat:first, dcat:prev and dcat:last) which could instead be covered by something like RDF lists — could you tell us why you made that choice? Are there any other opportunities to reuse existing work in DCAT 3.0?

Discussed Aug 29, 2022 (See Github)

Amy: we left them some feedback after the f2f and they have not replied yet. I will leave them a nudge. I can open an issue on their repo.

Discussed Sep 19, 2022 (See Github)

Hadley: still in progress - we put some comments in our issues. Amy opened an issue in their repo which got a response but they haven't fully answered our question... why they are not reusing RDF lists... They came back and sayd DCAT is hierarchical. Danbri has responded and they haven't replied... Hopefully we can get a response from them and wind this up. Also they have done some edits on their explainer that I need to review.

Comment by @riccardoAlbertoni Oct 18, 2022 (See Github)

DXWG replies are tracked by https://github.com/w3c/dxwg/issues/1530

Comment by @riccardoAlbertoni Dec 6, 2022 (See Github)

DXWG replies are tracked by w3c/dxwg#1530

Hi @rhiaro, @danbri and @hadleybeeman, Can you live with the explanations supplied in w3c/dxwg#1530?

If there is nothing else to clarify, please let us know when we can consider the review from TAG concluded.

Thanks a lot.

Discussed Dec 12, 2022 (See Github)

Amy: there has been back and forth on their issue. Can we close this?

Hadley: sounds sensible. This discussion came out of joining up groups of work, so looking for an opportunity to reuse rather than create from scratch. From what they said it doesn't seem suitable unless they wanted to readdress all of RDF Lists, which isn't in their charter.

Peter: We can propose close

Hadley: will ping danbri to close the loop

proposed closing

Discussed Dec 19, 2022 (See Github)

Amy: danbri replied to say it's fine. I think we can close this.

Dan: agreed

Amy: closing comment

closed

Comment by @rhiaro Dec 20, 2022 (See Github)

Hi @riccardoAlbertoni, sorry for the delay in following up. We are satisfied with your responses, and are happy to conclude this review. Thanks!