#758: TAG review of Data Catalog Vocabulary (DCAT) - Version 3

Visit on Github.

Opened Jul 14, 2022

Wotcher TAG!

I'm requesting a TAG review of Data Catalog Vocabulary (DCAT) - Version 3.

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

DCAT enables a publisher to describe datasets and data services in a catalog using a standard model and vocabulary that facilitates the consumption and aggregation of metadata from multiple catalogs. This can increase the discoverability of datasets and data services. It also makes it possible to have a decentralized approach to publishing data catalogs and makes federated search for datasets across catalogs in multiple sites possible using the same query mechanism and structure. Aggregated DCAT metadata can serve as a manifest file as part of the digital preservation process.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: End of September, Early October as we plan to move to CR in Autumn.
  • The group where the work on this specification is currently being done: Dataset Exchange Working Group
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Dataset Exchange Working Group
  • Major unresolved issues with or opposition to this specification: The DXWG received considerable feedback, requests for additional features and specific use cases. The group has considered the feedback and included suggestions and revisions received, closing more than 160 GitHub issues and discussing an even greater number. However, the total number of issues exceeded the group capacity in terms of effort and available resources. As a result, DXWG was forced to prioritize the efforts to some of the most urgent extensions, dealing with versioning and dataset series. The remaining pending issues are left open in the Github repository for consideration in future DCAT versions.

We'd prefer the TAG to provide feedback as:

☂️ open a single issue in our GitHub repo for the entire review

Discussions

2022-07-London

Minutes

Amy: would be good if the explainer gave more context.

Hadley: we can ask them. We could also say "it looks sensible enough - no architectural problems" and [close the review].

Amy: I don't see any architectural problems.

Hadley: [channeling Danbri] interest in series - being able to represent a series - diff says they've done it. But not sure how well it's done.

Dan: alignment with schema.org...

Amy: I'm going to look at their open issues...

Amy: Status?

Dan: "The Working Group plans to request Candidate Recommendation no sooner than 10 August 2022"

Amy: some privacy tracker issues...

Dan: reviewing Nick's issue on authenticity of data - it's a good point but is it in scpoe?

Hadley: i don't think it's in scope for what they want to do - feels like work for another working group - a substantial piece of work, but worth doing.

Dan: .. overspecialised for just clean dataset use case ...

Hadley: librarian world where everything is organised ... vs the search engine world where ...

Amy: balance between flexibility and interopability. No way to reliably combine data... Fine with the goal of starting with clean data - and that is what this is for. Should be clear.

Hadley: messy data not in scope - or wasn't in scope for first ...

Amy: organisations exist to help make messy data better... As long as it's obvious what part of the pipeline this is for ... But in the grand scheme of things there's a lot more messy data.

Hadley: I'm aware of use cases... when DCAT 1.0 became a Rec, there was a lot more momentum behind data cataloges but now there's less.. more data being surfaced via non-web APIs... stand-along websites without a catalog... I feel it would be interesting to mention in our comments it would be great to hear their thoughts on how this should evolve. Can they look more to the horizon?

Amy: they're coming at from a "data in the lab" perspective vs "data in the wild".

Hadley: rdf itself has first/next/last

they could have used rdf lists, https://www.w3.org/TR/rdf-schema/#ch_collectionvocab but choose dataset specific terms instead. Perhaps we should ask if it's worth creating new terms (dcat:first, dcat:prev and dcat:last)?

Thanks for your review request. It would be really helpful for us if the explainer could go into a bit of detail about the "more pressing use cases and requests among those left unaddressed in the previous standardization round" that version 3 is addressing. It would help us to understand if/how you've accomplished what you set out to do with DCAT 3.0.

We also try to join up work across working groups and W3C specs. We see that you've created new terms (such as dcat:first, dcat:prev and dcat:last) which could instead be covered by something like [RDF lists](rdf lists, https://www.w3.org/TR/rdf-schema/#ch_collectionvocab) - could you tell us why you made that choice? Are there any other opportunities to reuse existing work in DCAT 3.0?

agreed to paste and wait for response

2022-08-29

Minutes

Amy: we left them some feedback after the f2f and they have not replied yet. I will leave them a nudge. I can open an issue on their repo.

2022-09-19

Minutes

Hadley: still in progress - we put some comments in our issues. Amy opened an issue in their repo which got a response but they haven't fully answered our question... why they are not reusing RDF lists... They came back and sayd DCAT is hierarchical. Danbri has responded and they haven't replied... Hopefully we can get a response from them and wind this up. Also they have done some edits on their explainer that I need to review.

2022-12-12

Minutes

Amy: there has been back and forth on their issue. Can we close this?

Hadley: sounds sensible. This discussion came out of joining up groups of work, so looking for an opportunity to reuse rather than create from scratch. From what they said it doesn't seem suitable unless they wanted to readdress all of RDF Lists, which isn't in their charter.

Peter: We can propose close

Hadley: will ping danbri to close the loop

proposed closing

2022-12-19

Minutes

Amy: danbri replied to say it's fine. I think we can close this.

Dan: agreed

Amy: closing comment

closed