Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DBpedia Ontology (dbpedia.org)
68 points by belter on Dec 21, 2022 | hide | past | favorite | 17 comments


Related: Ontolex [1], an ontology for lexicons.

Which DBnary [2], an RDF extract of the Wiktionary, uses. I worked on it almost 10 years ago during an internship during which I added support for extracting French inflections in the DBnary extractor. A colleague did German. It increased the size of DBnary a lot so it was an optional feature. I don't know the status of the project today. Back then, DBnary was based on the Lemon ontology, which apparently Ontolex extends and Ontolex is now a W3C standard, yay!

I don't quite use such ontologies in my daily life knowingly, but I find them fascinating for some unknown reasons.

[1] https://www.w3.org/2016/05/ontolex/

[2] http://kaiko.getalp.org/about-dbnary/


I wonder how much cross-pollination with Cyc [1] can be performed, and how DBpedia's ~5M terms can be compared to Cyc's ~25M assertions?

[1] https://en.wikipedia.org/wiki/Cyc


I had an example on my blog about 10 years ago blending OpenCyc as OWL/RDF with SPARQL queries to DBPedia (using SERVICE operator). At least for me, this is at best a clumsy process.


I've seen a handful of threads and articles in the last year or so about knowledge graphs, but I still don't understand what they're used for.

I understand the idea, I think, but I don't know what problems they actually solve. When would I want to consider building a knowledge graph, or using an existing one in a project?


It's a structure for representing Knowledge and optionally reasoning. A relational database is fine for lots of instances of data that are basically the same (i.e. rows in a table), if you want the table to be really flexible to deal with the complexity of modelling a great number of things you end up bastardising the relational model and using strings and other quirks, and the semantics of what this all means isn't in the data but in the applications/humans on how to interpret it ... knowledge graphs deals with this. Additionally ones like RDF support the Open World view of data, where mostly not having information means you just don't have it (the world has a lot of information), rather than assuming in a Closed World that missing data means it's false (not true), and has other affordances for modelling the world, especially where you may have many data sources and need to integrate them together.

It's more than just a graph database, which is typically misunderstood by most devs (including me), when they start looking at this.

However designing an Ontology is hard, just as getting a good database schema is hard, failure of most systems is the data model builds up tension over time, which leads to code complexity/hairballs, effort in adding new concepts to the model, operational work arounds, which then start to poison the data quality (start adding freeform text to store 'data').

There is no god given ontology or one way of modelling the world, crowdsourcing is a good approach, but looking at this one you can see some questionable practices (tradeoffs to be fair), and there is a big gap in people who have skills and experience.


I would say they are useful, specially against a relational database, when you have a ever-growing or dynamic amount of different entity types that are all related between each other. With a knowledge graph you only don't need to create new tables every time you define new entity types, you also can give semantic meaning to the relationships between them.


Speaking for myself, ontologies and knowledge graphs provide a quantifiable function of equivalence for when we're determining if a product is a variant or if it's a whole new thing.

You'd think that systems teams could be able to predict whether something works with something else, but you'd be wrong. Systems architecture is still mostly a matter of faith, right up until it crashes against the reality of operations and maintenance.

The traditional way of making a family of systems goes like this. A containerized group of systems guys make a big graph, and show how product variants X Y and Z are 98% equivalent, so they can use the same tooling, the same procedures, etc. Downstream, the org is mandated to take this on faith, even when (and after) the FoS has been falsified by experience in the field, on the production floor, and in the repair station. This isn't just expensive; it's dangerous as hell.

Knowledge graphs are one path towards falsifiable product architectures. You have a pile of procedures, each set is specialized, and the graph, properly configured, is a snapshot of just how related those products are.

Another angle is data warehousing the product data with the aim of creating an "Equivalence Factor" for an arbitrary subset of assemblies. When it goes below a certain factor, the assemblies are NOT the same product, and they should be handled accordingly. That "Equivalence Factor" is something I'm still working, but it includes: O&D, materials, density, mass, interfaces, topology, dissassembly, tooling, fasteners. One of the hard sells is that the number is not a solid number - it's a baseline, part of a check, and it's got a range. People like hard yes or no answers, but with Equivalence Factor I can at least give you some probabilities of maintaining a system as it's configured.


I think they can be good for factual queries and entity relations. One example is the Google Knowledge Graph which they use in Search [1].

[1] https://en.wikipedia.org/wiki/Google_Knowledge_Graph


It feels a bit like a GOFAI way to teach machines to play chess— through hand-organized human knowledge versus more successful approaches that rely on computational scale. At the same time, it feels like it could give large language models a sense of “truth.”


The "computational scale" approach to semantic networks has been tried - that's pretty much what Freebase was. The Wikidata community refused a direct import of Freebase because the resulting dataset was full of errors; and even today it similarly rejects bot-generated data.


Right. Makes sense. But then GPT seems like it is on the verge of accomplishing something very similar but in a much more effective way. I wonder if it can use something like this for training or reinforcement learning, for instance.


GPT routinely dreams up totally fake data, it's just how the model works. It's a fun toy but absolutely not helpful for real-world scenarios.


Am I misunderstanding something or are DBpedia and Wikidata essentially trying to do the same thing?


They're not the same thing. DBPedia is trying to extract structured information from the existing versions of Wikipedia, which are segregated by language and licensed under CC-by-sa. Wikidata is a newer project focused on providing high-quality structured data from scratch (labeled in any natural language), with a CC-0 license that's as close as legally possible to releasing them into the public domain.


DBPedia is extracting data from infoboxes in Wikipedia, but increasingly Wikidata is the source of the data in the infoboxes.


They overlap. Wikidata uses generated URIs for entities, properties, etc. that are not human readable readable; they can be resolved by hovering the mouse cursor on the SPARQL endpoint web app or add a pattern to get the label for these URIs.

Many of the Knowledge Graph people who I have worked with clearly prefer Wikidata but I am in the other camp: I find it so much easier using the human-readable meaningful URIs for entities, properties, etc. in DBPedia. Yesterday I wrote SPARQL queries for both DBPedia and Wikipedia.


Is there a good example of what this is useful for? I’d love to see examples of where it is used. What does it do?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: