Raghotham Sripadraj

23 Jul 2020

Knowledge Graph Notes

How should AI explicitly represent knowledge?

My notes from CS520 - Knowledge Graph Seminar

Session 1 - What is Knowledge Graph?

Speakers: Denny Vrandečić, Jans Aasman, Mikhail Galkin


Denny speaks about knowledge graphs (KGs) are used for web search, question answer systems, data integration systems.

Specific use case of Wiki Data

  • KG built on 80M nodes, 1B edges
  • Uses RDF open format (w3c)
  • Schema.org annotation
  • SPARQL for querying

Following the open format has great advantages. One example stated was how wikipedia can leverage other services which expose data. Wikipedia can run SPARQL to query OpenStreet Maps data - pick lat, long from OSM for ATMs in Munich for a specific bank network.


Discussion around modern KGs:

  • Semantic graph DB
  • Ontologies & Taxonomies
  • Rule based processing
  • ML & NLP based processing

AllegroGraph's KG implementation:

  1. Document / NLP - Chomsky KG

    The Noam Chomsky Knowledge Graph will link to over 1,000 articles & over 100 books that Chomsky has authored about linguistics, mass media, politics & war.

  2. Event Based for Motefiore Health Care

    Montefiore’s Patient-centered Analytical Learning Machine (PALM), a machine learning platform built from the ground up to predict & prevent life-threatening medical conditions & minimize wait times.


  • Think of KG as world models in terms of entities & relations
  • Encoding can be based on different representations


  1. Symbolic - Logic, DB

    • Store as triples

      Rajakumara starring Puneet
      Puneet born Chennai
  2. Vector - NLP, Computer Vision

    • Embeddings - Leverage high dimensional space & a function to group similar things nearby

KGs can be viewed from different Point of Views:

  • Logic programming, RDF way

  • RDMS - entities are cells, relations are columns

  • Computer Vision - CNN + RPN ⇒ graph inference

  • NLP

    • Knowledge Graph ⇒ Named Entity Recognition

    • Information Retrieval ⇒ Relation Linking

    • Unstructured Sources ⇒ Question Answer System

    • Language Models

Session 2 - How to create Knowledge Graph?

Speakers: Juan Sequeda, Chris Ré, Xiao Ling

Xiao Ling

Discussion on how Siri Knowledge is built. It is based on triples - subject, predicate, object.

Sources are:

  1. Unstructured text articles
  2. Semi structured
  3. Structured features
  4. Human Curated

All these are fused together to build the Siri KG. Techniques used - Info Box extraction from wikipedia, Entity Resolution.


  • Fields do not match

    Date of Birth, DoB, Birth Date - all of them mean the same


  • RIBE - Robust Info Box Extraction (HTML input ⇒ Triple as output)
  • Candidate Extraction Models
  • Entity Linking Models
  • Entity Resolution Models

Session 3 - What are some advanced knowledge graph?

Speakers: Mike Tung, Cogan Shimizu, Marie-Laure Mugnier


DiffBot KG built on full public web

Their pipeline looks like this

  1. Page type classification - classify page type & language
  2. Visual Extraction - extract product information, metadata links, images, price
  3. NLU - language detection, enetity detection
  4. Record Linking

Session 4 - What are some knowledge graph inference algorithms?

Speakers: An Hai Doan, Yuxiao Dong, Georg Gottlob

An Hai Doan

Discussion on Entity Matching use case - The Magellan Project

Entity matching steps

  • Blocking - reduce number of pair comparisons
  • Matching - Rule based / ML based


Discussion on Microsoft Academic Graph (MAG)

Leverages heterogenous graph transformer


Mostly spoke about VADALOG

Session 5 - How to evolve a knowledge graph?

Speakers: Héctor Pérez-Urbina, José Manuel Gómez-Pérez, Mike Uschold


How to model a dynamic world?

Example of ambiguity - Vocaloids. Not only humans are artists but anime characters too.

Modifying a KG is far easier than modifying RDBMS. Easy to change → Add properties.

Evolution of KG

  • Can UI still work with the change?
  • Can all downstream applications work?
  • Schema validation

Test, Test, Test!


Discussion on KG for NLP

  • ML driven NLP
    • Pros - flexible, SoTA, broad
    • Cons - black box, lack real world understanding
  • KG based NLP
    • Pros - curated, logical graph, no training, rich & deep representation
    • Cons - rigid, brittle, expensive manual curation

Real world use cases:

  1. COGITO - Expert NLP based on KG
    • Sentence split / parse
    • Morphological analysis
    • Sentence / logical / grammar analysis
    • Semantic analysis / disambiguation
  2. Vecsigrafo
    • Learns word & concept embeddings in shared space
    • Combines corpus based & graph based knowledge to build word representation
    • Uses & extends swivel algorithm
  3. Transigrafo - Transformers + KG


  • Get rid of data silos with KG
  • Use triple stores instead of RDBs
  • Use SHACL - SHApes Constrains Language
  • Use OWL + SHACL
  • Build 1 enterprise ontology

Session 6 - How do users interact with knowledge graphs?

Speakers: Amit Prakash, Chaomei Chen, Leilani Gilpin


Discussion on work at ThoughtSpot

  • ThoughtSpot success is mostly attributed to great UX. They took a year to crack the UX.
  • Stick to simple algorithms, they work 80% of the time
  • Figure out ways to collect data, labels with simple algorithms based on user feedback & interaction
  • Figure out success KPIs


Discussion on work at Drexel University

  • KG on top of research papers & citations
  • Temporal movement analysis
  • Pay attention to relation across domains. Look for bridges across clusters & their importance


Discussion on explaining explanations

  • Explainability ≠ Interpretibility
  • Interpretable ⇒ understandable to humans
  • Completeness ⇒ describe operations in an accurate way
  • Explanation needs needs to be both interpretable & complete

Session 7 - What are some prevalent graph engines in industry?

Speakers: Philip Rathle, Brad Bebee, Matei Zaharia


  • Showcased neo4j & customers - NASA, eBay, DZD

  • Spoke about property Graph way of modelling


Showcased AWS Neptune


Showcased Databricks & their graph framework

Highlighted Use Cases

  1. FINRA

    • Detect illegal trading activity
    • data source - 100 B events / day
    • 30 PB of historical data
  2. Drug Discovery - Astra Zeneca

    Recommend new compounds to test using NLP, BERT, GNN.

  3. Network Security - Apple

    • Data source - logins, TCP, SSH
    • Find security threats
    • 100 TB / day
    • 300 B events / day
    • Leverage DeltaLake

Session 8 - What is the role of knowledge graphs in machine learning?

Speakers: Jure Leskovec, Luna Dong, Robert Hoffman


Discussion on resoning in KG using embeddings. KGs are heterogeneous graphs

  • Traditional tasks - Link prediction / KG completion

    Obama born in US

    Obama nationality?

    learn projections, intersections & other operations.

  • Query2Box


Discussion on Amazon Product Graph

Use KG + ML for search, Alexa, product recommendations

Knowledge Extraction involves

  • Knowledge Alignment
  • Knowledge cleaning
  • Knowledge mining

Session 9 - What are some high value use cases of knowledge graphs?

Speakers: Jay Yu, Apoorv Saxena, David Newman


Discussion on KG at Intuit.

Use case of Tax programming using logic graph & KG.


Finance data is mostly RDBMS

Risk assessment with company KG

Use Cases:

  • Link Traversal - How much % revenue comes from Boeing?
  • Page Rank
  • Community Detection
  • Link Prediction
  • Graph embedding

Leveraging BERT to translate natural language to Cypher query

Session 10 - What are some open research questions on knowledge graphs?

Speakers: Richard Socher, Mark Musen, RV Guha


Deep Dive on real world use cases of KG with ML

Neural Tensor Network for knowledge base completion

How to think about multi hop link prediction / reasoning models ?


Contrarian view - What do KGs really know?

Spoke about MYCIN use case developed in 1970, which was SoTA back then.

Earlier known as semantic networks. 50 years fast forward we are at the same place.