When Google crawls the Web, it collects information about content on the pages it finds as well as links on pages. How much does it collect information about facts on the Web? Microsoft showed off an object-based search about 10 years ago, in the paper, Object-Level Ranking: Bringing Order to Web Objects..
The team from Microsoft Research Asia tells us in that paper:
Existing Web search engines generally treat a whole Web page as the unit for retrieval and consuming. However, there are various kinds of objects embedded in the static Web pages or Web databases. Typical objects are products, people, papers, organizations, etc. We can imagine that if these objects can be extracted and integrated from the Web, powerful object-level search engines can be built to meet users’ information needs more precisely, especially for some specific domains.
This patent from Google focuses upon extracting factual information about entities on the Web. It’s an approach that goes beyond making the Web index that we know Google for because it collects more information that is related to each other. The patent tells us:
Information extraction systems automatically extract structured information from unstructured or semi-structured documents. For example, some information extraction systems …read more