5/5 - (3 votes) This article describes the relationship between entities and expertise, authority and trust, or the EAT

Reddi2 · Post by **Reddi2** » Thu Jan 30, 2025 8:56 am

This scientific paper from Google deals with how to determine the trustworthiness of online sources. In addition to analyzing links, a new method is presented that is based on checking the accuracy of the information published.

We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.

For this purpose, data mining methods are used, which I have already discussed in detail in the articles How can Google identify and interpret entities from unstructured content? and The role of natural language processing for data mining, entities & search queries .

We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources.

The current assessment of the credibility of sources based on links and browser data on website usage behavior has weaknesses because less popular sources are given worse cards and are unfairly neglected, even though they provide very good information.

This approach allows sources to be rated with a "trustworthiness score" without taking popularity into account. Websites that frequently provide false information are downgraded. Websites that publish information in line with the general consensus are rewarded. This also reduces the likelihood that websites that attract attention through fake news will gain visibility on Google.

Producing a ranking for pages using distances in a web-link graph
This patent was signed by Google in 2017 in the latest version and the status is active. It describes how a ranking score can be created for linked documents based on the proximity to selected seed websites. The seed pages themselves are individually weighted.

In a variation on this embodiment, a seed page si in the set of seed pages is associated with a predetermined weight wherein 0<wi≦1. Furthermore, the seed page si is associated with an initial distance di wherein di=−log(wi).

The seed pages themselves are of high quality and the sources are highly credible. The following can be read about these pages in the patent:

In one embodiment of the present invention, seeds 102 are specially selected high-quality pages which provide good web connectivity to other non-seed pages. More specifically, to ensure that other high-quality pages are easily reachable from seeds 102, seeds in seeds 102 need to be reliable, diverse to cover a wide range of fields of public portugal phone number data interests, as well as well-connected with other pages (i.e., having a large number of outgoing links). For example, Google Directory and The New York Times are both good seeds which possess such properties. It is typically assumed that these seeds are also “closer” to other high-quality pages on the web. In addition, seeds with large number of useful outgoing links facilitate identifying other useful and high-quality pages, thereby acting as “hubs” on the web.

According to the patent, these seed pages must be selected manually and the number should be limited to prevent manipulation. The length of a link between a seed page and the document to be ranked can be determined using the following criteria:

Position des Links
the font of the link
Degree of thematic deviation of the source page
Number of outgoing links of the source page
Interestingly, pages that do not have a direct or indirect link to at least one seed page are not included in the scoring at all.

Note that however, not all the pages in the set of pages receive ranking scores through this process. For example, a page that cannot be reached by any of the seed pages will not be ranked.