forum.opennet.ru

Составление сообщения

Исходное сообщение

"Сбор средств на поддержание ленты новостей OpenNET в 2019 го..."
Отправлено Ordu, 09-Апр-19 20:01

Я спросил у гугла, тот меня отправил на stackoverflow, и там есть такой ответ:
> You need at least one <p> tag around the text, you want to see in Reader View and at least 516 characters in 7 words inside the text.
https://stackoverflow.com/questions/30661650/how-does-firefo...
И ещё, там же:
I followed Martin's link to the Readability.js GitHub repository, and had a look at the source code. Here's what I make of it.
The algorithm works with paragraph tags. First of all, it tries to identify parts of the page which are definitely not content - like forms and so on - and removes them. Then it goes through the paragraph nodes on the page and assigns a score based on content-richness: it gives them points for things like number of commas, length of content, etc. Notice that a paragraph with fewer than 25 characters is immediately discarded.
Scores then "bubble up" the DOM tree: each paragraph will add part of it's score to all of it's parent nodes - a direct parent gets the full score added to its total, a grandparent only half, a great-grandparent a third and so on. This allows the algorithm to identify higher-level elements which are likely to be the main content section.
Though this is just Firefox's algorithm, my guess is if it works well for Firefox, it'll work well for other browsers too.
In order for these Reader View algorithms to work for your website, you want them to correctly identify the content-heavy sections of your page. This means you want the more content-heavy nodes on your page to get high scores in the algorithm.
So here are some rules of thumb to improve the quality of the page in the eyes of these algorithms:
    Use paragraph tags in your content! Many people tend to overlook them in favor of <br /> tags. While it may look similar, many content-related algorithms (not only Reader View ones) rely heavily on them.
    Use HTML5 semantic elements in your markup, like <article>, <nav>, <section>, <aside>. Even though they're not the only criterion (as you noted in the question), these are very useful to computers reading your page (not just Reader View) to distinguish different sections of your content. Readability.js uses them to guess which nodes are likely or unlikely to contain important content.
    Wrap your main content in one container, like an <article> or <div> element. This will receive score points from all the paragraph tags inside it, and be identified as the main content section.
    Keep your DOM tree shallow in content-dense areas. If you have a lot of elements breaking your content up, you're only making life harder for the algorithm: there won't be a single element that stands out as being parent of a lot of content-heavy paragraphs, but many separate ones with low scores.

Исходное сообщение
"Сбор средств на поддержание ленты новостей OpenNET в 2019 го..." Отправлено Ordu, 09-Апр-19 20:01
Я спросил у гугла, тот меня отправил на stackoverflow, и там есть такой ответ: > You need at least one <p> tag around the text, you want to see in Reader View and at least 516 characters in 7 words inside the text. https://stackoverflow.com/questions/30661650/how-does-firefo... И ещё, там же: I followed Martin's link to the Readability.js GitHub repository, and had a look at the source code. Here's what I make of it. The algorithm works with paragraph tags. First of all, it tries to identify parts of the page which are definitely not content - like forms and so on - and removes them. Then it goes through the paragraph nodes on the page and assigns a score based on content-richness: it gives them points for things like number of commas, length of content, etc. Notice that a paragraph with fewer than 25 characters is immediately discarded. Scores then "bubble up" the DOM tree: each paragraph will add part of it's score to all of it's parent nodes - a direct parent gets the full score added to its total, a grandparent only half, a great-grandparent a third and so on. This allows the algorithm to identify higher-level elements which are likely to be the main content section. Though this is just Firefox's algorithm, my guess is if it works well for Firefox, it'll work well for other browsers too. In order for these Reader View algorithms to work for your website, you want them to correctly identify the content-heavy sections of your page. This means you want the more content-heavy nodes on your page to get high scores in the algorithm. So here are some rules of thumb to improve the quality of the page in the eyes of these algorithms: Use paragraph tags in your content! Many people tend to overlook them in favor of <br /> tags. While it may look similar, many content-related algorithms (not only Reader View ones) rely heavily on them. Use HTML5 semantic elements in your markup, like <article>, <nav>, <section>, <aside>. Even though they're not the only criterion (as you noted in the question), these are very useful to computers reading your page (not just Reader View) to distinguish different sections of your content. Readability.js uses them to guess which nodes are likely or unlikely to contain important content. Wrap your main content in one container, like an <article> or <div> element. This will receive score points from all the paragraph tags inside it, and be identified as the main content section. Keep your DOM tree shallow in content-dense areas. If you have a lot of elements breaking your content up, you're only making life harder for the algorithm: there won't be a single element that stands out as being parent of a lot of content-heavy paragraphs, but many separate ones with low scores.

Ваше сообщение
Имя*:
EMail:	Для отправки ответов на email укажите знак ! перед адресом, например, !user@host.ru (!! - не показывать email). Более тонкая настройка отправки ответов производится в профиле зарегистрированного участника форума.
Заголовок*:
Сообщение*:
	Введите код, изображенный на картинке:

При общении не допускается: неуважительное отношение к собеседнику, хамство, унизительное обращение, ненормативная лексика, переход на личности, агрессивное поведение, обесценивание собеседника, провоцирование флейма голословными и заведомо ложными заявлениями. Не отвечайте на сообщения, явно нарушающие правила - удаляются не только сами нарушения, но и все ответы на них. Лог модерирования.

Партнёры:

Хостинг:

Закладки на сайте
Проследить за страницей

Created 1996-2024 by Maxim Chirkov
Добавить, Поддержать, Вебмастеру