Revision as of 00:58, 13 February 2005 editNetoholic (talk | contribs)Autopatrolled, Extended confirmed users39,917 editsm rvt← Previous edit | Revision as of 07:52, 13 February 2005 edit undoJamesday (talk | contribs)Extended confirmed users, Rollbackers4,559 edits Add a technical overview of how pages are built and flushed from cache to provide some background information. Change one use of Squid to just cache, to include parser cacheNext edit → | ||
Line 10: | Line 10: | ||
* | * | ||
* | * | ||
== How a page is built and cached == | |||
''Here's some technical background which may be of use. ] 07:52, 13 Feb 2005 (UTC)'' | |||
When a page has been removed from caches, the next view of the page causes several steps: | |||
*Each item in the base page portion is requested from the database (images and CSS aren't in the main part). The page you edit, each templage, each template included in the template and so on. Two templates, to database records to be retrived. One template on its own, one read, one template including another, two. Plus the one for the base page. | |||
*Once that and the rest of what is called the parsing is done, the page is saved in the parser cache. That's kept in RAM in memcached. | |||
*Finally the skin is applied and the page is passed on to the Squids, which cache it in RAM and on disk (to get larger capacity but at slower access time) for all who aren't logged in (will only be useful if it's the normal skin) and send it on to the person who originally requested it. | |||
*Whenever any part of a page is changed, be it the page itself or a template or image used in it: | |||
** the page is marked as changed ("touched") and will be regenerated next time it is requested. Both the Squid and parser caches have it removed. Necessary so people see the correct version. | |||
** the touching process involves a database update to every affected page, which for many pages can produce "replication lag", outdated information displayed to those using the page. This also shows up as slower response times when the lag is less than ten seconds. The effect is minimised by affecting small numbers of pages and maximised by affecting a large number, in part because the wait of up to ten seconds makes batches in the few thousand range effectively invisible except for delay in page load times. Touching about 18,000 pages currently takes a database slave with 4GB of RAM, 6 drive RAID 10 array and write caching disk controller some 90 seconds (that's from a real touch operation). | |||
***Assuming even distribution and 8 templates instead of one meta-template, each of the 8 edits would flush from cache one eighth of the pages in each edit. On the database side, lets look at that 90 second case: | |||
***90/8 = 11.25 seconds, call it 12. | |||
***those who view in the first 2 seconds will wait ten seconds then see out of date information. | |||
***those who view in the remaining ten seconds will see delay of up to ten seconds and then completely current data. | |||
***so, splitting it has removed most of the visible lag and visible problem. | |||
The Squids, because of the limitations in the way they can work, with much less work per page, are inherently the fastest way of serving the pages and just 4 machines can serve some 75% of all hits to the site. But they are restricted in what they can serve. Next step is using the parser cache via the apache web servers. That allows all of the user settings for logged in people but uses more web server CPU so it's much less efficient. | |||
We could switch everyone to using the apaches but that would be far less efficient and would require something like 4-5 times as many apaches and database servers as we have today, far more than the 4 machines gained by not using them as squids. And the page views would be slower, because it's an inherently slower process. | |||
While all template use causes an extra database read and flushes all pages using it, meta-templates are a special case because they use twice the number of database queries and can cause flushing of many more pages than other templages. If a meta-template is used in only a few or perhaps a few dozen templates which are fundamentally unrelated, it's debatable whether the extra equipment costs are worth it, compared to the relatively modest work involved in updating the individual templates. The replication lag issue can't so easily be addressed - that work is done by however many database servers are purchased. We're trying to reduce the effect but updates affecting many pages are inherently more problematic in this area than those affectig fewer. | |||
== Harmful effects == | == Harmful effects == | ||
=== Server load === | === Server load === | ||
In normal cases, a single template entry is checked each time a page is displayed. When using a meta-template, each non-cached view adds one additional call. For templates that include parameters, this operation also adds incrementally to the processing time. | In normal cases, a single template entry is checked each time a page is displayed. When using a meta-template, each non-cached view adds one additional call. That is, one template within another doubles the database work of using the template compared to putting it all in one. | ||
For templates that include parameters, this operation also adds incrementally to the processing time. | |||
Each time a page is saved, all of its links ''from'' it are refreshed, including the reference to the template (and meta-template) being used. So when I update a page like ], the database creates all the normal links, and one to ] and one to ]. Look at ] - none of those articles are actually calling it directly – they are just "false links". Because all links are purged and re-created on page save, it takes a lot more time and server resources than even the database read in the first place. | Each time a page is saved, all of its links ''from'' it are refreshed, including the reference to the template (and meta-template) being used. So when I update a page like ], the database creates all the normal links, and one to ] and one to ]. Look at ] - none of those articles are actually calling it directly – they are just "false links". Because all links are purged and re-created on page save, it takes a lot more time and server resources than even the database read in the first place. | ||
When a template is changed, all pages involving that template are marked as invalid and uncachable (by changing the database entry of every page which uses the template) and must be purged from the |
When a template is changed, all pages involving that template are marked as invalid and uncachable (by changing the database entry of every page which uses the template) and must be purged from the cache servers. On the next view of each of those pages, the apache web servers must carry out the slow operation of rebuilding the page, increasing the load on them and slowing down the site. When individual work templates are involved, the flushing is limited to only the individual work concerned. When a meta-template is changed, the effect purges the cache of thousands of pages. | ||
=== Vandalism === | === Vandalism === |
Revision as of 07:52, 13 February 2005
- This is a proposed new Misplaced Pages policy or guideline as part of Misplaced Pages:Policy thinktank
Template messages allow certain standard text to be included on many pages, usually with the idea that in the future, any changes to that text block can be changed in one place. "Meta-templates" as used in this article are those that are created and used to keep other templates in a standard format. This page is an attempt to gather information supporting the assertion that so-called "meta-templates" (or master templates) are harmful to Misplaced Pages. In other contexts, the term "meta-templates" may refer to templates containing "meta" information (i.e., the templates do not add content to an article but rather describe or classify the article in some way that is relevant only in the context of Misplaced Pages).
Convenience is often cited as the reason for using meta-templates, much like all templates. Certainly, it offers a central place with which to effect changes across a wide number of articles.
Examples of meta-templates currently in use on Misplaced Pages:
- Template:Message box - used currently on many top-of-page notice templates, such as Template:NPOV and Template:FAC.
- Template:Metastub & Template:MetaPicstub - used as part of the Misplaced Pages:WikiProject Stub sorting and Misplaced Pages:Stub categories to facilitate topical stub messages.
How a page is built and cached
Here's some technical background which may be of use. Jamesday 07:52, 13 Feb 2005 (UTC)
When a page has been removed from caches, the next view of the page causes several steps:
- Each item in the base page portion is requested from the database (images and CSS aren't in the main part). The page you edit, each templage, each template included in the template and so on. Two templates, to database records to be retrived. One template on its own, one read, one template including another, two. Plus the one for the base page.
- Once that and the rest of what is called the parsing is done, the page is saved in the parser cache. That's kept in RAM in memcached.
- Finally the skin is applied and the page is passed on to the Squids, which cache it in RAM and on disk (to get larger capacity but at slower access time) for all who aren't logged in (will only be useful if it's the normal skin) and send it on to the person who originally requested it.
- Whenever any part of a page is changed, be it the page itself or a template or image used in it:
- the page is marked as changed ("touched") and will be regenerated next time it is requested. Both the Squid and parser caches have it removed. Necessary so people see the correct version.
- the touching process involves a database update to every affected page, which for many pages can produce "replication lag", outdated information displayed to those using the page. This also shows up as slower response times when the lag is less than ten seconds. The effect is minimised by affecting small numbers of pages and maximised by affecting a large number, in part because the wait of up to ten seconds makes batches in the few thousand range effectively invisible except for delay in page load times. Touching about 18,000 pages currently takes a database slave with 4GB of RAM, 6 drive RAID 10 array and write caching disk controller some 90 seconds (that's from a real touch operation).
- Assuming even distribution and 8 templates instead of one meta-template, each of the 8 edits would flush from cache one eighth of the pages in each edit. On the database side, lets look at that 90 second case:
- 90/8 = 11.25 seconds, call it 12.
- those who view in the first 2 seconds will wait ten seconds then see out of date information.
- those who view in the remaining ten seconds will see delay of up to ten seconds and then completely current data.
- so, splitting it has removed most of the visible lag and visible problem.
The Squids, because of the limitations in the way they can work, with much less work per page, are inherently the fastest way of serving the pages and just 4 machines can serve some 75% of all hits to the site. But they are restricted in what they can serve. Next step is using the parser cache via the apache web servers. That allows all of the user settings for logged in people but uses more web server CPU so it's much less efficient.
We could switch everyone to using the apaches but that would be far less efficient and would require something like 4-5 times as many apaches and database servers as we have today, far more than the 4 machines gained by not using them as squids. And the page views would be slower, because it's an inherently slower process.
While all template use causes an extra database read and flushes all pages using it, meta-templates are a special case because they use twice the number of database queries and can cause flushing of many more pages than other templages. If a meta-template is used in only a few or perhaps a few dozen templates which are fundamentally unrelated, it's debatable whether the extra equipment costs are worth it, compared to the relatively modest work involved in updating the individual templates. The replication lag issue can't so easily be addressed - that work is done by however many database servers are purchased. We're trying to reduce the effect but updates affecting many pages are inherently more problematic in this area than those affectig fewer.
Harmful effects
Server load
In normal cases, a single template entry is checked each time a page is displayed. When using a meta-template, each non-cached view adds one additional call. That is, one template within another doubles the database work of using the template compared to putting it all in one.
For templates that include parameters, this operation also adds incrementally to the processing time.
Each time a page is saved, all of its links from it are refreshed, including the reference to the template (and meta-template) being used. So when I update a page like Republic of Canada, the database creates all the normal links, and one to Template:Canada-stub and one to Template:MetaPicstub. Look at Special:Whatlinkshere/Template:MetaPicstub - none of those articles are actually calling it directly – they are just "false links". Because all links are purged and re-created on page save, it takes a lot more time and server resources than even the database read in the first place.
When a template is changed, all pages involving that template are marked as invalid and uncachable (by changing the database entry of every page which uses the template) and must be purged from the cache servers. On the next view of each of those pages, the apache web servers must carry out the slow operation of rebuilding the page, increasing the load on them and slowing down the site. When individual work templates are involved, the flushing is limited to only the individual work concerned. When a meta-template is changed, the effect purges the cache of thousands of pages.
Vandalism
Meta-templates, which would be featured on a very high percentage of pages, are an excellent denial-of-service attack vector, since changing it or any component used in it would flush a substantial percentage of the site caches, which are critical to site performance and normally serve some 75-80% of all hits. Making even one subtle change, like the addition of a space, causes the effect.
In addition to the specific update issues for each template, the template's Whatlinkshere, image description pages and all other components involved in the template become denial of service attack vectors, since they can involve reading very large numbers of database entries or updating very large numbers of pages when they are changed.
Of course, more obvious forms of direct vandalism are possible when using these meta-templates.
Scope creep
The existence of some meta-templates has resulted in an explosion of related "child" templates. This is most often because creating a new one is simple given the existence of the meta-template. The primary example of this is Misplaced Pages:Stub categories, which lists templates created using the Template:Metastub & Template:MetaPicstub. There are literally hundreds of these stub templates, many of which have been created on the spur of the moment for use on only a few articles and are not even documented on the main page. Although each new one was probably created in good faith, the sheer number of these templates has grown tremendously. Already, a many of these new "child" templates have come up for deletion.
One of the almost paradoxical effects of this growth is that, as more and more "child" templates are created, there is often less of a chance for any particular change to be made to the meta-template, because of the wider effect.
Alternatives
- Design, document, and implement - In the case of Misplaced Pages:Sister projects, a proposal was made to use a meta-template. In this area, it is much better to decide on a common look and format, and then implement it across the few templates being used. When there is consensus for a change, interested editors can apply them. Creating a page which displays each template also helps to locate templates which don't follow the standard format.
- Use lists, not templates and categories - In the case of Misplaced Pages:Stub categories, if the goal is to assist people with specific topic interests in finding articles that need work, then create lists of articles to be worked on. Many WikiProjects also maintain an area for reporting articles with need work.
- Fix the problem, don't mark it with a template - If more effort was made correcting the problems which usually call for the use of a notice or template, fewer templates would be needed, less often.
- CSS - Some meta-templates serve as glorified stylesheets. If these were identified, CSS classes could be added to the site's global stylesheets. Then, those meta-templates could be replaced with the CSS classes in relevant Misplaced Pages pages. This would accomplish the same purpose – maintaining uniform style across the site – without placing a burden on the server.