Revision as of 20:50, 28 December 2009 editFreedatingservice (talk | contribs)2 edits →Transvestites,shemales and kathoies: new section← Previous edit | Revision as of 20:51, 28 December 2009 edit undoSineBot (talk | contribs)Bots2,556,302 editsm Signing comment by Freedatingservice - "→Transvestites,shemales and kathoies: new section"Next edit → | ||
Line 444: | Line 444: | ||
== Transvestites,shemales and kathoies == | == Transvestites,shemales and kathoies == | ||
I am writing a best seller book and would like to hear from individuals who have a story to tell.>>>> | I am writing a best seller book and would like to hear from individuals who have a story to tell.>>>> <small><span class="autosigned">—Preceding ] comment added by ] (] • ]) 20:50, 28 December 2009 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot--> |
Revision as of 20:51, 28 December 2009
Manual of Style | ||||||||||
|
Misplaced Pages Help Project‑class | |||||||
|
This is the talk page for discussing improvements to the Citing sources page. |
|
Archives: Index, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56Auto-archiving period: 14 days |
This is the talk page for discussing improvements to the Citing sources page. |
|
Archives: Index, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56Auto-archiving period: 14 days |
Retrieval dates: redundant for sources with official publication dates?
This subject keeps coming up. There are extensive discussions in the archive: 1, 2. Please add new comments here, not in the archive. --EnOreg (talk) 14:20, 27 April 2009 (UTC)
Replacing duplicate footnotes with named footnotes
Note: Discussion was started by User:Hegvald at the Village Pump (Policy) page on 17:43, 14 November 2009. User:CBM started a new discussion at Misplaced Pages talk:Citing sources later that same day. The two discussionw went on in parallel for a few days until CBM moved the original discussion to this page. It is now in the section in the blue box below. --Hegvald (talk) 11:05, 22 November 2009 (UTC)
Link to discussion from July 2009
- Btw, the initial discussion on list-defined refs, with good points made on both sides, is at Wikipedia_talk:Citing_sources/Archive_26#Improving_.3Cref.3E. - Dank (push to talk) 17:16, 15 November 2009 (UTC)
Refname footnotes
- Moved from Village pump (policy) to consolidate discussion — Carl (CBM · talk) 17:50, 19 November 2009 (UTC)
- Moved to top of section and put in coloured box in order to clarify chronology. --Hegvald (talk) 10:58, 22 November 2009 (UTC)
- Moved from Village pump (policy) to consolidate discussion — Carl (CBM · talk) 17:50, 19 November 2009 (UTC)
I don't like these. At all. So far I have had to revert changes to the Charles Boit article three times, after clueless people or bots changed the referencing, apparently taking for granted that this system is somehow inherently and obviously superior. It is not. It introduces unnecessary complexity to the wiki "code" and makes it more difficult to edit and source articles.
I just have to ask: is there really a general consensus that these are good and useful, or is it just that a bunch of computer geeks like them and use their bots and scripts to force them on everyone else? --Hegvald (talk) 17:43, 14 November 2009 (UTC)
- I don't like them. It makes the footnote numbers out of order within the article. It prevents people from combining multiple supporting references for the same sentence into a single footnote. And it prevents (or at least discourages) annotations from being added to each footnote to explain or elaborate upon how the reference supports the statement in the article. I think its only benefit is labor-saving, which should never trump other concerns. It's one thing for an article's editors to agree to use them in a given context (though I would prefer a guideline discouraging it). But we certainly shouldn't have bots running around imposing them on every article. postdlf (talk) 17:53, 14 November 2009 (UTC)
- personally, I completely disagree with the criticisms of list defined references here. I mean... I just find that everything said in criticism of them is simply not true, in my view. That being said opinion in this area is decidedly split. I think that we're going to have to "impose" a rule on everyone not to change the existing system on pages, due to the fact that opinions are so split. Those of us who like them can add them to articles that are created, and those of you who don't like them will simply have to live with it when you do come across them.
- AWB and Ssmackbot (along with all of the other AWB bots and users) are a fairly serious problem, in relation to LDR's, however. I've been tempted for a while now to bring the issue up and if necessary to temporarily de-authorize AWB's use site wide until they get up to speed with the change. That a tool is not up to date with the site software is a completely unacceptable situation, to me.
— V = I * R (talk to Ω) 18:19, 14 November 2009 (UTC)
- Hey, wait... you're just talking about using the name parameter? what in the world is the problem with that? Using the name parameter doesn't do any of the things that you guys are talking about... I don't get it.
— V = I * R (talk to Ω) 18:24, 14 November 2009 (UTC)
Regardless whether these new footnotes are better or not, no editors should be going around in an automated way switching articles from one referencing style to another. As WP:CITE says, "You should follow the style already established in an article, if it has one; where there is disagreement, the style used by the first editor to use one should be respected." It is extremely uncommon that there is a need to change an article from one referencing style to another. — Carl (CBM · talk) 18:27, 14 November 2009 (UTC)
- It turns out that no one is. I'm not 100% clear on what Postdlf is complaining about, but Hegvald at least is simply talking about using the name parameter with references. I don't really understand his complaint, since the name parameter is often essential to the proper organization of references in this sort of instance, so I'm hoping that he'll follow up with more info on his position here.
— V = I * R (talk to Ω) 18:29, 14 November 2009 (UTC)
- Actually SmackBot seems to be doing this . I want to wait a little while, but I may block the bot temporarily if it starts a new task that includes this non-feature. — Carl (CBM · talk) 18:34, 14 November 2009 (UTC)
- Well right, but it's doing that because that's what WP:CITE says to do. That's the way that references are supposed to be formed, if they are able to be.
— V = I * R (talk to Ω) 18:37, 14 November 2009 (UTC)- WP:CITE says to use the style already established. If the established style does not use this sort of named footnotes, then switching to it is a change. Later, WP:CITE says, "Optionally, one may add the name attribute by using <ref name="name">details of the citation</ref>. Thereafter, the same footnote may be used multiple times by adding <ref name="name"/>." This makes it more clear that using named references is an option that a particular page might or might not adopt. Automated processes such as bots, in particular, should not be changing references from one optional style to another. — Carl (CBM · talk) 18:42, 14 November 2009 (UTC)
- "Optionally" here refers to the fact that a name paramter can be added to any ref, including non-repeating ones. This we did at Fort Hood shootings for example, where we waanted a clear scheme for managing the references. AWB does not add names to non-repeating refernces. Rich Farmbrough 19:13 14 November 2009 (UTC).
- Since one can simply repeat the same reference twice, using the name parameter even for that sort of reference is also optional. In general people should not be making widespread changes to referencing styles, whether with AWB, by bot, or by hand. — Carl (CBM · talk) 19:19, 14 November 2009 (UTC)
- "Optionally" here refers to the fact that a name paramter can be added to any ref, including non-repeating ones. This we did at Fort Hood shootings for example, where we waanted a clear scheme for managing the references. AWB does not add names to non-repeating refernces. Rich Farmbrough 19:13 14 November 2009 (UTC).
- WP:CITE says to use the style already established. If the established style does not use this sort of named footnotes, then switching to it is a change. Later, WP:CITE says, "Optionally, one may add the name attribute by using <ref name="name">details of the citation</ref>. Thereafter, the same footnote may be used multiple times by adding <ref name="name"/>." This makes it more clear that using named references is an option that a particular page might or might not adopt. Automated processes such as bots, in particular, should not be changing references from one optional style to another. — Carl (CBM · talk) 18:42, 14 November 2009 (UTC)
- Well right, but it's doing that because that's what WP:CITE says to do. That's the way that references are supposed to be formed, if they are able to be.
- Actually SmackBot seems to be doing this . I want to wait a little while, but I may block the bot temporarily if it starts a new task that includes this non-feature. — Carl (CBM · talk) 18:34, 14 November 2009 (UTC)
- we're not talking about style here though, we're talking about correct usage of cite.php. WP:CITE is talking about changing {{ref}}, {{note}}, etc... template use to use cite.php references instead. There is clear and widespread consensus that organizing references to eliminate duplicates, with the added benefit of reducing the size and adding organization to reference lists, is a maintenance task which we should all accomplish whenever we are able to. AWB added the ability to add the name parameter to references more than a year ago, with broad support to do so, and that feature is now a part of the "general fixes" that it will run all the time. All of this is documented on the AWB pages, in the bot working group, and someplace on cite (at least, it used to be. Maybe someone has changed it recently?)
— V = I * R (talk to Ω) 18:50, 14 November 2009 (UTC)- If the editors of an article feel that the citation style of the article includes not using the "name" parameter, then nobody should be adding the "name" parameter, automatically or otherwise. Apparently Hegvald and postdif feel this way. This is the "clear and widespread consensus" about not making changes from one optional style to another that is described both at WP:CITE and at WP:MOS. As I pointed out on user talk:SmackBot, I didn't find any bot approval for reformatting references. Editors using AWB manually are a separate problem; I often see them make changes that they should not be making, although I only revert them on articles that I already follow. — Carl (CBM · talk) 19:00, 14 November 2009 (UTC)
- we're not talking about style here though, we're talking about correct usage of cite.php. WP:CITE is talking about changing {{ref}}, {{note}}, etc... template use to use cite.php references instead. There is clear and widespread consensus that organizing references to eliminate duplicates, with the added benefit of reducing the size and adding organization to reference lists, is a maintenance task which we should all accomplish whenever we are able to. AWB added the ability to add the name parameter to references more than a year ago, with broad support to do so, and that feature is now a part of the "general fixes" that it will run all the time. All of this is documented on the AWB pages, in the bot working group, and someplace on cite (at least, it used to be. Maybe someone has changed it recently?)
Well Hegvald does have a point becasue the refs in that particular articlee are things like "Smith p.34" so replacing them with name=Smith p.34 is of marginal if any benefit. I will drop a feature request for AWB to ignore refs that wil be equivalent to theeir name. Rich Farmbrough 19:25 14 November 2009 (UTC).
I have to side with User:Ohms law on this one, it's not changing the style, it's just replacing duplicate refs with a named ref. I would think that'd be a good thing because if a ref needs updated you don't have to go through and update 20 odd references. Q 19:31, 14 November 2009 (UTC)
- That is a change in style nevertheless, as repeating each reference whenever it is used is a plausible choice that can be made by the editors of an article (and was, apparently, made by the first two editors in this thread). Can you point to any bot approval for SmackBot to change references? That seems like exactly the sort of thing that requires human discretion and would not be given approval as a bot task. — Carl (CBM · talk) 19:37, 14 November 2009 (UTC)
- {e/c}While I generally agree that the existing style should be maintained throughout the article, I take exception when the exact same footnote is being used in multiple places. The name parameter should be used if using the same source more than once. This is not only for whatever minor performance issues there may be (duplicating the reference adds unnecessary length and file size to the page), but for maintainance reasons as well. Dealing with dead links, archived versions, malformed text or other problems in a reference should not need to be done multiple times on the same article. Although in this particular case it's seems almost moot because of the use of shortened notes+references system. But this is not a change in style, simply adding a parameter that is used in the current style.Jim Miller 19:40, 14 November 2009 (UTC)
- If this is correct, the wording of WP:CITE should be changed to require, or at least encourage, named footnotes when there are duplicates. At the moment the matter is treated as completely optional. The fundamental reason that we avoid changing from one optional style to another is to avoid having people discuss back and forth about them. If replacing duplicates is clearly worthwhile, WP:CITE should reflect that. However, WP:CITE has historically been written to give editors an extremely broad amount of freedom about exactly how to handle references in each article. — Carl (CBM · talk) 19:44, 14 November 2009 (UTC)
- I'll start the discussion at WT:CITE. — Carl (CBM · talk) 19:45, 14 November 2009 (UTC)
- I don't think this should be discussed anywhere else. I suspect that whatever imagined consensus people may have thought existed is based in the fact that this has previously mostly been aired among people who are already mostly interested in the technical aspects of referencing. It is likely reach a wider group of people right here.--Hegvald (talk) 20:08, 14 November 2009 (UTC)
The issues Jim Miller mention are better served by using an alphabetic list of references at the end of the article for complete bibliographic data on the sources. Then, if an URL or something changes, you only have to change this once. In any case, an article published in 1925 or 1933 is not usually affected by things like dead links. Even if a link to an on-line facsimile goes dead, the Misplaced Pages article should contain enough bibliographic information to find the original print publication in a library. The text remains the same.
That said, I will allow for different needs in different types of articles. An article on a recent event may well need a different technical solution from an article on an 18th century painter. I also think this may be a science vs. humanities issue. The only place I have ever seen this type of non-consecutive footnoting outside Misplaced Pages was in a science journal. In publications in the humanities I have never seen it. --Hegvald (talk) 20:08, 14 November 2009 (UTC)
- The only slightly unusual aspect to the article in question is that it uses Harvard style referencing. APA/Chicago style references are much more common on Misplaced Pages, but Harvard style referencing is hardly unheard of. In reality, with a bit of familiarization with the citation system I think that you'll realize that the group and name parameters are there to help you. You can create nice and compact reference lists using groups and names, and they then behave in a manner that you seem to be touching on in your description of an "ideal". Smackbot and Drilbot weren't making the optimal change, but they were trying to help at least. For a live example of grouped and named Harvard style referencing, take a look at the bibliographical portion of the references section in Moon landing conspiracy theories.
— V = I * R (talk to Ω) 20:18, 14 November 2009 (UTC)
- Sorry, I don't follow you. Which article are you referring to (the one with a "slightly unusual aspect")?
- As for the Moon landing conspiracy theories article - are you serious? That article has seven different sets of footnotes, each with its own numbering! And it uses those hideous source templates in the middle of the text! Can you imagine how difficult most people will find it to make changes to an article like that? Perhaps is a good ting to make it difficult to edit an article on conspiracy theories, but in most cases this would not be the case. The average retired professor of art history wanting to make a small referenced correction to an article would just give up if he opened the edit window and encountered something looking like this.
- I am well acquainted with footnotes and how they are used outside Misplaced Pages. I like them just the way I am used to everywhere else. I don't want Misplaced Pages-specific referencing systems. Don't try to tell me that I really should like your preferred system. There are good reasons why footnotes in a normal academic publication do not look like those in Moon landing conspiracy theories, and why adding footnotes in a normal word processor is as simple as it usually is. --Hegvald (talk) 20:41, 14 November 2009 (UTC)
- Are the "hideous source templates" that you mentioned {{cite web}}, {{cite news}}, etc...?
- As for the later comment, I think that you'll find that putting aside your personal preferences in favor of working with the Misplaced Pages community norms will lead to a happier time here. You're far from the only person with strong personal feelings on issues, especially when it comes to referencing, but we all have to work together here. The community norms on Misplaced Pages have developed over years, and tons of people have collectively decided to use the rather unique style that tends to predominate on Misplaced Pages. You'll likely be able to impose your own preference on an out of the way article such as the Charles Boit article for quite some time, as long as you're actually active, but eventually the wider community standard will wash over it.
— V = I * R (talk to Ω) 20:52, 14 November 2009 (UTC)
- OK, I realize that it is fairly pointless to discuss with someone setting themselves up as an incarnation of the "Misplaced Pages community norms". A small number of bots and their keepers do not represent the "Misplaced Pages community norms", and neither do you. --Hegvald (talk) 21:05, 14 November 2009 (UTC)
- The WP:CITE guideline probably does more to make articles harder to edit than any one thing on Misplaced Pages. By encouraging the use of half a dozen different reference formats and not allowing changes, our articles are a complete and utter mess when it comes to references. We've developed ways to make references less distracting when trying to edit (named refs, list-defined refs), but our ability to use them is hampered by poor policy. The whole purpose of the MoS is to keep article styles consistent, but for some reason, we prefer to do the exact opposite when it comes to references. Mr.Z-man 22:40, 14 November 2009 (UTC)
Using named references is not a change in citation style; it is how you are supposed to use the software. Duplicated references should be transformed to uses of a named reference at every opportunity. The reasons why are detailed extensively in a variety of places. OrangeDog (τ • ε) 16:11, 16 November 2009 (UTC)
Use of named references to consolidate duplicates is described as optional in the guideline, and this is intentional. Naming references changes the final appearance of the article and there is difference of opinion as to whether it is an improvement. In articles that use short footnotes, there is also usually little saving of space (often AWB will add to the length of such articles). Christopher Parham (talk) 14:25, 18 November 2009 (UTC)
- "I like them just the way I am used to everywhere else." only makes sense if you are used to them in a single subject field with well-defined and universally-followed conventions. In most of the subject fields I am familiar with, the way they are used differs widely from journal to journal, and the only Misplaced Pages style conforming even roughly to the way they are used elsewhere is Harvard, which in the RW is very rarely found outside the humanities & which most Misplaced Pages editors would find very cumbersome. Most librarians are prepared to teach several different of them, and the first question we ask a student who wants help about this is "who is your professor--did he give any handouts on what he wants?" DGG ( talk ) 00:11, 19 November 2009 (UTC)
- DGG, I already said above: "I also think this may be a science vs. humanities issue. The only place I have ever seen this type of non-consecutive footnoting outside Misplaced Pages was in a science journal. In publications in the humanities I have never seen it." There may be good reasons for this, such as the way footnotes are used in different fields, or even the average length and scope of publications in the sciences vs the humanities. (And I am not talking about a single subject field, but a wide range of fields within the humanities: history, art history, literature etc.) In this case, we seem to see science/engineering people using bots and scripts to impose their own preferred style on everyone else.
- In any case, I have asked User:CBM who started the new discussion at the CITE page to move this entire discussion over there. We shouldn't have two parallel discussions and be forced to repeat the same arguments in both places. --Hegvald (talk) 16:45, 19 November 2009 (UTC)
Named references
It has been suggested that WP:CITE should have more firm guidance about replacing duplicate footnotes with the same content with multiple references to the same, named footnote (as in this edit). Right now, WP:CITE says,
- "Optionally, one may add the name attribute by using <ref name="name">details of the citation</ref>. Thereafter, the same footnote may be used multiple times by adding <ref name="name"/>."
Apparently, AWB automatically implements this "optional" aspect, thus changing the referencing style in articles where duplicate footnotes were intentionally kept separate. Should the policy here say that named footnotes are recommended? Required? — Carl (CBM · talk)
- As I noted on WP:VPP, the inconsistency encouraged by this "style guideline" is frankly, terrible. I would strongly support this, and any change that helps us move toward making articles easier to edit (as refs are one of the most confusing aspect, and having half a dozen or so different options just makes it worse). Mr.Z-man 22:43, 14 November 2009 (UTC)
- When refs need to be repeated multiple times in an article, it is largely in the interest of editors to use named refs. This should be encouraged. Dragons flight (talk) 00:16, 15 November 2009 (UTC)
- Clearly not in the interest of all editors. --Hegvald (talk) 01:51, 15 November 2009 (UTC)
My problem with list-defined references is that they're meant to make the editing screen easier to read, but Harvard refs, aka shortened refs, (<ref>Harris 2006, p. 4.</ref>) have less distracting code than even the list-defined refs, so I prefer Harvard refs. OTOH, most of my references are books that I cite more than once ... people who tend to use only one citation per reference sometimes say that Harvard references and list-defined references are extra work. Maybe there's some script that could save keystrokes, I don't know. - Dank (push to talk) 14:29, 15 November 2009 (UTC)
- Dank this isn't about LDR. Just names. Rich Farmbrough, 13:58, 18 November 2009 (UTC).
- I use shortened footnotes a lot, and often choose ref names which work for article editors like Harvard refs, e.g., <ref name=smith1996pp5-6 />. I haven't been using LDRs to move the ref declarations out of the article prose, though. I'm currently working up a new article here; perhaps I'll give LDRs a try there and see how they "feel" style-wise using this ref naming style. Wtmitchell (talk) (earlier Boracay Bill) 00:43, 23 November 2009 (UTC)
Discussion location
Is it really a good idea to have this discussion in two places? Wouldn't it be better to keep it at the village pump, where it is more likely to be seen by more people? --Hegvald (talk) 01:51, 15 November 2009 (UTC)
- Generally, I prefer a pointer from VPP to whichever is the appropriate guideline or policy page, because people will actually be able to find previous conversations about citations if you keep them at WT:CITE. - Dank (push to talk) 14:29, 15 November 2009 (UTC)
Postdlf's three arguments against named references
I'm strongly opposed to any mandatory imposition of named refs. The only virtue I can see in them is that they condense the footnotes section of an article, but that certainly shouldn't come at the expense of clarity within the body of an article, or at the expense of more information being provided. I think ref names have at least three main drawbacks.
- 1. Because all cites to the same ref name will display the same numbered superscript link (i.e., footnote) within the body of the article, the use of ref notes will cause the numbering of those footnotes within the article body to be confusingly out of order.
- 2. The use of ref name for all duplicate references discourages editors from combining multiple cites for the same statement of fact into one footnote; this is commonly seen in many articles, where one single sentence will end with half a dozen or more footnotes. This causes distracting and confusing clutter within the article body (particularly given the inconsistent numbering that results from ref name usage), all apparently for no other purpose than to condense the footnotes section. Which is not a sensible trade-off.
- 3. The use of ref name for all duplicate references discourages editors from adding explanative annotations (or often even specific page cites) to footnotes for separate citations to the same reference, where those citations are in support of different statements of fact within the article.
The first drawback is a simple fact, a consequence of how ref name works. The second and third drawbacks are easily observed in practice, even though ref name doesn't absolutely prohibit the individualization of some cites. But it definitely discourages it, whether because editors think they can't deviate from using a ref name for all instances of a particular cite once that's begun within an article, or because editors are simply more likely to lean on the crutch that ref name provides of relieving the need to retype or copy a full ref in a new place. And certainly editors are less likely to expand upon the content within individual footnotes when they aren't already maintained in separate form. postdlf (talk) 15:02, 15 November 2009 (UTC)
Discussion of Postdlf's three arguments against named references
- 1. Because all cites to the same ref name will display the same numbered superscript link (i.e., footnote) within the body of the article, the use of ref notes will cause the numbering of those footnotes within the article body to be confusingly out of order.
- Not so, AWB can and does order cites so they appear not as but .Rich Farmbrough, 13:58, 18 November 2009 (UTC).
- I think the point is, rather, they will appear as . That's an odd way to count to 7. Christopher Parham (talk) 14:19, 18 November 2009 (UTC)
- Sorry, I don't follow the last comment. Why would an article ever cite exactly the same source twice at the same place? I've never seen a Misplaced Pages article that said anything like "It is possible that regressive autism is a specific subtype." and if I did see one, I'd change that trailing "" to "" without a second thought. Eubulides (talk) 20:02, 18 November 2009 (UTC)
- Sorry, I assumed it was clear that there were intervening sentences each being cited; that's not one block of citations for a single fact. I mean A. B. C. Christopher Parham (talk) 20:08, 18 November 2009 (UTC)
- AWB, and all other automated tools that I am aware of, don't change the layout of references with are separated by text at all. Meaning, in the A. B. C. example which you gave, AWB wouldn't change that layout at all. It may convert the second reference to a named ref, which is important and a good change because a)it's shorter, and b)doing so prevents later editors from changing one ref without changing the others (it helps maintenance).
— V = I * R (talk to Ω) 20:34, 19 November 2009 (UTC)- Yes, the problem is that it changes the second reference to a named reference. It's not necessarily shorter (oftentimes longer when the change is made by AWB). However, it is easy to revert AWB editors so I don't see this as a serious issue. Christopher Parham (talk) 22:27, 19 November 2009 (UTC)
- I guess that I really don't understand the criticism of using the name parameter. It's not as though the parameter is a recent addition either, which actually starts me scratching my head when it comes to this discussion. The increase in maintainability and readability (both of the page source and the list of references) easily justifies the general use of the name parameter, though.
— V = I * R (talk to Ω) 22:56, 19 November 2009 (UTC)- To be clear, this discussion as I understand it is about whether to make use of the name parameter for duplicate references mandated or encouraged by this guideline. I've not problem with it as an option for editors, where it is indeed quite longstanding. I don't really see much improvement in maintainability or readability that comes from use of this feature, and it has a number of features that can make editing more difficult, including separating the content of a citation from the text it references in the edit box (getting them together was one of the main innovations behind cite.php). I can see that if you are using citation templates, it might be a helpful feature to reduce the size of references. Sometimes it's useful, others not; I think maintaining its optional status is the best choice. Christopher Parham (talk) 14:30, 20 November 2009 (UTC)
- I guess that I really don't understand the criticism of using the name parameter. It's not as though the parameter is a recent addition either, which actually starts me scratching my head when it comes to this discussion. The increase in maintainability and readability (both of the page source and the list of references) easily justifies the general use of the name parameter, though.
- Yes, the problem is that it changes the second reference to a named reference. It's not necessarily shorter (oftentimes longer when the change is made by AWB). However, it is easy to revert AWB editors so I don't see this as a serious issue. Christopher Parham (talk) 22:27, 19 November 2009 (UTC)
- AWB, and all other automated tools that I am aware of, don't change the layout of references with are separated by text at all. Meaning, in the A. B. C. example which you gave, AWB wouldn't change that layout at all. It may convert the second reference to a named ref, which is important and a good change because a)it's shorter, and b)doing so prevents later editors from changing one ref without changing the others (it helps maintenance).
- Sorry, I assumed it was clear that there were intervening sentences each being cited; that's not one block of citations for a single fact. I mean A. B. C. Christopher Parham (talk) 20:08, 18 November 2009 (UTC)
- Sorry, I don't follow the last comment. Why would an article ever cite exactly the same source twice at the same place? I've never seen a Misplaced Pages article that said anything like "It is possible that regressive autism is a specific subtype." and if I did see one, I'd change that trailing "" to "" without a second thought. Eubulides (talk) 20:02, 18 November 2009 (UTC)
- I think the point is, rather, they will appear as . That's an odd way to count to 7. Christopher Parham (talk) 14:19, 18 November 2009 (UTC)
- Not so, AWB can and does order cites so they appear not as but .Rich Farmbrough, 13:58, 18 November 2009 (UTC).
- 2. The use of ref name for all duplicate references discourages editors from combining multiple cites for the same statement of fact into one footnote; this is commonly seen in many articles, where one single sentence will end with half a dozen or more footnotes. This causes distracting and confusing clutter within the article body (particularly given the inconsistent numbering that results from ref name usage), all apparently for no other purpose than to condense the footnotes section. Which is not a sensible trade-off.
- I don't see what this has to do with the WP:CITE document and/or named references, though. I agree that having a single sentence (or even a paragraph) referenced to 8 different news stories get's to be ridiculous, but that problem isn't something which should be addressed through the reference name parameter at all really. An actual solution to that problem is to editorially select 2 or maybe 3 of the more important references and get rid of the others.
— V = I * R (talk to Ω) 20:38, 19 November 2009 (UTC)
- I don't see what this has to do with the WP:CITE document and/or named references, though. I agree that having a single sentence (or even a paragraph) referenced to 8 different news stories get's to be ridiculous, but that problem isn't something which should be addressed through the reference name parameter at all really. An actual solution to that problem is to editorially select 2 or maybe 3 of the more important references and get rid of the others.
- 3. The use of ref name for all duplicate references discourages editors from adding explanative annotations (or often even specific page cites) to footnotes for separate citations to the same reference, where those citations are in support of different statements of fact within the article.
- I understand this criticism, but I don't find it very convincing. That sort of practice really requires larger editorial planning and work in general, so I think that it falls a bit outside of the scope of this discussion. However, just to take this criticism at face value, I am generally willing to accept this as a tradeoff to using the name parameter in the interests of reducing lists of references to more managable numbers. What may be acceptable in small article with up to a dozen references isn't as good of an idea in an article with 120 references (which often could be more like 300 references if all instances of use of the name parameter were deprecated).
— V = I * R (talk to Ω) 20:44, 19 November 2009 (UTC)
- I understand this criticism, but I don't find it very convincing. That sort of practice really requires larger editorial planning and work in general, so I think that it falls a bit outside of the scope of this discussion. However, just to take this criticism at face value, I am generally willing to accept this as a tradeoff to using the name parameter in the interests of reducing lists of references to more managable numbers. What may be acceptable in small article with up to a dozen references isn't as good of an idea in an article with 120 references (which often could be more like 300 references if all instances of use of the name parameter were deprecated).
- There is no reason for this to happen if combined cites are appropriate. In fact there is no reason a footnote cannot read:
- {1] See for example Bloggs , chapter 4, LeBowski especially the section on pigeons, and Bond - Annals of Defenstration, July 1999 pp228.
- Doubtless the much requested "page number" fix would provide relief from these uncertainties too.
- Anyway no-one is talking about mandatory, we are talking about names being provided for duplicate refs.
- Rich Farmbrough, 13:58, 18 November 2009 (UTC).
- There is no reason for this to happen if combined cites are appropriate. In fact there is no reason a footnote cannot read:
- I agree with Postdlf on all three points; the implementation of named references, particularly the way they are displayed in a non-intuitive manner that defies the conventions found in most printed and online matter (sequentially numbered footnotes), is not ideal. I support use of named references being optional and regularly revert AWB edits to add them when they pop up on my watchlist. Christopher Parham (talk) 14:27, 18 November 2009 (UTC)
- You know, your "non-intuitive manner" is the standard convention in many scientific publications, including both Science and Nature. It's a choice of style and not a particularly unusual one. Dragons flight (talk) 20:14, 18 November 2009 (UTC)
- I'm aware this is a standard in some fields. The use of the style for articles on scientific topics would seem eminently reasonable, if that is the style with which writers and readers in that field would be most familiar. Nobody is suggesting it not be an option. Christopher Parham (talk) 20:25, 18 November 2009 (UTC)
- This is another "Oh I'm a professor, they re-used ref-1 after ref-4 my head is going to explode." type effect? Because the people who read these learned journals can only cope with increasing numbers, while we expect Little Johnny to grasp abbreviations like pp. ed. Ann. Phy. and so forth intuitively? Or is it the "ZOMG the people who write in APA style will never take WP seriously if we do thus and so?" Because if that's the problem - news flash - "Those that mind don't matter and those that matter don't mind." Storm meet teacup. Rich Farmbrough, 15:03, 19 November 2009 (UTC).
- I am quite happy to call this a storm in a teacup and keep the current language in which the use of named references to consolidate duplicate citations is optional. Christopher Parham (talk) 16:28, 19 November 2009 (UTC)
- I came late to this discussion and would want to be counted among those who want to keep the optional nature of named notes. I much prefer consolidating several references in a single note (sometimes with comments like Jones , p. 123 and Smith, p. 45 but Brown, p. 67 disagrees.) to a string of numbers at the end of a sentence. But then I'm a humanist (who also writes in the sciences and social sciences sometimes). --SteveMcCluskey (talk) 16:45, 19 November 2009 (UTC)
- I am quite happy to call this a storm in a teacup and keep the current language in which the use of named references to consolidate duplicate citations is optional. Christopher Parham (talk) 16:28, 19 November 2009 (UTC)
- You know, your "non-intuitive manner" is the standard convention in many scientific publications, including both Science and Nature. It's a choice of style and not a particularly unusual one. Dragons flight (talk) 20:14, 18 November 2009 (UTC)
- Personally I think it would make it easier for editors to move towards one (or a small number of) "recommended" styles rather than allowing editors to do whatever they want. Having an encyclopedia broken up into a dozen different styles makes it harder for editors working in one area to move into other areas. I have a personal opinion about what should be recommended, but I'd say having recommendations is more important than exactly what they are. Dragons flight (talk) 17:00, 19 November 2009 (UTC)
Basically, I think that we should essentially "do nothing" here. I agree that it would make things easier if we could all agree on one (or a small number of) "recommended" styles, but as is probably evident to most here already I do see any firm consensus developing on this topic (for some reason). The curious fact (to me) is that we already use our own house reference style, so it's not as though we're forcing people to choose some standard over another. The real issue is that our house referencing style is ad hoc I guess, but it seems to me that is at least somewhat intentional.
— V = I * R (talk to Ω) 20:58, 19 November 2009 (UTC)
- I personally am in favor of the style using names to group references, since it makes articles shorter. Not doing that makes sense in a multi-page printed edition, but not on a one-page-a-time website. That be as it may, I can understand the arguments against it, and therefore agree that automated tools should not interfere in this matter (unless it would be decided to use one style only). Debresser (talk) 23:38, 14 December 2009 (UTC)
Wait, why wouldnt we want named references. Consider this:
Content Section
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque et eros lorem. Nullam mi dui, euismod id ultrices ut, posuere id ante. Maecenas metus felis, bibendum in iaculis eget, pretium vitae turpis. Donec vel condimentum sem. Quisque sit amet pharetra libero. Aliquam dolor urna, sagittis et bibendum sit amet, pulvinar sed sem. PellentesqueCite error: A
<ref>
tag is missing the closing</ref>
(see the help page). nec porta nibh risus eu massa. Aliquam imperdiet fringilla metus eu aliquet. Maecenas convallis nulla id sem euismod suscipit. Etiam fringilla lorem placerat augue convallis cursus. Nullam quis lectus id risus volutpat sagittis. Integer quis turpis lacus. Praesent adipiscing nisl porttitor lacus gravida lacinia. Fusce lectus justo, rutrum vitae consectetur sit amet, faucibus vitae lacus.References
- google.com
- google.com
- google.com
- Versus
Content Section
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque et eros lorem. Nullam mi dui, euismod id ultrices ut, posuere id ante. Maecenas metus felis, bibendum in iaculis eget, pretium vitae turpis. Donec vel condimentum sem. Quisque sit amet pharetra libero. Aliquam dolor urna, sagittis et bibendum sit amet, pulvinar sed sem. Pellentesque imperdiet, dui at ornare rutrum, urna est vulputate enim, nec porta nibh risus eu massa. Aliquam imperdiet fringilla metus eu aliquet. Maecenas convallis nulla id sem euismod suscipit. Etiam fringilla lorem placerat augue convallis cursus. Nullam quis lectus id risus volutpat sagittis. Integer quis turpis lacus. Praesent adipiscing nisl porttitor lacus gravida lacinia. Fusce lectus justo, rutrum vitae consectetur sit amet, faucibus vitae lacus.
References
- I like the second one more, because the same source is grouped into one reference, instead of having duplicates. Tim1357 (talk) 16:27, 21 November 2009 (UTC)
- Evidently you would wish to employ this optional feature in articles you are writing. Others prefer not to use them for the reasons cited above. Christopher Parham (talk) 00:40, 22 November 2009 (UTC)
- I like the second one more, because the same source is grouped into one reference, instead of having duplicates. Tim1357 (talk) 16:27, 21 November 2009 (UTC)
SmackBot is at it again
SmackBot is again changing reference system. I am taking this to the administrators' page now. --Hegvald (talk) 10:02, 5 December 2009 (UTC)
The discussion from Misplaced Pages:Administrators' noticeboard/Incidents is now archived at Misplaced Pages:Administrators' noticeboard/IncidentArchive584#SmackBot changing referencing style, again (dearchived). --Hegvald (talk) 14:30, 13 December 2009 (UTC)
Bot from changing multi line refs to one line refs
Currently many refs are spread over 10 - 15 lines of text. Is there a bot that could put the ref all on one line to make editing easier? For example this:
{{Cite journal | last = | first = | authorlink = | coauthors = | title = | journal = | volume = | issue = | pages = | publisher = | location = | date = | url = | issn = | doi = | id = | accessdate = }}
to this
{{cite journal| author = | title = | journal = | volume = | issue = | pages = | year = | pmid = | doi = | month =| issn =}}
On highly referenced pages it is hard to find the text between the citations in the first example and much easier in the second. Thanks.Doc James (talk · contribs · email) 15:12, 4 December 2009 (UTC)
- Have posted over at Misplaced Pages:Bot_requests#Bot_from_changing_multi_line_refs_to_one_line_refs to find consensus there.Doc James (talk · contribs · email) 16:03, 4 December 2009 (UTC)
- I suspect that the bot will have a poor reception as many people prefer the expanded format - that is why you see it used so often. =) Christopher Parham (talk) 21:14, 4 December 2009 (UTC)
- I have been told that a simple script can be created to give everyone the appearance that they wish. Thus all shall be happy.Doc James (talk · contribs · email) 21:47, 4 December 2009 (UTC)
- "put the ref all on one line to make editing easier" -- huh??? It is the expanded version that is much easier to edit, surely. -- Alarics (talk) 23:03, 6 December 2009 (UTC)
- Indeed. When it's broken up into multiple lines, it's easy to find the end of the {{cite}} tag -- when it's all collapsed into one line, the content and references blur together. --SarekOfVulcan (talk) 18:27, 7 December 2009 (UTC)
- I find advantages for both forms. Maybe a compromise? Allow the compression to one line and leave the trailing
}}
on a line by itself. That would allow finding the end of the citation more easily. Vegaswikian (talk) 00:12, 15 December 2009 (UTC)- That's fine if you are editing only the text and not the footnote, but if you are editing the footnote itself (which I often am, because I constantly find them being done incorrectly) it doesn't help at all. -- Alarics (talk) 08:23, 15 December 2009 (UTC)
- I find advantages for both forms. Maybe a compromise? Allow the compression to one line and leave the trailing
- Indeed. When it's broken up into multiple lines, it's easy to find the end of the {{cite}} tag -- when it's all collapsed into one line, the content and references blur together. --SarekOfVulcan (talk) 18:27, 7 December 2009 (UTC)
- "put the ref all on one line to make editing easier" -- huh??? It is the expanded version that is much easier to edit, surely. -- Alarics (talk) 23:03, 6 December 2009 (UTC)
- I have been told that a simple script can be created to give everyone the appearance that they wish. Thus all shall be happy.Doc James (talk · contribs · email) 21:47, 4 December 2009 (UTC)
- I suspect that the bot will have a poor reception as many people prefer the expanded format - that is why you see it used so often. =) Christopher Parham (talk) 21:14, 4 December 2009 (UTC)
- Have posted over at Misplaced Pages:Bot_requests#Bot_from_changing_multi_line_refs_to_one_line_refs to find consensus there.Doc James (talk · contribs · email) 16:03, 4 December 2009 (UTC)
- (outdent) I don't think this is the sort of thing which a bot should be let loose to enforce wikipedia-wide. Wtmitchell (talk) (earlier Boracay Bill) 06:27, 15 December 2009 (UTC)
- I usually believe that the vertical style is more readable, but neither style is nearly as readable as the new list-defined references feature. List-defined references allow you to stick all of the messy templates in the ==References== section, and use only <ref name=blah /> throughout the text. WhatamIdoing (talk) 20:25, 15 December 2009 (UTC)
Making pages faster to load
SlimVirgin recently pointed out in Misplaced Pages talk:Featured article candidates #Citation templates that citation templates make pages very slow to load, and that this is hurting the editing of featured articles. This is a problem I've noticed as well. It's pretty bad: when I'm editing, citation template formatting consumes the vast majority of the time spent waiting for pages to load.
- I have developed experimental versions of {{cite journal}} and {{cite book}} that attack this problem. In my tests with Autism they have reduced page generation times by over a factor of three (from about 30 seconds to about 8 seconds for that very long article); this is a dramatic improvement to the editing experience.
- These templates also reduce the size of the HTML generated, by about 35% in my test case (this is counting all the HTML generated for the entire article, including all the article text); this is a noticeable improvement in the reading experience, especially for readers who are not on broadband (this is still the majority of the Internet).
- These templates use Vancouver system format rather than the hybrid style currently used by {{cite journal}} etc., partly because the Vancouver system is a bit more efficient textually, and partly because the Vancouver system is standardized in the biomedical field, which is where I spend most of my editing time.
- I'm currently calling these templates {{cit journal}} and {{cit book}}; the name is just a character shorter so that it's easy to use the Diberri tool to generate calls to these templates. You can see an example of their use in my sandbox, which is a copy of this version of Autism, modified only by replacing "
{{cite
" with "{{cit
". - The templates need documentation and some more testing. Also, substitutes for the other
cite XXX
templates are still needed: currently {{cit web}} is just a redirect to {{cit journal}} (I haven't verified whether this satisfies the Vancouver rules, but I hope it does), and less-commonly-used templates like {{cite video}} haven't been rewritten yet. - Feedback is welcome.
Eubulides (talk) 21:29, 6 December 2009 (UTC)
- You made an edit to Template:Cite journal/doc that belongs on the doc for your own template; I think I fixed it but check. Otherwise, I think it is a welcome change. One problem with templates like these is feature creep and there's obviously a major problem when this bloat, multiplied hundreds of times, is significantly slowing page load times. Christopher Parham (talk) 03:06, 7 December 2009 (UTC)
- This is OK, although I won't use it because of where it puts the date. I don't know really understand COInS and don't see how that information is omitted (could try to explain that in a less technical way or provide an example). The Vancouver style of putting the date at the end beside the journal volume was poorly thought out, and I hope this doesn't spur a major shift to that (I will fight it); author-date citation styles are common in the sciences, extremely common in Misplaced Pages (including medicine articles) despite footnotes, and put a very important piece of information where it's easy to see rather than hard to see. Differences will be jarring to readers, who are not always scientific elites and typically expect author-date. The omission of the wikilinking PMID and doi is something you should do to the regular templates. II | (t - c) 04:01, 7 December 2009 (UTC)
- There has been recent discussion at {{cite journal}} about standardising Misplaced Pages "cite X" style in a document outside of the implementation, and describing it as an optional "in house" style. Additionally, I would prefer the existence of a Turabian to allow history articles the full breadth of citation requirements they often need. There is agitation regarding the quality of citations in general. Fifelfoo (talk) 04:13, 7 December 2009 (UTC)
- It would make sense to support Turabian style better, too, for fields that prefer it; that would probably need to be a different template, as the details differ so much from the Vancouver system. Turabian style also puts dates after the journal title and volume, so it's not like the Vancouver system is odd in this regard, and it's certainly not an issue of "scientific elites". The vast majority of readers won't care or be affected by the location of the date, and an advantage of using a standard style such as Vancouver or Turabian is that it should lessen the number of disputes over trivia like date placement. Eubulides (talk) 17:28, 7 December 2009 (UTC)
- But will it increase the number of disputes over which style should be used? Anomie⚔ 19:11, 7 December 2009 (UTC)
- No answer on the COinS question? Chicago "humanities style notes and bibliography" puts the date in parentheses after the journal volume
and edition, as does Turabian (they're related) . Vancouver puts the date before the journaltitle andvolume. You may want to fix your error (you said "Turabian style also puts dates after the journal title and volume, so it's not like the Vancouver system is odd in this regard", which is incorrect). Chicago scientific author-date (generally parenthetical) style puts the date after the author, and the vast majority of Misplaced Pages uses this style with footnotes.Whether or not readers will care is impossible to say, unless you're a mindreader. However, it is clear that this will just introduce a lot more complexity into citations for uncertain benefit. I know plenty about citation styles, but if we move towards "standard styles" I will have probably to spend a significant amount of my time referring to them. The placement of dates for books, newspapers, and other things gets more complicated. We'll have to have several different autogeneration software options, newbies (and people like me) will continually have to be corrected. Turabian, "Like the Chicago Manual of Style on which it is based ... offers those in the natural and social sciences the option of using an author-date system with notes and parenthetical references" . I also suspect that Chicago allows the hybrid style currently used in Misplaced Pages articles. Anyway, I can't stop you from doing this, but you'll be needing consensus to change articles with a current standard to a Vancouver or Chicago/Turabian humanities style. And I think it's a very bad idea, and a bit sad that you're going to campaign for greater complexity in citations. For history and maybe humanities articles which rely on less clearly published works, sure, a different style may be needed, but not for scientific articles. II | (t - c) 20:02, 7 December 2009 (UTC)
- Please see #Complexity in citations below. Eubulides (talk) 20:47, 7 December 2009 (UTC)
- COinS is a type of metadata; the citation templates use the info users input into the template to create these metadata tags and attach them to Misplaced Pages pages. Eubulides's versions omit this feature; whether people have actually found this feature useful in the existing templates is unknown. Christopher Parham (talk) 20:09, 7 December 2009 (UTC)
- In practice the COinS data significantly hurts all readers and editors, for the benefit of a very few readers whose needs can easily be satisfied by some other method. Eubulides (talk) 20:47, 7 December 2009 (UTC)
- What in the template produces COinS? The cit journal template has these parameters: {{cit journal |author= |title= |journal= |date= |volume= |pages= |url= |doi= |pmid= |pmc= }}. The cite journal template: {{cite journal |author= |title= |journal= |volume= |issue=|pages= |year= |doi= }} What's producing the COinS? II | (t - c) 00:03, 8 December 2009 (UTC)
- The {{cit journal}} template does not produce COinS, for efficiency reasons. The {{cite journal}} template does, and this generates a COinS tag that contains article title, author names, date, series, volume, issue, edition, etc., etc., all derived from the template argument. This contributes significantly to article size and to page load time. Eubulides (talk) 00:40, 8 December 2009 (UTC)
- What exactly in the cite journal template example above is producing the COinS, which does not exist in cit journal? There should be something textual producing it. I don't see what it is. II | (t - c) 01:04, 8 December 2009 (UTC)
- There is nothing special in the arguments to {{cite journal}} that causes the template to produce COinS. {{cite journal}} always produces COinS, no matter what, just as it always produces a textual citation in HTML, no matter what. Eubulides (talk) 02:21, 8 December 2009 (UTC)
- What exactly in the cite journal template example above is producing the COinS, which does not exist in cit journal? There should be something textual producing it. I don't see what it is. II | (t - c) 01:04, 8 December 2009 (UTC)
- The {{cit journal}} template does not produce COinS, for efficiency reasons. The {{cite journal}} template does, and this generates a COinS tag that contains article title, author names, date, series, volume, issue, edition, etc., etc., all derived from the template argument. This contributes significantly to article size and to page load time. Eubulides (talk) 00:40, 8 December 2009 (UTC)
- No answer on the COinS question? Chicago "humanities style notes and bibliography" puts the date in parentheses after the journal volume
- But will it increase the number of disputes over which style should be used? Anomie⚔ 19:11, 7 December 2009 (UTC)
- It would make sense to support Turabian style better, too, for fields that prefer it; that would probably need to be a different template, as the details differ so much from the Vancouver system. Turabian style also puts dates after the journal title and volume, so it's not like the Vancouver system is odd in this regard, and it's certainly not an issue of "scientific elites". The vast majority of readers won't care or be affected by the location of the date, and an advantage of using a standard style such as Vancouver or Turabian is that it should lessen the number of disputes over trivia like date placement. Eubulides (talk) 17:28, 7 December 2009 (UTC)
- Regardless of what format we use, I agree there is definitely a need to improve the efficiency of the citation templates. Perhaps some of the more common ones could be split into two templates. One would include only the basic fields which would be sufficient for 90% of usage, the other would include all the less-common fields as well. Mr.Z-man 20:36, 7 December 2009 (UTC)
- Unfortunately the main problem is the sheer number of subtemplates invoked, and these templates would still be invoked even if there were separate templates for common patterns. By itself I don't think this sort of factoring will buy much. Eubulides (talk) 20:47, 7 December 2009 (UTC)
- The problem is not really the number of templates, its the complexity of the templates - the number of parser functions and variable substitutions it has to do for each template. Even when you're not using the trans_chapter field in {{cite book}} the parser still has to set it to an empty string and evaluate it in a bunch of parser functions. If simplified versions were made (using a simplified version of Template:Citation/core, or not using a single meta-template for all types), it could speed things up significantly. Its mainly feature bloat that's causing them to be so slow. Mr.Z-man 00:14, 8 December 2009 (UTC)
- OK, well then, please consider {{cit journal}} to be a first cut at that. It rarely uses parser functions and it tries to substitute each variable at most twice, once to see whether it's empty and once to use it. (It doesn't always succeed in that.) But even if this technique were applied to {{cite journal}}, it would still be significantly worse than {{cit journal}} due to the COinS data. Eubulides (talk) 00:40, 8 December 2009 (UTC)
- Outputting citation metadata is one of the fundamental purpose of citation templates; ignoring metadata is not a very good solution to template bloat. It looks to me like the better solution would be to implement the core template in a Mediawiki extension, and get rid of all the parserfunctions entirely. — Carl (CBM · talk) 03:11, 8 December 2009 (UTC)
- As long as the extension is fast, well written, and provides services for a wide variety of citation styles, media types, and various "tricks" of the trade. We've been locked into an in-house style by happenstance, and one which has a confusing variety of terms for the title of the object cited versus the work contained in (as one example of poor interface). The in-house style we've picked has a number of crudities due to its accretion of features, which is a result mostly of poor social decision-making as a community rather than any criticism of the template writers. Fifelfoo (talk) 03:16, 8 December 2009 (UTC)
- If such an extension could be built, that would be a good thing. One other point: the extension needs to generate COinS data conditionally. That is, it should be possible to turn off COinS, and the default (when reading an ordinary Misplaced Pages page) should have COinS disabled, so that the vast majority of readers who don't need COinS aren't penalized by its large overhead (over a
50%20% overhead for Autism, in terms of number of bytes that go across the Internet). Eubulides (talk) 03:28, 8 December 2009 (UTC)- The 50% figure seems high; the HTML is 425kb and the data in the COinS title fields is 92kb. — Carl (CBM · talk) 03:43, 8 December 2009 (UTC)
- Sorry, you're right, it was a 23% overhead by my measurements (22% in yours, no doubt because the article mutated between measurements). The 58% figure includes all the bloat generated by the citation templates; only about half of that is COinS. I struck the wrong figure in my previous comment, and inserted the correct one. Eubulides (talk) 18:42, 8 December 2009 (UTC)
- The 50% figure seems high; the HTML is 425kb and the data in the COinS title fields is 92kb. — Carl (CBM · talk) 03:43, 8 December 2009 (UTC)
- If such an extension could be built, that would be a good thing. One other point: the extension needs to generate COinS data conditionally. That is, it should be possible to turn off COinS, and the default (when reading an ordinary Misplaced Pages page) should have COinS disabled, so that the vast majority of readers who don't need COinS aren't penalized by its large overhead (over a
- As long as the extension is fast, well written, and provides services for a wide variety of citation styles, media types, and various "tricks" of the trade. We've been locked into an in-house style by happenstance, and one which has a confusing variety of terms for the title of the object cited versus the work contained in (as one example of poor interface). The in-house style we've picked has a number of crudities due to its accretion of features, which is a result mostly of poor social decision-making as a community rather than any criticism of the template writers. Fifelfoo (talk) 03:16, 8 December 2009 (UTC)
- Outputting citation metadata is one of the fundamental purpose of citation templates; ignoring metadata is not a very good solution to template bloat. It looks to me like the better solution would be to implement the core template in a Mediawiki extension, and get rid of all the parserfunctions entirely. — Carl (CBM · talk) 03:11, 8 December 2009 (UTC)
- OK, well then, please consider {{cit journal}} to be a first cut at that. It rarely uses parser functions and it tries to substitute each variable at most twice, once to see whether it's empty and once to use it. (It doesn't always succeed in that.) But even if this technique were applied to {{cite journal}}, it would still be significantly worse than {{cit journal}} due to the COinS data. Eubulides (talk) 00:40, 8 December 2009 (UTC)
- The problem is not really the number of templates, its the complexity of the templates - the number of parser functions and variable substitutions it has to do for each template. Even when you're not using the trans_chapter field in {{cite book}} the parser still has to set it to an empty string and evaluate it in a bunch of parser functions. If simplified versions were made (using a simplified version of Template:Citation/core, or not using a single meta-template for all types), it could speed things up significantly. Its mainly feature bloat that's causing them to be so slow. Mr.Z-man 00:14, 8 December 2009 (UTC)
- Unfortunately the main problem is the sheer number of subtemplates invoked, and these templates would still be invoked even if there were separate templates for common patterns. By itself I don't think this sort of factoring will buy much. Eubulides (talk) 20:47, 7 December 2009 (UTC)
Complexity in citations
- "Vancouver puts the date before the journal title and volume." This seems inconsistent with an earlier comment by the same editor: "The Vancouver style of putting the date at the end beside the journal volume". Anyway, let me try to resolve the confusion. The Vancouver system as specified by the NLM, which is what I've based the template on (see Citation Rules with Examples for Journal Articles) puts the date after the journal title and edition, just as Turabian does. Because Vancouver and Turabian are similar in this respect, I'm not sure what the date-format fuss is about.
- "And I think it's a very bad idea, and a bit sad that you're going to campaign for greater complexity in citations." No matter what change is made, it's going to increase complexity in the short run. I have asked for the bloat to be removed from {{cite journal}} on multiple occasions, and each time the answer has been no, for various reasons. I see no improvement on the horizon: for good or ill, these templates seem to have become ossified. The result is an environment in which it is becoming more and more intolerable to edit. If I want to fix this, apparently I have to work outside the {{cite journal}} system; so I might as well use a well-specified format rather than the hodgepodge format that {{cite journal}} uses now: after all, that hodgepodge is part of the reason {{cite journal}} is so slow.
Eubulides (talk) 20:47, 7 December 2009 (UTC)
- Sample citations:
- Vancouver from the template page: Baird G, Cass H, Slonims V. Diagnosis of autism. BMJ. 2003;327:488–93. doi:10.1136/bmj.327.7413.488. PMID 12946972.
- Chicago : John Maynard Smith, “The Origin of Altruism,” Nature 393 (1998): 639.
- Turabian: Christopher Policano, "Dueling Colas," Public Relations Journal 41, no. 11 (1985): 16.
- You can see what I'm referring to: Vancouver puts the date before the volume, Chicago/Turabian put the date after. It's not the end of the world, but it illustrates the irritating inconsistencies among citation systems. Earlier Eubulides said "Turabian style also puts dates after the journal title and volume" - yes, it does, but Vancouver does not. I'll admit I got confused and used the word title when I meant volume. I'll admit that I like Vancouver better than Chicago. However, the NLM book does not have documentation on how to cite an online newspaper. It's designed for medicine, and complex articles with medical and social aspects will probably be tricky to deal with. The NLM example on citing an internet source is annoyingly crowded and complicated : Hooper JF. Psychiatry & the Law: Forensic Psychiatric Resource Page . Tuscaloosa (AL): University of Alabama, Department of Psychiatry and Neurology; 1999 Jan 1 . Available from: http://bama.ua.edu/~jhooper/.
- As Christopher said, how many people benefit from COinS is unknown. Thus far there hasn't been a grounded, concrete explanation of COinS, how it is produced from the template HTML, and who uses it (and for what). II | (t - c) 00:03, 8 December 2009 (UTC)
- Sample citations:
- Ah, sorry, I see that I contributed to the confusion. When I read your first comment about author-date styles being common, and Vancouver being bad because it puts the date at the end, I thought the key point was whether the date immediately followed the author name (as in {{cite journal}}) or followed somewhere after the article and journal titles (as in Vancouver, Turabian, etc., although the details differ in relatively minor ways, and I agree this is irritating but I cannot fix that I'm afraid).
- "However, the NLM book does not have documentation on how to cite an online newspaper." That topic is covered in Chapter 8; there is a section 23 Newspaper article on the Internet. I don't expect editors to routinely use all the NLM-required fields such as "", though they may if they wish to. Currently {{cit news}} is a crude redirect to {{cit journal}}, so it doesn't implement all the stuff in that chapter on newspapers; it shouldn't be that hard to fix this but I haven't gotten around to it, as I wanted to get journals right first.
- I, like most readers and editors, don't use COinS and don't benefit from it. If it didn't cost much I wouldn't mind, but it's slowing down page edits by a factor of three or more, and it's bloating pages by 50%, so it needs to go back to the drawing board. Sorry, that's about all I know about it; I don't really want to know about COinS internals.
- Eubulides (talk) 00:37, 8 December 2009 (UTC)
- For what it is worth, I also think COinS template support is a poor solution to a problem nearly nobody has. It is intended for computers so Misplaced Pages should provide a separate computer-readable (e.g., XML) interface for that purpose if it is felt important. For many citations, the ISBN or PMID is the only key field a computer program needs; the rest is baggage. Colin° 09:13, 8 December 2009 (UTC)
- Turabian and Chicago require Journal titles in ital so changed the examples. Turabian also has at least 5 presentation formats, refs / reflist; footnote / shortfoot / bib. Fifelfoo (talk) 01:05, 8 December 2009 (UTC)
- Yes, isn't it wonderful that these standards have so many options? Anyway, it's OK to put journal titles in italics in the Vancouver system, as it says nothing about fonts and does not care about fonts. My template uses italics for such titles because that's the Misplaced Pages style for such works (regardless of whether they occur in citations). Eubulides (talk) 02:21, 8 December 2009 (UTC)
- Its wonderfully necessary for certain fields; for other fields its a useless bell or whistle. Fifelfoo (talk) 02:32, 8 December 2009 (UTC)
- Yes, isn't it wonderful that these standards have so many options? Anyway, it's OK to put journal titles in italics in the Vancouver system, as it says nothing about fonts and does not care about fonts. My template uses italics for such titles because that's the Misplaced Pages style for such works (regardless of whether they occur in citations). Eubulides (talk) 02:21, 8 December 2009 (UTC)
Writing a citation formatter as a Mediawiki extension
It seems to me that the better solution would be to implement a Mediawiki extension that formats citations. This could have parameters for many styles, and would eliminate the slowness of parserfunctions entirely. — Carl (CBM · talk) 03:14, 8 December 2009 (UTC)
- That would no-doubt help with the page-load problem, though it would raise a new problem of its own: it might well make it harder for editors to develop their own citation formatting styles, or to improve on the hardwired styles. I assume that the extension would let one turn off COinS selectively, so that an ordinary page read would not have COinS data at all, but a user could extract COinS data using a gadget or button or suchlike. Eubulides (talk) 03:28, 8 December 2009 (UTC)
- I am not sure why you believe COinS is a significant factor in page loading times, apart from the complexity of templates that produce it. Do you have any data that suggests that the volume of COinS data is large enough to make a difference? For example, Autism, with 171 references, is only a 425kb HTML file, and this is at the extreme end of the spectrum. Of that 425kb, about 92kb seems to be in the "title" fields corresponding to COinS, which comes to about 500 bytes per reference for the COinS data itself, plus the tag overhead. — Carl (CBM · talk) 03:32, 8 December 2009 (UTC)
- Sure: I've created sandbox versions of {{cite journal}} etc. that differed only in not generating COinS data. They showed that COinS imposed a 23% overhead in the total size of the HTML for Autism. (This is not the total size of the references section; it's the total size of the whole page.) I measured the same thing for Virus, and COinS imposed a 26% overhead for that article. Please see Template talk:Cite journal/Archive 2009 October #COiNS bloat for details.
- Not all the bloat in the output of {{cite journal}} is COinS bloat. Some of it is other stuff. I recently took Autism and generated one version with the experimental templates (which omit the bloat), and one with the standard templates (which generate COinS data and the other bloat). The former contains 265,716 bytes of HTML, the latter 418,969 bytes. The page's HTML is thus 58% larger than it has to be, for no particularly good reason: there's no more information in the bloated version than in the trimmed-down version.
- Eubulides (talk) 04:01, 8 December 2009 (UTC)
- The COinS data you are omitting is more information, in a somewhat trivial way. — Carl (CBM · talk) 13:42, 8 December 2009 (UTC)
- OK, but I was talking about information in the sense of information theory; the COinS data is completely redundant with the information already presented to the human reader in textual form, and so it conveys no additional information in that sense. Eubulides (talk) 17:58, 8 December 2009 (UTC)
- The COinS data you are omitting is more information, in a somewhat trivial way. — Carl (CBM · talk) 13:42, 8 December 2009 (UTC)
- Even if the added cost is rather small, say 5-10%, this is still significant for a feature that virtually no users benefit from. There must be a solution that can deliver this information to users who require it as promoted, not to every user every time a page is loaded. Should usage of this feature gain more widespread traction, it would be easy to turn back on. Christopher Parham (talk) 13:34, 8 December 2009 (UTC)
- One main point of using citation templates is to add the citation meta information, so that we don't have to type it by hand into articles. A 10% increase in page length would not be particularly significant. — Carl (CBM · talk) 13:42, 8 December 2009 (UTC)
- The problem is that the benefit, so far as I can tell, is not significant at all, and for users with slow connections, the added time is meaningful. If it could be explained who uses this feature that could not easily manage it another way (say writing a tool to read the source and create appropriate meta tags when needed), that would be useful in understanding why anyone should care. Christopher Parham (talk) 13:52, 8 December 2009 (UTC)
- Plus, we're not talking about a 10% increase in page length; we're talking about increases that are around 25% for the two featured articles that I checked. I think many Misplaced Pages editors are spoiled by being on broadband connections, where it's not too bad. We tend to forget that most of the Internet has low-bandwidth connections and is not on broadband (my source is the 2009-09-24 Economist report on telecoms in emerging markets). Many of these people disable images, for example, because the page load time is too long otherwise. A 20% to 25% bloat hurts these guys significantly, and discourages them from reading Misplaced Pages's best-sourced articles. Eubulides (talk) 17:58, 8 December 2009 (UTC)
- You're aware that HTML page content is sent over the network compressed? The current version of Autism takes 80kb to send over the network, which would be about 20 seconds on 28.8kb modem. That's with all the COinS data, and this is already an exceptional article in terms of number of references. — Carl (CBM · talk) 18:55, 8 December 2009 (UTC)
- Yes, I'm aware of that. In my measurements the COinS data compressed slightly worse than the rest of the HTML, which means that the 23% and 26% bloat I measured for the uncompressed text of Autism and Vaccine articles actually underestimates the amount of actual bloat over the Internet. It's true that shorter articles don't suffer as much, but Misplaced Pages technology should support our high-quality and well-sourced featured articles; it shouldn't be getting in their way. For what it's worth, my impression is that African Internet connections are typically in the 9.6 Kb/s to 28 Kb/s range nominal, with actual rates somewhat slower. Eubulides (talk) 19:27, 8 December 2009 (UTC)
- It seems to me that you are stretching the justification here. If you actually have statistics on the number of Misplaced Pages readers with a 9.6 kbaud connection that indicate that page size (rather than latency, etc) is a problem, I would be interested to see them. But we expect things to be somewhat bad for such users anyway, just as we expect things to be somewhat bad for users with an 800x600 display. If we really wanted to minimize average page size, I don't think that COinS data would be the place to start, since few articles have that many references.
- Yes, I'm aware of that. In my measurements the COinS data compressed slightly worse than the rest of the HTML, which means that the 23% and 26% bloat I measured for the uncompressed text of Autism and Vaccine articles actually underestimates the amount of actual bloat over the Internet. It's true that shorter articles don't suffer as much, but Misplaced Pages technology should support our high-quality and well-sourced featured articles; it shouldn't be getting in their way. For what it's worth, my impression is that African Internet connections are typically in the 9.6 Kb/s to 28 Kb/s range nominal, with actual rates somewhat slower. Eubulides (talk) 19:27, 8 December 2009 (UTC)
- You're aware that HTML page content is sent over the network compressed? The current version of Autism takes 80kb to send over the network, which would be about 20 seconds on 28.8kb modem. That's with all the COinS data, and this is already an exceptional article in terms of number of references. — Carl (CBM · talk) 18:55, 8 December 2009 (UTC)
- Plus, we're not talking about a 10% increase in page length; we're talking about increases that are around 25% for the two featured articles that I checked. I think many Misplaced Pages editors are spoiled by being on broadband connections, where it's not too bad. We tend to forget that most of the Internet has low-bandwidth connections and is not on broadband (my source is the 2009-09-24 Economist report on telecoms in emerging markets). Many of these people disable images, for example, because the page load time is too long otherwise. A 20% to 25% bloat hurts these guys significantly, and discourages them from reading Misplaced Pages's best-sourced articles. Eubulides (talk) 17:58, 8 December 2009 (UTC)
- The problem is that the benefit, so far as I can tell, is not significant at all, and for users with slow connections, the added time is meaningful. If it could be explained who uses this feature that could not easily manage it another way (say writing a tool to read the source and create appropriate meta tags when needed), that would be useful in understanding why anyone should care. Christopher Parham (talk) 13:52, 8 December 2009 (UTC)
- One main point of using citation templates is to add the citation meta information, so that we don't have to type it by hand into articles. A 10% increase in page length would not be particularly significant. — Carl (CBM · talk) 13:42, 8 December 2009 (UTC)
- Do you have any statistics on the average number of citation templates over the pages that use citation templates? I would guess that articles such as Autism are far from the median in that regard. Without statistics like that, I don't see any reason to think that COinS tags actually make a noticeable difference when averaged over multiple page views, especially given that most readers are logged-out and will benefit from caching of the compiled pages. — Carl (CBM · talk) 19:43, 8 December 2009 (UTC)
- Please see #Real-world effects of citation bloat below.
- Do you have any statistics on the average number of citation templates over the pages that use citation templates? I would guess that articles such as Autism are far from the median in that regard. Without statistics like that, I don't see any reason to think that COinS tags actually make a noticeable difference when averaged over multiple page views, especially given that most readers are logged-out and will benefit from caching of the compiled pages. — Carl (CBM · talk) 19:43, 8 December 2009 (UTC)
Real-world effects of citation bloat
- "If you actually have statistics on the number of Misplaced Pages readers with a 9.6 kbaud connection" Nobody has readership statistics like that, and it's not clear how relevant they would be anyway. If article bloat discourages people with slow connections, and they are therefore a small percentage of Misplaced Pages readers, that does not mean that article bloat should be of no concern; on the contrary. We don't need to commission a scientific research study to know that we have a very real problem here.
- "Do you have any statistics on the average number of citation templates over the pages that use citation templates? I would guess that articles such as Autism are far from the median in that regard." Sorry, but I'm afraid that intuition is incorrect in an important practical sense. What matters is articles that are commonly read. And here, statistics show that Autism and Vaccine are typical; they're not unusual. Lots of very highly-accessed pages in the same ballpark as, or even more citations than, Autism (171 citations) and Vaccine (36 citations). Here is a list of the ten most-accessed pages on English Misplaced Pages in July (the most recent month available) according to Wikistics. I am omitting self-referential pages such as Main Page and Misplaced Pages:
- The Beatles (112,000 hits/day, 128 citations)
- Michael Jackson (80,000 hits/day, 267 citations)
- YouTube (72,000 hits/day, 97 citations)
- Barack Obama (49,000 hits/day, 240 citations)
- Deaths in 2009 (49,000 hits/day, 0 citations)
- United States (46,000 hits/day, 204 citations)
- Facebook (43,000 hits/day, 215 citations)
- Swine influenza (40,000 hits/day, 106 citations)
- Eminem (33,000 hits/day, 108 citations)
- Lost (TV series) (33,000 hits/day, 162 citations)
- Except for the outlier Deaths in 2009, these pages all take waaaaayyy to long to load on my browser (unless they happen to be cached, which in my ad hoc tests is not as often as I thought it would be). And they're often the first articles that new readers see. This is a terrible advertisement for Misplaced Pages.
- For uncached articles it appears that most of the delay is due to COinS data and other citation bloat. I do not have the detailed statistics, but from what we've seen so far I would guess that more than half of the CPU cycles consumed by Misplaced Pages page-rendering servers are squandered on citation bloat, and that more than 10% of all the bytes shipped to browsers are wasted due to citation bloat.
- The citation bloat might be worth it if it was useful. But it's not. Almost nobody uses COinS data, and the people who do use it would be satisfied with another button that shipped COinS data only on request. And the non-COinS citation bloat provides almost zero benefit for a relatively high cost.
Eubulides (talk) 21:37, 8 December 2009 (UTC)
- "but from what we've seen so far I would guess that more than half of the CPU cycles consumed by Misplaced Pages page-rendering servers are squandered on citation bloat," — do you have any evidence to back that up? We can all describe our personal opinions; I'm looking for something more scientific than numbers pulled out of the air, or short lists of articles chosen at random. As to readership statistics, we do have (at least) data on geographical distribution of readers. That is, it's in the server logs; I don't know if anyone has analyzed it. — Carl (CBM · talk) 21:59, 8 December 2009 (UTC)
- A list of the most accessed articles is hardly random. At any rate, trying to understand the portion of readers negatively impacted here is not useful unless we also have an understanding of the benefits derived from these features. Are readers using COinS data to improve their Misplaced Pages experience? I suspect that the rare readers who use it would be techncially sophisticated enough to employ an alternate method of extracting this metadata, which wouldn't interfere with our general reader experience. In short, whether the impact of this bloat is great or small it probably justifies eliminating a feature whose value is minimal. Christopher Parham (talk) 22:35, 8 December 2009 (UTC)
- My limited data obviously do not come from an exhaustive survey of all the articles on Misplaced Pages: far from it! But it is good evidence that we have a significant performance problem. Here are two more data points. I just put into my sandbox copies of The Beatles and of Michael Jackson. The bloated versions required 21 and 22 seconds to load (uncached), respectively; the trimmed-down ones 12 and 13 seconds. The bloated versions consume 92,355 and 98,482 bytes of compressed HTML; the trimmed-down ones 81,976 and 85,655 bytes. This indicates that for The Beatles, the citation bloat consumes 75% more CPU time and 13% more network traffic, and for Michael Jackson it's 69% more CPU time and 15% more network traffic. I will concede that my CPU-time estimate of 100% was too high, but a 70% bloat is still quite bad. And my 10% network-traffic estimate seems to have been too low, if these two high-volume samples are any indication. I agree with Christopher Parham that we need to compare costs to benefits here. Eubulides (talk) 23:19, 8 December 2009 (UTC)
- Again, it looks to me like you are cherry-picking specific articles where you know the use of citation templates will slow things down. We have thousands of commonly-read articles, most of which are not FAs. In particular, we have 17,000 articles that average over 1000 hits per day. So picking 5 or 10 articles off the top of your head is not any sort of fair sample, and isn't evidence that there is any widespread performance problem.
Moreover, your "trimmed-down" templates do more than just remove COinS, they also rewrite the template structure, and change the output format. In particular, does your template support all the parameters that the original ones do? A fair test would use a "lean" template that accepts all the same parameters and produces the same output as the current templates. At the very least, you could see whether adding COinS to your templates actually causes the performance to degrade to its prior levels. Lacking that sort of testing, there doesn't seem to be evidence that COinS is what is slowing things down.
In case it might help you generate actual data, I have put a list of the top 1000 articles by average daily hitcount at . — Carl (CBM · talk) 01:00, 9 December 2009 (UTC)
- Again, it looks to me like you are cherry-picking specific articles where you know the use of citation templates will slow things down. We have thousands of commonly-read articles, most of which are not FAs. In particular, we have 17,000 articles that average over 1000 hits per day. So picking 5 or 10 articles off the top of your head is not any sort of fair sample, and isn't evidence that there is any widespread performance problem.
- My limited data obviously do not come from an exhaustive survey of all the articles on Misplaced Pages: far from it! But it is good evidence that we have a significant performance problem. Here are two more data points. I just put into my sandbox copies of The Beatles and of Michael Jackson. The bloated versions required 21 and 22 seconds to load (uncached), respectively; the trimmed-down ones 12 and 13 seconds. The bloated versions consume 92,355 and 98,482 bytes of compressed HTML; the trimmed-down ones 81,976 and 85,655 bytes. This indicates that for The Beatles, the citation bloat consumes 75% more CPU time and 13% more network traffic, and for Michael Jackson it's 69% more CPU time and 15% more network traffic. I will concede that my CPU-time estimate of 100% was too high, but a 70% bloat is still quite bad. And my 10% network-traffic estimate seems to have been too low, if these two high-volume samples are any indication. I agree with Christopher Parham that we need to compare costs to benefits here. Eubulides (talk) 23:19, 8 December 2009 (UTC)
- A list of the most accessed articles is hardly random. At any rate, trying to understand the portion of readers negatively impacted here is not useful unless we also have an understanding of the benefits derived from these features. Are readers using COinS data to improve their Misplaced Pages experience? I suspect that the rare readers who use it would be techncially sophisticated enough to employ an alternate method of extracting this metadata, which wouldn't interfere with our general reader experience. In short, whether the impact of this bloat is great or small it probably justifies eliminating a feature whose value is minimal. Christopher Parham (talk) 22:35, 8 December 2009 (UTC)
- "but from what we've seen so far I would guess that more than half of the CPU cycles consumed by Misplaced Pages page-rendering servers are squandered on citation bloat," — do you have any evidence to back that up? We can all describe our personal opinions; I'm looking for something more scientific than numbers pulled out of the air, or short lists of articles chosen at random. As to readership statistics, we do have (at least) data on geographical distribution of readers. That is, it's in the server logs; I don't know if anyone has analyzed it. — Carl (CBM · talk) 21:59, 8 December 2009 (UTC)
- P.S. Also, because of caching, compile time for pages is not likely to be a significant factor for not-logged-in users. Once a revision is compiled once, it can be served to additional logged-out editors with no parsing overhead. This casts more doubt on the argument that parsing time is relevant to how quickly most of our readers are served pages. — Carl (CBM · talk) 01:09, 9 December 2009 (UTC)
- The charge of cherry-picking is completely unfounded. I chose the top 10 regular articles on Misplaced Pages, and had no idea ahead of time how many citations they had. I reported the results even though one of these articles had no citations whatsoever, and obviously did not favor my case. I am just a single editor editing by hand, and do not have the resources to benchmarks thousands of revisions of thousands of pages. Nor is it realistic to expect me to. The evidence I've presented so far is compelling evidence that there's a significant overhead to the citation templates, and that a sizeable fraction of that is due to COinS data. If you don't choose to believe the numbers, and think that the fraction is smaller than what's stated, that's fine, and you're free to generate your own number. But there's no way that the fraction is insignificant: it's a noticeable performance degradation on the articles that I regularly edit. In contrast, there's zero evidence that COinS is providing benefit to Misplaced Pages readers; certainly zero evidence that the current approach, where everybody pays to get COinS data that almost nobody wants, is any better than a button that would generate COinS only on demand. The cost–benefit tradeoff here is clear. Eubulides (talk) 01:19, 9 December 2009 (UTC)
- Actually, I do think it is realistic to expect that somebody who is arguing that a certain template has performance problems has actually done some benchmarking of a significant number of pages to back up their claims. Otherwise, there's no way to distinguish between reality and perception. The problem of limited samples is exacerbated by the fact that caching works much better for non-logged in readers than it does for logged-in editors. — Carl (CBM · talk) 01:23, 9 December 2009 (UTC)
- Was that same performance case made when COinS data was put in? If not, then why have a higher burden for proof for removing a feature that has obvious costs (even if we may disagree with how high they are) and no demonstrated benefits? Anyway, I can do some more benchmarking when I find the time, but it's unrealistic to expect me to do tests involving thousands of pages. Eubulides (talk) 02:16, 9 December 2009 (UTC)
- Actually, I do think it is realistic to expect that somebody who is arguing that a certain template has performance problems has actually done some benchmarking of a significant number of pages to back up their claims. Otherwise, there's no way to distinguish between reality and perception. The problem of limited samples is exacerbated by the fact that caching works much better for non-logged in readers than it does for logged-in editors. — Carl (CBM · talk) 01:23, 9 December 2009 (UTC)
- The charge of cherry-picking is completely unfounded. I chose the top 10 regular articles on Misplaced Pages, and had no idea ahead of time how many citations they had. I reported the results even though one of these articles had no citations whatsoever, and obviously did not favor my case. I am just a single editor editing by hand, and do not have the resources to benchmarks thousands of revisions of thousands of pages. Nor is it realistic to expect me to. The evidence I've presented so far is compelling evidence that there's a significant overhead to the citation templates, and that a sizeable fraction of that is due to COinS data. If you don't choose to believe the numbers, and think that the fraction is smaller than what's stated, that's fine, and you're free to generate your own number. But there's no way that the fraction is insignificant: it's a noticeable performance degradation on the articles that I regularly edit. In contrast, there's zero evidence that COinS is providing benefit to Misplaced Pages readers; certainly zero evidence that the current approach, where everybody pays to get COinS data that almost nobody wants, is any better than a button that would generate COinS only on demand. The cost–benefit tradeoff here is clear. Eubulides (talk) 01:19, 9 December 2009 (UTC)
- P.S. Also, because of caching, compile time for pages is not likely to be a significant factor for not-logged-in users. Once a revision is compiled once, it can be served to additional logged-out editors with no parsing overhead. This casts more doubt on the argument that parsing time is relevant to how quickly most of our readers are served pages. — Carl (CBM · talk) 01:09, 9 December 2009 (UTC)
The median article, in a traffic-weighted sense, gets about 300 hits per day. Meaning that 50% of traffic goes to pages with more hits per day and 50% goes to pages with fewer. (Some 15% of total traffic goes to the 85% of articles averaging less than one hit per hour.) Also, please keep in mind that render times can easily vary by a factor of two or more simply depending on which server gets asked to do the rendering. Any conclusions you attempt to draw based on these served by numbers should be repeated many times. Dragons flight (talk) 01:42, 9 December 2009 (UTC)
- I do realize that the render times can vary, and explain part of the noise in the figures I've generated so far. The preliminary numbers I've done provide compelling evidence. And they are easily reproducible (you can visit the links I've provided yourself). Eubulides (talk) 02:16, 9 December 2009 (UTC)
Autism (current version) | 26.152 | 23.118 | 16.281 | 16.235 | 22.776 | 16.468 | 28.366 | 28.318 | 16.391 | 16.437 |
---|---|---|---|---|---|---|---|---|---|---|
Autism (Eubulide's bloated version) | 19.261 | 19.33 | 18.962 | 16.278 | 23.622 | 16.239 | 39.378 | 16.309 | 16.492 | 23.775 |
Autism (Eubulide's slim version) | 12.08 | 8.341 | 8.335 | 8.255 | 8.531 | 14.862 | 8.373 | 8.369 | 8.356 | 8.513 |
These are essentially indistinguishable in median (19.622 vs. 19.1115) and mean (21.0542 vs. 20.9646). There is no meaningful difference between your approach and the original as far as I can tell. Eubulides, I think you are fooling yourself in to believing there is a significant improvement when there isn't. Dragons flight (talk) 02:29, 9 December 2009 (UTC)
- That's because you measured the wrong page! The version you labeled "Autism (Eubulide's version)" is the bloated version: it is a clone of Autism except that stuff that should not be in user pages (such as {{featured article}} and categories) are commented out to avoid trouble. The bloated version generates all the COinS data and other unnecessary gorp. The version that you should have measured is the non-bloated version; and, to be fair, you should compare it to the the bloated version rather than to Autism directly. I just now did this, and got the following timings (measured by wall-clock time between pressing "shift-reload" and seeing the "Done" message on Firefox):
the non-bloated version | 11 | 16 | 16 | 11 | 17 | 17 | 12 | 12 | 13 | 14 |
---|---|---|---|---|---|---|---|---|---|---|
the bloated version | 32 | 32 | 27 | 27 | 22 | 27 | 22 | 27 | 22 | 22 |
- That averages out to about 14 seconds for the non-bloated version, and 26 seconds for the bloated version. My timings for the bloated version are higher than yours most likely because I'm measuring the total turnaround times (including network latency, and browser rendering on an older desktop) rather than just the CPU on the server side. Eubulides (talk) 06:24, 9 December 2009 (UTC)
- Okay, so I can be dumb some times. I've updated the table above with the correct versions. I agree that there does appear to be a significant difference, median (19.1115 vs. 8.371), mean (20.965 vs. 9.402). Dragons flight (talk) 07:44, 9 December 2009 (UTC)
- Running a series of 10 with just COinS suppressed, suggests that roughly 1/3 of the CPU difference is caused by COinS while presumably the other 2/3 comes from the other changes you made to the templates. Dragons flight (talk) 08:36, 9 December 2009 (UTC)
Basically a (lack of) caching issue
The basic problem is that we don't cache template calls, so every time the page changes every template has to be rerun even though most changes affect very little of the page. At the expense of more disk space, one could cache the results of all template calls (or more likely just the large and/or slow ones) in order to avoid rerending all templates on every edit. I can think of a couple different approaches to such a cache, though it's not immediately obvious what would provide the best balance of performance gains without ridiculous resource requirements. People have also argued for caching <ref> render results which would be another approach if caching all templates is too aggressive. Dragons flight (talk) 03:38, 8 December 2009 (UTC)
- Possibly a solution would be section caching. This is the same data that is already being cached, just handled slightly differently. Clearly there are dependencies, such as ref numbers changing that might mean an intermediate level of caching was needed, but again there may be ways to deal with that. Rich Farmbrough, 20:06, 8 December 2009 (UTC).
- So the problem is really one for editors rather than readers. Rich Farmbrough, 20:06, 8 December 2009 (UTC).
- Apparently this is done on Wikia - it helps with the pre-render buffer limit too. Rich Farmbrough, 16:03, 22 December 2009 (UTC).
Size increases due to COinS vs non-COinS bloat
At CBM's request I separated out the citation bloat into two parts: one part due to COinS, and the other part due to the other unnecessary gorp that {{cite journal}} etc. put into templates. I measured three pages, Autism (which is what started this thread), and The Beatles and Michael Jackson, the two most popular regular articles in July. I've been asked to compile similar numbers for thousands of articles, but I don't have the resources to do that, so I did it for just these three, as this should at least give us a clue as to how much bloat is going on. I measured the compressed and uncompressed HTML of the resulting pages, in my sandbox (see wikilinks below, in each compressed entry). Here are the byte counts and bloating factors that I observed:
Article | current templates | remove COinS | remove other bloat too | COinS bloat | other bloat | total bloat |
---|---|---|---|---|---|---|
Autism | 78,212 A1 | 63,110 A2 | 61,214 A3 | 24% | 3% | 28% |
(uncompressed) | 418,969 | 311,383 | 265,716 | 35% | 17% | 58% |
The Beatles | 92,355 B1 | 85,118 B2 | 81,976 B3 | 9% | 4% | 13% |
(uncompressed) | 439,035 | 376,052 | 346,839 | 17% | 8% | 27% |
Michael Jackson | 98,482 C1 | 89,164 C2 | 85,655 C3 | 10% | 4% | 15% |
(uncompressed) | 473,333 | 398,305 | 358,031 | 19% | 11% | 32% |
All byte counts are total HTML bytes; for each page, the first row is for compressed byte counts, and the second row for uncompressed. For example, in Autism adding COinS increases the number of compressed HTML bytes from 63,110 to 78,212, which is an increase by a factor of 1.239..., so it counts as 24% bloat. Adding just the other bloat increases the number of compressed HTML bytes from 61,214 to 63,110, a factor of 1.030..., so it counts as a 3% bloat. Bloat factors multiply, so the total bloat is 78,212 / 61,214, or a factor of 1.27768..., so it counts as a 28% bloat. And so forth.
Looking at these numbers, my earlier comments about COinS not compressing as well as the rest of the article seem to be incorrect; the reverse is true for these articles. Sorry about that: I don't know how I got the wrong figures there. Overall, for these (large, well-referenced) articles, the overall bloat, after compression, is in the range of 13% to 28%, with COinS alone responsible for a bloat of from 9% to 24%.
Eubulides (talk) 08:24, 9 December 2009 (UTC)
DOI usage
Where a citation of a journal article includes a DOI, it seems to me that some other citation elements become redundant and should be avoided (I am really thinking about citation template parameters, but the same principle applies to manually entered citations).
- The article's URL is redundant, because the DOI links to the same target web page. That can frustrate the reader who clicks both.
- An ISSN number is redundant, because the DOI gives the same information and much more.
- A PMID number contributes no useful information. Since the DOI links to the article or abstract, a reader who also clicks the PMID gains nothing.
- An access date is irrelevant, because a DOI is supposed to be permanent.
Comments?—Finell 09:25, 11 December 2009 (UTC)
- The URL is redundant if it links to the same page as the DOI. However, often it links to a different page (the author's own copy, or some other institution's copy), and in this case it can be quite useful, as often the URL version is free but the DOI version is not).
- Yes, the ISSN number is almost always unnecessary. The exception might be if the publisher goes out of business or sells the journal and the DOI no longer works and it's an obscure journal.
- The PMID points to metadata that may not be accessible via the DOI, e.g., commentary, or a PMC link. I generally include the PMID even if it's redunant, as the PubMed servers are typically more reliable than various publisher's servers. In my experience DOIs are not nearly as reliable as PMIDs (that's why there is a
|doi_brokendate=
but no|pmid_brokendate=
. - Completely agree about access date: they're not useful for archival journals.
- Eubulides (talk) 18:02, 11 December 2009 (UTC)
Full citations obsolete?
I confess that I like full, well formatted bibliographic citations. However, I am coming to the conclusion that some traditional citation elements are obsolete. If a citation of a book includes the ISBN number, is it useful to include the name and location of the publisher or the edition number? I would still retain citation elements that convey information on sight: The author(s)' name(s), title, and year of publication. But is the rest of it helpful? In citing the Encyclopædia Britannica, does it really help the reader to add Chicago: Encyclopædia Britannica, Inc.—especially if Encyclopædia Britannica is wikilinked?—Finell 09:48, 11 December 2009 (UTC)
- EB has always been so well known that it hasn't needed a publication location. I find publishers to be more or less essential in judging article quality in the humanities, obscure publishers require locations, and then once you've got locations for a couple. Fifelfoo (talk) 09:58, 11 December 2009 (UTC)
- One very good reason for including the the name, publisher location and edition number or year of publication is that some books are published simultaneously in two or more countries under different ISBNs. If the reader is in a different country to the writer, then the reader might choose to look at the version that was published in their own country rather than the one that was published in the writer's country. —Preceding unsigned comment added by Martinvl (talk • contribs) 12:45, 11 December 2009 (UTC)
- I find it helpful to give the name of the publisher even if the ISBN is available, to give the reader a hint as to the establishedness of a book without having to click on the link. The location is useful only for obscure publishers. If the publisher is the same as the book name (as in the Encyclopædia Britannica) there's no point to listing the publisher separately. The question of book editions is separate: some books come out in different editions even with the same publisher, and the edition needs to be cited regardless of the publisher. Eubulides (talk) 18:02, 11 December 2009 (UTC)
- I agree that location is the least important element of a "full" cite these days. On many new books it is hard to even pick a location, because it may have been simultaneously published in several places in identical editions, with all of them listed on the title page. Publisher, on the other hand, can be important as a clue to the nature of the source. Some publishers are known as vanity presses or for ideological bias, so seeing their name (instead of, say, a major university press) can trigger further investigation of the source. You could find this out by following an ISBN, but that is extra steps. The odds of typos and other mistakes is also more likely with number strings than it is with ordinary names. --RL0919 (talk) 16:00, 15 December 2009 (UTC)
Disappearing website
I've got a problem with sources in an article I wrote today. For the Bronzewing Gold Mine article, used a lot of references from the View Resources company website, a former owner, which is in administration. Apparently, once the administration process is over, the company will be liquidated and, I would think, its website will disappear. That will make all my references dead links. Any ideas what can be done about that? I could save all the company announcements I used on my computer but that won't make them accessable to others. Calistemon (talk) 10:07, 15 December 2009 (UTC)
- You can archive individual pages using WebCite, but I don't know whether this will work for PDFs. I've queued the 6 Feb 08 Bronzewing update PDF for archiving. More news when I have it. - Pointillist (talk) 12:14, 15 December 2009 (UTC)
- Yes, WebCite works for PDF documents available on websites. Calistemon could also check if web pages have already been automatically archived by http://www.archive.org. — Cheers, JackLee 12:22, 15 December 2009 (UTC)
- Alternatively, upload your PDFs to Google docs and link to them there. -- Alarics (talk) 15:32, 15 December 2009 (UTC)
- In theory the resource http://www.viewresources.com.au/documents/1A2-BronzewingUpdate_6Feb08.pdf is now available at http://www.webcitation.org/5m2TWATQz, but I can't make it work. Perhaps Jacklee could have a go? The Google docs suggestion is novel: at first glance it seems that this technique would breach the View Resources copyright and fails to provide an audit trail from the source website to the archive. I can also see potential problems with Google's terms (e.g. 8.3, 9.6, 11.1-11.4). - Pointillist (talk) 22:31, 15 December 2009 (UTC)
- I have no problem viewing the archived file at http://www.webcitation.org/5m2TWATQz. What problem are you having with it? By the way, Calistemon, note the use of the
|archiverurl
and|archivedate=
parameters to refer to archived web pages if you are using citation templates such as {{citation}} in your Misplaced Pages article. — Cheers, JackLee 07:20, 16 December 2009 (UTC)
- I have no problem viewing the archived file at http://www.webcitation.org/5m2TWATQz. What problem are you having with it? By the way, Calistemon, note the use of the
- Yes, WebCite works for PDF documents available on websites. Calistemon could also check if web pages have already been automatically archived by http://www.archive.org. — Cheers, JackLee 12:22, 15 December 2009 (UTC)
Thanks for all the advice, guys! I'm having problems, too, to view the http://www.webcitation.org/5m2TWATQz file, my computer tries to open it with Adope and then tells me it can't find the file. I will try some more and see whether I can get it to work. Calistemon (talk) 10:13, 16 December 2009 (UTC)
- http://www.webcitation.org/5m2TWATQz works fine for me. -- Alarics (talk) 10:31, 16 December 2009 (UTC)
- Strange! I tried archiving another one, http://www.viewresources.com.au/announcements/15.06.04AcquiresBronzewing.pdf is now under http://www.webcitation.org/5m3tUbSdp and it seemed happy to do so but I can't access this one either! Any ideas whats locking me, and possibly also Pointillist out? Calistemon (talk) 10:34, 16 December 2009 (UTC)
- As a last try, I've archived a third source, this time not a PDF file and I can access this one fine, without trouble. I would say, its me than having trouble with the PDF's. Thanks for all your help, guys, you put me in the right direction! Calistemon (talk) 11:14, 16 December 2009 (UTC)
- Hi Calistemon, what PDF reader/writer application do you use? My problems only seem to happen on machines that have Adobe Acrobat (writer) installed. For example: http://www.webcitation.org/5m2TWATQz opens OK using FF3.5 with Adobe Reader 8 on Vista Home, OK using FF3 with Adobe Reader 8 on XP Pro, and OK using IE6 with Adobe Reader 8 on Windows 2000 Server. It fails using either FF3.0 or IE8 with Acrobat 9.2 Standard on XP Pro and fails using IE8 with Acrobat 9.2 Pro on Windows 7 x64 Pro, Hmmm. - Pointillist (talk) 13:13, 16 December 2009 (UTC).
- However, it opens OK using Chrome 3.0.195.38 with Acrobat 9.2 Pro on Windows 7 x64 Pro, Hmmm again. Pointillist (talk) 13:57, 16 December 2009 (UTC)
- http://www.webcitation.org/5m3tUbSdp also works fine for me. I use Firefox browser with Foxit PDF reader. I do not have Acrobat installed. It sounds as if your problem is with Adobe. -- Alarics (talk) 14:28, 16 December 2009 (UTC)
- As a last try, I've archived a third source, this time not a PDF file and I can access this one fine, without trouble. I would say, its me than having trouble with the PDF's. Thanks for all your help, guys, you put me in the right direction! Calistemon (talk) 11:14, 16 December 2009 (UTC)
- Strange! I tried archiving another one, http://www.viewresources.com.au/announcements/15.06.04AcquiresBronzewing.pdf is now under http://www.webcitation.org/5m3tUbSdp and it seemed happy to do so but I can't access this one either! Any ideas whats locking me, and possibly also Pointillist out? Calistemon (talk) 10:34, 16 December 2009 (UTC)
Transvestites,shemales and kathoies
I am writing a best seller book and would like to hear from individuals who have a story to tell.>>>> —Preceding unsigned comment added by Freedatingservice (talk • contribs) 20:50, 28 December 2009 (UTC)
Categories: