The Wayback Machine - https://web.archive.org/web/20120122061250/https://bugzilla.wikimedia.org/show_bug.cgi?id=6455

Last modified: 2011-12-31 06:11:48 UTC

Bug 6455 - Set $wgPFEnableStringFunctions = true on WMF wikis
: Set $wgPFEnableStringFunctions = true on WMF wikis
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Component: Site requests
: unspecified
: All All
: Highest enhancement with 86 votes (vote)
: ---
Assigned To: Wikibugs
: http://www.mediawiki.org/wiki/Extensi...
:
: 26092
: 29087
  Show dependency treegraph
 
Reported: 2006-06-26 22:04 UTC by Aryeh Gregor
Modified: 2011-12-31 06:11 UTC (History)
52 users (show)

See Also:
Web browser: ---


Attachments
New version of string functions (26.59 KB, patch)
2009-03-30 11:07 UTC, Robert Rohde
Details | Diff
Merge string functionality into ParserFunctions (11.17 KB, patch)
2009-04-06 07:44 UTC, Robert Rohde
Details | Diff

Description Aryeh Gregor 2006-06-26 22:04:46 UTC
Uses of string-related functions would doubtless be myriad.  One in particular
that would be handy, from my point of view, is checking whether input to a
hatnote contains "[[" and "]]": if it does, then the wrapping "[[" and "]]"
could be dropped.  This is useful because not requiring [[]] in the parameter
input has become the norm on enwiki, and this makes it impossible to replace an
intended single link with "[[Link1]] or [[Link2]]" due to parsing oddness.  If
StringFunctions were installed, "[[Link1]] or [[Link2]]" could be used as the
parameter, and the template would drop the default brackets.

Okay, so perhaps that's a slightly pathetic reason, but there's no request open
for this yet, so let others add their own reasons.
Comment 1 Polonium 2006-10-17 23:50:49 UTC
There apparantly is a problem with these functions in MediaWiki 8.0, but these
where fixed. The current version is 9.0, so they would need to be tested first.
However, this problem was solved (see
[http://meta.wikimedia.org/wiki/Talk:StringFunctions#Update_for_1.8.0 this]) and
since these functions would be very useful, they should be added as soon as
possible.
Comment 2 Aryeh Gregor 2006-10-21 23:25:46 UTC
*** Bug 7654 has been marked as a duplicate of this bug. ***
Comment 3 jquinn 2007-01-10 17:58:40 UTC
These would be very useful for a number of tasks - above all #strlen and
#substr. There has been much concern that functions such as repeat and
regexp-related functions could be above O[n] on input length and thus usable for
DOS attacks. However, these functions are NOT included in StringFunctions; all
the functions there are O[n] and, while they would contribute to server load,
would not break caching and thus would be an incremental increase.

Please please include these. I can think of several places where they would be
immediately useful, not just as toys - above all for any task involving
formatting, as mentioned above.
Comment 4 Polonium 2007-01-11 21:42:45 UTC
I totally agree with the above comment. The most important functions to add are
strlen and substr. Several other functions could be derived from them, and they
would be useful for many applications. Since the are DOS safe and O(n), I cannot
see any reason not to install them. They should be installed right away (the
code already exists). Beyond this, other functions, like the
[[m:VariablesExtension|variables]] could be added.
Comment 5 Polonium 2007-01-12 20:10:25 UTC
There already is a way to find the length of a string, but it is very limited
and is a hack instead of a proper solution. See
http://en.wikipedia.org/wiki/Template:Strlen
Comment 6 Alon Lischinsky 2007-01-12 21:54:46 UTC
(In reply to comment #5)
> There already is a way to find the length of a string, but it is very limited
> and is a hack instead of a proper solution. See
> http://en.wikipedia.org/wiki/Template:Strlen

But it works only for ASCII-only strings, due to padright: and the like
measuring byte-length rather than apparent number of characters.
Comment 7 jquinn 2007-01-14 00:38:44 UTC
See my comments at the end of [[m:Talk:Stringfunctions#Proposals]] for why some
of these are NOT currently safely O(n) and how that could be easily fixed.
Comment 8 Antoine "hashar" Musso 2007-01-14 00:40:56 UTC
jquinn > please paste your comment here. The above link does not
work and might be deleted one day.
Comment 9 Polonium 2007-01-14 01:45:42 UTC
(In reply to comment #8)
> jquinn > please paste your comment here. The above link does not
> work and might be deleted one day.

The correct link is to [[Talk:StringFunctions#Proposals]]. Go and view it, it is
too long to be posted directly. Still, strlen and substr are totally O(n) and
are the most important string functions. That is why the consensus is to install
them now and install other functions later.
Comment 10 Aryeh Gregor 2007-01-14 02:23:28 UTC
The quote is:

"See the page - pad is O(10^n) and replace is O(n^2). Suggest limiting the "from" and "to" values 
of replace and the "delimiter" of explode to 30 characters in length (as with pos - I would 
support anything from 10 to 30 characters, I'd vote against anything outside this range) and the 
length value of pad to 99. I'd also recommend limiting "value" or "string" for everything EXCEPT 
len and sub to some length limit in the range of .5K - 5K, preferably 1K. Even O(n) is a DOS 
attack if it is easy to make n be a 100K page templated from somewhere. And finally, even with 
all these limits, it would be easy to make a "replace" call that returned a 30K-long string - 
there needs to be a further limit targeted to the output of replace (and possibly urlencode?). 
For ease of programming (just check limits then expose the PHP, rather than building your own 
function), this could be that len(to)*len(value)/len(from) can't be over twice the limit on 
len(value) - that is, the "to" can only be twice as long as the "from" unless you're sure that 
your "value" string is a fraction of the length limit. The "len" in question is byte-length, ie 2 
for each standard-codepage unicode char and 1 for each ascii."

Pad is O(10^n)?  o_O
Comment 11 Polonium 2007-01-14 17:07:24 UTC
When the StringFunctions are installed, string length should be measured in
characters, not bytes, so that they work with unicode characters.
Comment 12 jquinn 2007-01-24 09:59:43 UTC
The really correct link is [[m:Talk:StringFunctions#Proposals]]. Dratted
case-sensitivity. 

Yes, pad is O(10^n) because n is the length of the argument - the number of
digits, not the value. But by the same token, you can limit n to just 2 or 3 and
still get 99 or 999 characters of padding.

Note that the issue is the memory & processor usage of the internal PHP call,
because on the level of the visible wiki the template buffer is limited anyway.
Comment 13 Aryeh Gregor 2007-01-24 15:53:11 UTC
The maximum pad length permitted in current code is 500 characters, so it's
O(1) worst-case.
Comment 14 Juraj Simlovic 2007-01-24 16:21:37 UTC
I shall limit the length of input parameters of the features in question ASAP.
Then, the posibility of DOS attack through them will be removed.
Comment 15 Jared 2007-02-03 21:48:27 UTC
This set of Functions is very useful, and so I advocate for the approval of this
proposed change.
Comment 16 Juraj Simlovic 2007-02-04 19:06:14 UTC
All parser functions in question has been finally limited in the latest code.
Also, full support for utf-8 was added (and other small fixes/improvements).
Comment 17 Bogumił "A_Bach" Cieniek 2007-02-04 21:24:37 UTC
Those functions will be wery helpfull in wiktionary.
Comment 18 hillgentleman 2007-02-05 17:31:50 UTC
#sub  would  be very useful for poetry annotations. (C.f. [[w:zh:WP:VPT]]).
Comment 19 Aryeh Gregor 2007-02-05 17:43:02 UTC
We're aware that it will be useful.  It needs to be reviewed by Tim or someone.
 Please do not 
comment on this bug just to say you support it; there's a "vote for this bug"
link near the bottom 
of the page for that that *doesn't* spam everyone who's voted for the bug plus
half the developers 
with your support.

A couple of comments: 1) urlencode and urldecode are now available as core
parser functions, so they 
don't need to be part of this.  2) It should fail gracefully if mbstring is not
installed.  Either 
it should define workalikes for mbstring functions it uses, or fall back to
non-UTF-8 versions, or 
just die if mbstring isn't installed.
Comment 20 百楽兎 2007-02-06 03:58:26 UTC
Agree this vote. Actually I plan to make a template for Chinese Wikipedia and
Wikisource, and this 
fuction will be useful.
Comment 21 Jared 2007-02-06 15:09:26 UTC
As I'm unfamiliar with this site, how long will it take for this to be 
approved? Are there steps that need to be taken? The code is already 
written, so it shouldn't take too long.
Comment 22 Juraj Simlovic 2007-02-06 20:34:47 UTC
> It should fail gracefully if mbstring is not installed.

You are right about the mbstring.. It is a hidden prerequisite to the extension
right
now. However, if you do not have mbstring support at hand, all you have to do is to
replace all occurences of "mb_" with an empty string all over the source code..

All wikipedia servers does have mbstrings installed, doesn't they?
Comment 23 Rob Church 2007-02-06 21:17:39 UTC
(In reply to comment #22)
> now. However, if you do not have mbstring support at hand, all you have to do
is to
> replace all occurences of "mb_" with an empty string all over the source code..

This is not how we like our code to work; we like our code so that people don't
have to hack at it to make it work. Requiring that mbstring functions be
installed for the extension to work is acceptable.

As far as I know, mbstring is available throughout the Wikimedia cluster.
Comment 24 Aryeh Gregor 2007-02-06 22:58:43 UTC
(In reply to comment #21)
> As I'm unfamiliar with this site, how long will it take for this to be 
> approved? Are there steps that need to be taken?

Basically, someone needs to get on IRC and nag Tim Starling (or Brion, but he tends to be more 
overworked).  Or you could e-mail him, I suppose.  It just needs to be brought to his attention so 
he can review it.
Comment 25 Rob Church 2007-02-18 16:10:22 UTC
*** Bug 9023 has been marked as a duplicate of this bug. ***
Comment 26 fdcn 2007-04-13 08:24:24 UTC
waitting
Comment 27 Rob Church 2007-04-13 14:56:00 UTC
(In reply to comment #26)
> waitting

I am aware that users are waiting for a response to this, but it isn't as
straight forward as might be desired; enabling an extension like this requires a
thorough review in terms of suitability for inclusion, and also a full
performance test. Some patience is therefore required; that patience might need
to stretch to several months, but that's the way of the world.

Any further comments on this bug are to be restricted to discussing the
technical issues raised above, and not, "we're waiting".
Comment 28 Juraj Simlovic 2007-04-13 18:39:03 UTC
I was quick-searching the above comments and all technical issues seem to be
resolved and closed by now, aren't they? I hope that there is nothing forgotten
currently. If I am wrong, please, draw my attention. Otherwise I shall just
continue to wait, 'till it is approved; or 'till Tim gets to me with new issues.
Comment 29 Raimond Spekking 2007-05-20 16:07:22 UTC
*** Bug 9979 has been marked as a duplicate of this bug. ***
Comment 30 Arath 2007-05-23 11:56:57 UTC
Another month has passed since comment #26, but the string functions have not
been installed yet. What does "several months" mean?
Comment 31 Eno1 2007-05-27 14:55:16 UTC
Can someone please ask whoever needs to be asked to make these functions
available on wikipedia as a matter of prioity?

There are more legimatate uses for these than you think. Most important are
#sub, #pos and #len.
Comment 32 Duken 2007-05-30 17:28:00 UTC
I'm waiting for #sub and #len too, for Wiktionary Templates. It really would
help me, so... I really hope you'll fix this bug. ^^'
Comment 33 Raimond Spekking 2007-06-12 19:36:16 UTC
*** Bug 10231 has been marked as a duplicate of this bug. ***
Comment 34 百楽兎 2007-09-19 06:58:20 UTC
Is this request still alive? Any new information regarding this issue?
Comment 35 Tisza Gergő 2007-11-01 14:25:25 UTC
#rpos would also be useful, especially in [[agglutinative language]]s with
[[vowel harmony]] where you can't just put a number or date between two words
and get a grammatically correct sentence, but need to select the matching
suffix too.
Comment 36 Tim Starling 2007-11-14 04:35:29 UTC
Can someone just add these functions to the ParserFunctions extension please? 
Comment 37 Tim Starling 2007-11-15 03:25:49 UTC
On second thoughts: presumably these functions, especially substr, would have
the potential to destroy strip markers, and even to capture the marker prefix
and make fraudulent markers. That is a bug.
Comment 38 Juraj Simlovic 2007-11-15 15:22:52 UTC
> the potential to destroy strip markers

And how do we fix that?
Comment 39 Steve Sanbeg 2007-11-15 17:36:15 UTC
(In reply to comment #38)
> > the potential to destroy strip markers
> 
> And how do we fix that?
> 

Probably, a substr operation would be something like; get input wiki string,
parse & convert to HTML, strip HTML to plain text, take substring of text.

a substring probably isn't meaningful, or at least not simple, on something
other than plain text.
Comment 40 Ross M 2007-11-16 08:54:25 UTC
(In reply to comment #37)
> On second thoughts: presumably these functions, especially substr, would have
> the potential to destroy strip markers, and even to capture the marker prefix
> and make fraudulent markers. That is a bug.
> 

If this is the case, then surely it's not a bug with this extension, but rather
with the extension parser, isn't it?  If third-party extensions should never be
permitted to affect strip markers, then those markers clearly shouldn't be
provided to them.
Comment 41 Aryeh Gregor 2007-11-16 19:13:01 UTC
Some extensions may wish to fiddle with strip markers, for some reason.  This
is not one of them.
Comment 42 Ross M 2007-11-17 02:01:41 UTC
After looking further through the code, I have to ask: why would having the
potential to alter strip markers qualify as a bug?  As far as I can tell,
creating a fraudulent marker would do nothing apart from putting some
unpleasant gibberish in the output, and removing a marker altogether would
simply prevent that marker's content from being displayed.  Neither of these
outcomes appear to break the code in any meaningful way.  What am I missing
here?
Comment 43 Aryeh Gregor 2007-11-17 23:54:49 UTC
Input: {{#sub:<nowiki>Hello!!!!!!111</nowiki>|10}}

Expected output: Either "<nowiki>He" or "<nowiki>Hello!!!!!111</nowiki>" or
"Hello!!!!!" (not sure which, but something like one of those)

Actual output: ef021ae6742-nowiki-00000013-QINU (notice the 0x7 byte at the
end, for good measure!)

That looks like a bug.  Shouldn't be difficult at all to fix.
Comment 44 Tim Starling 2007-11-18 08:51:26 UTC
(In reply to comment #41)
> Some extensions may wish to fiddle with strip markers, for some reason.  This
> is not one of them.

#if, for example, "fiddles" with strip markers, by passing through the strip
markers in the followed branch and discarding the ones in the non-followed
branch. It's not an exotic scenario.

Yes, you could unstrip the string and then apply the substring operation, but
that would cause the output to be escaped. So it would destroy the output of
tags such as <gallery> -- in fact pretty much all the extension tags other than
<nowiki>. And if you unstripped <nowiki>, the content escaped by <nowiki> would
be erroneously parsed by the main parser stage, defeating its purpose. 

The solution is to identify strip markers, and then to skip them when you are
counting characters. But that might be inefficient. 
Comment 45 Ross M 2007-11-19 01:21:11 UTC
Yes, that's inefficient.  Since PHP's mb_string methods operate in O(n) time,
stepping through each string character by character would make these methods
O(n^2) rather than O(n).  Once PHP6 is rolled out, we'll be able to use
TextIterator to perform this operation efficiently.  But do we really need to
wait until then before adding these functions?  Allowing the occasional marker
to be corrupted won't break the system, and the benefits far outweigh this
minor flaw.
Comment 46 Tim Starling 2007-11-21 11:50:14 UTC
You could just use the preg_match_all /./u trick, that would be O(n). Or you
could split the string at the markers with preg_split(), and then loop through
the fragments, counting characters with mb_strlen(). Or use an strpos() loop
for marker identification instead of preg_split() to conserve memory -- maybe
that'd even be faster. There's lots of ways to do it, there's no need to throw
your hands up and wait for PHP 6.
Comment 47 Brion Vibber 2007-11-30 21:14:15 UTC
My own inclination would be to unstrip markers and do a wiki-text-escape on
output... possibly with HTML tag stripping on the expanded markers first. (Eg,
a big ol' table will be reduced to the text contained in it.)

Is there a reason one would want string functions to output non-plaintext or
otherwise that this wouldn't be appropriate?
Comment 48 Steve Sanbeg 2007-11-30 22:06:33 UTC
(In reply to comment #47)
> My own inclination would be to unstrip markers and do a wiki-text-escape on
> output... possibly with HTML tag stripping on the expanded markers first. (Eg,
> a big ol' table will be reduced to the text contained in it.)
> 
> Is there a reason one would want string functions to output non-plaintext or
> otherwise that this wouldn't be appropriate?
> 

Yeah, that was my thought.  Transplant a few LOC from
http://www.mediawiki.org/wiki/Extension:Strip_Markup, maybe implement the
stripping as a degenerate case of #substr.  It would kill two bugs with one
stone, and the string functions don't even imply it can do more than that.
Comment 49 Ross M 2007-11-30 23:52:34 UTC
(In reply to comment #47)
> My own inclination would be to unstrip markers and do a wiki-text-escape on
> output... possibly with HTML tag stripping on the expanded markers first. (Eg,
> a big ol' table will be reduced to the text contained in it.)
> 
> Is there a reason one would want string functions to output non-plaintext or
> otherwise that this wouldn't be appropriate?
> 

I can think of situations where one might want to splice wikiformatting into a
string, or to simply trim a formatted string by a few characters.  There's no
need to convert to plaintext in order to solve this bug; Tim's suggestion to
overlook markers works perfectly well.

I've coded up an alpha version of these functions which employs preg_match_all
to split the string into characters while keeping strip markers separate.  It's
up and running at
http://www.undefined.net/w/index.php?title=User:Algorithm/StringFunctions2 --
feedback is welcome.
Comment 50 Ross M 2007-12-10 10:03:03 UTC
StringFunctions 2.0 has just been released on MediaWiki.  This version
completely fixes the strip marker problem, and removes the reliance on
mb_string as well.  Unless there are any other flaws in the implementation, it
should be ready to install.
Comment 51 Anders Einar Hilden 2008-03-14 21:07:11 UTC
4 months, what's happening? We could really need the functions in some
complicated date-templates on no.wikipedia.
Comment 52 Raimond Spekking 2008-05-02 17:26:41 UTC
*** Bug 13895 has been marked as a duplicate of this bug. ***
Comment 53 Wersad Daverbelt 2008-06-11 03:56:24 UTC
Question: Who exactly is the person who could implement this, and what exactly
is he waiting for?

Wikipedia could seriously use these functions.
Comment 54 Aryeh Gregor 2008-06-11 21:58:26 UTC
(In reply to comment #53)
> Question: Who exactly is the person who could implement this

Anyone with commit access, if the course of action is (as Tim has suggested)
merging them into ParserFunctions.

> and what exactly is he waiting for?

Remarkably, developers are not all automatons whose only goal in life is to
serve Wikipedia template programmers.  Some may actually wish to spend their
time on doing other things, maybe even not MediaWiki-related!  If you want this
done, either ask nicely in recommended channels (e.g., wikitech-l or
irc://irc.freenode.org/mediawiki, not Bugzilla), or get commit access and do it
yourself
Comment 55 Daniel Friesen 2008-06-12 07:27:43 UTC
(In reply to comment #54)
> (In reply to comment #53)
> > Question: Who exactly is the person who could implement this
> 
> Anyone with commit access, if the course of action is (as Tim has suggested)
> merging them into ParserFunctions.

I don't think that's a very good idea. Just because someone wants
StringFunctions, doesn't mean they want ParserFunctions. Those two extensions
don't belong together.

If you want an extension that has both, then it should be a new extension, not
merging of one extension's functionality from one extension into the other.
That's one of the many purposes of the WikiCode extension I'm working on
anyways:
http://svn.nadir-point.com/viewvc/mediawiki-extensions/trunk/WikiCode/
http://wiki-tools.com/wiki/WikiCode/Drafting
Comment 56 Aryeh Gregor 2008-06-12 13:35:43 UTC
(In reply to comment #55)
> I don't think that's a very good idea. Just because someone wants
> StringFunctions, doesn't mean they want ParserFunctions. Those two extensions
> don't belong together.

You could argue that just because someone wants {{#expr}} doesn't mean they
want {{#if}}, too.  I find it very unlikely, in fact, that anyone would want
things like {{#strlen}} without {{#expr}}, because as soon as you're
automatically generating numbers, you almost certainly want to do basic
arithmetic with them.  ParserFunctions is an extension that adds advanced
computational capabilities of various sorts to MediaWiki; if you only want a
subset of that functionality, tell your users not to use the rest, or manually
disable it.  If you really want, some config options could be added to disable
parts of it, but first I'd be interested to hear of anyone who actually *wants*
such a feature.
Comment 57 Fran Rogers 2008-08-18 22:21:03 UTC
Merged the functionality of StringFunctions into ParserFunctions in r39618. :)
Comment 58 Brion Vibber 2008-08-19 18:54:30 UTC
Reverted in r39653. These functions look *extremely* inefficient, for instance
reimplementing mb_strlen() by apparently splitting the entire input string into
an array of individual characters and counting up the elements.
Comment 59 Juraj Simlovic 2008-08-26 23:29:00 UTC
I finally got some free time to look into it and rewrite the functions into
something more efficient. Is anyone working on it right now? Anyway, any
comments are welcome, sooner rather than later. I will place my rewrite at  
http://www.mediawiki.org/wiki/Extension:StringFunctions/Code, since I do not
have svn ci access at wikimedia.
Comment 60 Daniel Friesen 2008-08-27 06:20:30 UTC
Updates have been committed as of r40068.

However, I'm not convinced it's good enough to put into ParserFunctions. It
duplicates parser code in ugly ways, and uses preg_replace_all (Tim pokes me
saying the _all is evil...)

I've actually been working on string handling inside of WikiCode. I to found
the StringFunctions code extremely ugly. I actually implemented my own
functions. They still need some tweaks, however I believe the code is far
better than that inside of StringFunctions.
Comment 61 Juraj Simlovic 2008-08-29 17:26:40 UTC
Thank you for you input. Actually, you got me hooked, so I put together a
simple benchmarking script of a few different implementations of the strlen:
http://www.mediawiki.org/wiki/Extension:StringFunctions/Bench

Turns out that Brion was more than right about extreme inefficiency. The
preg_match_all version, which splits the input into an array of individual
chars can be more than hundred times slower than other implementations and can
be more than thousands times slower than the native mb_strlen().

Now, rising the length of benching data put other implementations aside as
well. So, here are the results for the best two only (plus simple mb_strlen()
version for comparison):

  benching 256 loops of length: 1120kB
    runLen0: 2.3116    ..using mb_strlen() only
    runLen2: 18.6722   ..using preg_match_all() through markers
    runLen4: 10.3728   ..using strpos()

So, as it turns out, pregs, even when they are not abused, take some time to
process the input. On the other side, using strpos() for counting markers seems
to be quite efficient, considering the native mb_strlen() is only 5 times
faster for the above bench case. I did more benches and found out that when the
length of data is 112kB only, mb_strlen() is upto 10 times faster; when the
length of data is 28kB, mb_strlen() is upto 12 times faster.

Any other ideas of how to implement such a simple thing as strlen()? ;))
Comment 62 Soroush 2008-09-19 20:57:43 UTC
Somebody wanted me to make a template in fa.wiktionary and I got informed that
this Extension is required for there. I read its potential usage for en.wiki
and I agree to its installation on en.wiki as well.
Comment 63 Raimond Spekking 2008-09-20 14:01:43 UTC
*** Bug 15658 has been marked as a duplicate of this bug. ***
Comment 64 Mike.lifeguard 2009-03-19 15:53:20 UTC
The extension isn't ready, so I've removed the shell keyword.
Comment 65 Robert Rohde 2009-03-30 11:07:10 UTC
Created attachment 5978 [details]
New version of string functions

The attached file is complete rewrite of the StringFunctions extension.

It implements the parser functions:

#len - string length
#pos - finding substring position
#rpos - reverse oriented #pos
#sub - fetch a substring specified by start and length
#replace - substring replacement
#explode - partition string by a delimiter and find a specific piece

The other functions, which are mostly already in the core, have been dropped.

In addition, I implemented it so that the unique markers generated by <nowiki>,
<gallery>, <math>, etc. are universally stripped (this is a partial change in
behavior from prior versions).  So the behavior will be more uniform and
predictable than prior versions and there is no risk of partial or unexpected
markers bleeding through.

Where possible PHP's built-in multi-byte string functions are used provide fast
results.  If the mb_ functions are unavailable, their behavior is simulated in
regex in order to provide a graceful (if slower) failure mode.

A global variable is used to define a hard limit for the size of a string to
operate on.  I've set this 1000 characters for now, but I haven't experimented
too much to decide what is reasonable or whether different limits should be
enforced for different functions.  #replace is armored against replacements
that would generate strings longer than this limit.

I believe that this version of StringFunctions (or something close to it)
should be suitable for implementation on WMF sites.
Comment 66 Aryeh Gregor 2009-03-30 13:22:51 UTC
(In reply to comment #65)
> The attached file is complete rewrite of the StringFunctions extension.

Shouldn't they just be added to ParserFunctions?

> In addition, I implemented it so that the unique markers generated by <nowiki>,
> <gallery>, <math>, etc. are universally stripped (this is a partial change in
> behavior from prior versions).  So the behavior will be more uniform and
> predictable than prior versions and there is no risk of partial or unexpected
> markers bleeding through.

I'm not sure if stripping them outright is the best solution, but I can't think
of a better one.

> Where possible PHP's built-in multi-byte string functions are used provide fast
> results.  If the mb_ functions are unavailable, their behavior is simulated in
> regex in order to provide a graceful (if slower) failure mode.

We already have some of these compatibility functions in GlobalFunctions.php
(mb_strlen and mb_substr).  You should use those, and add any additional ones
there.
Comment 67 Ted Kandell 2009-03-30 16:34:54 UTC
A good use of the String functions would be to parse Newick tree format (Newick
notation) files which is the standard way of minimally representing
phylogenetic trees. Trees are now an important data structure in Wikipedia, and
it's very difficult to edit these by hand and to get them to align and display
properly. A simple {{newick}} template could then convert a Newick string into
a properly displayed tree.

This may seem trivial compared to the other reasons, but just check the myriad
ways that trees are now represented in MediaWiki. Having this template would
allow trees to be created and edited in external tools and just dropped in.

I don't see any other way of parsing such a format without the String
Functions.
Comment 68 Robert Rohde 2009-03-30 17:05:37 UTC
(In reply to comment #66)
> (In reply to comment #65)
> > The attached file is complete rewrite of the StringFunctions extension.
> 
> Shouldn't they just be added to ParserFunctions?

I'm happy to write it up that way instead, though I don't know which is
preferred.  Given how long we and other sites have gone without working
StringFunctions, it almost feels more natural to segregate them so that site
operators have a choice.  

My main interest though is getting an implementation somewhere that is
sufficiently reasonable that it can be used on the WMF sites.

> > Where possible PHP's built-in multi-byte string functions are used provide fast
> > results.  If the mb_ functions are unavailable, their behavior is simulated in
> > regex in order to provide a graceful (if slower) failure mode.
> 
> We already have some of these compatibility functions in GlobalFunctions.php
> (mb_strlen and mb_substr).  You should use those, and add any additional ones
> there.

Okay.  The one caveat is that my functions more or less assume they are being
passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is
not implemented.  It appears that mb_strlen in GlobalFunctions is making the
same assumption, so I'll assume that is okay for Mediawiki's purposes.
Comment 69 Robert Rohde 2009-03-30 19:20:18 UTC
(In reply to comment #68)
> (In reply to comment #66)
> > We already have some of these compatibility functions in GlobalFunctions.php
> > (mb_strlen and mb_substr).  You should use those, and add any additional ones
> > there.
> 
> Okay.  The one caveat is that my functions more or less assume they are being
> passed valid UTF-8 strings, and the encoding parameter for mb_strpos, etc. is
> not implemented.  It appears that mb_strlen in GlobalFunctions is making the
> same assumption, so I'll assume that is okay for Mediawiki's purposes.

Added the necessary mb_ fallbacks to GlobalFunctions in r49043.

Figuring out the merge with ParserFunctions will take more time.

I'll probably post that as an alternative patch here and let someone with more
familiarity decide whether it is better to build StringFunctions as a separate
stand-alone or to merge it into the ParserFunctions.
Comment 70 Robert Rohde 2009-04-06 07:44:01 UTC
Created attachment 5993 [details]
Merge string functionality into ParserFunctions

Comments suggested it may be preferably to merge string functionality into
ParserFunctions.  The attached patch would accomplish that.  The logic should
be the same as the other StringFunctions patch, so one should choose one patch
or the other depending on whether it is preferred for StringFunctions to
operates as a separate stand-alone extension or as a component of
ParserFunctions.  I'm not sure which approach is preferable.  Minor tweaks were
made to more or less follow the existing layout conventions in ParserFunctions.

Also note that StringFunctions and ParserFunctions were originally written
under different copyleft schemes.  I asked for and received permission from the
referenced authors to GPL the StringFunction code in order to facilitate the
merge.
Comment 71 Le Chat 2009-05-14 16:25:06 UTC
Sorry, I'm bemused. Every programming language I've met (admittedly that's not
very many) has these string functions as absolute basic standard. How does it
take three years to find a way to expose them through MW? 
Comment 72 Minh Nguyễn 2009-05-14 19:57:18 UTC
The wiki syntax (especially the subset used on Wikimedia sites) isn't quite
intended as a full-fledged programming language, though it's getting to be one.
Think of it more as a language for macros. Notice that there's no built-in
support for iteration, either, and that's an absolute basic standard for
programming languages too.
Comment 73 Le Chat 2009-05-14 21:00:54 UTC
I think you missed my point - I don't mean MW has to have something because
programming languages have it, I mean if programming languages have it as
standard, AND we want to have it (as we clearly do in this case), then it
surely must be a pretty trivial matter to code. Surely there are standard php
libraries which have all these functions? 
Comment 74 Aryeh Gregor 2009-05-14 22:03:37 UTC
It's already implemented.  Robert has a patch, which he can commit if he likes.
 He hasn't so far.
Comment 75 Roan Kattouw 2009-05-15 10:59:44 UTC
(In reply to comment #73)
> I think you missed my point - I don't mean MW has to have something because
> programming languages have it, I mean if programming languages have it as
> standard, AND we want to have it (as we clearly do in this case), then it
> surely must be a pretty trivial matter to code. Surely there are standard php
> libraries which have all these functions? 
> 

If you read the comments (granted, 74 is a lot), you'll see that there were
issues with previous implementations, such as the need to use Unicode-aware
string functions, the need to fall back to alternative implementations if those
functions aren't available (they're a PHP extension) and the need to do all
this efficiently.

Robert has attached a patch, which he could (and probably should, or maybe
already has?) committed to StringFunctions in SVN; Tim or Brion can then review
that and, if it passes, enable it on Wikipedia.
Comment 76 Aryeh Gregor 2009-05-15 13:45:52 UTC
The patch is to ParserFunctions, so it wouldn't need review beyond the normal
process.
Comment 77 Robert Rohde 2009-05-26 00:49:03 UTC
I made some additional tweaks to the second patch and committed it as r50997.
Comment 78 Aryeh Gregor 2009-05-26 00:52:01 UTC
Marking FIXED, then.  Close enough to the original request.
Comment 79 Happy-melon 2009-06-19 11:51:01 UTC
The spirit of this bug is clearly "enable StringFunctions on WMF wikis".  So
now we need $wgEnableStringFunctions = true; to be set on WMF wikis.  But the
substance of this bug is not resolved.  Reopening.
Comment 80 Aryeh Gregor 2009-06-19 16:54:26 UTC
Tim has stated pretty clearly that string functions will not be enabled on
Wikimedia wikis, so I'll mark this WONTFIX.
Comment 81 Aryeh Gregor 2009-06-19 17:36:06 UTC
(In reply to comment #80)
> Tim has stated pretty clearly that string functions will not be enabled on
> Wikimedia wikis, so I'll mark this WONTFIX.

. . . that's in r51497.  Quote from diff:

+/**
+ * Enable string functions.
+ *
+ * Set this to true if you want your users to be able to implement their own 
+ * parsers in the ugliest, most inefficient programming language known to man: 
+ * MediaWiki wikitext with ParserFunctions.
+ *
+ * WARNING: enabling this may have an adverse impact on the sanity of your
users.
+ * An alternative, saner solution for embedding complex text processing in 
+ * MediaWiki templates can be found at:
http://www.mediawiki.org/wiki/Extension:Lua
+ */

It's pretty clear this isn't going to be enabled on Wikimedia.
Comment 82 Tisza Gergő 2009-06-19 18:36:10 UTC
Opened bug 19298 for enabling Lua as per Tim's suggestion.
Comment 83 Kevin Norris 2009-06-24 16:21:19 UTC
Can we mark this bug as LATER instead of WONTFIX given the disagreement with
Tim's decision expressed in the comments for the Lua bug?
Comment 84 Aryeh Gregor 2009-06-24 21:42:53 UTC
The people disagreeing with Tim don't get to make decisions like this, Tim
does.  So not much point.  Any WONTFIX could be revisited later, of course.
Comment 85 Robert Rohde 2009-06-24 22:27:14 UTC
(In reply to comment #84)
> The people disagreeing with Tim don't get to make decisions like this, Tim
> does.  So not much point.  Any WONTFIX could be revisited later, of course.
> 

Well, in point of fact, they are sort of disagreeing with you Aryeh, since you
are the one who tagged it WONTFIX.

Tim's comments are discouraging, but it isn't clear to me that they represent a
final conclusion on the subject.  That's doubly true since Brion has already
said Lua won't be installed in the near-term (if ever), so Tim's preferred
solution is pretty much no solution at all.

While Tim's concern for the sanity of wikicode is well-intentioned, I've yet to
see any template coder (i.e. the people who would really be working with this)
come forward to say that the incremental burden of enabling this would be
terrible.  Given the evident desire of the community, and the fact that Tim's
alternative isn't really available, I am wondering if this should be reopened
and given more developer discussion.
Comment 86 Aryeh Gregor 2009-06-24 23:07:45 UTC
I was taking Tim's statement as a fait accompli.  If you want to reopen this
and maybe start a wikitech thread, go ahead, I don't agree with him.
Comment 87 Rich Farmbrough 2009-09-04 17:50:49 UTC
Hm there seem to be facilities for manipulating stings enabled now. So this is
either fixed or it is being done by an almighty kludge and probably far less
efficiently than "fixing" this. See my comment to 19298.
Comment 88 Happy-melon 2009-09-04 21:17:35 UTC
(In reply to comment #87)
> either fixed or it is being done by an almighty kludge 

Oh yes.  Those templates put all other hacks to shame. But they work, and
they're now very widely used.  Which demonstrates the need for this
functionality to be supported *somehow*.
Comment 89 Kevin Norris 2009-09-08 22:44:42 UTC
(In reply to comment #88)
> But they work

Really?  IIRC we don't have substrings (yet)...
Comment 90 Rich Farmbrough 2009-09-12 09:44:53 UTC
I wrote {{Sub right}} and  another one recently - trying to get  a title case
template to work.  See Category:String manipulation templates, most have been
around for a while.
Comment 91 Rich Farmbrough 2010-11-17 17:41:24 UTC
This really needs some attention.  We have perfectly good templates for doing
minor stuff that work, provided there are less than "X" of them on a page,
where "X" is a small number.  Wontfix is not a good status for this bug. 
Reopening.
Comment 92 Gurch 2010-11-17 17:58:43 UTC
(In reply to comment #91)
> This really needs some attention.  We have perfectly good templates for doing
> minor stuff that work, provided there are less than "X" of them on a page,
> where "X" is a small number.  Wontfix is not a good status for this bug. 
> Reopening.

I take issue at the description "perfectly good".

What happened was, a while back well-meaning people asked for "padleft" and
"padright" string functions. The devs decided to add support for these specific
functions, assuming -- foolishly -- that they wouldn't be abused within an inch
of their life.

Since then, various string functions (length, string search functions,
sub-string based function) have been implemented using unmaintainable,
indecipherable nested MediaWiki templates IN TERMS OF PADLEFT AND PADRIGHT.
This is something I didn't even know was possible and probably constitutes in
interesting academic exercise... oh, except this is in production use on one of
the world's busiest websites.

The algorithms involves are so hideously inefficient that given the huge
overhead incurred by having to parse wikitext every step of the way.

Have a look at how "str len" is implemented. This:

http://en.wikipedia.org/w/index.php?title=Template:Str_len/core&action=edit

is just part of it.

When you've finished washing your eyes out with bleach, look at the "str find"
template. Note its reliance on the aformentioned "str len", as well as "str
left" and various other horrendous string functions. Note that at the bottom of
this hierarchy of {{{{}{}{}{}{}{}{}{{}}}} lies #padleft, #titleparts and
various other functions that you wouldn't normally expect to be roped into
string searching, unless you were in a batshit insane environment where they
were the only primitive functions available... oh wait.

It probably takes as long to evaluate one of these string functions on a
modern, top-of-the-range multicore server machine as it would to evaluate a
sane implementation on a 1980s home computer. The algorithm for "str find"
wouldn't even be too bad if it was implemented directly in C or something, but
don't pretend that MediaWiki template syntax isn't the least efficient
programming language ever created. Including several joke ones.

Come to think of it, yes, there is a really fucking good argument for enabling
StringFunctions on Wikimedia wikis. And also for tracking down the people who
implemented templates like [[Template:Str find]] and murdering them for crimes
against programming.
Comment 93 Max Semenik 2010-11-17 18:06:08 UTC
(In reply to comment #92)

> Come to think of it, yes, there is a really fucking good argument for enabling
> StringFunctions on Wikimedia wikis. And also for tracking down the people who
> implemented templates like [[Template:Str find]] and murdering them for crimes
> against programming.

No, it's a great reason to disable #padleft and friends instead. Things
ParserFunctions are (ab)used for are insane, and the more of them are there,
the more insane things they allow. This spiral dive has to stop somewhere.
Comment 94 Phillip Patriakeas 2010-11-17 18:33:53 UTC
There will be a lot of angry people (and broken functionality, with no obvious
way to fix, replace, or remove it) if the only currently enabled way to
implement string parsing in wikicode on WMF wikis is simply disabled or
removed. It is not the template coders' fault for abusing the hell out of
padleft and padright, they are simply making do with the only tool they can
use, and would certainly use something else if it were available (and I do mean
they'd use pretty much *anything* else, as just about anything would be an
improvement over the current situation). It's not like padleft and company are
being blindly used either, these templates are massively optimized and however
bad it is, it could be far, far worse.
Comment 95 Le Chat 2010-11-17 20:23:47 UTC
Of course the use of padleft and so on shouldn't be happening, but it's not the
fault of the people who worked out those hacks. This really is a no-brainer -
PLEASE *enable the efficient string functions*, and we won't be using the
mind-blowingly inefficient ones any more. (Notice that the servers are still up
and running in spite of the use of the inefficient hacks, so replacing them
with more efficient functions will certainly not be any kind of performance
hit.)
Comment 96 MZMcBride 2010-11-18 00:00:48 UTC
(In reply to comment #91)
> This really needs some attention.  We have perfectly good templates for doing
> minor stuff that work, provided there are less than "X" of them on a page,
> where "X" is a small number.  Wontfix is not a good status for this bug. 
> Reopening.

I don't have any problem with users overturning a WONTFIX with a valid reason.
I've certainly done so a number of times. However, this bug as currently
summarized reads "Enable StringFunctions on WMF wikis" and the most senior
active sysadmin and developer has (essentially) said this is never going to
happen. Re-reading comment 0 (way the hell up there), this bug was not
originally about a specific extension, just about the functionality.

Either this bug should be re-closed as WONTFIX or the bug summary should be
genericized. The current match-up is disingenuous and misleading.
Comment 97 Phillip Patriakeas 2010-11-18 01:57:51 UTC
Looking at this bug's history, the very first entry is Rob Church changing the
summary from "Install StringFunctions" to "Install the StringFunctions
extension". Unless there's missing history here (which is doubtful, since the
change was made less than a half-hour after the bug report was filed), this bug
was indeed originally about a specific extension, and a careful reading of
comment 1 and comment 3 support this.
Comment 98 Kevin Norris 2010-11-18 13:56:20 UTC
MZ, are you seriously suggesting that the developers will completely
re-implement an extension, when the concerns about the original are *not*
implementation-specific?  I seriously doubt that.
Comment 99 MZMcBride 2010-11-18 20:16:23 UTC
(In reply to comment #98)
> MZ, are you seriously suggesting that the developers will completely
> re-implement an extension, when the concerns about the original are *not*
> implementation-specific?  I seriously doubt that.

I'm suggesting that the sysadmins in charge of running Wikimedia wikis have
said rather unequivocally that this extension is not going to be installed. The
StringFunctions extension is a means to an end. There are plenty of other ways
to implement string manipulation. For years, there has been discussion of
implementing a proper programming language into MediaWiki. The current
preferred favorite is not Lua, but JavaScript, actually.

I don't believe that there is any legitimate objection to letting users
manipulate strings. However, there are legitimate objections to enabling this
extension on Wikimedia wikis. This bug is about enabling a specific extension
on Wikimedia wikis. Unless there is some reason to believe this is ever going
to happen, this bug should be re-closed as WONTFIX. A subsequent, generic bug
should be filed about the ability to manipulate strings on Wikimedia wikis
(though there's little hope of that bug being resolved anytime soon). Keeping
this bug unresolved in the REOPENED state does not change the reality of the
situation. It just misleads people into believing that this is still up for
debate.
Comment 100 Tisza Gergő 2010-11-18 20:57:54 UTC
(In reply to comment #99)
> I don't believe that there is any legitimate objection to letting users
> manipulate strings. However, there are legitimate objections to enabling this
> extension on Wikimedia wikis.

Actually, what are those? Tim's oft-cited comment stated that StringFunctions
should be deprecated in favor of Lua, but since then it was decided that Lua is
an even worse option. As for a hypothetical server-side Javascript-based string
manipulation extension, it has most of the drawbacks of Lua (denial-of-service
vulnerability, incompatibility of Wikipedia with a MediaWiki at an average web
host), with the added bonus that Lua at least exists and does not need to be
implemented from scratch.

More importantly, what are the disadvantages of StringFunctions compared to the
current situation? #padleft-based string manipulation is slower, less reliable,
harder to understand and maintain, and more limited in its abilities. It used
to be said that SF should not be enabled because then a lot of pages will
depend on it, and it will be difficult to switch to a superior solution when
one is found, but we already crossed that river a long time ago.
Comment 101 Victor Vasiliev 2010-11-18 21:29:42 UTC
(In reply to comment #100)
> Actually, what are those? Tim's oft-cited comment stated that StringFunctions
> should be deprecated in favor of Lua

Tim's comment was that we are not going to expand the parser functions in any
way and all our further development should be concentrated at the development
of sensible scripting engine instead of turning parser functions into
programming language. As far as I am aware he did not change his mind about
that so this bug is closed and should not be reopened unless the policy
mentioned above is changed as a result of discussion among the developers (you
may initiate it).
Comment 102 Krinkle 2010-11-18 21:35:27 UTC
A request once was made to implement padleft to pad left. Then more advanced
functions were wanted and existing functions (ab)used to achieve it.
I can imagine developers not wanting to natively those now wanted advanced
functions as it will likely lead to history repeating itself, namely some other
advanced thing wanted being implemented with these etc etc.

There are way too many scripts and templates that should be and can be written
as an Extension instead.

So how about opening bugs for the actual functionality wikipedians want instead
of requesting functions to achieve them in templates ? The same was done with
Babel, instead of creating lots and lots of templates and decentralized stuff
all over the place it was written into a native Extension and everybody's
happy.

I realise this is not a solution for everything though )
Comment 103 Krinkle 2010-11-18 21:36:31 UTC
mid-air collision mistake. Self-reverting status change
Comment 104 Marcus Buck 2010-11-18 21:52:35 UTC
(In reply to comment #102)
> So how about opening bugs for the actual functionality wikipedians want instead
> of requesting functions to achieve them in templates ? The same was done with
> Babel, instead of creating lots and lots of templates and decentralized stuff
> all over the place it was written into a native Extension and everybody's
> happy.

??? Have I missed some developments? The extension was created, then not
reviewed by the developers and everybody is still unhappy with the old system.

And I'm suspecting the exact same thing will happen with StringFunctions...
Comment 105 Aryeh Gregor 2010-11-18 22:08:20 UTC
Note that string function support was added to ParserFunctions proper in
r50997, and disabled by default by Tim in r51497 -- a separate extension is no
longer needed.  I don't know if anything has happened since June 2009 to cause
him to reconsider his opinion.  I personally have thought for a long time that
enabling string functions is the lesser evil here, given the givens, but it's
not my call.
Comment 106 Kevin Norris 2010-11-19 02:04:16 UTC
Could Tim Sterling please indicate whether his veto on this bug is still
outstanding?

If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the
following proposal:

"There is community consensus to enable StringFunctions; if the developers do
not enable it themselves, the community hereby requests that the WMF instruct
the developers to do so."

I really hate to go over your heads on this one, but it appears to be
necessary.  As Aryeh said, SF is clearly "the lesser evil" and it's patently
ridiculous that the nth most popular site in the world (n=whatever our current
Alexa rank is) is using [[Category:String manipulation templates]] instead of
native php, especially when the functionality to do so is available and tested.
Comment 107 msh210 2010-11-19 07:21:39 UTC
(In reply to Kevin Norris's comment #106)
> If it is, I intend to bring it up on [[WP:VPT]] or somewhere and make the
> following proposal:
> 
> "There is community consensus to enable StringFunctions; if the developers do
> not enable it themselves, the community hereby requests that the WMF instruct
> the developers to do so."

If you do bring up such a proposal on-wiki, please link to it in a comment on
this bug so that people on other wikis know. Thanks.
Comment 108 Rich Farmbrough 2010-11-19 14:49:13 UTC
I'm pretty sure it would garner extensive support.  

#Pro: less server load
#Pro: less page breakage
#Pro: easier template programming
#Pro: faster page load/render times
#Pro: less obscure limits on lengths
#Pro: less templates which work fine in test but are useless on an actual page

#Con: If we implement a scripting language we may need to migrate some stuff -
which we would anyway.

Things have moved on in four years, but we are still struggling with ancient
functionality. Wikia has more powerful facilities than the WMF projects.

I'm disappointed someone re-closed the bug, it was not re-opened lightly.

Anyone in doubt as to the importance of this bug is invited to look at VP(T)
where I believe almost half the threads are related to it.
Comment 109 Happy-melon 2010-11-19 15:49:52 UTC
(In reply to comment #108)
> I'm pretty sure it would garner extensive support.  
> ...
> Anyone in doubt as to the importance of this bug is invited to look at VP(T)
> where I believe almost half the threads are related to it.

You[1] are making a mistake in assuming that if the enwiki community supports a
technical change then, ipso facto, that change should be implemented,
irrespective of any 'big picture' considerations.  You're[1] not in Kansas any
more; the consensus of the enwiki community is not sovereign here.  

> I'm disappointed someone re-closed the bug, it was not re-opened lightly.

It was re-opened mistakenly under a [[WP:BRD]] principle which just doesn't
apply here.  It is perfectly acceptable to comment, where appropriate, on
closed bugs; the status applies only to the bug title, not to the discussion
underneath.  Tim has said that the status of the request "Set
$wgPFEnableStringFunctions=true on WMF wikis" is WONTFIX; that conclusion
stands until something (maybe discussion under the closed bug, maybe something
else) convinces *him* or *another sysadmin of equal standing* to reconsider it.
 Someone else changing the status does not somehow reshape the world to make it
so.

[1] I'm speaking generally, not to anyone specifically.
Comment 110 Aryeh Gregor 2010-11-19 16:33:30 UTC
(In reply to comment #108)
> #Pro: less server load
> #Pro: faster page load/render times

Experience has shown that people will just write pages that use up whatever the
resource limits are.  They'll use the functions to write still more complicated
templates, which currently they can't write because of preinclusion size
limits.  It's not at all obvious it will make anything faster, it will just
allow more complexity for the same length limit.

In support of this, observe that ParserFunctions was only introduced to provide
a sane replacement for [[Template:Qif]], much as this bug requests that
StringFunctions be enabled to replace [[Template:Str len]] and friends.  The
explosion of template complexity after ParserFunctions were turned on would
have been impossible (given performance limits) with template hacks.  It's a
certainty that that will happen again if we enable StringFunctions, with
template editing becoming even more arcane.

Maybe we should enable the string functions, but reduce preinclusion length
limit, or impose other limits on template complexity.

> #Pro: less page breakage

How so?

> #Pro: easier template programming

Not if things get even more complicated to compensate, which they will.

> #Pro: less obscure limits on lengths

The limits on length will be the same, it's just people will write even more
complicated templates to use up the length limits.

#Con: Templates like {{str len}} will no longer count as much against the
length limit, so the effective limit will be higher and people will be able to
make even more complicated and unmaintainable wikitext pages for things that
should have been written in a real language to start with.


I agree that enabling string functions is the lesser evil, but it's still evil.
 People shouldn't have been writing programs in wikitext to begin with, they
should use proper scripts of some type -- extensions or bots or such. 
Personally I'd also be okay with restricting or disabling any functions that
people are abusing to emulate string functions, like padright/left, but that
would be much more disruptive, and people will always find ways to abuse
innocent functionality.  So unless someone is willing to implement a systematic
solution like a Lua extension, we may as well resign ourselves to making
template programming less painful.
Comment 111 Le Chat 2010-11-19 17:21:09 UTC
>...abuse...

It's not abuse (which would be putting good tools to bad use), this is putting
bad tools to good use.

>...proper scripts...extensions...

Yes, this seems to be the vicious circle we're in... someone *has* written an
extension, but what good did it do him - we're now deprived of the use of the
extension, just in case someone "abuses" it by making better use of it than was
anticipated. It would obviously be much much better to have non-trivial logic
compiled into the software than to do it via templates, but what choice are we
given?
Comment 112 Marcus Buck 2010-11-19 19:35:53 UTC
As far as I can see all of the StringFunctions are already present in
template-implemented versions now. Just in an inefficient way. So any "abuse"
(quotation marks because of Le Chat's good remark) would be possible already
now.

Does anybody have any ideas in which direction possible "abuse" could go? I
cannot think of any new class of functionality that would become possible if we
allowed StringFunctions. The template-based string functions too were not
enabled by ParserFunctions alone. Template-based string functions would be
impossible without "padleft:" and "padright:". These two are string functions.
It's clear that when you provide a single string function and simple logic,
that other string functions can be emulated. That door was left open and people
walked through. But if StringFunctions do not open new doors nothing bad can
happen. I don't see open doors in them. If you do see them, please report.

I guess we can safely assume that when you provide functionality people will
_always_ test the limits of the functionality. It doesn't matter how few or how
amazingly much functionality you provide. They will test it limits. It's almost
a law of nature. That's normal and we will never have success with "We provide
this functionality but please don't fully utilize it".

We have to put limitations on functionality because we need to limit the
computation cost and rendering time. If we replace the template-based string
functions with extension-based StringFunctions we will reduce computation cost
and rendering time. That's a good thing. If you want to secure that this gain
will not be consumed by increased use of the functions then set limitations on
how many instances of the functions can be called on a single page.

By the way, I'm sure there are wikis with activated StringFunctions. Are there
any reports that these wikis had problems with it? If there are any open doors
in them, I'm sure somebody must have discovered them already!?
Comment 113 Phillip Patriakeas 2010-11-19 20:01:14 UTC
(In reply to comment #112)
> By the way, I'm sure there are wikis with activated StringFunctions. Are there
> any reports that these wikis had problems with it? If there are any open doors
> in them, I'm sure somebody must have discovered them already!?

Wikia - *all* of Wikia - has had StringFunctions enabled for years. I've never
heard about any problems they've had as a result of this.
Comment 114 Tisza Gergő 2010-11-20 11:35:35 UTC
(In reply to comment #110)
> Experience has shown that people will just write pages that use up whatever the
> resource limits are.  They'll use the functions to write still more complicated
> templates, which currently they can't write because of preinclusion size
> limits.  It's not at all obvious it will make anything faster, it will just
> allow more complexity for the same length limit.
> [...]
> Maybe we should enable the string functions, but reduce preinclusion length
> limit, or impose other limits on template complexity.

You make it sound as if complexity would be a bad thing in itself. That is not
so - complex tasks require complex solutions, most of the time. MediaWiki
itself has become much more complex along the years, the editing workflow
became more complex, Vector was a huge jump in the complexity of the editing
GUI, and so on. Everyone accepts these as necessary, so why not the same for
complex templates? Seems like a bit of NIH syndrome to me (or more precisely,
Not Invented By Us, because it *is* invented here, just not by the developers).

I sense a good amount of developer hubris in the debates about templates - "you
should leave this stuff to us, we could do it better". Sure you could - but you
could do much less of it. By the same account, we should leave writing
encyclopedia articles to professionals, because they are much better at it
(except that Nupedia had some 100 articles after three years). This line of
thinking is completely contrary to Wikipedia philosophy. Wikipedia is about
generativity, community empowerment and ultra-low barriers to entry - you can't
seriously suggest that making a feature request and waiting for some developer
to pick it up every time someone needs a new template would be a scalable
approach.

> People shouldn't have been writing programs in wikitext to begin with, they
> should use proper scripts of some type -- extensions or bots or such. 

This gets thrown around a lot, but how those proper scripts could replace the
current template system is never demonstrated. Bots are not much help with
dynamic text (and do have problems of their own, like littering page
histories). Extensions, as I tried to point above, are not scalable (whatever
you might think of the template syntax, it is a lot easier to learn than
writing secure and scalable MediaWiki extensions, and we didn't even consider
yet the epic fail of code review). The conclusion of the bug about Lua was that
templates using scripts interpreted by some external tool are out of the
question - they have security issues, and they would break compatibility of
Wikipedia with pretty much all other MediaWiki installations. What is left
then? Inventing another template language and writing another parser in PHP?
IIRC Werdna actually offered to do that and was turned down, because that is
still not a "proper" solution. The proper solution, apparently, is to deny the
Wikimedia community of a useful tool, out of purely aesthetic reasons.
Comment 115 Victor Vasiliev 2010-11-20 12:20:21 UTC
(In reply to comment #114)
> What is left then? Inventing another template language and writing another
> parser in PHP? IIRC Werdna actually offered to do that and was turned down,
> because that is still not a "proper" solution.

I was working on a template scripting extension called InlineScripts. It is in
Subversion and it was working last time I checked (it's most severe problem was
the documentation, or, to be more specific, the absence of it). It was
discussed on the developers' conference in April and the only reason I stopped
working on it was the lack of time.
Comment 116 Ted Kandell 2010-11-21 02:18:18 UTC
I would like to add a concrete example to this debate, an actual use case.

Many entries in Wikipedia describe some sort of phylogenetic data, from
genealogies, to the phylogenies of language families, to Y and mitochondrial
DNA haplogroups.

A standard way of representing such trees is through the Newick format:
http://en.wikipedia.org/wiki/Newick_format

There are all sorts of template hacks in Wikipeida to represent family trees,
genetic trees, language families, and all sorts of other related information.

Wouldn't it be better to just add the standard Newick format tree
representation to articles, and then use templates to display the data in
various sorts of ways? The fundamental information would then be preserved in a
standardized display-independent format. Also there are a large number of tools
out there that can generate graphic images based on the Newick format. 

It isn't very difficult to parse a Newick format string and create a basic tree
display template from it. However, all this really would need is a full set of
string functions. It's true that PHP and MediaWiki wasn't designed to be a kind
of parser or compiler, but what sort of alternative can anyone think of?
Should we put in a request for MediaWiki developers to support the Newick
format, and any number of other important display-independent representations
of data widely used in Wikipedia? Who decides, and who does the work?

There is a good argument to "doing it right" and implementing a full scripting
language (aside from Javascript?) but in the meantime, all sorts of important
data that can't quite be represented as text is being added to Wikipedia in the
form of templates. How can all the various sorts of tree data now in Wikipedia
be extracted - or just redisplayed using whatever new and better display
template comes along?

I don't know if this can be added as a "bug" in and of itself, but it it does
point out the fundamental problem. MediaWiki has text, graphic, audio, and
video formats, but is missing the ability to parse certain other critical basic
information storage formats that the developers never considered.
Comment 117 Gurch 2010-11-21 04:34:31 UTC
(In reply to comment #106)
> "There is community consensus to enable StringFunctions; if the developers do
> not enable it themselves, the community hereby requests that the WMF instruct
> the developers to do so."

That's not really how it works. The developers *are* WMF, or at least a subset
thereof. (Or were you under the impression that volunteer devs opinions'
mattered in such cases? LOL)
Comment 118 Alex Z. 2010-11-21 06:23:56 UTC
(In reply to comment #116)
> It isn't very difficult to parse a Newick format string and create a basic tree
> display template from it. However, all this really would need is a full set of
> string functions. It's true that PHP and MediaWiki wasn't designed to be a kind
> of parser or compiler, but what sort of alternative can anyone think of?
> Should we put in a request for MediaWiki developers to support the Newick
> format, and any number of other important display-independent representations
> of data widely used in Wikipedia? 
> 
...
> I don't know if this can be added as a "bug" in and of itself, but it it does
> point out the fundamental problem. MediaWiki has text, graphic, audio, and
> video formats, but is missing the ability to parse certain other critical basic
> information storage formats that the developers never considered.

This is kind of the main argument against string functions. Letting users
create parsers in wikitext is pretty much exactly the kind of thing that those
against it want to avoid. Wikitext is not supposed to be a programming
language.[1] This is also a good example of what Aryeh was talking about in
comment #110.

A well-defined language that has applications in thousands of pages is an
excellent candidate for something that should be handled by an extension.

> Who decides, and who does the work?

The same person who decides whether or not to enable string functions would
decide to enable a Newick extension. Anyone who knows PHP can do the work.

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2009-June/043609.html
Comment 119 Kevin Norris 2010-11-21 08:08:12 UTC
(In reply to comment #117)
> (In reply to comment #106)
> > "There is community consensus to enable StringFunctions; if the developers do
> > not enable it themselves, the community hereby requests that the WMF instruct
> > the developers to do so."
> 
> That's not really how it works. The developers *are* WMF, or at least a subset
> thereof. (Or were you under the impression that volunteer devs opinions'
> mattered in such cases? LOL)

The non-volunteer devs *work for* the WMF.  If the WMF decides to listen to the
community (and that's a big if), I don't think the devs can reasonably say no.

What's more, the developers are primarily responsible for making functionality
the community wants available to the community.  They aren't doing that here,
and that's a Bad Thing.
Comment 120 Happy-melon 2010-11-21 11:44:02 UTC
(In reply to comment #119)
> The non-volunteer devs *work for* the WMF.  If the WMF decides to listen to the
> community (and that's a big if), I don't think the devs can reasonably say no.
> 
> What's more, the developers are primarily responsible for making functionality
> the community wants available to the community.  They aren't doing that here,
> and that's a Bad Thing.

You're confusing developers (who write code for new features) with sysadmins
(who manage the servers and turn features on and off).  The developers are
their own community around their own project: the MediaWiki software.  That
community is structured slightly differently to a wiki community (there is a
clear hierarchy of authority and other different ways of doing things) but
fundamentally it is a volunteer project like any of the WMF's others:
developers code things that interest them.  Most developers work on areas of
MediaWiki which will be of use on Wikimedia wikis, as seeing their code in
action on the world's 6th largest website is the most tangible reward for their
time, but neither the paid nor unpaid devs are beholden to the other WMF
communities (and please remember that enwiki is just one of 800 such groups);
any more than one wiki community is beholden to another.  Many developers work
on parts of MediaWiki which will never be installed on Wikimedia wikis.  To say
that ""the developers are primarily responsible for making functionality the
community wants available to the community"" is arrogant and false.

The *sysadmins*, most (but not all) of whom are also active developers, are the
ones who decide which components of MediaWiki are installed on WMF wikis. 
There is a strict hierarchy amongst sysadmins, and most of them are WMF paid
staff.  They *are* expected to take the communities' sentiments into account
when making changes, and they are indeed accountable to the Foundation.  The
sysadmin you're talking about here reports directly to the Foundations' CTO;
the CTO reports to the CEO, and the CEO reports to the board.  The sysadmin who
has made this decision is 'above' 90% of the Foundations' paid staff in the
organisational hierarchy.  Where, exactly, are you planning to go to get this
decision overturned?
Comment 121 Le Chat 2010-11-21 12:00:04 UTC
>Where, exactly, are you planning to go to get this
decision overturned?

Rather than initiate some kind of power battle, I think we ought simply to
politely draw the sysadmin's attention to this discussion and the apparently
strong arguments in favour of changing this decision, and hope that he'll now
be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp
user page, though others may know of more effective ways of giving him a
friendly poke.)
Comment 122 Max Semenik 2010-11-21 12:42:08 UTC
(In reply to comment #121)
> Rather than initiate some kind of power battle, I think we ought simply to
> politely draw the sysadmin's attention to this discussion and the apparently
> strong arguments in favour of changing this decision, and hope that he'll now
> be persuaded. (If it's Tim Starling, then I've already left a note on his en.wp
> user page, though others may know of more effective ways of giving him a
> friendly poke.)

Thinking that he doesn't know about this bug or that he is not watching it is
way too naive, so all your pokes do nothing but annoyance.
Comment 123 Juraj Simlovic 2010-11-22 23:22:45 UTC
(In reply to comment #122)
> > If it's Tim Starling, then I've already left a note on his en.wp user page,
> Thinking that he doesn't know about this bug or that he is not watching it is
> way too naive, so all your pokes do nothing but annoyance.

Actually, based on my experience with other big projects I (used to) be part
of, this bug reads 123 comments as of right now. My humble guess is that Tim no
longer bothers to read this bug, probably has it on his ignore list for a long
time already. And I'd fully understand him. The decision has been made (I hope
it was not taken lightly) and none of the above changes that (though it
pollutes what should have been a technical discussion). The only reason I still
read this bug is that it is getting funny, and not because I am interested in
it as a dev..

jsimlo


ps. Yes, this comment also pollutes this bug. But I simply no longer see any
cons of doing it.. :)
Comment 124 Le Chat 2010-11-23 05:13:36 UTC
>none of the above changes that 

It should change it really, as we now know that (a) there is continuing user
demand for this functionality (b) nothing is happening or likely to happen
towards providing it in any other sensible way than the one proposed (c) the
use of the very inefficient workarounds without ill effect, the use of the
proposed functions on Wikia, etc. prove that this functionality will not (as
feared) damage performance. Presumably sysadmins don't have completely closed
minds, and are capable of listening to users and arguments and taking a second
look at past decisions...
Comment 125 MZMcBride 2010-11-23 06:16:45 UTC
(In reply to comment #124)
> It should change it really, as we now know that (a) there is continuing user
> demand for this functionality (b) nothing is happening or likely to happen
> towards providing it in any other sensible way than the one proposed (c) the
> use of the very inefficient workarounds without ill effect, the use of the
> proposed functions on Wikia, etc. prove that this functionality will not (as
> feared) damage performance. Presumably sysadmins don't have completely closed
> minds, and are capable of listening to users and arguments and taking a second
> look at past decisions...

Hahahahaha

You're obviously not very familiar with Wikimedia's software development
processes. Right now, some of this ParserFunctions mess (and its use in high
use templates like "Template:Cite") cause page renderings to take upward of 30
seconds on a large article. And still nobody cares.™  If you think a bit of
whining (or is it whinging?) in bug comments or attempting to rally some folks
on a village pump is going to push anything forward, you're insane. You'd be
better off trying to raise some money for a grant, to be honest. (Though not
really; Wikimedia is apparently trying to stop accepting money with strings
attached.)

If you wrote an extension that implemented JavaScript into MediaWiki templates
that also doubled as donation-related software, you might be able to attract
some attention to this bug before the 12th of Never. ;-)  Otherwise, it's
probably best to save your energy for battles you can possibly win.
Comment 126 Le Chat 2010-11-23 07:29:13 UTC
Don't see any reason for the negativity and sarcasm concerning this bug (as in
comment above) - it's just a perfectly normal and well-reasoned feature
request, which will actually *reduce* these page-rendering times you mention,
and will hopefully be considered on its technical merits.
Comment 127 Juraj Simlovic 2010-11-23 22:42:44 UTC
(In reply to comment #126)
> Don't see any reason for the negativity and sarcasm concerning this bug
> (as in comment above) - it's just a perfectly normal and well-reasoned
> [...] and will hopefully be considered on its technical merits.

Simply put: I do see one. No, it is not. I thought it was back then.

The long story short: I've developed these StringFunctions (not all by myself
of course, there were subsequently three of us:) because I needed them back
then in my own wikies. Then someone started this bug and Tim said no. Then we,
out of interest, tried to optimize the extension to be more "suitable" for
wikimedia cluster. And again, Tim said no. Then someone managed to merge
StringFunctions into ParserFunctions, which were/are installed on wikimedia
cluster. And guess what happend: Tim said no. If it ain't clear already, Tim
had his chance to reconsider.

The only thing left now is: Let it go. The more comments are posted into this
bug, the more it becomes and unusable kid chat wall. No developer is probably
going to invest into reading thru a hundred of comments, even if a nugget of
gold was lost somewhere within. ...Ahh, who am I kiddin? This attempt of
explanation is pointless anyway..
Comment 128 Phillip Patriakeas 2010-11-24 02:45:54 UTC
I've filed bug 26092 for *some* form of string parsing functionality to be
enabled on WMF wikis, could we please maybe try to keep from turning it into
the same mess this bug is (i.e. if you have something *useful* to contribute,
by all means do, but if not, no comments saying "we need this soooo bad, the
devs aren't being [fair/reasonable/humane/etc]")?

Not sure if this bug should be marked as blocking it, but it probably doesn't
matter anyways since this one is closed.
Comment 129 Juraj Simlovic 2010-11-24 19:56:25 UTC
(In reply to comment #128)
> I've filed bug 26092 for

Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of
luring someone into creating a brand new bug as a copy of this one. Despicable
me, sorry about that.. :) Of course this solves nothing, but right now I am $50
richer! And all it took was mentioning the devs' reluctancy to read some
hundred of comments.. :)))))

ps. Perhaps I should be banned for disrupting, but it was worth it.
Comment 130 Phillip Patriakeas 2010-11-24 23:48:21 UTC
(In reply to comment #129)
> (In reply to comment #128)
> > I've filed bug 26092 for
> 
> Unbelievable! :)) Yesterday, I was kinda wondering if there was any way of
> luring someone into creating a brand new bug as a copy of this one. Despicable
> me, sorry about that.. :) Of course this solves nothing, but right now I am $50
> richer! And all it took was mentioning the devs' reluctancy to read some
> hundred of comments.. :)))))
> 
> ps. Perhaps I should be banned for disrupting, but it was worth it.

Actually, I'd been thinking about it for a while, I just finally decided to
stop being lazy and do it already. =)
Comment 131 Dmitriy Sintsov 2010-12-29 08:09:35 UTC
(In reply to comment #99)
> (In reply to comment #98)
> > MZ, are you seriously suggesting that the developers will completely
> > re-implement an extension, when the concerns about the original are *not*
> > implementation-specific?  I seriously doubt that.
> 
> I'm suggesting that the sysadmins in charge of running Wikimedia wikis have
> said rather unequivocally that this extension is not going to be installed. The
> StringFunctions extension is a means to an end. There are plenty of other ways
> to implement string manipulation. For years, there has been discussion of
> implementing a proper programming language into MediaWiki. The current
> preferred favorite is not Lua, but JavaScript, actually.
> 
If JavaScript is the language of choice, there is PHP SpiderMonkey extension.
It still is not absolutely stable (only a beta), however I know that some WMF
programmers are good in C, so it is probably possible to make few fixes. The
question is, how to make these scripts run at "ordinary" hosters, where there
will be no such PHP extension. In such case, one might try client-side
JavaScript (in browser), however passing of function / template parameters from
server side to client side might become too inefficient. Perhaps one might
limit the JS language features to basic subset. Then to run it through PHP mod,
when available, slowly interpret in PHP otherwise. Co-location (where you can
compile and install PHP mod yourself) have become more affordable in last
years, anyway.
Comment 132 Dmitriy Sintsov 2010-12-29 09:05:23 UTC
The mod can also register PHP classes in JS:
http://devzone.zend.com/article/4704

There is also interesting JavaScript-based server Jaxer:
http://jaxer.org/

It allows to share a lot of server-side and client-side code. For example, it
allows to run server-side jQuery. Things like parsers could be written in
JavaScript then used at both sides, thus minimizing the code duplication.
Comment 133 MZMcBride 2010-12-29 09:10:37 UTC
(In reply to comment #132)

These are interesting, yes. However, these comments are really outside the
scope of this bug. File a separate bug (if there isn't one already) or start a
thread on the wikitech-l@lists.wikimedia.org mailing list if you're interested
in further discussion about this.
Comment 134 Mark A. Hershberger 2011-09-24 18:02:19 UTC
*** Bug 31136 has been marked as a duplicate of this bug. ***
Comment 135 Daniel Werner 2011-11-17 13:44:38 UTC
One argument brought up a few times, against string functions, that people
would always go to the limits of whats possible in template programming and
just write more complicated templates with string functions enabled might be
true. So why not simply scale down the limits after installing these functions?
Existing string templates can be re-written as wrappers for using string
functions, functionality wouldn't even be broken, we would have lower limits
for whats possible using templates and functions but we would have more
powerful and sane functions provided. They could be used in a sane way as they
are being used right now with less load on the servers.
Comment 136 Happy-melon 2011-11-17 16:21:12 UTC
(In reply to comment #135)
> So why not simply scale down the limits after installing these functions?
> Existing string templates can be re-written as wrappers for using string
> functions, functionality wouldn't even be broken, we would have lower limits
> for whats possible using templates and functions but we would have more
> powerful and sane functions provided. They could be used in a sane way as they
> are being used right now with less load on the servers.

Template limits are not just hit using string functions, indeed they're not
even the major cause.  The citation templates used on a large article consume
much more of the template resources than string functions, as well as stupid
things like the innumerable {{SubatomicParticle}} calls (and their endless
subtemplates) on [[List of baryons]] etc.  Reducing the template limits would
break all these cases, and they're not scenarios which could be 'fixed' with
proper string functions.
Comment 137 Dan Wolff 2011-11-17 16:28:16 UTC
A solution would be to define how expensive a parser function is, and set the
string functions as "expensive" while not changing anything else. That way,
other parser functions would work as they currently do, while we get the power
of string functions, just that you can't use so many.

(I think there is already something like this in place already for some parser
functions - not sure though)
Comment 138 Rich Farmbrough 2011-12-30 17:07:15 UTC
There is, but we have seen no evaluation of "expensive" - result is that stuff
that is essential is split over several pages...

Cite templates would benefit enormously from parser functions, instead of
jumping through hoops, simple tests can be made about whether something has a
full stop at the end already or not.

The bug should be changed form WONTFIX, keeping it at that status because of a
stray comment on a mailing list years ago, when at Wikimania 2011 Tim was
undecided as to which solution (parser functions, scripting language or
Victor's extension) was best. 

Really I have looked at all three, ANY ONE WILL DO. And if you change your mind
from parser functions to one of the others, I WILL PERSONALLY MIGRATE ALL
TEMPLATES TO THE NEW SOLUTION. 

I am re-opening this bug.  Please do not casually re-close it.
Comment 139 Ted Kandell 2011-12-31 03:40:25 UTC
Finally, some common sense here.

There are a huge number of templates that now do pretty much everything. My
personal interest is in displaying trees and phylogenies. These are incredibly
hard to edit now, not even worth it. I've tried to edit genealogical trees, and
have given up, because the "presentation" is mixed up with the data. My browser
would crash before I could even get part of it right by repeated
experimentation. 

"Expensive"? All of these "hoops" that everyone has to go though to validate
templates without any sort of parser functions really has a collective impact
on MediaWiki and Wikipedia. "NO solution" is much much worse than a an attempt
at a "bad solution". 

I don't think anyone even realizes the *lack* of editing by knowledgeable
people that is taking place, because of the sheer difficulty in editing data
that is not text or inline images. There's a price here, and it isn't whether
"this or that implementation of trim()" regular expressions is more or less
efficient.

It's been 5 1/2 years since this bug was first opened. 
Maybe someone can get moving on it before a decade has passed?
Comment 140 Daniel Friesen 2011-12-31 04:11:05 UTC
Are string functions "really" the solution to the difficulty of editing
specialized data.

To me that sounds like a really horrible solution that won't actually solve the
issue. If data really is complex then string functions sound like something
that will only allow a change to 'another' string based data format that will
still be too complex for the knowledgeable people to edit.

I'd like to see some of those complex data formats. I'm pretty sure that for
the most of them the real optimal thing they need is specialized code in a
proper programming language to create a format that knowledgeable people can
actually understand. And perhaps even add in a ui to make that possible.
Comment 141 Jim Craigie 2011-12-31 04:34:05 UTC
String functions certainly are a solution to the problem that brought me here -
attempting to construct a template to create the slightly unusual URLs used by
an external site, which requires replacing each instance of a non-alphanumeric
character by an underbar. Easy to do with {{#replace:}}

I was horrified to discover that a perfectly good solution has been implemented
but its activation is being blocked for reasons I still cannot understand.
Comment 142 Bawolff 2011-12-31 04:55:28 UTC
This is pointless. Can we stop beating the dead horse already?
Comment 143 Ted Kandell 2011-12-31 05:31:37 UTC
Yes, string functions they are *a* solution, that can work right now. 
Why? How would you implement parsing of say a Newick file, or any specialized
data format that you didn't know about yourself, beforehand? 

There are hundreds of such data formats. Some may be very useful for common
sorts of representations in Wikipedia. Will we have to open a bug for each and
every one, then hardcode a parser for it, then have someone update that parser
whenever a slight change in the format comes out? Or would you rather just
implement AJAX and Java instead? 

BTW, how complex is it to parse a phylogenetic tree format which merely uses
nested parentheses, and then display it, when these can be copied from
anywhere?
http://en.wikipedia.org/wiki/Newick_format

The point is that often data in these specialize formats *already exists* out
there, somewhere, and just needs to be displayed. 

If you mean "stop beating a dead horse and just release these functions" I say
yes. But if you mean "stop asking for them, you'll never ever get them, forget
it  ... "
Comment 144 Ted Kandell 2011-12-31 05:38:11 UTC
Examples?
Here is the complete grammar for the "complex specialized" Newick format:

The grammar rules

Note, "|" separates alternatives.

   Tree --> Subtree ";" | Branch ";"
   Subtree --> Leaf | Internal
   Leaf --> Name
   Internal --> "(" BranchSet ")" Name
   BranchSet --> Branch | BranchSet "," Branch
   Branch --> Subtree Length
   Name --> empty | string
   Length --> empty | ":" number

Examples:

(,,(,));                               no nodes are named
(A,B,(C,D));                           leaf nodes are named
(A,B,(C,D)E)F;                         all nodes are named
(:0.1,:0.2,(:0.3,:0.4):0.5);           all but root node have a distance to
parent
(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;       all have a distance to parent
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);       distances and leaf names (popular)
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;     distances and all names
((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;    a tree rooted on a leaf node (rare)
Comment 145 Ted Kandell 2011-12-31 05:40:53 UTC
Here is an example of a current genealogical tree, using templates:

http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

=== Généalogie ===
<center>
{{Arbre généalogique/début|style=font-size:75%;}}
{{Arbre généalogique | SAM | | | | | | | | | | | | | | |RSH| | |
|SAM=Samuel|RSH='''Rachi ([[1040]]-[[1104]])'''}}
{{Arbre généalogique | |!| | | |
|,|-|-|-|-|-|-|-|v|-|-|-|^|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|.}}
{{Arbre généalogique |SMH| | |RHL|-|AZR| |KVD|v|RMB| | | | | | | | |SMA| |
|MRM|v|YBN||SMH=Simha ben Samuel de Vitry| RHL=Rachel ''Bellassez''|
AZR=Eliézer ''Jocelyn'' | KVD=Yokheved | RMB=Meïr ben Samuel | MRM=Myriam |
YBN=Judah ben Nathan | SMA=Shémaiah}}
{{Arbre généalogique | |!| | | |,|-|-|-|v|-|-|-|v|-|-|^|-|-|-|-|v|-|-|-|.| | |
|!| | | | |,|-|^|-|.}}
{{Arbre généalogique |SAM|v|HAN| |SLM| |RTM|v|MRM| |RVM| |SBM|v|INC| | |YTV|
|AZR|SAM=Samuel de Vitry|HAN=Hanna| SLM=Salomon |RTM=[[Rabbénou Tam]]
(~[[1100]]-[[1171]])|MRM=Myriam |RVM=Isaac Rivam|SBM=Samuel [[Rashbam]]
(~[[1085]]-[[1158]])|INC=?|YTV=Yom Tov de Falaise|AZR=Eléazar }}
{{Arbre généalogique | | | |!| | | | | |,|-|-|-|v|-|^|-|v|-|-|-|.| | | | | |!|
| | |,|-|-|^|-|-|-|.}}
{{Arbre généalogique | | |RI| | | |ITS| |SLM| |MSH| |ISF| | | |ITS | |YHD| | |
| |ISF|RI=[[Isaac ben Samuel de Dampierre|Isaac de Dampierre]] dit le Ri
(~[[1120]]-[[1195]])|ITS=Isaac|SLM=Salomon|MSH=Moïse|ISF=Joseph| YHD=Judah |
ITS=Isaac}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | |
|,|-|-|^|.| | | |,|-|^|.}}
{{Arbre généalogique | | |HNN| | | | | | | | | | | | | | | | | | | | | | |ITS|
|AZR|v|BLA| |LAH|HNN=Elhanan (mort [[1184]])|ITS=Isaac|AZR=Eléazar  | BLA=Bila
| LAH=Léah}}
{{Arbre généalogique | | | |!| | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | |!| | | | | }}
{{Arbre généalogique | | |SML| | | | | | | | | | | | | | | | | | | | | | | | |
| | | | |YHD| SML=Samuel|YHD=Judah de Paris Sir Léon ([[1166]]-[[1224]])}}
{{Arbre généalogique/fin}}
</center>
Comment 146 Ted Kandell 2011-12-31 05:48:42 UTC
http://fr.wikipedia.org/wiki/Rachi#G.C3.A9n.C3.A9alogie

In the above tree, I need to add a father for Shémaiah and make Shémaiah the
father of Eliézer Jocelyn. 

That should be a simple change using the above templates, right? 

No.

= "Lack of string functions"

In the Newick format it would take 1 second, and there are tools to create and
edit such files. 

Now think of the hundreds of other easy-to-parse useful standard data formats
...
Comment 147 Daniel Friesen 2011-12-31 06:10:07 UTC
This Newick format and that genealogy stuff look like a perfect example of what
string functions will NOT solve.

String functions are for simple text replacements and tests. What are you going
to do, write a whole Newick parser in string functions? If, and that's a big if
given that we don't have variables inside WikiText, you can manage to implement
Newick parsing inside of a template. That template is going to be insanely
complex, trying to make minor tweaks to the template which would be sane in a
normal programming language are going to become so hard it's nearly impossible.
And to top it off that template is going to be so heavy that it slows down
parsing for every page you use it on (multiplied by how much you use it and how
much data you input).

If we have a use for it, then what it sounds like we could use, if we actually
have a use for it, would be a real Newick parser. Just as for whatever other
formats there are for things that are in fact useful to Wikipedia. Yes there
are hundreds of formats, but when we talk about Wikipedia and implementation we
only care about the ones that will output things we want on Wikipedia, and
within that only the few formats we actually need. We don't have to implement
parsing for dozens of formats that do the same thing when there's one format
most people can use that'll work.
But I would also like to make the point that what I see as the output of those
genealogy I can't consider acceptable. It's horrible, absolutely disgusting. A
complete abuse of html tables in a presentational way. I don't want to see a
new template that outputs the same garbage. Not only do those need a better
system of inputting the information, they need a better output. Something you
can't do in templates because it likely involves building a .svg or something.

From what I see of Newick and your example your argument also falls short.
Newick seams to describe trees that only branch outwards. But that genealogy
tree appears to re-connect at various points. In other words, it looks like
your example tree actually CAN'T be expressed in Newick.

Frankly, it looks like you could use DOT. Wonder what happened to graphviz in
all this.
Comment 148 John Du Hart 2011-12-31 06:11:48 UTC
I'm reclosing as WONTFIX.

It's very clear that we're going to have a new solution in the next year to
handle these situations. Whether it be Lua, built in Javascript or an extension
to handle cite templates. Whatever the fix is, I think the developers have made
a point that string functions simply won't be enabled.

Therefore, the bug's original request of setting $wgPFEnableStringFunctions =
true on Wikimania wikis will not happen. Hence, WONTFIX.

Please don't change this unless you are a developer.

(In reply to comment #142)
> This is pointless. Can we stop beating the dead horse already?

Agreed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links