TTMServer performance and coverage issues
OpenPublic

Description

The latest fixes to TTMServer done some months ago are not enough. During translation rally at translatewiki.net, translation memory was using too much cpu time. In addition there have been reports and observations that suggestions are not found, for example when translating the tech news with many repeating parts.

During the Lyon hackathon I spoke with David Chan who suggested to replace the current FuzzyLikeThis query with checking some ngrams from beginning and end of the strings. Those need to be stored separately at indexing time unless there is a way to instruct ES to do it for us. In any case short one to three word strings need special attention.

It seems that current performance bottleneck is fetching too many string contents for comparison and scoring, not the scoring itself.

Nikerabbit created this task.Via WebJun 3 2015, 11:25 AM
Nikerabbit added a subscriber: Nikerabbit.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptVia HeraldJun 3 2015, 11:25 AM
Nemo_bis added a project: Elasticsearch.Via WebJun 3 2015, 11:53 AM
Nemo_bis added a subscriber: Nemo_bis.
Arrbee added a project: LE-Sprint-88.Via WebJun 3 2015, 3:12 PM
Arrbee set Security to None.
Stryn added a subscriber: Stryn.Via WebJun 3 2015, 7:01 PM
Purodha added a subscriber: Purodha.Via WebJun 3 2015, 8:46 PM
gpaumier awarded a token.Via WebJun 3 2015, 8:48 PM
Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebJun 3 2015, 11:51 PM
siebrand added subscribers: siebrand, dchan.Via WebJun 4 2015, 6:00 AM
Glaisher added a subscriber: Glaisher.Via WebJun 6 2015, 11:43 AM
santhosh set Story Points to 1.Via WebJun 10 2015, 6:28 AM
santhosh added a subscriber: santhosh.Via WebJun 10 2015, 8:18 AM
Arrbee assigned this task to Nikerabbit.Via WebJun 10 2015, 8:21 AM
Arrbee added a subscriber: Arrbee.
Restricted Application added a project: Discovery. · View Herald TranscriptVia HeraldJun 17 2015, 5:48 AM
gerritbot added a subscriber: gerritbot.Via ConduitJun 25 2015, 2:12 PM

Change 219388 had a related patch set uploaded (by Nikerabbit):
Use Filtered query instead of post_filter for TTMServer suggestion.

https://gerrit.wikimedia.org/r/219388

gerritbot added a project: Patch-For-Review.Via ConduitJun 25 2015, 2:12 PM
gerritbot added a comment.Via ConduitJun 25 2015, 2:16 PM

Change 219388 merged by jenkins-bot:
Use Filtered query instead of post_filter for TTMServer suggestion.

https://gerrit.wikimedia.org/r/219388

Nemo_bis added a project: user-notice.Via WebJun 25 2015, 2:16 PM
Nemo_bis added a subscriber: Phoenix303.

We should probably let users know that starting next week, thanks to @Phoenix303, they should get faster translation suggestions and that they should report any weirdness.

gpaumier moved this task to Announce in next Tech/News on the user-notice workboard.Via WebJun 25 2015, 3:10 PM
gpaumier moved this task to In current Tech News draft on the user-notice workboard.Via WebJun 25 2015, 3:58 PM
gpaumier moved this task to Recently announced in Tech/News on the user-notice workboard.Via WebJun 26 2015, 9:19 PM
Legoktm removed a subscriber: Forrestbot.Via WebJun 29 2015, 5:49 PM
gpaumier moved this task to Archive on the user-notice workboard.Via WebJul 2 2015, 8:02 PM
Arrbee placed this task up for grabs.Via WebJul 21 2015, 9:27 PM
Arrbee removed a project: LE-Sprint-88.
Purodha added a comment.Via WebJul 22 2015, 5:26 AM

I do not experience a notable speedup with TM suggestions.

At least when translating the weekly tech newsletter in MW: invariant or next-to-invariant strings of more than, say, 5 characters length are never found in TM. Maybe, this is another issue that has to be investigated separately.

Add Comment