Lorenzo Franceschi-Bicchierai recently interviewed (over Twitter) Guccifer 2.0,  who released documents hacked from the Democratic National Committee and claims to be the hacker of the DNC, concluding that he is probably not Romanian as he claims to be, and may very well be Russian. (Other evidence also points to the hack being perpetrated by the Russians.)

What, if anything, does Guccifer 2.0s language usage in the interview say about his likely national origin?
I performed a linguistic and stylistic analysis of Guccifer 2.0s English language responses in the interview. I examined all the clear errors in English grammar and style/usage, and considering whether each indicates that the writer is more likely a native Romanian or a native Russian speaker.

Summary of Results

There are seven unusual (non-native) features in the English of the interview. Out of these seven, five clearly point to the author being a native Russian speaker, one weakly points in that direction, and the last says nothing. Hence we can conclude that the author is far more likely to be Russian than Romanian.

Detailed Analysis

  1. He refers to himself as a women lover in the plural.  In Romanian this would be iubitor de femei (instead of femeie), while in Russian it would be женщина любовник.   The odds of the Russian combination occurring as a phrase (based on Google searching the phrases with constraint site:ru and site:ro, respectively) is nearly five times that of the the Romanian combination occurring as a phrase. Also note that in Romanian, the phrase contains the preposition de, corresponding to English lover of women, while in Russian there is no preposition. Thus the phrase is much more likely a calque from Russian than from Romanian.
  2. He writes Ive already told, instead of Ive already told you. This is a case of pro-drop, where an argument to a verb (usually a pronoun) is dropped when understandable from context. In Russian, the equivalent sentence Я уже сказал (or я уже ответил) is grammatically and contextually correct. In Romanian, the equivalent sentence, Deja am spus is technically correct, but it sounds odd and robotic to native speakers without the pronoun ti (you). This example is thus evidence that the writer is more likely Russian than Romanian.
  3. He uses the word deal multiple times to mean hack or operation (e.g., DNC isnt my first deal, all my deals). In Russian, a deal is сделка, business is бизнес, an affair is дело, an operation is операция, an effort is  усилие, and an enterprise is предприятие. In Romanian, these are, respectively, afacere, afaceri, afacere, and operație, efort, and afacere.  This may point towards Romanian, due to the similarity of several of the common translations for this concept, but more likely points towards Russian, due to the similarity of дело (delo) and deal (as well as сделка sdelka) deal is an apparent cognate.
  4. He replies to a request to reply in Romanian as proof of his origin with Man, Im not a pupil at school. This is odd because of the use of the word pupil (kid would be more likely in colloquial English), the use of the preposition at instead of in, and the start of the sentence with Man, . 
    1. In both Russian and Romanian, the phrase pupil at school (ученик в школе and elev la școală, respectively) is slightly rarer than either kid at school or boy at school. There is no clear difference here.
    2. In Russian, both English prepositions in and at are generally translated by в, while Romanian has different prepositions, în (in) and la (at), giving evidence for a Russian native language.
    3. The odds of the phrase Im not (Russian: я не; Romanian: eu nu) being preceded by Man, (Russian: Человек or Tоварищ,; Romanian: Omule,) in Russian is more than five times greater than in Romanian. This phrasing is evidence for a Russian native language over Romanian.
  5. One of the most prominent features of Russian (and other Slavic-language) speakers writing in English is the general lack of articles (the words a and the), as those languages lack them. Romanian, on the other hand, as a Romance language, has both definite and indefinite articles. While many of the interview responses used English articles correctly, there were a number which did not (e.g., I used [a] 0-day exploit and  the NGP VAN soft). The inconsistency is not a complete proof of Russian (or Slavic) L1 authorship, but is strong evidence that the writer is a Russian speaker with relatively good English.
 There are in total 7 oddities in the English text that can point to the native language of the writer. Five out of the seven point clearly to Russian over Romanian as the native language, one (deal) points weakly to Russian, and the last (pupil) is inconclusive.
Overall, therefore, the linguistic evidence consistently points towards the writer being a native Russian speaker. It is also possible that the writer is a Romanian speaker who has studied Russian (often L2 features spill into a third language more than L1 features do); however the writer denied knowing any Russian, and so the most reasonable conclusion is that he is a Russian native speaker rather than a Romanian native speaker. This evidence, combined with the evidently problematic Romanian language use in the interview, indicates clearly that Guccifer 2.0 is a Russian pretending to be a Romanian.

2 thoughts on “Linguistic Analysis: Guccifer 2.0 Is Russian, Not Romanian

Comments are closed.