Ethics in Data Mining and Cryptography

Posted 4 Dec 2002 at 16:57 UTC by exa Share This

In recent years, computer science has become more of an applied science than a pure discipline. It is true that much of the driving force behind proliferation of computing devices is commercial. However, over-commercialization has begun cultivating products that give rise to ethical issues.

In this brief article, I shall mention two such areas which require our immediate attention in both making the public aware and warning the future researchers of the implications.

When we refer to ethics in science, we imagine fields such as nuclear physics or genetic engineering. It is harder to conceive that similar ethical controversies can exist in a mostly mathematical discipline with seemingly less connection to the physical world. Computer Science, however, has walked out of the lab 3 decades ago and has tightened its stronghold in our homes with the Internet. That is not to say that Computer Science threatens our lives in a mean way. On the contrary, being a computer scientist I think the field has enormous benefits for mankind.

Nevertheless, we are in a period where we have kin that is lower than common slime. The phenomenon known as SPAM has truly battered the heart of "Matrix", the asynchronous facet of the Internet. Previously a highly efficient means of communication, e-mail is now searching for a needle in the haystack. The sad part of the story is that SPAM is a work of man; it is not a computer fault in any way.

Another threat to our society comes from the overarching greed of corporations that want to control our life in every possible aspect. By making use of legal priviledges such as copyright law and contracts that allow "private law", they try to extend their power by "making" law. The assaults of copyright protection and shrink-wrap licenses have recently introduced to our lives sneaky concepts such as "Digital Rights Management" which is nothing short of a first and bold step towards a truly cybernetic society, one that allows "control at distance". Such "perfect control" is undesirable regarding our freedom, since the myriad of control systems effectively detriment our existing rights. For example, you can buy a book and lend or sell it to anybody you would like to or read it anywhere you please. With DRM, unfortunately, that is a thing of the past in the world of "digital media".

While research such as supercomputing and artificial intelligence has brought about results that have improved our lives in many ways, Computer Science, in its ugliest form, amends these aforementioned trends and helps ruin our lives. First we shall explain what this research amounts to and then we will deliberate over the ethical problems with the research. I do not directly offer any solution but it should be evident that I am profoundly against research of this sort for several reasons.

Data Mining is the application of algorithmic methods for knowledge discovery in vast amounts of data. It can be used to glean useful information in both scientific and business domains. When we talk of data mining, we generally refer to a system that is more sophisticated than a simple query or statistics over a database. A data mining algorithm performs a non-trivial computation on data that can be terabytes in size. A typical example is inferring customer or product groups from market-basket data. There are also scientific achievements such as SKYCAT project in NASA which has automatically classified thousands of celestial objects.

In recent years, an important "justification" for research has been "Online Marketing" which is not too different from what you know as SPAM. I have seen papers that claimed to introduce wonderful methods to extract unique email addresses from the web or other facilities that would only increase the amount of SPAM. While one may argue that these might actually result in fewer SPAM since it makes "targeted advertisement" easier, that has NOT been the case as we have all observed.

Another important area of research is cryptography, which tends to be very mathematical by nature, and is one of the respected areas in computer science/mathematics departments. A well known cryptographer often appears as a wizard to the outside world. We have however witnessed many attempts by electrical engineers, computer scientists and mathematicians to come up with an ultimate means of "copyright protection". One of the techniques is called "watermarking" which involves inflicting a permanent "tag" in various media. For instance, a watermarked image carries a "stamp" that can be verified and that is impossible (they say) to duplicate, much like the ones in ordinary paper such as bills. Such digital marks remain even when the image is cropped or modified. A recent "justification" for research reads: "Prof. Dr. Murat Tekalp informed us that today a lot of people are making unauthorized changes on digital images and with the application of the new software in digital machines it will be much easier to detect unauthorized changes". While this reads slightly innocent, it is impossible not to detect the tone of draconian leanings in the passage. Much of research in this area is justified solely by "bring total control to our digital lives".

The problem is that some Computer Scientists do this consciously and on purpose. They are in a similar position to "military scientists". While the actions of the latter kind can be explained in terms of "patriotism", I can find no explanation for this sort of research and the way it is conducted.

More disturbing than the actual results of this research is the way it is being conducted. For a scientific work to be published, one should be addressing some sort of a problem. Here, the deepest ethical problem is that there are researchers who justify their research by goals that are in conflict with the best interests of the society.

If the research in question had been abstract mathematical results it would not have been as "evil". However, the research is of such a nature that is useless without the stated "goals". They seem to make almost no contribution to humanity except SPAM and the Big Brother.

What is more, these researchers are doing most of this research with public funding, or in the case of large corporations with the money of stockholders among the public. I find that to be the second major ethical complication in this pervert scenario. The general public has no interest in supporting advances that are likely to damage its communication facilities or freedoms that have existed for several hundreds of years.

I raise these ethical considerations for discussion. I believe that every computer scientist and every computer literate alike must think about these issues.

I am not citing any specific paper right now, that is left as an exercise for the reader! I shall give as example papers from when I have some time. Simply search for "online marketing" among data mining papers.


Eray (exa) Ozkural <>

Ethics, posted 6 Dec 2002 at 10:47 UTC by neil » (Master)

You're right, of course, but unfortunately ethics questions are rarely posed in such direct, isolated terms, though, so the answers aren't always so easy..

People don't get asked to analyze databases to violate privacy, they get asked to search for a known criminal, for evidence to convict a suspect, to help prevent crime, or to help save lives.

People don't get asked to write encryption or watermarking software to prop up obsolete business models. They get asked to protect vulnerable releases from being exploited by soulless companies creating software networks to organize international hordes of freeloaders.

And of course, beyond that, there are other issues. If one has a choice between not having enough money to buy food, or writing software that scans for open relays to send spam, ethics might not come to one's mind. Being backed into a corner changes one's reactions.

However..., posted 8 Dec 2002 at 05:27 UTC by piman » (Journeyer)

For every advance in data mining that leads to commercial research, there's equally possible ones in useful areas, like automatic organization, text summarization, or interface usability. Or even better, it might lead to automatic spam detection and elimination.

An advance in cryptology might create watermarks, or it might protect your privacy or identity online. Or it might even break earlier watermarks.

The problems lie in how the discoveries are being used, not in how the discoveries themselves. I do think it's important that we don't push forward as "progress for progress's sake" and keep in mind that work can be used for malicious purposes. However, inventing a new text extraction method is not nearly as clear-cut as issue as, say, inventing a new nuclear detonation device.

Additionally, in almost every example I can think of for computer science, the research into the problem is almost the same as the research for the cure. Data mining finds spam, new decrypting techniques break old crypto. Since computers are so predictable, most of these bad things can be easily reversed once we understand them. (In this respect, CS seems a lot like biology to me; I imagine if you do a lot of research into how to create some sort of disease, you're closer to curing it too.)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page