What limitations should be imposed on datamining of email traffic patterns?

E-mail traffic is one of the main target of intelligence surveillance to detect terrorist and other malicious activities. Its rather easy and result oriented when compared to wiretapping of other data streams. This is because mails send or received can be easily linked to a chain that it belongs to. It can be used to identify the community that the email belong to. I would personally suggest unlimited right to analyze any suspicious traffic identified by pattern analysis. Various accepted methods are being experimented to effect this.

Rather than directly wiretapping and analyzing the content of individual mail in detail, the suggested method is to “look for the critical links that form bridges or betweenness of separate groups” (Muir, 2003). This would bring out a group of people communicating stuff that can include terrorist activities. Suggested method is to use automated pattern analysis to detect for suspicious communication and if any such is identified, intelligence force may use CALEA to further take actions.

Here’s a link to ‘Process Mining’ that introduces a new method of result oriented data-mining to uncover social networks from e-mail traffic. The method works on event logs created by e-mail clients and tries to uncover social relationships that connects people, potentially applicable to trace terrorist groups.

Process mining as applied to email-traffic is to –

1. Create event logs out of email (subject, To-from ids, send/received dates, mail headers etc) such as those handled by MS Outlook, usually dumped into a database.

2. Use the so called ProM framework to mine the event log to uncover social relationships.

It is also true that there should be limitations applied to data mining that will not search for specific content in an email, as there are privacy concerns attached to it. All data mining techniques are to be “privacy-preserving”. Here’s a nice article – Privacy preserving data mining -, in which they outline the current state of this procedure that could be effectively utilized for a controlled data mining in intelligence surveillance, including e-mail traffic.