Creating a Real-World Dataset Simulating Organized Crime Communication

The ROXANNE project is developing a cutting-edge solution to help Law Enforcement Agencies (LEAs) perform extremely efficient investigations through sophisticated use of biometric technologies (such as Phonexia Voice Biometrics), automatic speech recognition, and large-data processing automation—all within the boundaries of a privacy-first legal framework. The last-mentioned point, however, also means a significant challenge for the evaluation of the ROXANNE solution itself. How can one evaluate the solution’s performance in the real world if there is limited real-world data to be used due to privacy, GDPR, and other ethical constraints? This is where the ROXANNE’s unique evaluation dataset comes in.

As the ROXANNE platform’s promise is to offer actionable intelligence based on the automated analysis of multiple sources of information, including audio, video, and text, the solution’s testing requires a very intricate set of evaluation data, closely reassembling the real-world scenarios.

Obtaining such complex data would be a difficult task by itself, but when you add to it the strict nature of data privacy policies and the European Union’s GDPR legislation, this task becomes very challenging to achieve.

Therefore, the ROXANNE project’s consortium decided to create its own evaluation dataset that would be as close to the real world as possible.

Not only would this dataset be helpful for evaluation purposes, but it would also be extremely useful for the demonstration of the solution’s capabilities to Law Enforcement Agencies.

ROXSD: a Dataset Simulating Organized Crime Communication

After extensive discussions, it became obvious that the only way to create such a dataset was by starting from scratch. The primary focus of the dataset’s recordings was on the telephone conversations intercepted legally via law enforcement means.

To include the aspect of a real-world investigation, the ROXANNE simulated dataset’s (ROXSD) audio recordings are based on the following three fictional cases (inspired by real cases), investigated separately by the Prague anti-drug unit of the Czech Police:

A Drug Distribution Case A

A university student in Prague (Kryštof) is suspected of selling drugs. The police wiretapped two of his mobile phones to get more information about his contacts and a dealer. Most of the communication occurs at the point where the drugs change hands, and, typically, they all speak in Czech or Slovak. The police decide to wiretap three of Kryštof’s contacts, one of them being a Russian (Sergej) who speaks to him in English. During one call, Sergej talks to another Russian who reveals his ties to Kryštof, and the police, therefore, decide to wiretap his phone as well, which then results in the interception of the information about a drug delivery to London.

A Drug Lab Case

In this case, the police suspect two Vietnamese (Tuấn and Hoàng) of dealing large quantities of drugs and that Hoàng may have a production site. Their phones are wiretapped, revealing that they speak mostly Vietnamese, call each other multiple times, and talk to a few other Vietnamese contacts. During one call, however, Tuấn has an English conversation with an unknown individual, discussing a large delivery of drugs.

A Drug Distribution Case B

In this third case, the police investigate an Austrian student Max who studies at Charles University. They suspect him of distributing drugs in Prague’s city center. His wiretapped phone reveals he speaks German and English and is in contact with several unknown individuals.

As you can see from all three scenarios above, the audio recordings in the ROXSD are designed to contain a great variety of conversations related to real-world organized crime.

What is not so obvious is the scope of work necessary to record all these fictional conversations.

Every conversation had to be recorded over the phone channel, different nationalities were needed for conversations, the transcriptions and translations of each conversation had to be provided, and corresponding metadata had to be recorded to provide further information such as date, time, age, gender, and other details for a link analysis.

To make the scope even greater and promote the advantages of cross-case analysis, all scenarios were designed to contain information that collectively helps to uncover a much broader scale of organized crime. In other words, all three scenarios are connected together, which can be uncovered through further analysis of the recordings based on voice recognition, automatic speech recognition, metadata, and link analysis.

On top of that, each speaker had to provide valid consent in order for the ROXANNE team to go ahead with the recordings of their voice and speech for the simulated evaluation dataset.

Once all legal and privacy aspects were successfully addressed, the audio recordings took place, and all necessary transcriptions, translations, and corresponding metadata were created.

In total, the evaluation dataset contains 481 recorded calls from 104 speakers. There is a mix of 13 languages: English, German, Czech, Russian, Greek, Farsi, French, Slovak, Romanian, Arabic, Polish, Vietnamese, and Swedish.

There are six national groups involved in the fictional criminal organization. Each group uses its native language and speaks in English when communicating in-between the groups.

In total, the ROXSD contains 19 hours and 32 minutes of net speech (utterances).

Data richness makes the ROXSD currently one of the most advanced evaluation datasets available for testing and demonstrating purposes. It provides unprecedented possibilities for the evaluation of law enforcement solutions and has already proved itself to be extremely valuable for the ROXANNE project’s performance evaluation, testing, and presentation.

From now on, law enforcement agencies, as well as researchers, can test the performance of their algorithms, systems, and solutions against each other using fictional but real-world organized crime communication data.

And because the ROXANNE Simulated Dataset is provided with extremely rich metadata, it can be used to not only test the accuracy of voice biometric and automatic speech recognition technologies but also to evaluate all kinds of complex approaches to a link analysis powered by Artificial Intelligence (AI), advanced configurations of Natural Language Processing (NLP) neural networks as well as other cutting-edge speech technologies such as language identification, age estimation, gender identification, and keyword spotting.

And, of course, to demonstrate the ROXANNE platform’s capabilities to Law Enforcement Agencies.