Computer Laboratory

Cambridge Cybercrime Centre: Guide for data recipients

This page is a guide for researchers who would like to use datasets from the Cybercrime Centre in their research. You will also need to read the formal agreement that you will have to sign.

You might find it helpful to read the corresponding 'Guide for data suppliers' as well.

Why do I have to sign this agreement?

Cybercrime data is not public. In order to grow the field of cybercrime research, a number of suppliers have agreed to make their data available to researchers under certain conditions. The agreement sets out those conditions.

What does it say?

The agreement is just over two pages, plus two short schedules. This guide is not a substitute for reading it, nor does it change what it means. Follow-up questions are always welcome.

Ok, but what are the main conditions?

Every person accessing the data needs to agree to:

  1. Use the data only for cybercrime research and analysis;
  2. Not reverse engineer the data or use it for any commercial purpose;
  3. Secure the data at all times;
  4. Respect the law, particularly data protection law;
  5. Publish in open access formats and respect all attribution requirements; and
  6. Keep the Cambridge Cybercrime Centre in the loop about all publications and problems.

Who signs the agreement?

Each agreement must be signed by the research institution, usually someone in the "Research Office" or possibly a "Head of Department". It's unusual for researchers to be allowed to sign on behalf of the institution.

Each individual researcher who intends to access data needs to have read the agreement and agreed to be bound by it. This is set out in condition #4. An example of the sort of paperwork that would be appropriate is acknowledgement-201812.pdf. This paperwork needs to kept on file by the research institution, but does not need to sent to Cambridge -- we merely need to be told who the researchers are.

So all my graduate students and interns have to sign?

If they use the data, yes.

What if I move from my current research institution?

You will need to sign a new agreement.

What if I am doing a project with people from other institutions?

It is unlikely that a research institution will be prepared to take responsibility for researchers who work somewhere else, so -- if the data is to be used in different places -- it will be necessary to have two (or more) agreements signed.

Do I have to be doing research at a university?

The agreement focuses on the nature of the research, not the nature of the institution. If you are a legitimate cybercrime researcher, you are welcome to apply. To be clear, we will never make commercial datasets available to a supplier's competitors or client companies.

Will I be identified as working with the Centre's data?

The Cybercrime Centre will publish the names of research institutions and lead researchers on its website. Demonstrating that a range of researchers are working on the data helps with future funding and to attract more data suppliers. In addition to this public list, we will maintain a private list of all researchers working on the data.

Will other people know what I'm working on?

For our due diligence, you will need to tell the Centre in general terms what you propose to do. We will not make this public before you publish results. We do not need to see a full project proposal.

If you decide to do something very different with the data we may need to sort out another agreement, so please keep the Centre updated.

How long does the agreement last?

We recognise that research can take time. Under condition #10, the agreement needs to be renewed annually, providing an opportunity for a regular check-in between researchers and the Centre. Once the agreement ends, you need to delete or return the data, but you may continue to publish results from work on the data, subject to the publishing conditions.

How much of the agreement is standard?

Conditions #9 to #19 are mostly what lawyers call 'boilerplate' terms. They are standard terms, typical of any contract.

What is the deal on intellectual property rights?

Legally, a data transfer is a transfer of intellectual property rights in data. The Cybercrime Centre is sub-licensing datasets to you for research purposes. Any new intellectual property you generate is yours, provided other terms of the licence are satisfied (for example, you are not allowed to reverse engineer the data or use it for commercial purposes). So if you invent a fantastic new way of detecting bad things, then it's yours.

How does the confidentiality condition work?

The cybercrime data you receive is never intended to be open data. This is to protect it from getting in the hands of criminals -- or to the competitors or clients of our data suppliers. It is also possible, but unlikely, that you might obtain confidential information about the data suppliers themselves, or about confidential aspects of the Cybercrime Centre's work. Condition #5 prevents you from disclosing any of this sort of confidential information, including after the agreement ends.

I have worked out where the Cybercrime Centre's data is coming from.

Sorry, but you are prohibited from "reverse engineering" the source of the data set. If data suppliers wish to be identified, you will be told explicitly in the schedules.

Can I combine the data you provide with other data?

Yes, of course.

What data security is required?

Condition #7 deals with this question. As cybercrime researchers, you are expected to be particularly vigilant and to use best available security practices and systems. You must promptly notify the Cybercrime Centre of any breach. Note we explicitly require the data be encrypted at rest -- if you leave your laptop on a bus, the Centre's data should not be exposed.

What privacy measures are required?

A high standard of compliance with privacy and data protection is also expected and required. Condition #8 requires you to comply with data protection law. Depending on your geographic location and local laws, there is a precaution in condition #8.1 to allow the introduction of extra terms to do with data protection if necessary.

So your data sets include personal data?

We go to significant effort to ensure that the vast majority of our data does not contain any personal data because that leads to all sorts of unnecessary legal complications for us and for researchers. So, for example, when a phishing URL originally contained an email address we will replace the specific email address with a dummy. This is usually good practice anyway since it removes specious duplicates. If we have overlooked any of this type of anonymisation then please let us know as soon as possible and we will address the issue.

If, for some reason, a particular set of data contains personal data (perhaps we have appropriate consent) then this will be made clear in the description of the data and appropriate text will be added to the agreement that you sign.

What's the position as regards ethics ?

The Centre collects data in an ethical manner that meets all the requirements of Cambridge's ethics regime. This ethics case allows us to share the data via our agreements. The ethics case does NOT give blanket permission for any and all research on the data -- even internally each project that will use the data must go through the Cambridge ethics approval process. You will need to ensure that your project and your use of the data does not run counter to the ethics regime at your institution.

I want to publish some findings.

Great! Condition #6 sets out the requirements on publishing. These are important.

Publications must be open access. You need to acknowledge the Centre in all cases, plus any data supplier that has asked to be identified. The Centre also wants to see publications at the stage they go to peer review, or if you blog or create other informal write-ups then we need to be told as soon as they are published. This will allow us to meet our obligations to data suppliers by ensuring the terms of all agreements are being met.

Can I include data extracts in publications?

Brief illustrative extracts are likely to be OK, but it is important to think very carefully about this. You should not provide information which would allow the reverse engineering of the original source of the data and you must ensure that you do not breach your obligations regarding the processing of personal data. what you include if this might identify individuals or data sources. Members of the Cybercrime Centre are happy to advise further in any particular case.

What do you mean by open access publication?

Both gold (publisher-level) and green (self-archiving) open access is fine. The most important thing is that non-academics, including commercial data suppliers, need to be able see all publications easily and without charge.

My conference or funder requires research on open data.

There will be an exception for datasets which cannot be published in this way. Other researchers can apply to the Centre to work on the same data and verify and build on your results. This is the whole motivation for the Centre's existence: to square the circle of being able to undertake high-quality, reproducible research on data that, by its nature, is not and should not be open data.

Can I publish a derived dataset?

No. However, we would be delighted to talk to you about incorporating this data back into the Cybercrime Centre's available datasets.

Will law enforcement people be interested in this data? If they request access, what should I do?

If law enforcement wishes to access data, they should go to data suppliers, not researchers. Condition #5.4 asks you to immediately make available any requests to the Cybercrime Centre, and we will help.

I don't have £100,000 lying around.

Condition #9 is a standard limitation of liability clause. This needs to contain a money cap, so that there is no risk of your liability being much higher. The sum of £100,000 is a standard ceiling. It is extremely unlikely that any financial liability will ever be imposed. This sort of money would only be charged if there was a gross breach of the agreement leading to clear damage and a decision on the matter by a court.

I don't like clause X. Can I change it?

We are most reluctant to change this document, because our agreements with data suppliers have been made on the basis that this agreement, or something like it, will be signed by all recipients of data. Nevertheless, talk to us about your concerns and we will find a solution.

I'm actually interested in your "extremist" datasets, not the cybercrime ones

Everything above applies to these datasets as well -- and there's some minor changes to the wording of the agreements to remove explicit references to cybercrime.

I have another question

No problem, write to < datarequest AT > and we will do our best to answer it.

Formal "outgoing" agreement:

Guide for data suppliers: https://www.cambridgecybercrime/inGuide.html

Legal overview:

Process for working with our data: