There are various options of how you can approach a project. You can develop a novel software tool, create novel algorithms, evaluate existing algorithms, and many things more. In the following, we explain what type of research projects exists.
A research project is what researchers normally do. It aims to do something truly novel. In computer science, this ‘something’ is typically a novel algorithm or a groundbreaking extension to an existing algorithm. Imagine you were the person that had originally proposed algorithms or techniques such as Perceptrons or Support Vector Machines, or extensions to it such as the Kernel trick. Something like this should be your ideal goal. Obviously, it is very unlikely that you will achieve that goal. But always keep in mind that you want to create something novel.
A typical research project can be very simple. For instance, some years ago, we wanted to extract titles from PDF files. We realized that all existing software libraries for title extraction were relatively difficult to use. They applied machine learning, had to be trained on expensive training data (we would have had to spend weeks to annotate PDFs and titles), they were slow, and not very accurate. So, within a few days, we wrote our own algorithm that simply extracted the largest text from the upper part of a PDF’s first page. We took that extracted text at the title. This super-simple algorithm did not need any training data, was 10% more accurate than the machine learning algorithms, and it was 4 times as fast. A project like this can be a perfect research paper.
Some examples of relatively simple, yet good, research projects are the following publications, which all resulted from Bachelor’s/Master’s theses:
What makes them ‘good’ is that they answer a question that has not been answered before (paper 3), or they propose a novel algorithm/concept (papers 1 and 2). In either case, they provide evidence that their answer is true, or their novel algorithm is better than the state of the art.
Build a real Intelligent Hardware System
We have a few project ideas for creating something ‘real’. We will provide you with all the tools and resources needed to realize the projects. We are also very open to hearing about your own crazy ideas.
Resource Projects (Datasets or Software)
A resource project creates a novel software tool, dataset, or benchmark. While it is generally relatively easy to write a piece of software or create some dataset, the requirements for a resource project are high. The novel resource normally must be really useful and impactful, and your thesis must provide evidence for that. For instance, if you introduce a novel software tool, you must provide comprehensive comparisons with other software tools.
In “applied research” you aim at improving one field of application -- e.g. movie recommendation, lung-cancer prediction, face recognition, or stock market prediction. Typically in such a project, you ‘throw’ a large number of existing algorithms on the novel scenario, and see what algorithms perform best. For a Studienarbeit, Bachelor’s or Master’s thesis, such a project is fine. However, from a scientific point of view, such projects are normally considered second-class. Such projects often involve a lot of trial and error, and less theoretically founded ideas. To illustrate the point: Imagine a person A) who proposed the idea of Support Vector Machines and evaluated the first SVM on a dataset with handwritten digits and a person B) who later proposed to apply SVMs on classifying images of cats and dogs. Person A) clearly made a much more significant contribution to the world. Nevertheless, applied research papers can make valuable contributions to the field (we have published many applied research papers ourselves).
Reproducibility projects are about confirming or disconfirming the results of someone else's work. There are three different scenarios.
Some authors of a novel algorithm have not released their source code and/or data
It often happens that authors publish a research article about a novel algorithm. But they do not release their source code or data. Consequently, it is difficult or even impossible for others to really use the novel algorithm. Especially large IT companies do this. They publish a paper and showcase how great their technology is, but they do not publish the source code so that others (competitors) cannot use their work. They also often refuse to publish their data due to privacy concerns.
If the authors have neither published their code nor their data, your task would be to a) implement the algorithm yourself based on the description in the authors' paper and b) evaluate your re-implementation on some datasets. Be warned, this type of work can be very time-consuming and there is a high risk that you will fail in implementing the algorithm and an even higher risk that the algorithm will not perform as expected. Nevertheless, if you succeed, and if you have picked a promising algorithm, you may benefit from that kind of work. For instance, one of our previous Bachelor’s students implemented the Neural Turing Machine algorithm (NTM). The NTM algorithm was proposed originally by Google, but Google did not release the source code. About dozens of researchers had tried to re-implement the algorithm but failed. Our Bachelor student succeeded, received a best-paper award for his work, and joined the Google AI residency program.
If the authors have released their source code, but not the data, your task is more simple. You run their code, and the baselines the authors used, on some other datasets to see if you receive similar results. For instance, if the original authors claim that their novel algorithm N is better than Baselines b1 and b2 on their private data D, then you check if that claim is true on other datasets.
The authors of the novel algorithm have released their source code and data
In this case, you run the novel algorithm and the original baselines on the same data as the authors, plus on some additional datasets. Your goal is to find out if the novel algorithm is really as promising as claimed by the original authors.
You suspect that a large number of algorithms in a particular domain are not reproducible
In a comparative study, you compare a large number of algorithms or tools to identify the best one or the advantages and disadvantages of each. This type of work can be similar to a reproducibility study, but the framing/writing of the paper is different. Also, a comparative study typically aims at well-established software tools (e.g. machine learning tools like Weka, sci-kit learn, H.2, TensorFlow, …). A reproducibility study rather aims at novel algorithms that are not yet integrated into software libraries like Tensorflow.
The examples above are relatively brief, but you may make such studies very comprehensive and even publish them in journals.
Industry or Application Projects
An industry or application paper could be a case study or a description of how a software tool performs in a certain environment/company. It could also be a paper in which you apply the well-known algorithm X on the novel scenario Z. Not many conferences accept industry or application papers. However, depending on the framing you could probably ‘sell’ this kind of work as a research paper or comparative study, too.
Once you are familiar with a research field, you may want to initiate discussions or change current best practices. For this purpose, a position paper is suitable. In a position paper, you can either just write down your opinion on a certain topic, or provide some evidence that backs up your claims.
A literature survey summarizes a large amount of literature relating to one particular topic. Every type of research project – be it a research article, a resource paper, or a comparative study – must contain some literature survey, summarized in the related work section. As a Bachelor’s student, your entire thesis may be a literature survey. However, your survey must be really comprehensive, and it is not easy to do this well! We would only agree to supervise a literature survey as a thesis project if you are already very familiar with that research field. Furthermore, you must be an excellent writer, and very well at organizing and structuring.