Нужно спарсить данные с Википедии, для магистерской работы.
Вот проблематика :
Zero-Shot Learning refers to the prediction of classes for which a classifier did not see samples during training. Typically, this is achieved by embedding samples and classes and assigning them. Some research has already been done in the image domain, but less in the text domain. Many documents also have a network structure (e.g. quotations). However, the availability of evaluation data records with both modalities (network + text) is limited. Knowledge diagrams such as DBpedia offer the potential to create such data sets because they often contain textual descriptions of nodes.
The aim of the paper is to extract a data set from DBpedia that allows zero-shot approaches to be evaluated that include both modalities. The evaluation should be possible on a small and large scale and the work should provide the basis for the data set.
To do the main part with the data set extraction from Wikipedia and to log all problems and procedures, as well as source code.
Only the data record extraction and the logging in addition. Introduction, etc, I do of course myself