|Availability||This resource is freely available, you should be able to download it now.|
Encoding format: TEI Lite P5 XML; GATE XML; GATE column format; plain text
|Creation Date||The corpus was constructed between 2008 and 2011.|
Expanded and revised version of http://ota.ox.ac.uk/id/2537
Various: see documentation in the download package. :
Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods.
The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet.