GerManC
| Title | GerManC |
| Author | Martin Durrell; Paul Bennett; Silke Scheible; Richard J. Whitt |
| Availability | This resource is freely available, you should be able to download it now. |
| Languages | German |
| Editorial Practice | Encoding format: TEI Lite P5 XML; GATE XML; GATE column format; plain text |
| OTA keywords |
Linguistic corpora Corpus |
| LC keywords | |
| Extent |
|
| Creation Date | The corpus was constructed between 2008 and 2011. |
| Source Description |
Expanded and revised version of http://ota.ox.ac.uk/id/2537 Various: see documentation in the download package. : Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods. The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet. |
