This corpus contains hitherto unedited medical texts produced in the period 1700-1900. The aim of the research team is to produce a balanced corpus in terms of the taxonomy of early English medical writing, i.e. theoretical treatises, surgical treatises and remedies.

The present version of the corpus holds over 2.5 million words, of which 1.5 million belong to the 18th century and the other million to the 19th.

The corpus has been compiled in three stages corresponding to the three different versions of the corpus. Each version of the corpus may be used for different research purposes:

Corpus Files

  1. Plain text corpus (.txt): these files hold the transcribed version of the text as found in the original sources, where spelling and word division have been preserved.

A Medicine for the Falling sickness.

Take a penny weight of the powder of Gold, six penny weight of Pearl, six penny weight of Am-ber, six penny weight of Corral, eight grains of Bezar, half an ounce of Piony seeds; also you must  put some powder of a dead mans skull, that hath been an Anatomy, for a Woman, and the powder of a Woman for a Man, compound all these toge-ther; and take as much of the powder of all these  as will lie upon a two-pence for nine mornings to-gether in Endive-water, and drink a good draught of Endive water after it.


  1. Normalised corpus (.norm): these files contain the normalized transcriptions of the corpus material. The normalisation process has been carried out by means of VARD, which standardises the variant forms to Present Day English and inserts an XML-tag so that the original word can be consulted.

A Medicine for the Falling sickness.

Take a penny weight of the powder of Gold, six penny weight of Pearl, six penny weight of Amber, six penny weight of Corral, eight grains of Bezoar, half an ounce of Peony seeds; also you must put some powder of a dead man’s skull, that has been an Anatomy, for a Woman, and the powder of a Woman for a Man, compound all these together; and take as much of the powder of all these as will lie upon a two-pence for nine mornings together in Endive water, and drink a good draught of Endive water after it.


  1. POS-tagged corpus (.pos): these files contain the POS-tagged version of the corpus, which has been carried out by means of CLAWS, which assigns a morpho-syntactic tag to each word in the corpus, punctuation marks included. The C7 tagset has been employed.

A_AT1 Medicine_NN1 for_IF the_AT Falling_JJ sickness_NN1 ._.

Take_VV0 a_AT1 penny_NNU1 weight_NN1 of_IO the_AT powder_NN1 of_IO Gold_NN1 ,_, six_MC penny_NNU1 weight_NN1 of_IO Pearl_NN1 ,_, six_MC penny_NNU1 weight_NN1 of_IO Amber_NN1 ,_, six_MC penny_NNU1 weight_NN1 of_IO Corral_NN1 ,_, eight_MC grains_NN2 of_IO Bezoar_NN1 ,_, half_DB an_AT1 ounce_NNU1 of_IO Peony_NN1 seeds_NN2 ;_; also_RR you_PPY must_VM put_VVI some_DD powder_NN1 of_IO a_AT1 dead_JJ man_NN1 ‘s_GE skull_NN1 ,_, that_DD1 has_VHZ been_VBN an_AT1 Anatomy_NN1 ,_, for_IF a_AT1 Woman_NN1 ,_, and_CC the_AT powder_NN1 of_IO a_AT1 Woman_NN1 for_IF a_AT1 Man_NN1 ,_, compound_VV0 all_DB these_DD2 together_RL ;_; and_CC take_VV0 as_RG much_DA1 of_IO the_AT powder_NN1 of_IO all_DB these_DD2 as_CSA will_VM lie_VVI upon_II a_AT1 two-pence_NN for_IF nine_MC mornings_NNT2 together_RL in_II Endive_NN1 water_NN1 ,_, and_CC drink_VV0 a_AT1 good_JJ draught_NN1 of_IO Endive_NN1 water_NN1 after_CS it_PPH1 ._.


  1. CQPweb format: the corpus is also available online as a CQP-web format at https://latemodernmss.uma.es/cqpweb

How to cite

Calle-Martín, Javier, Miriam Criado-Peña, Verónica Hernández, Sinéad Linehan-Gómez and Juan Lorente-Sánchez. 2016. The Málaga Corpus of Late Modern English Scientific Prose (MCLModESP). Málaga: University of Málaga. Available from https://latemodernmss.uma.es.