Advancing, promoting and sharing knowledge of health through excellence in teaching, clinical practice and research into the prevention and treatment of illness

Empirical Bayesian models for analysing molecular serotyping microarrays

Newton, R; Hinds, J; Wernisch, L (2011) Empirical Bayesian models for analysing molecular serotyping microarrays. BMC BIOINFORMATICS, 12 (88). ISSN 1471-2105
SGUL Authors: Hinds, Jason

PDF Published Version
Available under License St George's repository terms & conditions.

Download (439kB) | Preview


Background: Microarrays offer great potential as a platform for molecular diagnostics, testing clinical samples for the presence of numerous biomarkers in highly multiplexed assays. In this study applied to infectious diseases, data from a microarray designed for molecular serotyping of Streptococcus pneumoniae was used, identifying the presence of any one of 91 known pneumococcal serotypes from DNA extracts. This microarray incorporated oligonucleotide probes for all known capsular polysaccharide synthesis genes and required a statistical analysis of the microarray intensity data to determine which serotype, or combination of serotypes, were present within a sample based on the combination of genes detected. Results: We propose an empirical Bayesian model for calculating the probabilities of combinations of serotypes from the microarray data. The model takes into consideration the dependencies between serotypes, induced by genes they have in common, and by homologous genes which, although not identical, are similar to each other in sequence. For serotypes which are very similar in capsular gene composition, extra probes are included on the microarray, providing additional information which is integrated into the Bayesian model. For each serotype combination with high probability, a second model, a Bayesian random effects model is applied to determine the relative abundance of each serotype. Conclusions: To assess the accuracy of the proposed analysis we applied our methods to experimental data from samples containing individual serotypes and samples containing combinations of serotypes with known levels of abundance. All but two of the known serotypes of S. pneumoniae that were tested as individual samples could be uniquely determined by the Bayesian model. The model also enabled the presence of combinations of serotypes within samples to be determined. Serotypes with very low abundance within a combination of serotypes can be detected (down to 2% abundance in this study). As well as detecting the presence of serotype combinations, an approximate measure of the percentage abundance of the serotypes within the combination can be obtained.

Item Type: Article
Additional Information: Copyright: 2011 Newton et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
Keywords: Bacterial Typing Techniques, Bayes Theorem, Oligonucleotide Array Sequence Analysis, Probability, Reproducibility of Results, Serotyping, Streptococcus pneumoniae, Science & Technology, Life Sciences & Biomedicine, Biochemical Research Methods, Biotechnology & Applied Microbiology, Mathematical & Computational Biology, Biochemistry & Molecular Biology, BIOCHEMICAL RESEARCH METHODS, BIOTECHNOLOGY & APPLIED MICROBIOLOGY, MATHEMATICAL & COMPUTATIONAL BIOLOGY, CAPSULAR BIOSYNTHETIC LOCI, Bioinformatics, 06 Biological Sciences, 08 Information And Computing Sciences, 01 Mathematical Sciences
SGUL Research Institute / Research Centre: Academic Structure > Infection and Immunity Research Institute (INII)
Journal or Publication Title: BMC BIOINFORMATICS
ISSN: 1471-2105
Related URLs:
31 March 2011Published
Web of Science ID: WOS:000289458300001
Publisher's version:

Actions (login required)

Edit Item Edit Item