CD Maps – dynamic profiling of CD1 to CD100 surface expression on human leukocyte and lymphocyte subsets

Tomas Kalina,^1* Karel Fišer,^1* Martin Pérez-Andrés,² Daniela Kužílková,¹ Marta Cuenca,³ Sophinus J.W. Bartol,⁴ Elena Blanco,² Pablo Engel,³ Menno C. van Zelm,^4,5 on behalf of the Human Cell Differentiation Molecules (HCDM) organization

¹ CLIP - Childhood Leukaemia Investigation Prague, Department of Paediatric Haematology and Oncology, Charles University, Prague, Czech Republic and University Hospital Motol, Prague, Czech Republic
² Department of Medicine, Cancer Research Centre (IBMCC, USAL-CSIC), Cytometry Service (NUCLEUS), University of Salamanca (USAL), Institute of Biomedical Research of Salamanca (IBSAL), Salamanca, Spain and Biomedical Research Networking Centre Consortium of Oncology (CIBERONC) Instituto de salud Carlos III, Madrid, Spain
³ Department of Biomedical Sciences, University of Barcelona, Barcelona, Spain
⁴ Department of Immunology, Erasmus MC, University Medical Center, Rotterdam, the Netherlands
⁵ Department of Immunology and Pathology, Monash University and The Alfred Hospital, Melbourne, VIC, Australia

^* equal contribution

Correspondence:
A/Prof Menno C. van Zelm
Department of Immunology and Pathology
Monash University and Alfred Hospital
89 Commercial road
Melbourne VIC 3004
E: menno.vanzelm@monash.edu

Abstract

CD molecules are surface molecules expressed on cells of the immune system that play key roles in immune cell-cell communication and sensing the microenvironment. These molecules are essential markers for the identification and isolation of leukocytes and lymphocyte subsets.

Here, we present the results of the first phase of the CD Maps study, mapping the expression of CD1-100 (n=110) on 47 immune cell subsets from blood, thymus and tonsil using an 8-color standardized EuroFlow approach and quantification of expression.

The resulting dataset included median antibody binding capacities (ABC) and percentage of positivity for all markers on all subsets and was developed into an interactive CD Maps web resource. Using the resource, we examined differentially expressed proteins between granulocyte, monocyte and dendritic cell subsets, and profiled dynamic expression of markers during thymocyte differentiation, T-cell maturation, and between functionally distinct B-cell subset clusters.

The CD Maps resource will serve as a benchmark of antibody reactivities ensuring improved reproducibility of flow cytometry-based research. Moreover, it will provide a full picture of the surfaceome of human immune cells and serves as a useful platform to increase our understanding of leukocyte biology, as well as, to facilitate the identification of new biomarkers and therapeutic targets.

List of Figures and Tables

Figure 1. shiny_overview
Figure 2. “Bubble Figure” = big_picture
Figure 3. single_pops_and_fluorescence_cv A - Candidate for Supp 5 B substitute, B - inter-population, C - inra-population, D-F PEpos_single_pops**
Suppl Figure 6. Supplementary Figure MedQb_all_individual
Suppl Table 1. Supplementary Table MedQb_all_individual
Suppl Figure 7. Supplementary Figure PEpos_distribution_all.
Suppl Table 2. Supplementary Table PEpos_distribution_all, below figure. List of CD markers based on modelling, not included, but I would like to have this. If included, than similar table should be done for supp_all_MedQb.html.
Figure 4. pheatmap
Suppl Figure 8. Supplementary Figure hca_big, clusters of samples AND CDs
Figure 5. Granulocyte, monocyte and dendritic cell analysis. MedQb_distribution_pairwise_menno_fig A - Monocytes, B - Dendritic, C + D - Granulocytes
Figure 6. T-cell maturation CD8Tsubsets_plot
TBD: Suppl Figure 9. …
Figure 7. Thymocyte differentiation: thymus_to_blood_plot
Figure 8. Antigen-dependent B-cell maturation in tonsils: martin_maturation_cd_groups_plot

Methods

Suppl Table 1: Table tab_panels
Table tab_source_per_panel. Will NOT be used.
Suppl Table 2: CD1-CD100 marker (do not use following (but check naming): Supplementary Table 100 CD markers)
Suppl Figure 1 - 4: Supplementary Figure gating_strategy
Suppl Table 3: Subsets: Supplementary Table cell_subsets
Table tab_variables. Will NOT be used.
Suppl Figure 5 A: scheme.
Suppl Table 4: Supplementary Table supp_tab_pckgs_used

Introduction

Leukocytes display on their surface molecules that are crucial for sensing hazardous environmental changes and mediating cell adhesion and communication between cells both within the immune system and with stroma. These include receptors, transporters, channels, cell-adhesion proteins and enzymes. The complexity of surface-expressed proteins, also called the surfaceome, is emphasized by the fact that an estimated 26% of human genes encode transmembrane proteins (~5500)¹. However, recent in silico evaluations predict that 2886 proteins are actually expressed at the outer cell membrane, i.e. the cell surface.2 Experimental evidence exists for ~1492 proteins across multiple tissues,3 and 1015 proteins that are expressed in one or more immune cell type and lymphoid tissue.4

Over the past four decades, a vast array of cell surface molecules has been discovered through the production of monoclonal antibodies (mAbs).5 These mAbs, together with the development of multicolor flow cytometric analysis,6 have been instrumental to determine their expression and function. Human Leukocyte Differentiation Antigen (HLDA) Workshops have led to the characterization and formal designation of more than 400 surface molecules,7, 8 known as CD molecules. CD nomenclature provides a unified designation system for mAbs, as well as for the cell surface molecules that they recognize. These molecules include receptors, adhesion molecules, membrane-bound enzymes and glycans that play multiple roles in leukocyte development, activation and differentiation. CD molecules are routinely used as cell markers, allowing the identification of the presence and proportions of specific leukocyte cell populations and lymphocyte subsets, and their isolation, using combinations of fluorochrome labeled antibodies and flow cytometry. Importantly, analysis of CD molecules, known as immunophenotyping, is a fundamental component for the diagnosis, classification and follow-up of hematological malignancies and immune deficiencies, and the monitoring of immune system disorders such as autoimmune diseases. More recently, mAbs recognizing CD molecules have been established as invaluable tools for the treatment of cancer, such as checkpoint inhibitors,9 and autoimmune diseases.10 Development and testing of such therapeutics does rely on accurate knowledge expression and function of the target molecule as has been negatively illustrated by the disaster in the Phase I TGN1412 study with an anti-CD28 superagonist.11

Currently, there are extensive gaps in our knowledge of CD molecule expression patterns, mainly because of the discordancy in the setup of the expression studies and the major changes in flow cytometry technology over the last 30 years.12 As a result, there has been overinterpretation in summarizing tables, which can be misleading. Thus, there is an urgent need to construct a higher resolution and accurate map of the expression profiles of the CD molecules to visualize the surface of leukocyte landscape. Moreover, an important part of the bibliography is incorrect and often misleading.

To correct current misinterpretation and to overcome gaps in knowledge, the HCDM has initiated the CD Maps project, a multi-institute research program to generate a high-resolution map of the cell surface of human immune cells using standardized multicolor flowcytometry protocols. Here, we present the results of the first phase of the CD Maps study which includes the expression signature of CD1-CD100 on 47 cell populations and subsets, 41 of which were non-overlapping. The data have been acquired across 4 expert flowcytometry laboratories to ensure reproducibility, and have been built into an online web resource with free user access. Expression profiling of CD markers across immune cell subsets revealed dynamic changes in expression levels, and hints at further immune cell diversity for markers that were expressed on a fraction of defined populations.

Results

I. Generation of a web resource for expression profiling of CD1-CD100 on major immune cell lineages and their subsets

To investigate the expression levels on major leukocytes subsets of the first surface molecules that had been defined in the 1980s and early 1990s with CD markers 1-100,8, 13, 14 we developed a multi-color immune phenotyping panel consisting of 4 tubes. Seven detection channels were utilized for backbone markers to define cell subsets (Suppl Table 3), and one channel was reserved for a PE-labeled drop-in monoclonal antibody directed against one of the CD1-CD100 antigens (Suppl Table 4). The backbone markers in tubes A and B were directed against innate and adaptive immune cell subsets from blood, respectively, in tube C against B-cell subsets from tonsil tissue and in tube D against T-cell progenitors in thymus. Through stepwise gating strategies for all 4 tubes (Suppl Figures 1-4), a total of 47 cell types and subsets therein were defined (Suppl Table 5).

Following protocol optimization and standardization, 12 biological repeats for blood, tonsil and thymus tissue were subjected to expression analysis of 110 unique targets of CD1-CD100 across four laboratories. Post curation (details in Methods), expression analysis was performed on 9 biological repeats for tube A, 11 for tube B, 7 for tube C and 5 for tube D. Multiple descriptors of CD marker expression were defined for each gated cell subsets and exported (Supplementary Figure 5A). Most notable are the median fluorescence intensity, which was converted to antibody binding capacity (ABC) using the QuantiBRITE bead measurements, and the percentage of positive cells using the FMO control value as cut-off.

The resulting dataset consisted of over a million datapoints of derived statistics and annotation information that together form a quantitative insight into the cell surfaceome of the human immune system. To make the data accessible as a major resource for detailed studies by us and the scientific community, we constructed an interactive web-based application (Figure 1). The resource contains multiple features to visualize the complete dataset (e.g. Principal Component Analysis; PCA), and to examine specific cell lineages and/or subsets (e.g. significantly differentially expressed markers or patterns of expression during cell maturation).

Figure 1. Overview of the CD Maps web resource available at hcdm.org. A) The user interface allows for interactive data interrogations. The resource uses several scenarios for data exploration and allows for data download, bookmarking of analysis state and image export. Analysis examples: B) Paired analysis of CD marker expression on cell subsets, e.g. in a differentiation setting; C) Visualization of expression of multiple CD markers including a measure of variation for a single subset; D) Analysis and visualization of statistically differentially expressed CD markers between two subsets or two groups of cell subsets.

II. Overview of expression of CD1-CD100 on all cell subsets in the dataset

For a complete overview of the total data set, the combined information of CD marker expression levels and percentages of positive cells were depicted as a “drop plot” (Figure 2), in which colors represent the ABC and the dot sizes the percentage of positivity. The CD markers displayed a wide range of expression patterns. For example, CD44, CD45, CD46 and CD47 were highly expressed on nearly all cells within the majority of defined subsets, whereas CD49a, CD49b and CD49c were typically expressed at low levels. Importantly, all markers showed positivity for at least one subset, and the expression patterns of molecules such as CD3, CD4, CD8, CD14, CD19 and CD20 were in agreement with their designation as well-defined lineage markers (example in Suppl Figure 5B).

Figure 2. Expression map of CD1-CD100 on all 42 non-overlapping cell subsets. CD markers are numerically ordered vertically with the FMO on the bottom row. The cell subsets are grouped (innate cells; thymocytes; T cells; B cells) and sorted within lineage on their maturity. The median expression level is visualized by color, and the median percentage of positive cells by the size of the dot. For cell type abbreviations, see Suppl Table 5.

III. CD marker expression on individual cell subsets and its variation across cell subsets

The nature of expression of the 110 cell surface markers was further addressed through examination of the heterogeneity between subsets and between donors. First, we assessed the relative intensity of expression of all CD markers in all defined cell subsets (Suppl Fig 6). The most highly expressed markers (e.g. CD45 on naive CD4 T cells; Figure 3A) reached 10⁵ ABC units, whereas gradually lower expression levels were found for other markers (e.g. CD3 and CD27 at 10⁴, and CD31 and CD49f at 10³). CD markers expressed at about 102 ABC units were down to background (FMO) levels. To study variation of expression across different subsets, box whisker plots were generated per CD marker depicting the median expression level and variation of all 42 non-overlapping cell subsets (Figure 3B). Importantly, ubiquitously expressed molecules on immune cells such as CD44, CD45, CD46, CD47, CD50, CD98 and CD99 have a low coefficient of variation (CV) across the studied subsets, as do some molecules with overall low expression levels (e.g. CD49c). In contrast, as expected, markers with lineage- and/or subset- specific expression patterns show a greater degree of heterogeneity in expression over the examined subsets (e.g. CD19, CD24, CD35).

Subsequently, the CV was calculated per cell subset for each marker. These CVs were subsequently displayed as box whisker plots combining the CVs per subset for each marker to visualize inter-population variation (Figure 3C). In general, the highly expressed markers were found to have relatively low inter-donor variability while the CVs were higher for CD markers that were expressed at low levels (Figure 3C). Indeed, some of the markers with small boxes in Figure 3A (CD44, CD45, CD46, CD47, CD98, CD99) were highly expressed and showed a relatively low CV. Still, some CD markers had a higher variability of expression in all cell subsets (CD15, CD36, CD66b), and some CD markers with higher ABC had also relatively high CVs (CD43, CD48).

The amount of surface protein (here expressed as ABC) is perhaps the most used measure of protein expression in a cell subset, and corresponds most closely to measures of expression in other forms of analysis with bulk cells. However, flow cytometry being a single-cell technique has the advantage of distinguishing individual cells that do or do not express a marker. This can be shown as percentage of positivity, and this has been defined relative to FMO for all measured CD markers in each cell subset (Suppl Fig 6). Most of the distributions of frequency of positive cells followed clear sigmoidal curve (Figure 3D and Suppl Fig 6), allowing the identification of markers that were negative on all, positive on all, or positive on a fraction of the cells within the subset. Frequency of positive cells was tightly associated with the fluorescent (shown by coloring), although for some markers that were positive on a fraction of cells, the expression levels was relatively low (e.g. CD9 in naive B cells) or relatively high (CD48; Figure 3D).

Figure 3. Expression levels and heterogeneity of expression of cell surface markers across cell types. A) Median fluorescence (in antibody binding capacity; ABC) for all markers on one cell subset (naïve CD4 T cells) ordered from low to high median expression. FMO is highlighted in orange and a horizonal orange line depicts the median FMO background. Similar plots for all cell subsets are provided in Suppl Figure 6. B) Fluorescence (in ABC) across all cell subsets per CD markers with box whisker plots Median, IQR and range pf s ABC of all cell subsets) The CD markers are ordered from low to high median expression (black horizontal lines). C) Coefficients of variation (CV) of expression across all cell subsets per CD marker. The CD markers are ordered from low to high median CV (black horizontal lines), and boxes represent interquartile ranges with the color representing the median expression level ABC). Values below one not shown. D) Frequency of positive cells for all markers on one cell subset (naive B cells) ordered from low to high frequency. Similar plots for all cell subsets are provided in Suppl Figure 7. CD markers in red font are discussed in the main text.

IV. Clustering cell subsets and CD markers

To interrogate and visualize common expression patterns of markers and how these related on the defined cell subsets, we performed unsupervised hierarchical clustering analysis (HCA; Figure 4). The analysis revealed three main cell clusters: T-cells, B-cells and myeloid cells. Within both B- and T-cells, the blood and tissue subsets were grouped into two separate subclusters.

Regarding CD marker patterns, CD19, CD20, CD21, CD22, CD72 and CD74 clustered together with predominant expression among B-cell subsets, whereas CD11b, CD11c, CD13, CD14, CD16, CD33 and CD88 were found to be expressed in the myeloid cell cluster (Supplementary Figure 8). The thymocyte cluster contained CD9, CD10, CD1a, CD1b, CD1d, CD71, CD69, CD90 and CD34, which are known markers for progenitor cells and for cell activation. A cluster of CD markers expressed on all subsets and at all stages included CD45, CD44, CD99, CD47 and CD50. Lastly, a T cell cluster was apparent, containing CD2, CD3, CD4, CD5, CD6, CD7, CD8, CD26, CD28, CD49e, CD49f, CD62L, CD84, CD95 and CD96. In addition to these dominant clusters, the heatmap also clearly visualizes expression of CD markers outside of the dominant cluster, such as CD24 expression on neutrophils and eosinophils, and CD21 expression on immature thymocytes (Figure 4).

Figure 4. Hierarchical clustering analysis of CD marker expression on leukocyte subsets.** Unsupervised clustering was performed on all CD markers (n= 113) and all cell subsets (n = 47) based on log10 transformed Median ABC].Cell subsets are color coded based on their lineage and their tissue of origin. The lineage subsets are: 10 groups: T4 - naïve CD4 T cells, Effector Memory CD4 T cells, Central Memory CD4 T cells, TEMRA CD4 T cells, CD4 T cells, CD4+ immature single positive, CD4+ single positive CD1a+, CD4+ single positive; T8 - naïve CD8 T cells, Effector Memory CD8 T cells, Central Memory CD8 T cells , TEMRA CD8 T cells, CD45RAdim CD27+ CD8 T cells , CD8 T cells, CD8+ single positive CD1a+, CD8+ single positive; B - CD27- IgM- IgD- B cells, naïve B cells, Natural Effector B cells, IgM only B cells, Switched Memory B cells, plasma cells , B cells, Naive B-cells Tonsil, Centrocytes, Centroblasts, Unswitched Memory B-cells, Switched Memory B-cells, Plasma cells, CD138negPlasma cells, CD138posPlasma cells; T - gamma delta T cells, T cells, Double positive CD3-, Double positive CD3+; Lymphocytes - Lymphocytes; Thymocytes - CD34+ double negative, CD34+CD1a+ double negative; Granulocytes - Eosinophils, Neutrophils, Basophils; Monocytes - non-classical monocytes, classical monocytes, intermediate monocytes; Dendritic - plasmacytoid DC, myeloid DC; NK - NK cells; . Hierarchical tree was algorithmically cut into clusters (k = 5). See Supplementary Figure 8 for larger version of the same figure, including CD marker and cell subsets labeling.

V. Granulocyte, monocyte and dendritic cell analysis

Three monocyte subsets can be typically defined based on differential expression of CD14 and CD16 (Suppl Figure 1) and these subsets have been shown to be associated with distinct diseases.15, 16 Of the 111 CD markers tested, 31 were significantly different in ABC (p<0.01) between any 2 of the three subsets (Figure 5A). Remarkably, multiple integrins (CD11b, CD49e) and other adhesion molecules (CD33, CD62P), as well as antigen-presentation molecule CD1d were specifically downregulated on non-classical monocytes as compared to the classical and intermediate subsets.

By definition, CD16 (FcγRIII) was upregulated on intermediate and non-classical monocytes. In contrast, CD64 (FcγRI) was specifically upregulated on non-classical monocytes, whereas all subsets expressed relatively similar levels of CD32 (FcγRIIa and FcγRIIb). The CD35 antigen (complement receptor 1) was specifically downregulated on non-classical monocytes. Within the family of tetraspanins, CD63 expression was specifically high on classical monocytes, and CD9 and CD82 expression levels were significantly reduced on non-classical monocytes, whereas no differences were seen for CD37, CD53 and CD81.

Similar to the monocyte subsets, we performed a detailed phenotypic comparison between the major two DC subsets in blood: myeloid (m)DC and plasmacytoid (p)DC. pDC were defined on the basis of co-expression of HLA-DR and CD123 (Supplemental Figure 1 and Supplemental Table 5). Due to the limitations in markers we could use in the backbone, we defined one mDC population on the basis of HLA-DR+CD11c+CD14-CD16-, which includes both the CD1c+ cDC1 and the CD141+ cDC2 subsets.17 Forty of the 111 CD molecules differed significantly in expression level between mDC and pDC (p<0.01), and of these 19 with a p-value <0.001 (Figure 5B). Most of the differences were the result of higher expression of markers on pDCs. Markers with low expression included molecules typically found on lymphocytes (CD3, CD10 and CD19), and this probably does not represent actual expression. In addition, pDC expressed higher levels of multiple integrins (CD29, CD49a, c, d) and adhesin molecule CD54 (ICAM-1), as well as the previously reported immunoregulatory receptor CD5 and tolerogenic receptors CD85d, j and k,17 whereas the death receptor CD95 was significantly reduced on pDC.18 Expression levels of the previously reported CD11b, CD11c and CD13 were reduced, but not with a significance of p<0.01.18

Between neutrophils and eosinophils, 20 CD molecules were significantly different (p<0.01) and all were lower on the latter subset (Figure 5C). These included the well-described CD10, CD15 and CD16, as well as integrins CD11b, CD11c, CD18, integrin ligand CD50, complement receptors CD35, CD88 and CD93, and the IgA receptor CD89. About half of the significantly different markers between basophils and eosinophils were around borderline expression (10³) (Figure 5D). Of the rest, 11 were significantly higher in basophils and included the tetraspanins CD9, CD53 and CD82, the FcγRII (CD32), multiple cell adhesion molecules (CD38, CD44, CD54, CD62L, complement decay factor CD55 and SLAM family member CD84. Conversely, eosinophils showed significantly higher expression of CD15, glycoproteins CD22 and CD24, ectoenzyme CD39, TNF receptor CD40, and adhesion molecules CD49f and CD66c.

Figure 5. Correlation of CD1-CD100 expression levels between related innate leukocytes. Pairwise analysis of expression levels (ABC) between A) classical, intermediate and non-classical monocytes. B) myeloid DC and plasmacytoid DC. C) neutrophils and eosinophils. D) eosinophils vs basophils. CD41 and CD42b were excluded from the plots. Both markers were not expressed on any innate cell type (ABC < 2x10²)

VI. T-cell maturation

Within the CD3+ cells, the three main lineages (TCRγδ+, CD4+ and CD8+) were distinguished (Suppl Figure 2). Pairwise analysis of parallel maturation stages between the CD4 and CD8 lineages for markers with significance of >0.01 and change of at least ten-fold (Suppl Figure 9A), revealed consistently higher CD59 expression on CD4 T cells (all stages, except for TemRA; CD45RA+CD27-).19 Conversely, “senescence” marker CD57 and tetraspanin CD63 were both higher on CD8 T cells in the central memory (Tcm) stage.

In addition, multiple CD markers were differentially expressed between stages of T cell maturation. Naive CD8 T cells (CD45RA+CD27+) were nearly all positive for the CD45RA isoform, CD31 (PECAM-1) and costimulatory molecules CD27 and CD28 (Figure 6).20 While the integrins (CD18 and CD11c) were expressed on all T cell subsets, their degree of expression increased with maturation (Suppl Figure 9B). The relative amount of surface CD45RA was about twice as high as CD3, which in turn was nearly twice that of CD27 (Suppl Figure 9B). The expression levels of regulators of activation were tightly controlled as evidenced by low CV within each subset (CD3, CD45RA, CD28, CD27, CD31; Suppl Figure 9C). By definition, CD8 Tcm and Tem cells lacked surface CD45RA, and all expressed the CD45RO isoform, generated by alternative splicing. CD95 was expressed on all memory subsets, whereas CD57 was gradually upregulated from Tcm to Tem subsets, which in turn gradually lost CD31. Furthermore, CD28 positivity decreased from Tcm to Tem. Finally, in TemRA, CD45RA was re-expressed with a concomitant loss of CD45RO, and a massive increase in CD57 positivity.21 In our gating strategy, a separate population (CD45RAdim CD27+) was defined in-between CD8 Tnaive and Tcm). In contrast to Tnaive, CD45RAdim cells expressed CD95 and CD45RO, lower levels of CD27, and lacked CD38 expression. On the other hand, the CD45RAdim cells were distinct from TemRA, as they did express CD28, and not CD85j. The phenotype of CD45RAdim cells therefore seem to fit with that of antigen experienced T memory stem cell subset as has been suggested before.22, 23 Similar to CD8 T cells, transition of naive CD4 T cells to memory was accompanied by a decrease in expression of CD31, CD38, and CD45RA, while CD45RO, CD95 (Fas-receptor) and CD84 (SLAMF5) were upregulated (data not shown).24, 25

Figure 6. Expression patterns of selected CD markers on naïve and memory CD8 T cells. Selected CD molecules are shown for naive, central memory (CM), effector memory (EM) and CD45RA+ effector memory (TEMRA) CD8 T cells. In contrast to co-stimulatory molecules (CD27, CD28) and FasR (CD95), the integrins (CD11a, CD11b, CD11c, CD18) are expressed on all stages of CD8 T cells, however, their expression levels differ (see Suppl Figure 8).

VII. Thymocyte differentiation

In addition to mature T-cells in blood, a separate tube was designed to map CD marker expression on T-cell progenitors in thymus (Suppl Figure 3).26, 27 This allowed complete mapping of CD marker expression from early T-cell progenitors until effector memory cells (Figure 7) with the maturation tool in the web resource (Figure 1). This revealed that CD10 is gradually lost as cells differentiate from the double negative (DN) to the double positive (DP) stage, and is completely absent on single positive (SP) CD4+ T cells. Distinct expression patterns were seen for costimulatory molecules CD27 and CD28. Early progenitors already express medium levels of CD28 which increase to a maximum after the DP stage, whereas CD27 is low or absent until the DP stage, reaching its maximum just before thymocytes exit to periphery at CD1a- SP CD4 stage. All thymocytes express CD31, which is gradually lost on peripheral naive CD4 T cells. Another unique pattern was seen for CD11a, which is expressed on all stages of T-cell differentiation, but does vary in expression level, and the highest amount is seen on effector memory T cells.

Figure 7. Median expression levels of critical CD markers during thymocyte differentiation into peripheral CD4 T cells. Expression levels (in ABC) of selected CD markers (n = 6) on thymocyte (left) and blood (right) CD4 T-cell subsets (n = 11).

Pablo - cd49_on_t4 - NOT including??

VIII. Antigen-dependent B-cell maturation in tonsils

Hierarchical clustering analysis (HCA; Figure 4) of tonsil B cell subsets distinguished three major functional compartments based on CD1-CD100 surface expression: i) B-lymphocytes, including naive and unswitched and switched memory B-cells; ii) germinal center (GC) cells, including centrocytes (CC) and centroblasts (CB); and iii) plasma cells (PC), including CD138- and CD138+ PC. Over 30 CD markers showed statistically significant differences (p<0.01) between any two of these three major subsets, and a p-value <0.001 was observed for >20 CD markers. In contrast, these subgroups were rather homogeneous based on expression of the CD1-100 markers, with expression of only <5 CD markers significantly different (p<0.01) between any two subpopulations within the same compartment.

The largest set of CD markers with differential expression was observed between the PC and B-lymphocyte compartments (p<0.01, 37 CD markers; p<0.001, 27 CD markers). Those differences with a p<0.001 included upregulation of a large set of adhesion and signaling molecules (CD18, CD31, CD47, CD54, CD97, CD98, CD99) together with a different profile of expression of activation/signaling markers (CD9, CD24, CD27, CD28, CD37, CD39, CD43, CD44, CD45RA, CD52, CD53, CD63, CD79b, CD81) and complement receptor proteins (CD35, CD46, CD55, CD59).28, 29 Using the maturation tool from the CD Maps web resource (Figure 1), we observed that some of these phenotypic features of an antibody-secreting cell signature were already acquired in the GC compartment (Figure 8). These phenotypic changes included upregulation of molecules involved in adhesion/migration (CD54, CD98) and enzymatic activity (CD10), and different profile of cell activation/signaling (CD24, CD44) and complement receptor (CD35, CD59) proteins, as compared to B-lymphocytes. However, there were 20 CD markers that showed highly significant differences in PC vs. GC (p<0.001), including increased levels of CD markers that were already upregulated during the GC phase (CD54, CD59, CD98), reversion of phenotypic changes observed during GC reaction (CD20, CD31, CD32, CD40, CD47, CD55), and upregulation of new markers that were neither detected in B-lymphocytes nor in GC cells (CD9, CD28, CD43, CD63, CD97) or showed a lower reactivity (CD46, CD99), together with decreased levels of markers commonly expressed by both B-lymphocytes and GC (CD37, CD45RA, CD52).

**Figure 8**. B-cell maturation profiling in secondary lymphoid tissues. <strong>Fluorescence intensity [log10(Median ABC)]</strong> values of selected populations (n = <strong>8</strong>) and CD markers (n = <strong>33</strong>). Selected populations are: <strong>BnaiveTo</strong> - Naive B-cells Tonsil, <strong>CC</strong> - Centrocytes, <strong>CB</strong> - Centroblasts, <strong>UnswtMem</strong> - Unswitched Memory B-cells, <strong>SwtMem</strong> - Switched Memory B-cells, <strong>PC</strong> - Plasma cells, <strong>CD138negPC</strong> - CD138negPlasma cells, <strong>CD138posPC</strong> - CD138posPlasma cells. .

Figure 8. B-cell maturation profiling in secondary lymphoid tissues. Fluorescence intensity [log10(Median ABC)] values of selected populations (n = 8) and CD markers (n = 33). Selected populations are: BnaiveTo - Naive B-cells Tonsil, CC - Centrocytes, CB - Centroblasts, UnswtMem - Unswitched Memory B-cells, SwtMem - Switched Memory B-cells, PC - Plasma cells, CD138negPC - CD138negPlasma cells, CD138posPC - CD138posPlasma cells. .

Discussion

We here examined 111 CD markers on 47 leukocyte subsets using multicolor flow cytometry with the marker of interest in the PE channel. The resulting expression profile is the largest quantitative dataset of surface protein expression levels on human immune cells.

The examined surface proteins represent those that were defined clustered mAbs in HLDA workshops I-V that were held in the 1980s and early 1990s.30 At that time, the protein expression patterns were defined in great detail. However, with advances in technologies and new insights into immune cell function and subsets, we deemed the expression data incomplete, not fully accurate and lacking quantitative information. Indeed, when we compare our data with a CD chart of a major antibody vendor, we could find over 50 discrepancies and 25 missing values. In part, those discrepancies stem from a positivity and negativity definition on a broadly defined cell lineage: any positivity found at any stage and/or activation status is regarded as positivity on such chart. Our detailed analysis on well-defined subsets potentially clarifies this.

To ensure robustness and reproducibility of our data, we standardized our experimental procedures and flow cytometer set-up according to the protocols that were established for clinical use by the EuroFlow consortium (www.EuroFlow.org).31 Subsequently, the measurements were independently performed in 3-4 laboratories, each acquiring data from 3-4 donors with in parallel acquisition of PE signal calibration particles. Indeed, gating of subsets using the backbone markers could be reliably performed on the data, irrespective of their origin. Thus, we have obtained a realistic data set, which can be prepared reproducibly in any laboratory following the same operating procedure. Although we do not claim we have covered population variation with only 12 donors per CD marker, we believe we could dismiss rare abnormal variants by displaying medians of the 12 donors. Accurate quantification of CD marker expression levels is not only important for biological function, but can be utilized as well for a proper design of flow cytometry experiments, where also intensity of expression is essential information for a successful multicolor panel.32

The unique feature of our data resource is the detailed information in expression levels and changes between diverse immune cell subsets, thus allowing interpretation of quantitative changes during thymocyte development, B-cell maturation in the tonsil, and between blood cell subsets that might share expression of the same marker but with different quantities.

In the present study we quantitatively mapped the expression of 111 surface-expressed proteins on 42 non-overlapping leukocyte subsets from 3 human tissues. With this being a large-scale analysis and a systems approach, a few concessions had to be made in experimental design. Accuracy of exact quantification of CD marker expression is potentially skewed by the antibody binding occurring through either one or two Fab domains.33 Thus, the ABC unit that was used to quantitatively depict expression has an error margin of a factor 2 for the number of expressed molecules. Still, our measurements for CD4 yielded a median of 38 650 ABC (clone MEM-241) for naive CD4 T cells, which was very similar to the previously published value of 42 000 ABC (clone SK3).34 Finally, for this large-scale approach, we only could use one antibody reagent for each given CD marker. Selection criteria for these reagents included: 1) being a clone that was approved in the HLDA workshops; and 2) good reactivity based on our in-house experience. We initially pilot tested two clones for CD4 (MEM-241 and RPA-T4) and CD8 (MEM-31 and HIT8a) and observed differences of up to 20% in expression levels, but it is not possible to draw general conclusions about individual clones’ performance. As the clones we tested have been through the HLDA workshops, these will serve as a benchmark that can either be matched or can be surpassed by alternative reagents. The resource we have built will be appended in the future with new clones, new reagents, new CD markers and new cell subsets. In fact, in the upcoming 11th HLDA workshop this methodological framework will be used to measure and cluster antibody reactivities across subsets to help assign new CD nomenclature. This approach follows the strategy proposed by the International Working Group for Antibody Validation (IWGAV) that has documented expression patterns for 3706 antibodies in immunoprecipitates,35, 36 and including the newly generated reactivity patterns of HLDA 11 in the CD Maps resource, will enhance the role of this reseource as a benchmark for the research community.

Regarding the biology, we did not exhaustively define all functionally-defined immune cell subsets. With 4 tubes using 7 channels for the backbone each, we were able to define 41 unique, non-overlapping subsets. Several cell types were not included, such as helper T cell subsets, regulatory T cells, NK-T cells, and mucosa-associated invariant T cells (MAIT). With an extended panel using more fluorescent markers, such limitation can be overcome in future studies. However, rare cell populations such as innate lymphoid cells will remain a challenge as this would require the acquisition of more than a million events per staining.

In conclusion, we have demonstrated the possibility to systematically quantify the expression of surface-expressed proteins on the multitude of immune cells using standardized multi-color flowcytometry. There is a need for this standardized systems approach to avoid confusion from separate observations in individual laboratories, to correct potential mistakes in the literature, and to predict potential off-target effects of antibody-based therapies. The CD Maps web resource enables each user to explore the data and it has the capacity to function as a platform for surface molecule expression data that can be updated with newer CD markers and more leukocyte subsets. With the ongoing activities of the HLDA workshops, the CD Maps project can provide the means to get towards a full picture of the surfaceome of human immune cells.

Acknowledgements

The authors are grateful to Robert Balderas (BD Biosciences), Kelly Lundsten (BioLegend) and Miloslav Suchanek (ExBio) for providing reagents, and to the Czech National Grid Infrastructure MetaCentrum under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042) for access to computing and storage facilities. The work was financially supported by the International Union of Immunological Societies (IUIS), projects 15-26588A (TK) and NV18-08-00385 (KF) of the Czech Republic Ministry of Health, Australian National Health and Medical Research Council (NHMRC) Fellowship GNT1117687 (MCvZ), SAF2015- 69829 from Ministerio Ciencia e Innovación (Spain to PE). We should also acknowledge the support.

ORCID:
Tomas Kalina https://orcid.org/0000-0003-4475-2872
Karel Fišer https://orcid.org/0000-0002-7265-3268
Martin Pérez-Andrés https://orcid.org/0000-0003-4599-0776
Daniela Kužílková
Marta Cuenca https://orcid.org/0000-0003-2261-7792
Sophinus J.W. Bartol https://orcid.org/0000-0001-7208-4096
Elena Blanco https://orcid.org/0000-0002-8150-5646
Pablo Engel https://orcid.org/0000-0001-8410-252X
Menno C. van Zelm https://orcid.org/0000-0003-4161-1919

Karel

Methods

Cell isolation, dextran density

Isolation of leukocytes from buffy coat

Leukocyte isolation method was chosen to minimize platelet to leukocyte satellistism. Buffy coat was 6x diluted in PBS with 2mM EDTA and mixed with 4% dextran solution (4% dextran Sigma-Aldrich, Saint Louis, MO, USA in 0.9% NaCl) to final concentration of dextran of 2%. Mixture was left for 30min to erythrocytes sediment. Supernatant was carefully collected and centrifuged (130g, 15min, RT). Supernatant was carefully removed. Residual erythrocytes in the pellet were lysed using hypotonic lysis (pellet was mixed with 0.2% NaCl for 55sec and subsequently supplemented with 1.2% NaCl to achieve isotonic concentration of NaCl). PBS was added to reach a final volume of 50mL. Suspension was centrifuged (130g, 15min, RT) and the lysis step was repeated. Suspension of leukocytes was washed and diluted with PBS/BSA (PBS with 0.5% BSA and 0.09% NaN3) to a final concentration of 4 x 10⁷ / mL.

Isolation of thymocytes

Thymocytes were isolated via gentle shaking from manually dissociated thymus tissue, washed with RPMI 1640 with 25 mM HEPES, L-glutamine, 100 U/ml penicillin and 100 mg/ml streptomycin (Lonza, Basel, Switzerland) supplemented with 10% (v/v) heat-inactivated fetal bovine serum (FBS, Thermo Fisher Scientific, Rockford, IL) and stored in FBS with 10% DMSO in liquid nitrogen for further analysis.
Thawing , some fresh (Barcelona?) or all frozen? Whenever frozen and thawed thymoces were used, we observed a marked decreased of proportion of double positive stage thymocytes, but their phenotype was similar to the fresh thymocytes.

Staining and cocktails

Sample staining

The procedure was performed in V-bottom 96-well plates in a total suspension volume of 50ul. First, PE labeled monoclonal antibodies (mAb) were added to each well as per manufacturer recommendation titer, adjusted to 50ul staining volume (10µL, 5µL or 2.5µL of mAb, topped up to 10ul by PBS/BSA). Volume of 40µL of cell suspension (1.6 x 10⁶ cells for buffy coat, 5 x 10⁵ cells for thymus) were added to each well. Mixture was carefully mixed and incubated for 30min (RT, in the dark) to allow for preferential binding of tested mAb over backbone mAbs. Next, 25µL of backbone mAb reagent mix (Table tab_panels) was added to each well, carefully mixed and incubated again for 30min (RT, in the dark). The cells were washed three times (8min, 500g, RT) in PBS/BSA, resuspended in 200µL of PBS with 2mM EDTA and analyzed by flow cytometry with HTS loader.

**Table tab_panels.** Backbone mAb reagent mix. …
Tube	Reagent/Target	Fluorochrome channel	Clone/Catalogue Nr.	Manufacturer
DC_mo_Inn	CD3	BV421	SK7	BD
	CD19	BV421	HIB19	BD
	CD34	BV421	581	BD
	Fixable Violet		L34955	Mol Probes
	CD16	BV510	3G8	BioLegend
	CD56	FITC	B159	BD
	CD14	PerCP-Cy5.5	M5E2	BD
	CD123	PE-Cy7	6H6	BioLegend
	CD11c	APC	B-ly6	BD
	HLA-DR	APC-H7	L243	BD
B_T	CD27	BV421	O323	BioLegend
	CD45RA	BV510	HI100	BD
	CD4	FITC	MEM-241	Exbio
	IgD	FITC	IA6-2	Biolegend
	CD8	PerCP-Cy5.5	MEM-31	Exbio
	IgM	PerCP-Cy5.5	MHM-88	BioLegend
	CD19	PE-Cy7	LT-19	Exbio
	TCRgd	PE-Cy7	11F4	BD
	CD3	APC	UCHT-1	Exbio
	CD45	APC-Cy7	MEM-28	Exbio
B	CD27	BV421	O323	BioLegend
	IgM	BV510	MHM-88	BioLegend
	IgD	FITC	IA6-2	Biolegend
	CD3	PerCP-Cy5.5	SK7	BD
	CD19	PE-Cy7	SJ25C1	BD
	CD138	APC	MI15	BD
	CD38	APC-H7	HB7	BD
T	CD13	BV421	WM15	BD
	CD19	BV421	HIB19	BD
	CD33	BV421	WM53	BD
	CD16	BV421	3G8	BD
	CD56	BV421	NCAM16.2	BD
	DAPI		D-3571	Mol Probes
	CD4	BV510	OKT4	BioLegend
	CD44	FITC	G44-26	BD
	CD3	PerCP-Cy5.5	SK7	BD
	CD34	PE-Cy7	8G12	BD
	CD1a	APC	HI149	BD
	CD8	APC-H7	SK1	BD

Standardized acquisition, PE and mAb quantification

Flow cytometer instrument setup

Cytometer Setup and Tracking (CS&T) beads (BD Biosciences, San Jose, CA, USA) and 8-peak Rainbow bead calibration particles (Spherotech, Lake Forest, IL, USA) were used for PMT voltages and light scatter setup to achieve inter-laboratory standardization as developed by the EuroFlow consortium². Acquisition was performed in four centers: Barcelona, Prague, Rotterdam, Salamanca on BD LSR II, BD LSR Fortessa and BD FACS Canto instruments, all equipped with 405nm, 488nm and 633/647nm excitation lasers. A total of 12 donors were acquired for each panel (but see Table tab_source_per_panel !), 1 milion of events per tube/well was acquired.

**Table tab_source_per_panel.** Source of material by Ab panel. Source derived from MATERIAL. BC - Buffy Coat, TH - Thymus, TO - Tonsil.
source	1_DC_mo	2_B_T	4_B	5_thy
BC	9	11	0	0
TH	0	0	0	5
TO	0	0	7	0

PE Fluorescence Quantitation assay

Quantity of PE molecules for all PE-labeled antibodies was estimated using PE Fluorescence Quantitation Kit (BD Biosciences) with four known levels of PE. The pellet was resuspended in 500μL PBS/BSA and within next 3 hours analyzed by flow cytometry in parallel with each experiment. Using FlowJo or Infinicyt software’s “define calibration” function additional parameter displaying PE units by fitting the measured signal with PEcalibration curve was created and used for all statistical evaluations.

Correction factor for actual amount of PE molecule for mAb

To allow for a relative comparisons between CD molecules we have calculated a correction factor reflecting the amount of PE for each antibody. A volume of 25μL UltraComp eBeads™ Compensation Beads (Thermo Fischer Scientific) were diluted with 15μL of PBS/BSA, mixed with excess of tested PE-labeled antibody and incubated for 30min, RT, in the dark. Compensation Beads were washed twice in PBS/BSA (8min, 500g, RT), resuspended in 70μL of PBS with 2mM EDTA and analyzed by flow cytometry. All 116 mAbs were measured, and for each mAb a ratio of individual Median PE/(median of all medians) was calculated as a correction factor. A standard deviation of the correction factor was 0.3; a total of 26 mAbs (25%) of all mAbs yielded a correction factor above or below 1 standard deviation, thus for mAbs with correction factor less than 0.7 or above 1.3 the measurement was repeated to exclude any outliers. Average of all correction factor values (after exclusion of outliers) was used to recalculate the quantity of PE molecules to quantity of bound mAbs (Qb), all statistics use Qb values if not specified otherwise.

Suppl Table. CD1-CD100 marker details (do not use following (but check naming): Supplementary Table 100 CD markers)

Gating and export of values

Gating of each panel was performed centrally by one lab using FlowJo (version 9 or 10) or Infinicyt software. Gating is shown in Supplementary Figure gating_strategy. From each gated cell subset, a set of statistics was extracted: Median, Mean, Mode, CV, 10th, 25th, 75th, 90th percentile. Furthermore, percentage of positive events were gated using fluorescence minus one as a control. Minimum cell count for statistical evaluation was set to 100, subsets with lower cell counts were omitted.

Gating (Supplementary Figure gating_strategy) gave 47 cell subsets (1_inn: 9, 2_B_T: 21, 4_ton: 8, 5_thy: 9) (see Supplementary Table cell_subsets).

On all gated subsets 11 descriptive statistic were measured (see scheme ??? and tab_variables).

**Table Measured variables**. Variables measured/calculated in all panels (n = 11).
CODE_NAME	DISPLAY_NAME
CVQb	CV of ABC
MeanQb	Mean ABC
MedQb	Median ABC
ModeQb	Mode ABC
p10Qb	10th percentile ABC
p25Qb	25th percentile ABC
p75Qb	75th percentile ABC
p90Qb	90th percentile ABC
count	Count
PEpos	Frequency of positive cells
MedPE	Median PE

Figure Scheme …

Data analysis

Reproducibility and version control throughout the project was achieved using GIT versioning software (https://git-scm.com/) RStudio IDE (RStudio, Inc., Boston, MA, USA) and Bitbucket repository (Atlassian, Sydney, Australia). Deployment was facilitated via Docker virtualisation (https://www.docker.com/, Docker, Inc., San Francisco, CA, USA).

Data import and pre-processing

Descriptive statistics from software used for population gating were exported into delimited flat table text files. One table per tube. Tables included all descriptive statistics from all gated cell subsets per tube and additional information on material source, antibody characteristics, experiment details, etc. Each cell subset and each descriptive statistics carried three identificators: short machine friendly names, longer descriptive names and names backward compatible with gating software.

All subsequent work was carried out in R (R: A Language and Environment for Statistical Computing)³. All used R packages are listed and references provided in Supplementary Table supp_tab_pckgs_used.

Data were imported into the R environment using standard import functions, converting data to R objects. Each of the four data flat tables from the four tubes were processed separately. Subsequent versions of data tables were compared to their previous versions computationally and if errors and unintended deviations were found they were manually corrected. All non-positive values were converted to ones. After checks for duplicated data entries these were converted into matrix-like formats and previously calculated median correction factors were applied. Sample wise centrality measures (means and medians) were calculated and data were converted from wide to long format for easier subsequent computation. Dictionaries of cell subset and statistics related terms were built and combined from all sources. The processed and combined data were stored in binary format.

Distribution of Frequency of PE positive cells

Sigmoidal fit and separation of markers into positive, intermediate and negative groups on a per cell subset basis was perforemed using R package sicegar. Simple sigmoidal fit was performed by logistic function⁴

\[PE\left( cds \right) = f_{sig}(cds) = \frac{PE_{max}}{1 + exp(-a_1(cds-cds_{mid}))}\]

Where, PE(cds) is the percentage of PE positive cells, given as a function of sequence of CD markers cds. The CD markers are ordered based on rising median precentage of PE positive cells. There are three parameters to be fitted: PE_max - maximum percentage of PE positive cells, cds_mid - mid point as half of maximum, and a₁. The a₁ parameter is related to the slope of PE(cds) at cds = cds_mid via the formula

\[\frac{d}{dcds}PE\left( cds \right)|_{cds = cds_{mid}} = \frac{a_1 PE_{max}}{4}\]

Distribution of Median Fluorescence intensity

Modeling of a turning point in a sequence of rising Median fluorescence intensity per cell subset was done using Menger curvature adapted from Demtris et al. 2014 citep("10.13140/2.1.3111.5844"). Menger curvature for \(y = f(x)\) at \((x_i, y_i)\) is

\[DC(x_{i}) = \frac{\sqrt{A - B^2}}{\|pq\| \|qr\| \|rq\|}\]

where

\[ \begin{aligned} A &= 4\|pq\|^2 \|qr\|^2 \\ B &= \|pq\|^2 + \|qr\|^2 - \|rp\|^2 \\ \|pq\| &= \sqrt{(x_{i-1} - x_i)^2 + (y_{i-1} - y_i)^2} \\ \|qr\| &= \sqrt{(x_i - x_{i+1})^2 + (y_i - y_{i+1})^2} \\ \|rp\| &= \sqrt{(x_{i+1} - x_{i-1})^2 + (y_{i+1} - y_{i-1})^2} \\ \end{aligned} \]

And the convex turning point at section of the curve is:
\[D = max\{DC(x_i), i = 2, ..., n - 1\}\]

Hierarchical clustering Analysis

For Hierarchical Clustering Analysis (HCA) the pheatmap R package was used. Per cell subset Median Qb values were log10 transfored after minimum Median Qb values were rised above zero. Observations with missing values and FMO controls were removed and data were z-score scaled. For HCA the euclidean distance and Ward linkage (ward.D2⁵) were used.

Generation and Utilities of a Dynamic Web Resource

To share CD Maps data as an user friendly resource we wrote an application with web page front-end. The application is written entirely in R using an R package Shiny. Shiny allows background computations in R serving results to web based front-end and uses reactive programming paradigm. Reactive programming allows for dynamic user directed content generation and therefore interactive data exploration and analysis. For enhanced user interactivity some of R packages facilitating access to JavaScript libraries were used (e.g. d3heatmap, htmlwidgets). Resulting web page includes general CD Maps information as well as several angles from which to interrogate CD Maps data (www.hcdm.org; Figure 1).

References

1. Fagerberg, L., Jonasson, K., von Heijne, G., Uhl’en, M. & Berglund, L. Prediction of the human membrane proteome. PROTEOMICS 10, 1141–1149 (2010).

2. Kalina, T. et al. EuroFlow standardization of flow cytometer instrument settings and immunophenotyping protocols. Leukemia 26, 1986–2010 (2012).

3. R Core Team. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2019).

4. Caglar, M. U., Teufel, A. I. & Wilke, C. O. Sicegar: R package for sigmoidal and double-sigmoidal curve fitting. PeerJ 6, e4251 (2018).

5. Murtagh, F. & Legendre, P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? Journal of Classification 31, 274–295 (2014).