Introduction

Angiotensin-converting enzyme-2 (ACE2) receptor has been identified as the key adhesion molecule for the transmission of the SARS-CoV-2. However, ACE2 gene variation is not associated with COVID-19 susceptibility, severity, or outcomes. We hypothesize that genes interacting with ACE2 activity are enriched for molecular pathways relevant for COVID-19 susceptibility. Accordingly, we employed a top-down approach starting with genes that interact with ACE2, followed by tissue gene expression enrichment, drug-gene interaction and enrichment and variant prioritization using genetic variants within the ACE2 gene-gene connectome and protein-protein interaction networks. With this approach we identified several risk loci with support from the COVID-19 Host Genetics Initiative’s published genome-wide association studies (GWAS) and demonstrable effects of ACE2-gene network relevant for the vast symptoms observed following SARS-CoV-2 infection.

Study Overview

Study Overview

1. The ACE2 gene connectome

List of genes from network sources

2. Tissue enrichment

Tissue enrichment of ACE2-network genes

Tissue enrichment of ACE2-network genes

3. Gene-drug interactions and enrichment of functional processes from drugs

A. Gene-drug interaction from dgidb

B. Gene function enrichment (from Reactome db) of aforementioned identified drugs

Table for above graph

Pathway based similarity of drugs

Pathway based similarity of drugs

4. PheWAS of genes

Each data point presents trait associated with gene as mined from the GWAS Atlas, traits are grouped in domains (x-axis) and size of the data point represents the sample size (legend on right) of the study for which the association statistic was reported. The y-axis shows -log10(p-value) of the gene with the respective trait. The dotted line presents Bonferroni significance line (1e-5) correcting for the traits present in the GWASAtlas.

AAMP

## [1] "Number of data points: 245"
## 
## Not.Signi     Signi 
##       203        42

ACE

## [1] "Number of data points: 323"
## 
## Not.Signi     Signi 
##       300        23

ACE2

## [1] "Number of data points: 64"
## 
## Not.Signi 
##        64

ALDOB

## [1] "Number of data points: 240"
## 
## Not.Signi     Signi 
##       239         1

AGT

## [1] "Number of data points: 393"
## 
## Not.Signi     Signi 
##       380        13

APOA1

## [1] "Number of data points: 401"
## 
## Not.Signi     Signi 
##       296       105

APOBEC1

## [1] "Number of data points: 267"
## 
## Not.Signi     Signi 
##       265         2

CALM1

## [1] "Number of data points: 259"
## 
## Not.Signi 
##       259

CALM2

## [1] "Number of data points: 247"
## 
## Not.Signi     Signi 
##       237        10

CAT

## [1] "Number of data points: 228"
## 
## Not.Signi     Signi 
##       226         2

CLCA1

## [1] "Number of data points: 324"
## 
## Not.Signi     Signi 
##       322         2

DEFA5

## [1] "Number of data points: 223"
## 
## Not.Signi     Signi 
##       212        11

DEFA6

## [1] "Number of data points: 218"
## 
## Not.Signi 
##       218

DPEP1

## [1] "Number of data points: 466"
## 
## Not.Signi     Signi 
##       400        66

DPP4

## [1] "Number of data points: 477"
## 
## Not.Signi     Signi 
##       431        46

FABP1

## [1] "Number of data points: 260"
## 
## Not.Signi     Signi 
##       258         2

FABP2

## [1] "Number of data points: 422"
## 
## Not.Signi     Signi 
##       382        40

GHRL

## [1] "Number of data points: 314"
## 
## Not.Signi     Signi 
##       311         3

HRAS

## [1] "Number of data points: 226"
## 
## Not.Signi     Signi 
##       224         2

ISYNA1

## [1] "Number of data points: 425"
## 
## Not.Signi     Signi 
##       393        32

IYD

## [1] "Number of data points: 256"
## 
## Not.Signi 
##       256

KDM3A

## [1] "Number of data points: 342"
## 
## Not.Signi     Signi 
##       292        50

LACTB2

## [1] "Number of data points: 376"
## 
## Not.Signi     Signi 
##       357        19

LRRC19

## [1] "Number of data points: 269"
## 
## Not.Signi     Signi 
##       266         3

MEP1A

## [1] "Number of data points: 243"
## 
## Not.Signi 
##       243

MEP1B

## [1] "Number of data points: 237"
## 
## Not.Signi     Signi 
##       236         1

MME

## [1] "Number of data points: 296"
## 
## Not.Signi     Signi 
##       294         2

NTS

## [1] "Number of data points: 196"
## 
## Not.Signi     Signi 
##       195         1

PDE9A

## [1] "Number of data points: 366"
## 
## Not.Signi     Signi 
##       360         6

PLA2G12B

## [1] "Number of data points: 307"
## 
## Not.Signi     Signi 
##       302         5

POU2F1

## [1] "Number of data points: 314"
## 
## Not.Signi     Signi 
##       307         7

PRCP

## [1] "Number of data points: 313"
## 
## Not.Signi     Signi 
##       308         5

RASEF

## [1] "Number of data points: 257"
## 
## Not.Signi     Signi 
##       254         3

SI

## [1] "Number of data points: 217"
## 
## Not.Signi     Signi 
##       216         1

SLC10A2

## [1] "Number of data points: 275"
## 
## Not.Signi     Signi 
##       273         2

SLC12A6

## [1] "Number of data points: 309"
## 
## Not.Signi     Signi 
##       308         1

SLC37A1

## [1] "Number of data points: 347"
## 
## Not.Signi     Signi 
##       346         1

SLC3A1

## [1] "Number of data points: 384"
## 
## Not.Signi     Signi 
##       360        24

SLC44A4

## [1] "Number of data points: 957"
## 
## Not.Signi     Signi 
##       682       275

SLC6A19

## [1] "Number of data points: 167"
## 
## Not.Signi 
##       167

TINAG

## [1] "Number of data points: 335"
## 
## Not.Signi     Signi 
##       330         5

TMEM27

## [1] "Number of data points: 80"
## 
## Not.Signi     Signi 
##        79         1

TMPRSS2

## [1] "Number of data points: 227"
## 
## Not.Signi 
##       227

TRPM4

## [1] "Number of data points: 252"
## 
## Not.Signi     Signi 
##       249         3

XPNPEP2

## [1] "Number of data points: 83"
## 
## Not.Signi 
##        83

RORC

## [1] "Number of data points: 527"
## 
## Not.Signi     Signi 
##       503        24

RORA

## [1] "Number of data points: 632"
## 
## Not.Signi     Signi 
##       529       103

RORB

## [1] "Number of data points: 428"
## 
## Not.Signi     Signi 
##       404        24

NR5A2

## [1] "Number of data points: 444"
## 
## Not.Signi     Signi 
##       425        19

NR5A1

## [1] "Number of data points: 255"
## 
## Not.Signi     Signi 
##       241        14

LRRC15

## [1] "Number of data points: 181"
## 
## Not.Signi     Signi 
##       180         1

Gene Prioritization

Distribution of genes for significant traits grouped by domains

Count of significant traits

Enrichment test for domains

The distribution of genes in immunological domain

The distribution of genes in respiratory domain

The distribution of genes in environmental domain

The distribution of genes in skeletal domain

The distribution of genes in Dermatological domain

The distribution of genes in metabolic domain

Common Genes across six enriched domains

Common genes across all six domains - “SLC44A4” “APOA1” “RORA”

5. Characterization of SNPs

SNPs that are within +/- 10kb of the gene positions and population frequency (db153; hg19)

##   Variation.ID        dbSNP Chromosome  Position REF.Allele ALT.Allele..IUPAC.
## 1  rs760194105  rs760194105          1 151768549       ATTC                  -
## 2 rs1265893702 rs1265893702          1 151768556          C                  T
## 3 rs1195052699 rs1195052699          1 151768558          A                  G
## 4 rs1488141823 rs1488141823          1 151768564          G                  Y
## 5 rs1202366215 rs1202366215          1 151768566          T                  G
## 6  rs545985998  rs545985998          1 151768567          C                  G
##   Minor.Allele Minor.Allele.Global.Frequency     Contig Contig.Position  Band
## 1         None                          None GL000016.1         3257191 q21.3
## 2         None                          None GL000016.1         3257198 q21.3
## 3         None                          None GL000016.1         3257200 q21.3
## 4         None                          None GL000016.1         3257206 q21.3
## 5         None                          None GL000016.1         3257208 q21.3
## 6            G                      0.000200 GL000016.1         3257209 q21.3

Download full length results here: Gene-coordinates.txt https://yaleedu-my.sharepoint.com/:f:/g/personal/gita_pathak_yale_edu/Emq-SyQupL9Nj8SkSYPuM04BlBebU0lIbwDU7lGR84bCzg?e=frnpjq

• Column Headers: * o Variation ID: <dbsnp rs#> * o dbSNP: link to dbSNP, if known * o Chromosome: Variant mapped chromosome location * o Position: Variant start position on chromosome * o REF Allele: Reference allele * o ALT Allele (IUPAC): Observed allele * o Minor Allele: Minor allele observed in global population, if known * o Minor Allele Frequency: Minor allele frequency observed in global population, if known * o Contig: Variant mapped contig location * o contigPosition: Variant start position on contig * o Band: SNP cytogenetic location

Nearest gene annotation

##   Variation.ID Chromosome  Position Overlapped.Gene Type Annotation
## 1         <NA>       <NA>        NA            <NA> <NA>       <NA>
## 2  rs760194105       chr1 151768549            None None       None
## 3 rs1265893702       chr1 151768556            None None       None
## 4 rs1195052699       chr1 151768558            None None       None
## 5 rs1488141823       chr1 151768564            None None       None
## 6 rs1202366215       chr1 151768566            None None       None
##   Nearest.Upstream.Gene Type.of.Nearest.Upstream.Gene
## 1                  <NA>                          <NA>
## 2          RP11-98D18.9                     antisense
## 3          RP11-98D18.9                     antisense
## 4          RP11-98D18.9                     antisense
## 5          RP11-98D18.9                     antisense
## 6          RP11-98D18.9                     antisense
##   Distance.to.Nearest.Upstream.Gene Nearest.Downstream.Gene
## 1                              <NA>                    <NA>
## 2                              1671           RP11-98D18.17
## 3                              1678           RP11-98D18.17
## 4                              1680           RP11-98D18.17
## 5                              1686           RP11-98D18.17
## 6                              1688           RP11-98D18.17
##   Type.of.Nearest.Downstream.Gene Distance.to.Nearest.Downstream.Gene
## 1                            <NA>                                <NA>
## 2                         lincRNA                                1981
## 3                         lincRNA                                1974
## 4                         lincRNA                                1972
## 5                         lincRNA                                1966
## 6                         lincRNA                                1964

Download full length results here:nearestgene_annotation.txt https://yaleedu-my.sharepoint.com/:f:/g/personal/gita_pathak_yale_edu/Emq-SyQupL9Nj8SkSYPuM04BlBebU0lIbwDU7lGR84bCzg?e=frnpjq

• Column Headers: * o Variation ID: <dbsnp rs#> * o Chromosome: Variant mapped chromosome location * o Position: Variant start position on chromosome * o Overlapped Gene: Name of the gene (HGNC system) to which the variant is overlapped * o Type: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc. * o Annotation: Summary of whether the variant overlapped with the coding, intronic or untranslated regions of the various transcript isoforms of the gene, as annotated from Ensembl gene system. * o Nearest Upstream Gene: If variant is not overlapped with any gene, then the gene whose end position is nearest to the variant on the left (considering the alignment of genes on the positive strand as left-to-right) * o Type of Nearest Upstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc. * o Distance to Nearest Upstream Gene: distance from the end position of the nearest upstream gene. * o Nearest Downstream Gene: If variant is not overlapped with any gene, then the gene whose start position is nearest to the variant on the right (considering the alignment of genes on the positive strand as left-to-right) * o Type of Nearest Downstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc. * o Distance to Nearest Downstream Gene: distance from the start position of the nearest downstream gene.

CADD Scores

##   Variation.ID Chromosome  Position Variant PHRED
## 1 rs1265893702       chr1 151768556     C/T 13.53
## 2 rs1195052699       chr1 151768558     A/G 12.55
## 3 rs1309532353       chr1 151768828     T/C 15.02
## 4  rs946376285       chr1 151768833     G/A 14.38
## 5 rs1044802297       chr1 151768837     C/T 15.29
## 6  rs903034465       chr1 151768839     G/C 16.03

Download annotation for CADD PHRED score > 10 here: SNPswithpathogenicCADD-annotation.txt https://yaleedu-my.sharepoint.com/:f:/g/personal/gita_pathak_yale_edu/Emq-SyQupL9Nj8SkSYPuM04BlBebU0lIbwDU7lGR84bCzg?e=frnpjq

• Column Headers: * o Variation ID: <dbsnp rs#> * o Chromosome: Chromosome name * o Position: Variant start position on chromosome * o Variant: <reference allele,“/”,observed allele> as reported in the tool’s genome-wide score * o PHRED: PHRED-like (-10*log10(rank/total)) scaled CADD-score ranking a variant relative to all possible substitutions of the human genome. A score≥10 indicates that it is predicted to be in the 10% most deleterious substitutions that you can do to the human genome, a score≥20 indicates the 1% most deleterious and so on

DeepSEA

##   Variation.ID Chromosome Position Variant Functional.Significance.Score
## 1  rs543482228      chr11   522483     C/T                       0.55310
## 2  rs763218255      chr11   522724     A/T                       0.52076
## 3 rs1381432050      chr11   522725     A/T                       0.53188
## 4 rs1048970710      chr11   522953     T/A                       0.50105
## 5 rs1030030831      chr11   523050     A/G                       0.53433
## 6  rs987669684      chr11   525258     G/A                       0.62874
##   eQTL.Probability GWAS.Probability HGMD.Probability
## 1          0.35134          0.27441          0.36835
## 2          0.37491          0.29143          0.36774
## 3          0.42790          0.30231          0.37801
## 4          0.34974          0.27633          0.37562
## 5          0.41663          0.31488          0.37194
## 6          0.35291          0.24213          0.36358

Download annotation for functional score > 0.5 : SNPsDeepScoreTOP-annotation.txt https://yaleedu-my.sharepoint.com/:f:/g/personal/gita_pathak_yale_edu/Emq-SyQupL9Nj8SkSYPuM04BlBebU0lIbwDU7lGR84bCzg?e=frnpjq

• Column Headers: * o Variation ID: <dbsnp rs#> * o Chromosome: Chromosome name * o Position: Variant start position in the chromosome * o Variant: <reference allele,“/”,observed allele> as reported in the tool’s genome-wide score * o eQTL Probability: The probability of the variant being a eQTL variant given by functional variant prioritization classifier. * o GWAS Probability: The probability of the variant being a trait-associated (GWAS) variant given by functional variant prioritization classifier. * o HGMD Probability: The probability of the variant being a inherited disease-associated (HGMD) variant given by functional variant prioritization classifier. * o Functional Significance Score: A measure in the range [0-1] depicting the significance of magnitude of predicted chromatin effect and evolutionary conservation. Lower score indicates higher likelihood of functional significance of the variant.

Enrichment of cluster, disease, function using miRNA targets from SNPs

Cluster

Function

Disease (152 significant observations; top 50 shown in figure)

Top 50 FDR-significant associations are shown in the bar graph and all significant associations are shown in the table listed under the figure

## Selecting by Term.2

6. Neanderthal LA in our COVID-19 ACE2 network SNP set

We compared mean probability of Neanderthal LA between the ACE2 network SNP set (mean=0.032) and 1,000 randomly selected SNP sets with comparable genomic features (range of Neanderthal LA means = 0.027-0.036). The ACE2 network SNP set had significantly greater Neanderthal LA probabilities than 663/1,000 randomly selected SNP sets.

Neanderthal LA in SNPs COVID-19 ACE2 network against randomly selected SNPs

Neanderthal LA in SNPs COVID-19 ACE2 network against randomly selected SNPs

7. SNPs from the COVID19-HGI initiative (Freeze 3) for all six phenotypes

LD pruned and p-value clumped network-SNPs from from the COVID19-HGI initiative (Freeze 3) for all six phenotypes - https://www.covid19hg.org/results/

A2_V2 (very severe respiratory confirmed covid vs. population; Population -Total Cases= 536 | Total Controls= 329391)

Manhattan Plot of GWAS for A2_V2

Manhattan Plot of GWAS for A2_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD Scores

The association of SNPs as QTLs for expression, methylation

GTEX-eQTLs

mQTLs (cis and trans)

B1_V2 Phenotype (hospitalized covid vs. not hospitalized covid; Population Total Cases - 928 | Total Controls - 2028)

Manhattan Plot of GWAS for B1_V2

Manhattan Plot of GWAS for B1_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD Scores

The association of SNPs as QTLs for expression, methylation

GTEX-eQTLs

mQTLs (cis and trans)

B2_V2 Phenotype (hospitalized covid vs. population; Population Total Cases - 3199 | Total Controls - 897488)

Manhattan Plot of GWAS for B2_V2

Manhattan Plot of GWAS for B2_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD Scores

The association of SNPs as QTLs for expression, methylation

GTEX-eQTLs (cis)

mQTLs (cis and trans)

C1_V2 Phenotype (covid vs. lab/self-reported negative; Population Total Cases - 3523 | Total Controls - 36634)

Manhattan Plot of GWAS for C1_V2

Manhattan Plot of GWAS for C1_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD Scores

The association of SNPs as QTLs for expression, methylation, protein, histone, splicing and miRNA

GTEX-eQTLs

mQTLs (cis and trans)

C2_V2 Phenotype (covid vs. population; Population Total Cases - 6696 | Total Controls - 1073072)

Manhattan Plot of GWAS for C2_V2

Manhattan Plot of GWAS for C2_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD

GTEX-eQTLs

mQTLs (cis and trans)

D1_V2 Phenotype (predicted covid from self-reported symptoms vs. predicted or self-reported non-covid; Population Total Cases - 1865 | Total Controls - 29174)

Manhattan Plot of GWAS for D1_V2

Manhattan Plot of GWAS for D1_V2

SNPs from the network, LD pruned and p-value clumped

LD pruned SNPs from the network, and their annotation

LD pruned SNPs from the network, and their annotation

CADD Scores

The association of SNPs as QTLs for expression, methylation

GTEX-eQTLs

noQTLs

mQTLs (cis and trans)

nomQTLs

Overlap of significant SNPs across the six phenotypes

Citation

Pathak et. al (2020) ACE2 Netlas: In-silico functional characterization and drug-gene interactions of ACE2 gene network and its potential involvement in COVID-19 susceptibility

Gita A Pathak, Frank R Wendt, Aranyak Goswami, Flavio De Angelis, COVID-19 Human Genetics Initiative, Renato Polimanti

Affiliation: Yale School of Medicine, Department of Psychiatry, Division of Human Genetics, New Haven, CT Veteran Affairs Connecticut Healthcare System, West Haven, CT

Corresponding authors:
* Renato Polimanti -
* Gita Pathak -