VenusX

VenusX: Unlocking Fine-Grained Functional Understanding of Proteins

Yang Tan 🔗, Wenrui Gou, Bozitao Zhong, Huiqun Yu, Liang Hong, Bingxin Zhou

Residue-level Binary Classification

Cross-family

Performance on Cross-family splits (Out-of-distribution). Higher is better.

Act: Active Sites | BindI: Binding Sites | Evo: Conserved Sites | Motif: Functional Motif | Dom: Functional Domain

ModelType AUPRPrecisionRecallF1-PositiveMacro-F1
SaProt (AF_650M)Seq-Structure0.1850.2410.0720.1100.538
Ankh (Base)Sequence-only0.1660.1900.0250.0450.507
ProtSSN (k20_h512)Seq-Structure0.1560.2410.0140.0260.498
ESM2 (t30)Sequence-only0.1430.2780.0600.0980.533
ProtBertSequence-only0.1310.1310.0200.0350.501
ESM2 (t33)Sequence-only0.1430.1260.0310.0500.507
SaProt (AF_35M)Seq-Structure0.1140.1320.0360.0560.510
GVP-GNNStructure-only0.1010.0190.0010.0020.485
ModelType AUPRPrecisionRecallF1-PositiveMacro-F1
SaProt (AF_35M)Seq-Structure0.2300.6340.1350.2230.599
SaProt (AF_650M)Seq-Structure0.1820.6610.1350.2240.600
ProtSSN (k20_h512)Seq-Structure0.0950.3790.0290.0530.514
ProtBertSequence-only0.1120.4160.0480.0860.530
Ankh (Base)Sequence-only0.1450.4370.0860.1440.559
ESM2 (t30)Sequence-only0.1330.5250.0780.1360.556
ESM2 (t33)Sequence-only0.1590.5810.1080.1810.579
GVP-GNNStructure-only0.0400.0000.0000.0000.488
ModelType AUPRPrecisionRecallF1-PositiveMacro-F1
Ankh (Base)Sequence-only0.2750.3870.1690.2350.595
SaProt (AF_650M)Seq-Structure0.2740.4560.1110.1780.568
SaProt (AF_35M)Seq-Structure0.2720.3820.1720.2380.596
ProtBertSequence-only0.2430.4820.0090.0170.489
ESM2 (t33)Sequence-only0.2620.4030.1220.1870.572
ESM2 (t30)Sequence-only0.2350.3740.0970.1540.555
ProtSSN (k20_h512)Seq-Structure0.2270.4520.0340.0620.511
GVP-GNNStructure-only0.1010.1760.0350.0580.506
ModelType AUPRPrecisionRecallF1-PositiveMacro-F1
ProtBertSequence-only0.3480.4720.2310.3100.628
ESM2 (t33)Sequence-only0.4560.5660.3840.4570.704
SaProt (AF_650M)Seq-Structure0.4410.5040.3500.4140.680
ESM2 (t30)Sequence-only0.4330.5100.4320.4670.707
SaProt (AF_35M)Seq-Structure0.4080.4850.4110.4450.695
Ankh (Base)Sequence-only0.3940.4990.3030.3770.662
ProtSSN (k20_h512)Seq-Structure0.3900.3900.3650.4120.678
GVP-GNNStructure-only0.3290.3290.4530.3990.661
ModelType AUPRPrecisionRecallF1-PositiveMacro-F1
SaProt (AF_650M)Seq-Structure0.5640.5720.4440.5000.632
SaProt (AF_35M)Seq-Structure0.5250.5480.3490.4270.594
ProtBertSequence-only0.5080.5880.1380.2230.501
ESM2 (t33)Sequence-only0.5060.5300.3670.4330.593
ESM2 (t30)Sequence-only0.4700.4960.3600.4170.578
GVP-GNNStructure-only0.4680.5190.0870.1490.462
Ankh (Base)Sequence-only0.4490.4940.2800.3570.552
ProtSSN (k20_h512)Seq-Structure-----

Mixed-family

Performance on Mixed-family splits (In-distribution). Higher is better.

Act: Active Sites | BindI/BindB: Binding Sites | Evo: Conserved Sites | Motif: Functional Motif | Dom: Functional Domain | Epi: Epitope Sites

Model Type AUPR Precision Recall F1-Positive Macro-F1
Ankh (Base)Sequence-only0.8730.8620.7000.7730.883
ESM2 (t30)Sequence-only0.8550.8260.6760.7440.868
ESM2 (t33)Sequence-only0.8520.8450.6820.7550.874
ProtBertSequence-only0.7640.7910.5650.6590.825
SaProt (AF_650M)Seq-Structure0.7450.8120.5110.6270.808
SaProt (AF_35M)Seq-Structure0.6880.8180.4080.5440.767
GVP-GNNStructure-only0.5230.7350.3620.4850.736
ProtSSN (k20_h512)Seq-Structure0.4650.5230.2090.3290.658
Model Type AUPR Precision Recall F1-Positive Macro-F1
ESM2 (t30)Sequence-only0.9120.8590.8590.8590.926
Ankh (Base)Sequence-only0.9070.8490.8660.8570.925
ESM2 (t33)Sequence-only0.9040.8690.8300.8490.921
ProtBertSequence-only0.8570.8550.6940.7660.878
SaProt (AF_650M)Seq-Structure0.8380.8270.7680.7960.893
SaProt (AF_35M)Seq-Structure0.8070.8130.7050.7550.871
ProtSSN (k20_h512)Seq-Structure0.8010.8180.7050.7570.873
GVP-GNNStructure-only0.6110.7300.5190.6070.795
Model Type AUPR Precision Recall F1-Positive Macro-F1
ESM2 (t33)Sequence-only0.8990.8560.8060.8310.910
Ankh (Base)Sequence-only0.8950.8820.7350.8020.896
ESM2 (t30)Sequence-only0.8620.8160.7830.7990.894
ProtBertSequence-only0.7710.8050.6100.6940.839
SaProt (AF_650M)Seq-Structure0.7340.8090.5540.6580.820
SaProt (AF_35M)Seq-Structure0.7240.8190.5200.6360.809
ProtSSN (k20_h512)Seq-Structure0.7150.7900.5070.6180.800
GVP-GNNStructure-only0.3420.8100.0910.1640.569
Model Type AUPR Precision Recall F1-Positive Macro-F1
Ankh (Base)Sequence-only0.8840.8460.7890.8170.895
ESM2 (t33)Sequence-only0.8740.8510.7480.7960.884
ESM2 (t30)Sequence-only0.8550.8240.7750.7990.885
SaProt (AF_650M)Seq-Structure0.8020.8410.6150.7100.837
ProtBertSequence-only0.7790.7840.6780.7270.845
SaProt (AF_35M)Seq-Structure0.7670.8210.5820.6810.821
ProtSSN (k20_h512)Seq-Structure0.7160.7720.5500.6420.799
GVP-GNNStructure-only0.6610.7480.5250.6180.786
Model Type AUPR Precision Recall F1-Positive Macro-F1
Ankh (Base)Sequence-only0.6730.6740.4670.5520.700
ESM2 (t33)Sequence-only0.6660.6610.4670.5470.696
SaProt (AF_650M)Seq-Structure0.6420.6350.4720.5420.689
ESM2 (t30)Sequence-only0.6340.6480.4330.5190.679
ProtBertSequence-only0.5910.6360.3530.4540.644
SaProt (AF_35M)Seq-Structure0.5740.6320.3220.4270.629
GVP-GNNStructure-only0.5600.5910.3440.4350.636
ProtSSN (k20_h512)Seq-Structure-----
Model Type AUPR Precision Recall F1-Positive Macro-F1
ESM2 (t33)Sequence-only0.4460.6050.3290.4270.707
Ankh (Base)Sequence-only0.4210.6340.2600.3690.678
ESM2 (t30)Sequence-only0.4080.5980.2890.3900.689
ProtBertSequence-only0.3400.5470.2380.3320.659
Model Type AUPR Precision Recall F1-Positive Macro-F1
ESM2 (t30)Sequence-only0.1861.0000.0010.0020.480
ESM2 (t33)Sequence-only0.1740.0000.0000.0000.479
ProtBertSequence-only0.1691.0000.0010.0020.480
Ankh (Base)Sequence-only0.1670.0000.0000.0000.479

Fragment-level Multi-class Classification

MF50

Performance on InterPro datasets (MF50 split). Higher is better.

ModelType Accuracy Precision Recall Macro-F1 MCC
SaProt (AF_650M)Seq-Structure0.9280.8300.8300.8250.926
GVP-GNNStructure-only0.9070.8260.8330.8220.906
SaProt (AF_35M)Seq-Structure0.9280.8100.8230.8070.926
ProtSSN (k20_h512)Seq-Structure0.8910.7730.7740.7640.889
Ankh (Base)Sequence-only0.8240.6610.6650.6470.821
ESM2 (t30)Sequence-only0.8190.6590.6700.6470.815
ESM2 (t33)Sequence-only0.8140.6030.6340.6050.810
ProtBertSequence-only0.7360.6180.6360.6090.731
ModelType Accuracy Precision Recall Macro-F1 MCC
SaProt (AF_650M)Seq-Structure0.9860.9680.9560.9570.984
SaProt (AF_35M)Seq-Structure0.9760.9430.9290.9310.971
GVP-GNNStructure-only0.9720.9010.8820.8840.967
ProtSSN (k20_h512)Seq-Structure0.9720.9400.9480.9310.967
ESM2 (t30)Sequence-only0.9370.8340.8190.8090.926
ESM2 (t33)Sequence-only0.9340.7550.7750.7530.922
ProtBertSequence-only0.9270.8380.7940.7900.914
Ankh (Base)Sequence-only0.9200.7330.7320.7180.906
ModelType Accuracy Precision Recall Macro-F1 MCC
SaProt (AF_650M)Seq-Structure0.9500.8680.8750.8630.950
SaProt (AF_35M)Seq-Structure0.9390.8570.8580.8490.938
ProtSSN (k20_h512)Seq-Structure0.9150.8040.8070.7930.915
GVP-GNNStructure-only0.9140.7630.7680.7570.913
Ankh (Base)Sequence-only0.8660.7270.7290.7160.865
ESM2 (t30)Sequence-only0.8530.6810.6840.6670.852
ESM2 (t33)Sequence-only0.8410.6820.6820.6690.840
ProtBertSequence-only0.8280.6440.6460.6270.827
ModelType Accuracy Precision Recall Macro-F1 MCC
ProtSSN (k20_h512)Seq-Structure0.9140.5640.5560.5560.907
SaProt (AF_650M)Seq-Structure0.9270.5460.5620.5520.921
ESM2 (t33)Sequence-only0.9060.5470.5430.5420.898
SaProt (AF_35M)Seq-Structure0.9010.5090.5050.5040.892
Ankh (Base)Sequence-only0.9010.5080.5010.4990.892
ESM2 (t30)Sequence-only0.8840.4580.4610.4570.875
ProtBertSequence-only0.8840.4550.4580.4520.875
GVP-GNNStructure-only0.8070.3870.3710.3700.791

Pairwise Functional Similarity Scoring

F50

Performance on fragment-level splits (F50). Metric is AUC (%). Higher is better.

ModelTypeAUC (%)
ESM-IFSeq-Structure96.5
Foldseek (3Di-AA)Alignment96.1
Foldseek (3Di)Alignment96.0
SaProt (AF2_35M)Seq-Structure95.8
TM-alignAlignment94.6
TM-VECSeq-Structure93.6
ProtT5 (xl-uniref50)Seq-Enc-Dec91.8
ProstT5Seq-Structure90.8
SaProt (PDB_650M)Seq-Structure82.8
ProtSSN (k20_h512)Seq-Structure79.1
ProtBertSequence-only71.4
Ankh (base)Seq-Enc-Dec69.6
ESM2 (t30)Sequence-only69.4
ESM-1BSequence-only67.6
MIF-STSeq-Structure65.9
ESM2 (t36)Sequence-only65.8
BLASTAlignment52.9
ESM2 (t33)Sequence-only50.2
ModelTypeAUC (%)
ProstT5Seq-Structure99.5
TM-VECSeq-Structure98.6
ProtT5 (xl-uniref50)Seq-Enc-Dec98.5
SaProt (PDB_650M)Seq-Structure98.1
ESM-IFSeq-Structure95.0
SaProt (AF2_35M)Seq-Structure94.3
Foldseek (3Di)Alignment92.6
Foldseek (3Di-AA)Alignment92.6
TM-alignAlignment90.1
Ankh (base)Seq-Enc-Dec88.9
ProtSSN (k20_h512)Seq-Structure88.4
MIF-STSeq-Structure86.1
ProtBertSequence-only84.9
ESM-1BSequence-only84.5
ESM2 (t30)Sequence-only77.6
ESM2 (t33)Sequence-only73.0
ESM2 (t36)Sequence-only71.3
BLASTAlignment52.4
ModelTypeAUC (%)
Foldseek (3Di-AA)Alignment88.4
Foldseek (3Di)Alignment88.3
ProtT5 (xl-uniref50)Seq-Enc-Dec71.0
TM-alignAlignment67.7
TM-VECSeq-Structure67.4
ESM2 (t36)Sequence-only63.9
Ankh (base)Seq-Enc-Dec63.9
SaProt (PDB_650M)Seq-Structure62.6
SaProt (AF2_35M)Seq-Structure61.9
ESM-IFSeq-Structure61.3
MIF-STSeq-Structure61.3
ProtSSN (k20_h512)Seq-Structure60.9
ESM-1BSequence-only57.0
ProstT5Seq-Structure55.6
ProtBertSequence-only54.6
BLASTAlignment54.0
ESM2 (t30)Sequence-only52.4
ESM2 (t33)Sequence-only49.3
ModelTypeAUC (%)
TM-VECSeq-Structure99.4
SaProt (PDB_650M)Seq-Structure98.9
ProstT5Seq-Structure98.5
ProtT5 (xl-uniref50)Seq-Enc-Dec98.2
ESM2 (t33)Sequence-only92.1
ESM2 (t36)Sequence-only90.1
ESM-1BSequence-only87.2
Ankh (base)Seq-Enc-Dec86.7
SaProt (AF2_35M)Seq-Structure85.3
ProtBertSequence-only85.1
ESM2 (t30)Sequence-only84.3
ESM-IFSeq-Structure80.4
TM-alignAlignment76.6
Foldseek (3Di)Alignment74.8
Foldseek (3Di-AA)Alignment74.7
ProtSSN (k20_h512)Seq-Structure72.4
MIF-STSeq-Structure50.2
BLASTAlignment49.9
ModelTypeAUC (%)
ProtT5 (xl-uniref50)Seq-Enc-Dec98.5
ProstT5Seq-Structure98.5
TM-VECSeq-Structure98.2
Ankh (base)Seq-Enc-Dec97.6
ESM-IFSeq-Structure97.1
SaProt (AF2_35M)Seq-Structure96.0
SaProt (PDB_650M)Seq-Structure91.7
ESM-1BSequence-only89.2
ProtBertSequence-only85.3
ProtSSN (k20_h512)Seq-Structure82.9
MIF-STSeq-Structure78.6
ESM2 (t30)Sequence-only78.0
ESM2 (t36)Sequence-only66.5
ESM2 (t33)Sequence-only62.2

P50

Performance on protein-level splits (P50). Metric is AUC (%). Higher is better.

ModelTypeAUC (%)
Foldseek (3Di)Alignment96.5
Foldseek (3Di-AA)Alignment96.5
Ankh (base)Seq-Enc-Dec90.4
TM-VECSeq-Structure89.9
ProstT5Seq-Structure80.7
ProtT5 (xl-uniref50)Seq-Enc-Dec78.1
SaProt (AF2_35M)Seq-Structure74.6
ESM-1BSequence-only73.8
ESM2 (t36)Sequence-only72.9
BLASTAlignment71.7
ESM-IFSeq-Structure70.2
ESM2 (t33)Sequence-only70.0
ESM2 (t30)Sequence-only69.2
ProtBertSequence-only68.7
SaProt (PDB_650M)Seq-Structure68.2
MIF-STSeq-Structure65.9
ProtSSN (k20_h512)Seq-Structure64.8
ModelTypeAUC (%)
Ankh (base)Seq-Enc-Dec91.8
TM-VECSeq-Structure82.4
Foldseek (3Di)Alignment80.6
Foldseek (3Di-AA)Alignment80.1
ProstT5Seq-Structure79.2
ProtT5 (xl-uniref50)Seq-Enc-Dec77.1
SaProt (AF2_35M)Seq-Structure71.9
SaProt (PDB_650M)Seq-Structure71.1
ESM-1BSequence-only69.8
ESM2 (t36)Sequence-only67.6
ProtBertSequence-only66.8
ESM-IFSeq-Structure65.6
ESM2 (t30)Sequence-only65.5
ESM2 (t33)Sequence-only62.3
ProtSSN (k20_h512)Seq-Structure61.2
MIF-STSeq-Structure59.2
BLASTAlignment51.1
ModelTypeAUC (%)
Foldseek (3Di)Alignment99.0
Foldseek (3Di-AA)Alignment99.0
Ankh (base)Seq-Enc-Dec98.9
ProstT5Seq-Structure98.2
TM-VECSeq-Structure96.2
ProtT5 (xl-uniref50)Seq-Enc-Dec95.6
SaProt (PDB_650M)Seq-Structure93.8
SaProt (AF2_35M)Seq-Structure92.7
ESM2 (t36)Sequence-only92.1
ESM-IFSeq-Structure90.6
ESM2 (t33)Sequence-only89.0
ESM-1BSequence-only88.4
ESM2 (t30)Sequence-only87.5
ProtSSN (k20_h512)Seq-Structure86.2
ProtBertSequence-only84.2
MIF-STSeq-Structure80.3
ModelTypeAUC (%)
TM-VECSeq-Structure71.7
ESM2 (t36)Sequence-only70.0
ProstT5Seq-Structure69.8
Ankh (base)Seq-Enc-Dec69.7
SaProt (PDB_650M)Seq-Structure68.3
ProtBertSequence-only68.2
ESM2 (t30)Sequence-only68.2
ProtT5 (xl-uniref50)Seq-Enc-Dec67.6
SaProt (AF2_35M)Seq-Structure66.6
MIF-STSeq-Structure66.3
ESM2 (t33)Sequence-only66.1
ESM-IFSeq-Structure66.0
Foldseek (3Di)Alignment64.9
Foldseek (3Di-AA)Alignment64.7
ProtSSN (k20_h512)Seq-Structure64.0
ESM-1BSequence-only58.4
BLASTAlignment56.2
ModelTypeAUC (%)
Ankh (base)Seq-Enc-Dec88.5
ProtT5 (xl-uniref50)Seq-Enc-Dec85.1
ProstT5Seq-Structure79.3
SaProt (AF2_35M)Seq-Structure78.8
ProtBertSequence-only77.9
ESM2 (t30)Sequence-only77.4
SaProt (PDB_650M)Seq-Structure76.1
ESM-1BSequence-only74.7
ESM-IFSeq-Structure70.5
ProtSSN (k20_h512)Seq-Structure69.4
ESM2 (t36)Sequence-only66.7
MIF-STSeq-Structure66.7
ESM2 (t33)Sequence-only66.4
TM-VECSeq-Structure59.9