XJIPC OpenIR  > 多语种信息技术研究室
A High Efficient Biological Language Model for Predicting Protein-Protein Interactions
Wang, YB (Wang, Yanbin)[ 1,2 ]; You, ZH (You, Zhu-Hong)[ 1 ]; Yang, S (Yang, Shan)[ 1,2 ]; Li, X (Li, Xiao)[ 1 ]; Jiang, TH (Jiang, Tong-Hai)[ 1 ]; Zhou, X (Zhou, Xi)[ 1 ]
2019
Source PublicationCELLS
ISSN2073-4409
Volume8Issue:2Pages:1-12
Abstract

Many life activities and key functions in organisms are maintained by different types of protein-protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein-protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a "bio-word" segmentation system and a word representation model used for learning the distributed representation for each "bio-word". The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of "bio language processing technology," which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.

Keywordprotein-protein interactions bio-language processing sentencepiece convolution neural network unigram language model
DOI10.3390/cells8020122
Indexed BySCI
WOS IDWOS:000460896000044
Citation statistics
Cited Times:3[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Identifierhttp://ir.xjipc.cas.cn/handle/365002/5733
Collection多语种信息技术研究室
Corresponding AuthorYou, ZH (You, Zhu-Hong)[ 1 ]
Affiliation1.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
Recommended Citation
GB/T 7714
Wang, YB ,You, ZH ,Yang, S ,et al. A High Efficient Biological Language Model for Predicting Protein-Protein Interactions[J]. CELLS,2019,8(2):1-12.
APA Wang, YB ,You, ZH ,Yang, S ,Li, X ,Jiang, TH ,&Zhou, X .(2019).A High Efficient Biological Language Model for Predicting Protein-Protein Interactions.CELLS,8(2),1-12.
MLA Wang, YB ,et al."A High Efficient Biological Language Model for Predicting Protein-Protein Interactions".CELLS 8.2(2019):1-12.
Files in This Item:
File Name/Size DocType Version Access License
A High Efficient Bio(1624KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, YB (Wang, Yanbin)[ 1,2 ]]'s Articles
[You, ZH (You, Zhu-Hong)[ 1 ]]'s Articles
[Yang, S (Yang, Shan)[ 1,2 ]]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, YB (Wang, Yanbin)[ 1,2 ]]'s Articles
[You, ZH (You, Zhu-Hong)[ 1 ]]'s Articles
[Yang, S (Yang, Shan)[ 1,2 ]]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, YB (Wang, Yanbin)[ 1,2 ]]'s Articles
[You, ZH (You, Zhu-Hong)[ 1 ]]'s Articles
[Yang, S (Yang, Shan)[ 1,2 ]]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: A High Efficient Biological Language Model for Predicting Protein-Protein Interactions.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.