Research News


New Data Representation of Nucleotide Sequences Transcription Factors Bind to

image picture Image by majcot/Shutterstock

Researchers at the University of Tsukuba have developed MOCCS profiles, a novel data representation for nucleotide sequences where transcription factors governing human gene expression bind. They have also elucidated that the transcription factors exhibit specific binding sequences for each cell type. Furthermore, this dataset enables the assessment of the effects of genetic mutations on DNA binding of transcription factors.

Tsukuba, Japan—The diverse characteristics of the human body's various cells are reflected in their gene expression patterns. The regulation of such gene expression is based on transcription factors that bind to specific sequences in the genome. The elucidation of unique transcription factor-binding sequences for each cell type is pivotal in unraveling the regulatory mechanisms governing gene expression within these cell types. Nonetheless, a comprehensive understanding of transcription factor-binding sequences, encompassing commonalities and variations across transcription factor types and cell types, remains elusive.

Using data concerning the binding sites of numerous human transcription factors, the research team has developed "MOCCS profiles", a novel data representation for transcription factor-binding sequences and analyzed binding sequences across various transcription factors and cell types. Their findings reveal that approximately half of the examined transcription factors possess distinct binding sequences for specific cell types. Moreover, utilizing the MOCCS profiles, the researchers developed an index to predict the effect of single nucleotide polymorphisms (SNPs) on DNA binding of transcription factors and showed that the effect of disease-related SNPs on transcription factor binding can be appropriately evaluated from the perspective of transcription factors and cell types.

The MOCCS profiles hold vast potential for various applications such as combining with epigenomic data to decipher cell type-specific gene expression regulatory mechanisms and evaluating the impact of somatic mutations in cancer cells on the binding of transcription factors.

This work was supported by JSPS KAKENHI (grant numbers JP19K24361 and JP20K19915). H.O. was supported by JSPS KAKENHI (grant numbers JP19H03696, JP19K20394, and 22K17992) and AMED Moonshot Research and Development Program (A3I03313).

Original Paper

Title of original paper:
Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans
BMC Genomics


Associate Professor OZAKI Haruka
Institute of Medicine, University of Tsukuba

Related Link

Institute of Medicine
Center for Artificial Intelligence Research

Celebrating the 151st 50th Anniversary of the University of Tsukuba
Celebrating the 151st 50th Anniversary of the University of Tsukuba