氏名

ハマダ ミチアキ

浜田 道昭

職名

教授

所属理工学術院

(先進理工学部)

連絡先

メールアドレス

メールアドレス
mhamada@waseda.jp

URL等

WebページURL

http://www.f.waseda.jp/mhamada/(浜田道昭(バイオインフォマティクス)研究室)

https://sites.google.com/site/michiakihamada/(個人)

本属以外の学内所属

兼担

理工学術院(大学院先進理工学研究科)

学内研究所等

構造生物・創薬研究所

研究所員 2015年-

ヒューマンパフォーマンス研究所

研究所員 2017年-

早稲田バイオサイエンスシンガポール研究所

研究所員 2017年-

学歴・学位

学歴

2000年04月-2002年03月 東北大学 理学研究科 数学専攻

学位

博士(理学) 課程 東京工業大学 生命・健康・医療情報学

経歴

2002年04月-2004年09月株式会社富士総合研究所(現:みずほ情報総研株式会社) 研究員
2004年10月-2006年07月(社名変更により) みずほ情報総研株式会社 研究員
2006年07月-2010年09月みずほ情報総研株式会社 コンサルタント
2010年10月-2014年03月東京大学大学院新領域創成科学研究科情報生命科学専攻 特任准教授
2014年04月-2018年03月早稲田大学 理工学術院 先進理工学研究科 電気・情報生命専攻 准教授
2018年04月-早稲田大学 理工学術院 先進理工学研究科 電気・情報生命専攻 教授

所属学協会

日本バイオインフォマティクス学会 理事

日本分子生物学会

日本RNA学会

日本癌学会

委員歴・役員歴(学外)

2014年04月-日〜 日本バイオインフォマティクス学会 理事

受賞

平成29年度科学技術分野の文部科学大臣表彰 若手科学者賞

2017年04月

研究分野

キーワード

バイオインフォマティクス;ゲノム科学;データマイニング;機械学習;人工知能

科研費分類

情報学 / 情報学フロンティア / 生命・健康・医療情報学

論文

Mining frequent stem patterns from unaligned RNA sequences.

Hamada Michiaki;Tsuda Koji;Kudo Taku;Kin Taishin;Asai Kiyoshi

Bioinformatics (Oxford, England)22(20)2006年-2006年

PubMed

詳細

ISSN:1367-4811

概要:MOTIVATION:In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly.;RESULTS:Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder.;AVAILABILITY:The software is available upon request.

Prediction of RNA secondary structure using generalized centroid estimators.

Hamada Michiaki;Kiryu Hisanori;Sato Kengo;Mituyama Toutai;Asai Kiyoshi

Bioinformatics (Oxford, England)25(4)2009年-2009年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures.;RESULTS:We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics.;AVAILABILITY:Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/.

CENTROIDFOLD: a web server for RNA secondary structure prediction.

Sato Kengo;Hamada Michiaki;Asai Kiyoshi;Mituyama Toutai

Nucleic acids research37(Web Server issue)2009年-2009年

PubMedDOI

詳細

ISSN:1362-4962

概要::The CENTROIDFOLD web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the 'execute CentroidFold' button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CentroidFold software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement.

Predictions of RNA secondary structure by combining homologous sequence information.

Hamada Michiaki;Sato Kengo;Kiryu Hisanori;Mituyama Toutai;Asai Kiyoshi

Bioinformatics (Oxford, England)25(12)2009年-2009年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted.;RESULTS:In this article, we propose a new method for secondary structure predictions of individual RNA sequences by taking the information of their homologous sequences into account without assuming the common secondary structure of the entire sequences. The proposed method is based on posterior decoding techniques, which consider all the suboptimal secondary structures of the target and homologous sequences and all the suboptimal alignments between the target sequence and each of the homologous sequences. In our computational experiments, the proposed method provides better predictions than those performed only on the basis of the formation of individual RNA sequences and those performed by using methods for predicting the common secondary structure of the homologous sequences. Remarkably, we found that the common secondary predictions sometimes give worse predictions for the secondary structure of a target sequence than the predictions from the individual target sequence, while the proposed method always gives good predictions for the secondary structure of target sequences in all tested cases.;AVAILABILITY:Supporting information and software are available online at: http://www.ncrna.org/software/centroidfold/ismb2009/.;SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score.

Hamada Michiaki;Sato Kengo;Kiryu Hisanori;Mituyama Toutai;Asai Kiyoshi

Bioinformatics (Oxford, England)25(24)2009年-2009年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.;RESULTS:We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L(3)+c(2)dL(2)) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.;AVAILABILITY:The software called CentroidAlign, which is an implementation of the algorithm in this article, is freely available on our website: http://www.ncrna.org/software/centroidalign/.;CONTACT:hamada-michiaki@aist.go.jp;SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

Improving the accuracy of predicting secondary structure for aligned RNA sequences.

Hamada Michiaki;Sato Kengo;Asai Kiyoshi

Nucleic acids research39(2)2011年-2011年

PubMedDOI

詳細

ISSN:1362-4962

概要::Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

Prediction of RNA secondary structure by maximizing pseudo-expected accuracy.

Hamada Michiaki;Sato Kengo;Asai Kiyoshi

BMC bioinformatics112010年-2010年

PubMedDOI

詳細

ISSN:1471-2105

概要:BACKGROUND:Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.;RESULTS:Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.;CONCLUSIONS:This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.

Generalized centroid estimators in bioinformatics.

Hamada Michiaki;Kiryu Hisanori;Iwasaki Wataru;Asai Kiyoshi

PloS one6(2)2011年-2011年

PubMedDOI

詳細

ISSN:1932-6203

概要::In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.

Antagonistic RNA aptamer specific to a heterodimeric form of human interleukin-17A/F.

Adachi Hironori;Ishiguro Akira;Hamada Michiaki;Sakota Eri;Asai Kiyoshi;Nakamura Yoshikazu

Biochimie93(7)2011年-2011年

PubMedDOI

詳細

ISSN:1638-6183

概要::Interleukin-17 (IL-17) is a pro-inflammatory cytokine produced primarily by a subset of CD4(+)T cells, called Th17 cells, that is involved in host defense, inflammation and autoimmune disorders. The two most structurally related IL-17 family members, IL-17A and IL-17F, form homodimeric (IL-17A/A, IL-17F/F) and heterodimeric (IL-17A/F) complexes. Although the biological significance of IL-17A and IL-17F have been investigated using respective antibodies or gene knockout mice, the functional study of IL-17A/F heterodimeric form has been hampered by the lack of an inhibitory tool specific to IL-17A/F. In this study, we aimed to develop an RNA aptamer that specifically inhibits IL-17A/F. Aptamers are short single-stranded nucleic acid sequences that are selected in vitro based on their high affinity to a target molecule. One selected aptamer against human IL-17A/F, AptAF42, was isolated by repeated cycles of selection and counterselection against heterodimeric and homodimeric complexes, respectively. Thus, AptAF42 bound IL-17A/F but not IL-17A/A or IL-17F/F. The optimized derivative, AptAF42dope1, blocked the binding of IL-17A/F, but not of IL-17A/A or IL-17F/F, to the IL-17 receptor in the surface plasmon resonance assay in vitro. Consistently, AptAF42dope1 blocked cytokine GRO-α production induced by IL-17A/F, but not by IL-17A/A or IL-17F/F, in human cells. An RNA footprinting assay using ribonucleases against AptAF42dope1 in the presence or absence of IL-17A/F revealed that part of the predicted secondary structure fluctuates between alternate forms and that AptAF42dope1 is globally protected from ribonuclease cleavage by IL-17A/F. These results suggest that the selected aptamer recognizes a global conformation specified by the heterodimeric surface of IL-17A/F.

CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences.

Hamada Michiaki;Yamada Koichiro;Sato Kengo;Frith Martin C;Asai Kiyoshi

Nucleic acids research39(Web Server issue)2011年-2011年

PubMedDOI

詳細

ISSN:1362-4962

概要::Although secondary structure predictions of an individual RNA sequence have been widely used in a number of sequence analyses of RNAs, accuracy is still limited. Recently, we proposed a method (called 'CentroidHomfold'), which includes information about homologous sequences into the prediction of the secondary structure of the target sequence, and showed that it substantially improved the performance of secondary structure predictions. CentroidHomfold, however, forces users to prepare homologous sequences of the target sequence. We have developed a Web application (CentroidHomfold-LAST) that predicts the secondary structure of the target sequence using automatically collected homologous sequences. LAST, which is a fast and sensitive local aligner, and CentroidHomfold are employed in the Web application. Computational experiments with a commonly-used data set indicated that CentroidHomfold-LAST substantially outperformed conventional secondary structure predictions including CentroidFold and RNAfold.

Shape-based alignment of genomic landscapes in multi-scale resolution.

Ashida Hiroki;Asai Kiyoshi;Hamada Michiaki

Nucleic acids research40(14)2012年-2012年

PubMedDOI

詳細

ISSN:1362-4962

概要::Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous 'genomic landscapes' to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape.

Direct updating of an RNA base-pairing probability matrix with marginal probability constraints.

Hamada Michiaki

Journal of computational biology : a journal of computational molecular cell biology19(12)2012年-2012年

PubMedDOI

詳細

ISSN:1557-8666

概要::A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented.

CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs.

Yonemoto Haruka;Asai Kiyoshi;Hamada Michiaki

International journal of molecular sciences14(3)2013年-2013年

PubMedDOI

詳細

ISSN:1422-0067

概要::Due to the recent discovery of non-coding RNAs (ncRNAs), multiple sequence alignment (MSA) of those long RNA sequences is becoming increasingly important for classifying and determining the functional motifs in RNAs. However, not only primary (nucleotide) sequences, but also secondary structures of ncRNAs are closely related to their function and are conserved evolutionarily. Hence, information about secondary structures should be considered in the sequence alignment of ncRNAs. Yet, in general, a huge computational time is required in order to compute MSAs, taking secondary structure information into account. In this paper, we describe a fast and accurate web server, called CentroidAlign-Web, which can handle long RNA sequences. The web server also appropriately incorporates information about known secondary structures into MSAs. Computational experiments indicate that our web server is fast and accurate enough to handle long RNA sequences. CentroidAlign-Web is freely available from http://centroidalign.ncrna.org/.

RNA structural alignments, part II: non-Sankoff approaches for structural alignments.

Asai Kiyoshi;Hamada Michiaki

Methods in molecular biology (Clifton, N.J.)10972014年-2014年

PubMedDOI

詳細

ISSN:1940-6029

概要::In structural alignments of RNA sequences, the computational cost of Sankoff algorithm, which simultaneously optimizes the score of the common secondary structure and the score of the alignment, is too high for long sequences (O(L (6)) time for two sequences of length L). In this chapter, we introduce the methods that predict the structures and the alignment separately to avoid the heavy computations in Sankoff algorithm. In those methods, neither of those two prediction processes is independent, but each of them utilizes the information of the other process. The first process typically includes prediction of base-pairing probabilities (BPPs) or the candidates of the stems, and the alignment process utilizes those results. At the same time, it is also important to reflect the information of the alignment to the structure prediction. This idea can be implemented as the probabilistic transformation (PCT) of BPPs using the potential alignment. As same as for all the estimation problems, it is important to define the evaluation measure for the structural alignment. The principle of maximum expected accuracy (MEA) is applicable for sum-of-pairs (SPS) score based on the reference alignment.

Improved Accuracy in RNA-Protein Rigid Body Docking by Incorporating Force Field for Molecular Dynamics Simulation into the Scoring Function.

Iwakiri Junichi;Hamada Michiaki;Asai Kiyoshi;Kameda Tomoshi

Journal of chemical theory and computation12(9)p.4688 - 46972016年-2016年

PubMedDOIScopus

詳細

ISSN:1549-9626

概要::RNA-protein interactions play fundamental roles in many biological processes. To understand these interactions, it is necessary to know the three-dimensional structures of RNA-protein complexes. However, determining the tertiary structure of these complexes is often difficult, suggesting that an accurate rigid body docking for RNA-protein complexes is needed. In general, the rigid body docking process is divided into two steps: generating candidate structures from the individual RNA and protein structures and then narrowing down the candidates. In this study, we focus on the former problem to improve the prediction accuracy in RNA-protein docking. Our method is based on the integration of physicochemical information about RNA into ZDOCK, which is known as one of the most successful computer programs for protein-protein docking. Because recent studies showed the current force field for molecular dynamics simulation of protein and nucleic acids is quite accurate, we modeled the physicochemical information about RNA by force fields such as AMBER and CHARMM. A comprehensive benchmark of RNA-protein docking, using three recently developed data sets, reveals the remarkable prediction accuracy of the proposed method compared with existing programs for docking: the highest success rate is 34.7% for the predicted structure of the RNA-protein complex with the best score and 79.2% for 3,600 predicted ones. Three full atomistic force fields for RNA (AMBER94, AMBER99, and CHARMM22) produced almost the same accurate result, which showed current force fields for nucleic acids are quite accurate. In addition, we found that the electrostatic interaction and the representation of shape complementary between protein and RNA plays the important roles for accurate prediction of the native structures of RNA-protein complexes.

Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions.

Iwakiri Junichi;Kameda Tomoshi;Asai Kiyoshi;Hamada Michiaki

Bioinformatics (Oxford, England)29(20)2013年-2013年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Understanding the details of protein-RNA interactions is important to reveal the functions of both the RNAs and the proteins. In these interactions, the secondary structures of the RNAs play an important role. Because RNA secondary structures in protein-RNA complexes are variable, considering the ensemble of RNA secondary structures is a useful approach. In particular, recent studies have supported the idea that, in the analysis of RNA secondary structures, the base-pairing probabilities (BPPs) of RNAs (i.e. the probabilities of forming a base pair in the ensemble of RNA secondary structures) provide richer and more robust information about the structures than a single RNA secondary structure, for example, the minimum free energy structure or a snapshot of structures in the Protein Data Bank. However, there has been no investigation of the BPPs in protein-RNA interactions.;RESULTS:In this study, we analyzed BPPs of RNA molecules involved in known protein-RNA complexes in the Protein Data Bank. Our analysis suggests that, in the tertiary structures, the BPPs (which are computed using only sequence information) for unpaired nucleotides with intermolecular hydrogen bonds (hbonds) to amino acids were significantly lower than those for unpaired nucleotides without hbonds. On the other hand, no difference was found between the BPPs for paired nucleotides with and without intermolecular hbonds. Those findings were commonly supported by three probabilistic models, which provide the ensemble of RNA secondary structures, including the McCaskill model based on Turner's free energy of secondary structures.

Reference-free prediction of rearrangement breakpoint reads

Wijaya, Edward; Shimizu, Kana; Asai, Kiyoshi; Asai, Kiyoshi; Hamada, Michiaki; Hamada, Michiaki

Bioinformatics30(18)p.2559 - 25672014年01月-2014年01月 

PubMedDOIScopus

詳細

ISSN:13674803

概要:© 2014 The Author. Availability and implementation: The source code of SlideSort-BPRcan be freely downloaded from https://code.google.com/p/slidesortbpr/. Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information. Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

A semi-supervised learning approach for RNA secondary structure prediction

Yonemoto, Haruka; Asai, Kiyoshi; Asai, Kiyoshi; Hamada, Michiaki; Hamada, Michiaki

Computational Biology and Chemistry57p.72 - 792015年05月-2015年05月 

PubMedDOIScopus

詳細

ISSN:14769271

概要:© 2015 Elsevier Ltd. All rights reserved. RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and dis criminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.

Learning chromatin states with factorized information criteria

Hamada, Michiaki; Hamada, Michiaki; Ono, Yukiteru; Fujimaki, Ryohei; Asai, Kiyoshi; Asai, Kiyoshi

Bioinformatics31(15)p.2426 - 24332015年01月-2015年01月 

PubMedDOIScopus

詳細

ISSN:13674803

概要:© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com. Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized. Results: In this study, we propose a method to estimate the chromatin states indicated by genomewide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

Bioinformatics tools for lncRNA research

Iwakiri, Junichi; Iwakiri, Junichi; Hamada, Michiaki; Hamada, Michiaki; Asai, Kiyoshi; Asai, Kiyoshi

Biochimica et Biophysica Acta - Gene Regulatory Mechanisms1859(1)p.23 - 302016年01月-2016年01月 

PubMedDOIScopus

詳細

ISSN:18749399

概要:© 2015 Elsevier B.V. Current experimental methods to identify the functions of a large number of the candidates of long non-coding RNAs (lncRNAs) are limited in their throughput. Therefore, it is essential to know which tools are effective for understanding lncRNAs so that reasonable speed and accuracy can be achieved. In this paper, we review the currently available bioinformatics tools and databases that are useful for finding non-coding RNAs and analyzing their structures, conservation, interactions, co-expressions and localization. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa.

Comprehensive prediction of lncRNA-RNA interactions in human transcriptome

Terai, Goro; Iwakiri, Junichi; Kameda, Tomoshi; Hamada, Michiaki; Hamada, Michiaki; Asai, Kiyoshi; Asai, Kiyoshi

BMC Genomics17(1)2016年01月-2016年01月 

PubMedDOIScopus

詳細

概要:© 2015 Terai et al. Motivation: Recent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA-RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA-RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved. Results: Here, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA-RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA-RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA-RNA interactions to provide fundamental information about the targets of lncRNAs.

RNA secondary structure prediction from multi-aligned sequences

Hamada, Michiaki; Hamada, Michiaki

RNA Bioinformaticsp.17 - 382015年01月-2015年01月 

DOIScopus

詳細

概要:© Springer Science+Business Media New York 2015. All right reserved. It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.

Efficient calculation of exact probability distributions of integer features on RNA secondary structures

Mori, Ryota; Hamada, Michiaki; Hamada, Michiaki; Asai, Kiyoshi; Asai, Kiyoshi

BMC Genomics152014年01月-2014年01月 

PubMedDOIScopus

詳細

ISSN:1471-2164

概要:© 2014 Mori et al. Background: Although the needs for analyses of secondary structures of RNAs are increasing, prediction of the secondary structures of RNAs are not always reliable. Because an RNA may have a complicated energy landscape, comprehensive representations of the whole ensemble of the secondary structures, such as the probability distributions of various features of RNA secondary structures are required. Results: A general method to efficiently compute the distribution of any integer scalar/vector function on the secondary structure is proposed. We also show two concrete algorithms, for Hamming distance from a reference structure and for 5' - 3' distance, which can be constructed by following our general method. These practical applications of this method show the effectiveness of the proposed method. Conclusions: The proposed method provides a clear and comprehensive procedure to construct algorithms for distributions of various integer features. In addition, distributions of integer vectors, that is a combination of different integer scores, can be also described by applying our 2D expanding technique.

AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana

Ishii, Kotaro; Kazama, Yusuke; Yamada, Mieko; Abe, Tomoko; Hirano, Tomonari; Hamada, Michiaki; Ono, Yukiteru

Genes and Genetic Systems91(4)p.229 - 2332016年01月-2016年01月 

PubMedDOIScopus

詳細

ISSN:13417568

概要:Detection of mutations at the whole-genome level is now possible by the use of high-throughput sequencing. However, determining mutations is a time-consuming process due to the number of false positives provided by mutation-detecting programs. AMAP (automated mutation analysis pipeline) was developed to overcome this issue. AMAP integrates a set of well-validated programs for mapping (BWA), removal of potential PCR duplicates (Picard), realignment (GATK) and detection of mutations (SAMtools, GATK, Pindel, BreakDancer and CNVnator). Thus, all types of mutations such as base substitution, deletion, insertion, translocation and chromosomal rearrangement can be detected by AMAP. In addition, AMAP automatically distinguishes false positives by comparing lists of candidate mutations in sequenced mutants. We tested AMAP by inputting already analyzed read data derived from three individual Arabidopsis thaliana mutants and confirmed that all true mutations were included in the list of candidate mutations. The result showed that the number of false positives was reduced to 12% of that obtained in a previous analysis that lacked a process of reducing false positives. Thus, AMAP will accelerate not only the analysis of mutation induction by individual mutagens but also the process of forward genetics. © 2016, The Genetics Society of Japan. All rights reserved.

Privacy-preserving search for chemical compound databases

Shimizu, Kana; Nuida, Koji; Nuida, Koji; Arai, Hiromi; Mitsunari, Shigeo; Attrapadung, Nuttapong; Hamada, Michiaki; Tsuda, Koji; Hirokawa, Takatsugu; Sakuma, Jun; Hanaoka, Goichiro; Asai, Kiyoshi

BMC Bioinformatics16(18)2015年12月-2015年12月 

PubMedDOIScopus

詳細

概要:© 2015 Shimizu et al. Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

Training alignment parameters for arbitrary sequencers with LAST-TRAIN

Hamada, Michiaki; Hamada, Michiaki; Hamada, Michiaki; Ono, Yukiteru; Asai, Kiyoshi; Asai, Kiyoshi; Frith, Martin C.; Frith, Martin C.; Frith, Martin C.; Hancock, John

Bioinformatics33(6)p.926 - 9282017年01月-2017年01月 

PubMedDOIScopus

詳細

ISSN:13674803

概要:© The Author 2016. LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads.

Rtools: a web server for various secondary structural analyses on single RNA sequences

Hamada, Michiaki; Ono, Yukiteru; Kiryu, Hisanori; Sato, Kengo; Kato, Yuki; Fukunaga, Tsukasa; Mori, Ryota; Asai, Kiyoshi

Nucleic acids research44(W1)p.W302 - W3072016年07月-2016年07月 

PubMedDOIScopus

詳細

概要:© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.

Estimating energy parameters for RNA secondary structure predictions using both experimental and computational data

Nishida, Shimpei; Sakuraba, Shun; Asai, Kiyoshi; Hamada, Michiaki

IEEE/ACM Transactions on Computational Biology and Bioinformatics2018年03月-2018年03月 

DOIScopus

詳細

ISSN:15455963

概要:IEEE Computational RNA secondary structure prediction depends on a large number of nearest-neighbor free-energy parameters, including 10 parameters for Watson-Crick stacked base pairs that were estimated from experimental measurements of the free energies of 90 RNA duplexes. These experimental data are provided by time-consuming and cost-intensive experiments. In contrast, various modified nucleotides in RNAs, which would affect not only their structures but also functions, have been found, and rapid determination of energy parameters for a such modified nucleotides is needed. To reduce the high cost of determining energy parameters, we propose a novel method to estimate energy parameters from both experimental and computational data, where the computational data are provided by a recently developed molecular dynamics simulation protocol. We evaluate our method for Watson-Crick stacked base pairs, and show that parameters estimated from 10 experimental data items and 10 computational data items can predict RNA secondary structures with accuracy comparable to that using conventional parameters. The results indicate that the combination of experimental free-energy measurements and molecular dynamics simulations is capable of estimating the thermodynamic properties of RNA secondary structures at lower cost.

Software.ncrna.org: web servers for analyses of RNA sequences.

Asai Kiyoshi;Kiryu Hisanori;Hamada Michiaki;Tabei Yasuo;Sato Kengo;Matsui Hiroshi;Sakakibara Yasubumi;Terai Goro;Mituyama Toutai

Nucleic acids research36(Web Server issue)2008年-2008年

PubMedDOI

詳細

ISSN:1362-4962

概要::We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.

Parameters for accurate genome alignment.

Frith Martin C;Hamada Michiaki;Horton Paul

BMC bioinformatics112010年-2010年

PubMedDOI

詳細

ISSN:1471-2105

概要:BACKGROUND:Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.;RESULTS:We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that gamma-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.;CONCLUSIONS:These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.

RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming.

Kato Yuki;Sato Kengo;Hamada Michiaki;Watanabe Yoshihide;Asai Kiyoshi;Akutsu Tatsuya

Bioinformatics (Oxford, England)26(18)2010年-2010年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a trade-off between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.;RESULTS:We present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures.;AVAILABILITY:RactIP is implemented in C++, and the source code is available at http://www.ncrna.org/software/ractip/.

IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming.

Sato Kengo;Kato Yuki;Hamada Michiaki;Akutsu Tatsuya;Asai Kiyoshi

Bioinformatics (Oxford, England)27(13)2011年-2011年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Pseudoknots found in secondary structures of a number of functional RNAs play various roles in biological processes. Recent methods for predicting RNA secondary structures cover certain classes of pseudoknotted structures, but only a few of them achieve satisfying predictions in terms of both speed and accuracy.;RESULTS:We propose IPknot, a novel computational method for predicting RNA secondary structures with pseudoknots based on maximizing expected accuracy of a predicted structure. IPknot decomposes a pseudoknotted structure into a set of pseudoknot-free substructures and approximates a base-pairing probability distribution that considers pseudoknots, leading to the capability of modeling a wide class of pseudoknots and running quite fast. In addition, we propose a heuristic algorithm for refining base-paring probabilities to improve the prediction accuracy of IPknot. The problem of maximizing expected accuracy is solved by using integer programming with threshold cut. We also extend IPknot so that it can predict the consensus secondary structure with pseudoknots when a multiple sequence alignment is given. IPknot is validated through extensive experiments on various datasets, showing that IPknot achieves better prediction accuracy and faster running time as compared with several competitive prediction methods.;AVAILABILITY:The program of IPknot is available at http://www.ncrna.org/software/ipknot/. IPknot is also available as a web server at http://rna.naist.jp/ipknot/.;CONTACT:satoken@k.u-tokyo.ac.jp; ykato@is.naist.jp;SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection.

Hamada Michiaki;Wijaya Edward;Frith Martin C;Asai Kiyoshi

Bioinformatics (Oxford, England)27(22)2011年-2011年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.;RESULTS:In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA).

Hamada Michiaki;Asai Kiyoshi

Journal of computational biology : a journal of computational molecular cell biology19(5)2012年-2012年

PubMedDOI

詳細

ISSN:1557-8666

概要::Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

PBSIM: PacBio reads simulator--toward accurate genome assembly.

Ono Yukiteru;Asai Kiyoshi;Hamada Michiaki

Bioinformatics (Oxford, England)29(1)2013年-2013年

PubMedDOI

詳細

ISSN:1367-4811

概要:MOTIVATION:PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.;RESULTS:Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.;AVAILABILITY:PBSIM is freely available from the web under the GNU GPL v2 license (http://code.google.com/p/pbsim/).

Fighting against uncertainty: an essential issue in bioinformatics.

Hamada Michiaki

Briefings in bioinformatics15(5)2014年-2014年

PubMedDOI

詳細

ISSN:1477-4054

概要::Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

A non-parametric bayesian approach for predicting RNA secondary structures

Sato, Kengo; Sato, Kengo; Sato, Kengo; Hamada, Michiaki; Hamada, Michiaki; Mituyama, Toutai; Asai, Kiyoshi; Asai, Kiyoshi; Sakakibara, Yasubumi; Sakakibara, Yasubumi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)5724 LNBIp.286 - 2972009年11月-2009年11月 

DOIScopus

詳細

ISSN:03029743

概要:Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models. © 2009 Springer Berlin Heidelberg.

A non-parametric bayesian approach for predicting rna secondary structures

Sato, Kengo; Hamada, Michiaki; Mituyama, Toutai; Asai, Kiyoshi; Sakakibara, Yasubumi

Journal of Bioinformatics and Computational Biology8(4)p.727 - 7422010年08月-2010年08月 

DOIScopus

詳細

ISSN:02197200

概要:Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models. © 2010 Imperial College Press.

A novel method for assessing the statistical significance of RNA-RNA interactions between two long RNAs

Fukunaga, Tsukasa; Hamada, Michiaki

Journal of computational biology : a journal of computational molecular cell biology2018年-2018年

Identification and analysis of ribosome-associated lncRNAs using ribosome profiling data

Zeng, Chao; Fukunaga, Tsukasa; Hamada, Michiaki

BMC Genomics19p.4142018年-2018年

DOI

Identification of transposable elements contributing to tissue-specific expression of long non-coding RNAs

Chishima, Takafumi; Iwakiri, Junichi; Hamada, Michiaki

Genes9(1)p.232018年-2018年

DOI

Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Takeda, Taikai; Hamada, Michiaki

Bioinformatics34(4)p.576 - 5842018年-2018年

DOI

In silico approaches to RNA aptamer design

Hamada, Michiaki

Biochimie145p.8 - 142017年-2017年

DOI

RIblast: An ultrafast RNA-RNA interaction prediction system for comprehensive lncRNA interaction analysis

Fukunaga, Tsukasa; Hamada, Michiaki

Bioinformatics33(17)p.2666 - 26742017年-2017年

DOI

Remark on application of distribution function inequality for Toeplitz and Hankel operators

Hamada, Michiaki

Hokkaido Mathematical Journal32(1)p.193 - 2082003年-2003年

DOI

A High Performance Computing Environments for Prediction of Activity and function of Biomolecules : An Application to Analysis of HIV Protease Inhibitors

Hamada, Michiaki; Feng, Cheng; Inagaki, Yuichiro; Nagashima, Unpei; Murakami, Kazuaki; Chuman, Hiroshi

Transactions of the Japan Society for Industrial and Applied Mathematics14(4)p.267 - 2882004年-2004年

DOI

書籍等出版物

生命情報処理における機械学習 : 多重検定と推定量設計 = Machine learning in bioinformatics

瀬々潤, 浜田道昭著

講談社2015年-2015年

LINK

詳細

ISBN:9784061529113;

講演・口頭発表等

ヒトトランスクリプトームにおける網羅的 lncRNA-RNA 相互作用予測

第17回日本RNA学会年会2015年07月

詳細

ポスター発表

次世代バイオインフォマティクス技術の研究開発

医薬会セミナー

詳細

口頭発表(一般)

【特別講演】正確な塩基対推定のためのRNAの2次構造予測--分布を考えることの重要性--

分子計算研究会

詳細

口頭発表(一般)

【オーガナイズド講演】非整列RNA配列群からの頻出ステムパターンのマイニング

第9回情報論的学習理論ワークショップ (IBIS 2006)

詳細

口頭発表(一般)

ヒトトランスクリプトームにおける網羅的lncRNA-RNA 相互作用予測

第17回日本RNA学会年会

詳細

ポスター発表

RNA 発現量を用いた組織特異的 lncRNA-mRNA 相互作用予測

第17回日本RNA学会年会

詳細

ポスター発表

分子動力学計算を用いた蛋白質・RNA 複合体立体構 予測

第17回日本RNA学会年会

詳細

ポスター発表

早稲田大学理工学術院バイオインフォマティクス研究室におけるNGS関連研究の紹介

第4回NGS現場の会

詳細

ポスター発表

Pipeline for whole-genome analysis of heavy-ion-induced mutants in Arabidopsis thaliana

The 26th international conference on arabidopsis research

詳細

ポスター発表

全ゲノム変異解析のためのパイプラインの構築

日本育種学会 第127回講演会プログラム 2015年春季

詳細

ポスター発表

Pipeline for whole-genome analysis of heavy-ion-induced mutants

International Symposium on Genome Science 2015

詳細

ポスター発表

Prediction of joint RNA secondary structure by using their homologous sequence information

GIW2014

詳細

ポスター発表

Learning chromatin states with factorized information criteria

生命医薬情報学連合大会2014年大会

詳細

ポスター発表

A comprehensive prediction of RNA-RNA interactions from human transcriptome

2014RNAインフォマティクス道場

詳細

ポスター発表

RNA2次構造情報解析のための統合ウェブ

第16回日本RNA学会

詳細

ポスター発表

Credibility Limit は推定二次構造の定量的な信頼度を示す

第16回日本RNA学会

詳細

ポスター発表

RNA-タンパク質相互作用予測手法の開発

第16回日本RNA学会

詳細

ポスター発表

分子動力学計算を用いた蛋白質・RNA 複合体立体構造予測

第16回日本RNA学会

詳細

ポスター発表

Centroid series: fundamental programs of sequence analysis for non‐coding RNAs

バイオインフォマティクスとゲノム医療─その課題と将来展望─

詳細

ポスター発表

Using Deep Learning as a Classi er for Biological Datasets --Hepatitis Dataset as an example--

JSBi2013

詳細

ポスター発表

A method for calculating stability of RNA secondary structure

JSBi2013

詳細

ポスター発表

Inferring constraints on amino acids from protein sequence alignment

BIWO2013

詳細

ポスター発表

A fast and exact calculation for various score distributions of RNA secondary structure

BIWO2013

詳細

ポスター発表

The 3D structure prediction of Protein and RNA complex

BIWO2013

詳細

ポスター発表

Goichiro Hanaoka, Kiyoshi Asail

An efficient privacy-preserving similarity search protocol for chemical compound databases

詳細

ポスター発表

2次構造情報を基盤とした RNA バイオインフォマティクス技 術・ツールの最近の進展,第15回日本RNA学会

2013年7月

詳細

ポスター発表

タンパク質-RNA相互作用におけるRNA2次構造認識機構:塩基対確率に基づく解析

第15回日本RNA学会

詳細

ポスター発表

蛋白質-RNA の複合体立体構造予測

第15回日本RNA学会

詳細

ポスター発表

Privacy-preserving search for a chemical compound database

ISMB/ECCB 2013

詳細

ポスター発表

半教師あり学習を用いたRNA二次構造予測アルゴリズムの提案

第35回日本分子生物学会

詳細

ポスター発表

カノニカル分布に基づくRNA二次構造の存在確率分布記述手法の開発

第35回日本分子生物学会

詳細

ポスター発表

Developing Privacy-preserving database search protocol for chemical compound libraries

BIWO2012

詳細

ポスター発表

Reference Free Approach for Detecting Chromosomal Rearrangement

BIWO2012

詳細

ポスター発表

Semi-supervised Learning Approach to Predict RNA Secondary Structure

BIWO2012

詳細

ポスター発表

PBSIM: PacBio reads simulator - toward accurate genome assembly

BIWO2012

詳細

ポスター発表

A quantitation and visualization technique for understanding high dimensional distribution of RNA structures

BIWO2012

詳細

ポスター発表

BIWO2012

2012.

詳細

ポスター発表

BIWO2012

2012.

詳細

ポスター発表

Reference Free Approach for Detecting Chromosomal Rearrangement

IIBMP2012 (CBI/JSBi/Omix)

詳細

ポスター発表

A Method for Measuring RNA Secondary Structure Stability and Reliability Based on Canonical Distribution

IIBMP2012 (CBI/JSBi/Omix)

詳細

ポスター発表

Semi-supervised Learning Approach to Predict RNA Secondary Structure

IIBMP2012 (CBI/JSBi/Omix)

詳細

ポスター発表

A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

IIBMP2012 (CBI/JSBi/Omix)

詳細

ポスター発表

PBSIM: PacBio reads simulator - toward accurate genome assembly

IIBMP2012 (CBI/JSBi/Omix)

詳細

ポスター発表

カノニカル分布に基づいたRNAの2次構造安定性解析の開発

第14回日本RNA学会

詳細

ポスター発表

半教師あり学習を用いたRNAの2次構造予測アルゴリズムの提案

第14回日本RNA学会

詳細

ポスター発表

検索行動におけるプライバシ保護

2012年度人工知能学会全国大会(第26回)

詳細

ポスター発表

A CLASSIFICATION OF BIOINFORMATICS ALGORITHMS FROM THE VIEWPOINT OF MAXIMIZING EXPECTED ACCURACY (MEA)

BIWO2011 (Jan, 2012).

詳細

ポスター発表

PROBABILISTIC ALIGNMENTS WITH QUALITY SCORES: AN APPLICATION TO SHORT-READ MAPPING TOWARD ACCURATE SNP/INDEL DETECTION

BIWO2011 (Jan, 2012)

詳細

ポスター発表

DEVELOPING NOVEL PROTOCOL FOR PRIVACY-PRESERVING SEARCH OF BIT-VECTORS AND ITS APPLICATION TO THE CHEMICAL COMPOUNDS LIBRARY SEARCH

BIWO2011 (Jan, 2012)

詳細

ポスター発表

Introduction to RNA informatics and maximum expected accuracy (MEA) principle

東京大学医科学研究所中井研究室セミナー

詳細

ポスター発表

Protein-Coding Based Assembly of Metagenomic Next-Generation Sequencing Data

CBI/JSBi2011

詳細

ポスター発表

Privacy preserving search for chemical compound libraries

CBI/JSBi2011

詳細

ポスター発表

2次構造情報に基づくRNA情報解析技術の現在と今後

RNA/RNPを見つける会2011

詳細

口頭発表(一般)

A New Approach to Elucidate Genomic Landscapes in MultiscaleResolution

5th Asian Young Researchers Conferenceon Computational and Omics Biology(AYRCOB)

詳細

ポスター発表

Centroidシリーズ:2次構造を基盤としたRNA情報解析ツール群

次世代バイオインフォマティクス研究会2011

詳細

ポスター発表

リードのクオリティ情報を考慮したNGSデータ解析技術

次世代バイオインフォマティクス研究会2011

詳細

ポスター発表

プライバシー保護配列解析技術の開発に向けて

次世代バイオインフォマティクス研究会2011

詳細

ポスター発表

Centroid series: fundamental programs of sequence analysis for non-coding RNAs

The 16th Annual Meeting of the RNA Society (RNA2011)

詳細

ポスター発表

Antagonistic RNA Aptamer Specific to a Heterodimeric Form of Human Interleukin-17 A/F

The 16th Annual Meeting of the RNA Society (RNA2011)

詳細

ポスター発表

Binary Estimation Problems in Structural Information Analysis of RNA

The 16th Annual Meeting of the RNA Society (RNA2011). Jun 2011.

詳細

ポスター発表

RactIP: Fast and Accurate Prediction of RNA-RNA Interaction Using Integer Programming

The 16th Annual Meeting of the RNA Society (RNA2011). Jun 2011.

詳細

ポスター発表

IPknot: Fast and Accurate Prediction of RNA Secondary Structures with Pseudoknots Using Integer Programming

The 16th Annual Meeting of the RNA Society (RNA2011). Jun 2011.

詳細

ポスター発表

Probabilistic alignments with quality scores: An application to short-read mapping toward accurate SNP/indel detection

第一回NGS現場の会研究会

詳細

ポスター発表

Software tools for RNA sequence analysis in ncrna.org

PSB2011.

詳細

ポスター発表

Centroid series: fundamental programs of sequence analysis for non-coding RNAs

ISMB2010. [Poster

詳細

ポスター発表

RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming

ISMB2010(Poster).

詳細

ポスター発表

How (not) to Align Genomes

The 20th International Conference on Genome Informatics December 14-16

詳細

ポスター発表

CentroidFold: Predictions of RNA Secondary Structure for Estimating Accurate Base-pairs

第32回日本分子生物学会

詳細

ポスター発表

CentroidHomfold: Prediction of RNA secondary structure by combining homologous sequence information

第32回日本分子生物学会

詳細

ポスター発表

CentroidHomfold: 相同配列群の情報を利用したRNAの2次構造予測

CBRC 2009

詳細

口頭発表(一般)

CBRC 2009

2009/12/04. (ポスター発表

詳細

ポスター発表

CentroidHomfold: 相同配列群の情報を利用したRNAの2次構造予測

生命情報科学研究セミナー

詳細

口頭発表(一般)

期待精度最大化推定とバイオインフォマティクス

第21回T-PRIMALセミナー

詳細

口頭発表(一般)

Centroid シリーズ:RNA の2構造予測/アラインメントのためのツール群

第 8 回 新しい RNA/RNP を見つける会

詳細

口頭発表(一般)

CentroidFold: Predictions of RNA Secondary Structure for Estimating Accurate Base-pairs

The 9th Workshop on Algorithms in Bioinformatics (WABI 2009). (Poster presentation

詳細

ポスター発表

CentroidFold: RNA 二次構造予測ウェブサーバー

第11回RNAミーティング

詳細

口頭発表(一般)

Predictions of RNA secondary structure by combining homologous sequence information

IThe 17th Annual International Conference on Intelligent Systems for Molecular Biology and 7th Annual European Conference on Computational Biology (ISMB/ECCB 2009). (Oral presentation

詳細

ポスター発表

CentroidFold: Predictions of RNA Secondary Structure for Estimating Accurate Base-pairs,The 17th Annual International Conference on Intelligent Systems for Molecular Biology and 7th Annual European Conference on Computational Biology (ISMB/ECCB 2009). (poster presentation

Reviewed international conference)

詳細

ポスター発表

正確な塩基対推定のためのRNAの2次構造予測 分布を考えることの重要性

第1回生命情報科学若手の会

詳細

口頭発表(一般)

A Non-Parametric Bayesian Approach for Predicting RNA Secondary Structures

The 2008 Annual Conference of the Japanese Society for Bioinformatics (JSBi2008). (poster presentation

詳細

ポスター発表

期待精度を最大化するRNA情報解析手法の開発

第31回日本分子生物学会(BMB2008)

詳細

ポスター発表

期待精度を最大化するRNAの2次構造予測手法

CBRC2008

詳細

口頭発表(一般)

第26回生命情報科学研究セミナー

2008/9/26.(口頭発表

詳細

口頭発表(一般)

期待精度を最大化するRNA情報解析手法の開発

新しいRNA/RNPを見つける会

詳細

口頭発表(一般)

RNA配列群に現れる局所安定2次構造の大規模類似性探索

CBRC 2007

詳細

ポスター発表

Large-Scale Similarity Search for Locally Stable Secondary Structures among RNA Sequences

JSBI2007. (poster presentation)

詳細

ポスター発表

RNA配列群に現われる局所安定2 次構造の大規模類似性探索

第6回新しいRNA/RNPを見つける会

詳細

口頭発表(一般)

Mining Local Secondary Structure Motifs from Unaligned RNA Sequences Using Graph Mining Techniques

5th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conference on Computational Biology (ECCB)

詳細

ポスター発表

RNAmine: Frequent Stem Pattern Miner from RNAs

The 17th International Conference on Genome Informatics (GIW2006).(poster presentation)

詳細

ポスター発表

非整列RNA配列群からの頻出ステムパターンのマイニング

東北大学理学部数学科情報学セミナー

詳細

口頭発表(一般)

RNA配列群からの頻出ステムパターンの抽出

新しいRNA/RNPを見つける会 in お台場

詳細

口頭発表(一般)

サポートベクトルマシンを用いた機能性RNAファミリーの分類

新しいRNA/RNPを見つける会 in 鶴岡

詳細

口頭発表(一般)

Support Vector Machineを用いた機能性RNAファミリーの分類

第7回日本RNA学会

詳細

口頭発表(一般)

DrugMLとGrid創薬

日本コンピュータ化学会2004春季年会

詳細

ポスター発表

Grid技術とXMLデータベースを用いた創薬プラットフォームの構築とその応用

32回構造活性相関シンポジウム

詳細

ポスター発表

DrugMLとグリッド創薬

31回構造活性相関シンポジウム

詳細

ポスター発表

「ナレッジ活用による研究支援環境 知見プラットフォーム」のご紹介 3次元SEM像シミュレータへの適用

XSLSI テスティングシンポジウム/2002. (口頭発表)

詳細

口頭発表(一般)

外部研究資金

科学研究費採択状況

研究種別:若手研究(A)

機能エレメントと深層学習に基づく長鎖ノンコーディングRNAの機能分類

2016年04月-2020年03月

研究分野:生命・健康・医療情報学

配分額:¥23400000

研究種別:新学術領域研究(研究領域提案型)

ヒストンバリアントに基づくクロマチンの機能の推定

2016年04月-2018年03月

配分額:¥2730000

研究種別:挑戦的萌芽研究

プライバシー保護バイオインフォマティクス基盤技術の開発と応用

2013年-2015年

研究分野:生命・健康・医療情報学

配分額:¥3770000

研究種別:若手研究(A)

修飾・編集RNAの構造予測手法の研究開発

2012年-2014年

研究分野:生体生命情報学

配分額:¥14300000

学内研究制度

特定課題研究

lncRNA-RNA相互作用の網羅的予測と実験情報を統合したデータベースの構築

2015年度

研究成果概要:本研究では、第一に、高速にRNA-RNAの相互作用を予測するためのパイプラインシステムを構築した。さらに、パイプラインシステムを京コンピュータに実装した。第2に、このパイプラインを用いてヒトのlncRNAを対象に網羅的な相互作用相...本研究では、第一に、高速にRNA-RNAの相互作用を予測するためのパイプラインシステムを構築した。さらに、パイプラインシステムを京コンピュータに実装した。第2に、このパイプラインを用いてヒトのlncRNAを対象に網羅的な相互作用相手の予測を行い、得られた結果をデータベースとして公開を行った。APBC2016において、浜田が口頭発表を行うと同時に、ジャーナル論文(BMC Genomics)に論文が掲載された。

エピジェネティクスデータからクロマチン状態を推定する方法論の研究と応用

2014年度

研究成果概要:Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so...Motivation: Recent studies have suggested that both the genome and the genome with epigenetic modifications, the so-called epigenome, play important roles in various biological functions, such as transcription and DNA replication, repair, and recombination. It is well known that specific combinations of histone modifications (e.g. methylations and acetylations) of nucleosomes induce chromatin states that correspond to specific functions of chromatin. Although the advent of next-generation sequencing (NGS) technologies enables measurement of epigenetic information for entire genomes at high-resolution, the variety of chromatin states has not been completely characterized. Results: In this study, we propose a method to estimate the chromatin states indicated by genome-wide chromatin marks identified by NGS technologies. The proposed method automatically estimates the number of chromatin states and characterize each state on the basis of a hidden Markov model (HMM) in combination with a recently proposed model selection technique, factorized information criteria. The method is expected to provide an unbiased model because it relies on only two adjustable parameters and avoids heuristic procedures as much as possible. Computational experiments with simulated datasets show that our method automatically learns an appropriate model, even in cases where methods that rely on Bayesian information criteria fail to learn the model structures. In addition, we comprehensively compare our method to ChromHMM on three real datasets and show that our method estimates more chromatin states than ChromHMM for those datasets.

エピゲノムの統合的理解に向けた情報技術の開発とデータ駆動型生物学の実践

2015年度

研究成果概要:今年度は、昨年度発表した論文[1]のプログラムの、ソースコードの一般公開に向けて、プログラムの整理、および、改良を行った。具体的には、各位置においてクロマチン状態の事後確率が出力可能となるように変更を行った。[1] Michiak...今年度は、昨年度発表した論文[1]のプログラムの、ソースコードの一般公開に向けて、プログラムの整理、および、改良を行った。具体的には、各位置においてクロマチン状態の事後確率が出力可能となるように変更を行った。[1] Michiaki Hamada*, Yukiteru Ono, Ryohei Fujimaki, Kiyoshi Asai, Learning chromatin states with factorized information criteria, Bioinformatics, Bioinformatics (2015) doi: 10.1093/bioinformatics/btv163 First published online: March 24, 2015

現在担当している科目

科目名開講学部・研究科開講年度学期
理工学基礎実験1B IIブロック基幹理工学部2018秋学期
理工学基礎実験1B IIブロック創造理工学部2018秋学期
理工学基礎実験1B IIブロック先進理工学部2018秋学期
Javaプログラミング入門 電生2クラス(月2)先進理工学部2018春学期
Javaプログラミング 電生2クラス(月2)先進理工学部2018秋学期
電気・情報生命工学フロンティア先進理工学部2018春学期
電気・情報生命工学フロンティア  【前年度成績S評価者用】先進理工学部2018春学期
電気・情報生命工学実験C先進理工学部2018秋学期
電気・情報生命工学実験C  【前年度成績S評価者用】先進理工学部2018秋学期
プロジェクト研究A先進理工学部2018春学期
プロジェクト研究A  【前年度成績S評価者用】先進理工学部2018春学期
プロジェクト研究B先進理工学部2018秋学期
プロジェクト研究B  【前年度成績S評価者用】先進理工学部2018秋学期
卒業研究A先進理工学部2018春学期
卒業研究A  【前年度成績S評価者用】先進理工学部2018春学期
卒業研究B先進理工学部2018秋学期
卒業研究B (春学期) 先進理工学部2018春学期
卒業研究B  【前年度成績S評価者用】先進理工学部2018秋学期
卒業研究B (春学期)  【前年度成績S評価者用】先進理工学部2018春学期
バイオインフォマティクス先進理工学部2018秋学期
Graduation Thesis A先進理工学部2018秋学期
Graduation Thesis B先進理工学部2018春学期
修士論文(電生)大学院先進理工学研究科2018通年
Research on Bioinformatics大学院先進理工学研究科2018通年
バイオインフォマティクス研究大学院先進理工学研究科2018通年
バイオインフォマティクス特論大学院先進理工学研究科2018春学期
Advanced Seminar A大学院先進理工学研究科2018春学期
特別演習A大学院先進理工学研究科2018春学期
Advanced Seminar B大学院先進理工学研究科2018秋学期
特別演習B大学院先進理工学研究科2018秋学期
Seminar on Bioinformatics A大学院先進理工学研究科2018春学期
バイオインフォマティクス演習A大学院先進理工学研究科2018春学期
Seminar on Bioinformatics B大学院先進理工学研究科2018秋学期
バイオインフォマティクス演習B大学院先進理工学研究科2018秋学期
Seminar on Bioinformatics C大学院先進理工学研究科2018春学期
バイオインフォマティクス演習C大学院先進理工学研究科2018春学期
Seminar on Bioinformatics D大学院先進理工学研究科2018秋学期
バイオインフォマティクス演習D大学院先進理工学研究科2018秋学期
Master's Thesis (Department of Electrical Engineering and Bioscience)大学院先進理工学研究科2018通年
バイオインフォマティクス研究大学院先進理工学研究科2018通年

他機関等の客員・兼任・非常勤講師等

2017年04月日本医科大学(日本)客員教授
2016年10月産業技術総合研究所・生体システムビッグデータオープンイノベーションラボラトリ(CBBD-OIL)(日本)班長・招聘研究員