LinearFold GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA. Liang Huang. Oregon State University & Baidu Research

Size: px

Start display at page:

Download "LinearFold GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA. Liang Huang. Oregon State University & Baidu Research"

Earl Francis
5 years ago
Views:

1 LinearFold Linear-Time Prediction of RN Secondary Structures Liang Huang Oregon State niversity & Baidu Research Joint work with Dezhong Deng Oregon State / Baidu and Kai Zhao Oregon State / oogle and David Hendrix Oregon State and David Mathews Rochester

2 LinearFold Linear-Time Prediction of RN Secondary Structures x y... Liang Huang Oregon State niversity & Baidu Research Joint work with Dezhong Deng Oregon State / Baidu and Kai Zhao Oregon State / oogle and David Hendrix Oregon State and David Mathews Rochester

3 Bit bout Myself Ph.D., 2008 Research Scientist, 2009 ssistant Professor, Principal Scientist, my main area is computational linguistics aka natural language processing where I develop faster linear-time algorithms to understand/translate languages but I also apply these algorithms to computational structural biology 2

4 RN Structure Prediction and Design RISPR/as9: gene editing RN sequence M. tuberculosis 3

5 RN Structure Prediction and Design RISPR/as9: gene editing RN sequence structure prediction RN secondary structure RN 3D structure M. tuberculosis 3

6 RN Structure Prediction and Design RISPR/as9: gene editing RN sequence structure prediction RN secondary structure design RN 3D structure M. tuberculosis 3

7 Why Eterna Needs Faster Structure Prediction? 4

8 RN Structure Prediction Folding allowed pairs: assume no crossing pairs example: transfer RN trn x y... parse tree

9 RN Structure Prediction Folding allowed pairs: assume no crossing pairs example: transfer RN trn x y... parse tree algorithms are way too slow: On challenge: existing structure prediction

10 RN Structure Prediction Folding allowed pairs: assume no crossing pairs example: transfer RN trn x y... parse tree 1 algorithms are way too slow: On solution: borrow linear-time algorithms 20 from natural language parsing challenge: existing structure prediction

11 Our Linear-Time Prediction is Much Faster 10,000nt ~HIV 244,296nt longest in RNcentral 4min 7s ~200hrs 120s 9 2 hrs running time per sequence sec Vienna RNfold, ~n 2.6 ONTRfold MFE, ~n 2.6 LinearFold b=100, ~n 1.0 LinearFold b=50, ~n nt 2000nt 3000nt s s s s Vienna RNfold: n 2.6 ONTRfold MFE: n 2.6 LinearFold b=100: n 1.0 LinearFold b=050: n nt 10 4 nt 10 5 nt with even slightly better prediction accuracy!! 6

12 omputational Linguistics => omputational Biology linguistics 1955 homsky: context-free grammars computer science 1958 Backus & Naur: Fs in programming lang ocke \ 1965 Kasami - KY Parsing: On Younger / 1965 Knuth: LR Parsing: On biology 1953 Watson & rick: DN double-helix 1980s: On 3 KY for RN structures 1986 Tomita: eneralized LR Parsing ~1990: linear-time greedy parsing 2010: linear-time DP parsing Huang & Sagae 2018: LinearFold: On RN structure prediction 7

$1964 ocke \ 1965 Kasami - KY Parsing: On 3 1967 Younger / 1965 Knuth: LR Parsing: On 1970 Joshi:$ 1980s: On 3 KY for RN structures 1985 Shieber: non-f languages 1986 Tomita: eneralized LR Parsing

1980s: On 3 KY for RN structures 1985 Shieber: non-f languages 1986 Tomita: eneralized LR Parsing

13 omputational Linguistics => omputational Biology linguistics 1955 homsky: context-free grammars computer science 1958 Backus & Naur: Fs in programming lang ocke \ 1965 Kasami - KY Parsing: On Younger / 1965 Knuth: LR Parsing: On 1970 Joshi: tree-adjoining grammars 1985 KY-style T parsing in On 6 biology 1953 Watson & rick: DN double-helix 1980s: On 3 KY for RN structures 1985 Shieber: non-f languages 1986 Tomita: eneralized LR Parsing ~1990: linear-time greedy parsing 1999: Ts for RN pseudoknots 2010: linear-time DP parsing Huang & Sagae 2018: LinearFold: On RN structure prediction 7

14 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j 8

15 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j... 8

16 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

17 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

18 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

19 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

20 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

21 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

22 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

23 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

24 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j i k j

25 urrent Structure Prediction Method: On 3 Dynamic Programming On 3 bottom-up KY parsing example: maximize # of pairs i i+1 j-1 j. i k j

26 How to Fold RNs in Linear-Time? 5 3 idea 0: tag each nucleotide from left to right maintain a stack: push, pop, skip. exhaustive: O3 n Huang and Sagae,

27 How to Fold RNs in Linear-Time? idea 0: tag each nucleotide from left to right maintain a stack: push, pop, skip. exhaustive: O3 n Huang and Sagae,

28 How to Fold RNs in Linear-Time? idea 1: DP by merging equivalent states maintain graph-structured stacks DP: On 3 Huang and Sagae,

29 How to Fold RNs in Linear-Time? idea 1: DP by merging equivalent states maintain graph-structured stacks DP: On 3 Huang and Sagae,

30 How to Fold RNs in Linear-Time? idea 2: approximate search: beam pruning keep only top b states per step DP+beam: On Huang and Sagae,

31 How to Fold RNs in Linear-Time? idea 2: approximate search: beam pruning keep only top b states per step DP+beam: On each DP state corresponds to exponentially many non-dp states graph-structured stack SS Tomita, 1986 Huang and Sagae,

32 How to Fold RNs in Linear-Time? idea 2: approximate search: beam pruning keep only top b states per step DP+beam: On each DP state corresponds to exponentially many non-dp states graph-structured stack SS Tomita, 1986 Huang and Sagae,

33 nother View: Left-to-Right KY many variants of KY ~ various topological ordering S, 0, n S, 0, n S, 0, n bottom-up left-to-right right-to-left all On 3, but the incremental ones can apply beam search to run in On 13

34 nother View: Left-to-Right KY many variants of KY ~ various topological ordering S, 0, n S, 0, n S, 0, n bottom-up left-to-right right-to-left all On 3, but the incremental ones can apply beam search to run in On 13

35 nother View: Left-to-Right KY many variants of KY ~ various topological ordering S, 0, n S, 0, n S, 0, n bottom-up left-to-right right-to-left all On 3, but the incremental ones can apply beam search to run in On 13

36 Our Linear-Time Prediction is Much Faster 10,000nt ~HIV 244,296nt longest in RNcentral 4min 7s ~200hrs 120s 9 2 hrs running time per sequence sec Vienna RNfold, ~n 2.6 ONTRfold MFE, ~n 2.6 LinearFold b=100, ~n 1.0 LinearFold b=50, ~n nt 2000nt 3000nt s s s s Vienna RNfold: n 2.6 ONTRfold MFE: n 2.6 LinearFold b=100: n 1.0 LinearFold b=050: n nt 10 4 nt 10 5 nt with even slightly better prediction accuracy!! 14

37 On to details...

38 n Example Path push push skip pop pop 16

39 n Example Path push push skip pop pop 16

40 n Example Path push push skip pop pop 16

41 n Example Path push push skip pop pop 16

42 n Example Path push push skip pop pop 16

43 n Example Path push push skip pop pop 16

44 Version 1: Exhaustive Search O3 n 17

45 Version 1: Exhaustive Search O3 n 18

46 Version 1: Exhaustive Search O3 n 19

47 Version 1: Exhaustive Search O3 n 20

48 Version 1: Exhaustive Search O3 n 21

49 Version 1: Exhaustive Search O3 n 22

50 Idea 1a: Merge Identical Stacks Merge states with the same full stack unpaired openings: Equivalent States 23

51 Version 2: Merge by Full Stack O2 n exhaustive full-stack merge 24

52 Version 2: Merge by Full Stack O2 n merge states with identical exhaustive stacks full-stack merge 25

53 Version 2: Merge by Full Stack O2 n exhaustive full-stack merge O2 n 26

54 Idea 1b: Merge Temporary Equivalents Merge states with the same top of the stack last unpaired opening: O2 n Temporarily Equivalent States 27

55 Version 3: Merge by Stack Top On 3 28

56 Version 3: Merge by Stack Top On 3 packing temporarily equivalent states 28

57 Version 3: Merge by Stack Top On 3 29

58 Version 3: Merge by Stack Top On 3 30

59 Version 3: Merge by Stack Top On 3 unpacking 31

60 Version 3: Merge by Stack Top On 3 unpacking packing 31

61 Version 3: Merge by Stack Top On 3 O2 n packing 32

62 lose p Look at Two Paths 33

63 lose p Look at Two Paths 34

64 lose p Look at Two Paths 34

65 lose p Look at Two Paths 34

66 lose p Look at Two Paths 34

67 Idea 3: Beam Pruning full-stack merge O2 n stack-top merge 35

68 Version 4: DP with Beam Search On stack-top merge +beam pruning 36

69 Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx. DP via beam search: On this is a simple illustration that we just maximize the number of pairs our real systems work with complicated feature templates no DP..... O3 n DP O2 n DP+SS ?.. On ?.? LinearFold..?.?..... On approx. DP full stack merge +SS +beam O2 n 37

70 Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx. DP via beam search: On this is a simple illustration that we just maximize the number of pairs our real systems work with complicated feature templates no DP..... O3 n DP O2 n DP+SS ?.. On ?.? LinearFold..?.?..... On approx. DP full stack merge +SS +beam O2 n 37

71 Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx. DP via beam search: On this is a simple illustration that we just maximize the number of pairs our real systems work with complicated feature templates no DP..... O3 n DP O2 n DP+SS ?.. On ?.? LinearFold..?.?..... On approx. DP full stack merge +SS +beam O2 n 37

72 Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx. DP via beam search: On this is a simple illustration that we just maximize the number of pairs our real systems work with complicated feature templates no DP..... O3 n DP O2 n DP+SS ?.. On ?.? LinearFold..?.?..... On approx. DP full stack merge +SS +beam O2 n 37

Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx.

73 Recap: O3 n to On 3 to On 5 search algorithms DP: bottom-up KY: On 3 left-to-right exhaustive: O3 n DP: merge by full stack: O2 n DP: merge by stack top: On 3 approx. DP via beam search: On this is a simple illustration that we just maximize the number of pairs our real systems work with complicated feature templates no DP..... O3 n DP O2 n DP+SS ?.. On ?.? LinearFold..?.?..... On approx. DP full stack merge +SS +beam O2 n 37

74 onnections to Incremental Parsing shared key observation: local ambiguity packing pack non-crucial local ambiguities along the way unpack in a reduce action only when needed psycholinguistic evidence eye-tracking experiments: delayed disambiguation John and Mary had 2 papers John and Mary had 2 papers Frazier and Rayner 1990, Frazier 1999 Huang and Sagae,

75 onnections to Incremental Parsing shared key observation: local ambiguity packing pack non-crucial local ambiguities along the way unpack in a reduce action only when needed psycholinguistic evidence eye-tracking experiments: delayed disambiguation John and Mary had 2 papers John and Mary had 2 papers each together Frazier and Rayner 1990, Frazier 1999 Huang and Sagae,

76 Experiments

77 pplying Prediction Models on LinearFold models from two most widely-used systems ONTRfold MFE machine-learned Vienna RNfold thermodynamic we linearized both systems from On3 to On efficiency systems time space machine-learned thermo-dynamic baselines On 3 On 2 ONTRfold Vienna RNfold our work On On LinearFold- LinearFold-V 40

78 Efficiency & Scalability tested on two datasets 9 2 hrs memory used rchive II dataset labeled, like Penn Treebank #: 2,889, avglen: 222nt; up to 2,927nt used for both efficiency and accuracy evaluations RNcentral dataset unlabeled, like igaword over 1.3M of unlabeled sequences; up to 244,296nt only used for efficiency/scalability evaluations 10B 1B 100MB 10MB 10 3 nt 10 4 nt 10 5 nt ONTRfold MFE, ~n 2.0 Vienna RNfold: ~n 2.0 LinearFold b=100, ~n 1.0 LinearFold b=50, ~n 1.0 running time per sequence sec Vienna RNfold, ~n 2.6 ONTRfold MFE, ~n 2.6 LinearFold b=100, ~n 1.0 LinearFold b=50, ~n nt 2000nt 3000nt Vienna RNfold: n 2.6 ONTRfold MFE: n 2.6 LinearFold b=100: n 1.0 LinearFold b=050: n nt 10 4 nt 10 5 nt 41

79 ccuracy use beam=100 for LinearFold Tested on rchive II dataset on a family-by-family basis significant improvements on 3 longer families biggest improvements on the longest families: 16S/23S rrns PPV Sensitivity trn 5S rrn * SRP RNaseP * * tmrn * * ONTRfold MFE LinearFold- b=100 Vienna RNfold LinearFold-V b=100 ** * roup I Intron * * telomerase RN 16S rrn ONTRfold MFE LinearFold- b=100 Vienna RNfold LinearFold-V b=100 * * * * ** 23S rrn ** trn SRP 5S rrn tmrn RNaseP roup I Intron 23S rrn 16S rrn telomerase RN % ONTRfold LinearFold- Vienna LinearFold-V PPV precision Sensitivity recall

80 Beam Size and Search Quality beam search achieves good search quality starting b=100 slightly under-predicts compared to exact search, but close average free energy Vienna RNfold LinearFold-V ONTRfold MFE LinearFold beam size average model cost number of pairs predicted Vienna RNfold LinearFold-V round Truth ONTRfold MFE LinearFold beam size 43

81 Beam Size and PPV/Sensitivity PPV/Sensitivity trade-off as a function of beam size stable PPV/Sensitivity around b= PPV ONTRfold MFE LinearFold- ONTRfold MFE LinearFold beam size Sensitivity Sensitivity ONTRfold MFE LinearFold b=150 b=120 b= b=20 b=300 b=75 b= PPV 44

82 Improvements on Long-Range Pairs long-distance pairs are well-known to be hard to predict but our beam search systems are even better on those 70 ONTRfold MFE LinearFold- b= ONTRfold MFE LinearFold- b= PPV Sensitivity > > base pair distance base pair distance 45

83 Incremental Folding left-to-right vs. right-to-left folding cotranscriptional folding? PPV / Sensitivity linear PPV linear Sensitivity right-to-left PPV right-to-left Sensitivity contrafold PPV contrafold Sensitivity negative free energy 46

84 Example Predictions: ONTRfold 47

85 Example Predictions:Vienna RNfold 48

86 Web Demo 1 Beam Sizes 49

87 Web Demo 2 Live Demo 50

88 Bonus Track: Deep Learning for RN Folding

89 Existing Models 52

90 Existing Models Physics-based energy model hundreds of carefully designed and experimentally-measured thermodynamic parameters 52

91 Existing Models Physics-based energy model hundreds of carefully designed and experimentally-measured thermodynamic parameters Machine Learning based model claimed without physics, but still using physics-based feature templates just learn the feature weights parameters from data 52

92 Existing Models Physics-based energy model hundreds of carefully designed and experimentally-measured thermodynamic parameters Machine Learning based model claimed without physics, but still using physics-based feature templates just learn the feature weights parameters from data an we really do it without physics? 52

93 Deep Learning idealized ML actual ML deep learning representation learning Input x Output y Training Model w Input x Output y feature map ϕ Training Model w Input x Output y Training feature map ϕ Model w 53

94 Deep Learning idealized ML actual ML deep learning representation learning Input x Output y Training Model w Input x Output y feature map ϕ Training Model w Input x Output y Training feature map ϕ Model w Deep Learning for RN structure prediction RNNs automatically extract features and learn their weights ompletely without physics First deep learning RN secondary structure prediction algorithm 53

95 Deep Learning idealized ML actual ML deep learning representation learning Input x Output y Training Model w Input x Output y feature map ϕ Training Model w Input x Output y Training feature map ϕ Model w Deep Learning for RN structure prediction RNNs automatically extract features and learn their weights ompletely without physics First deep learning RN secondary structure prediction algorithm We borrow techniques from natural language parsing where deep learning has been very successful since ~2016 our group s span-based parsing framework is now dominant 53

96 Our pproach: from NLP Span Parsing Span differences are taken from an encoder in this case: a bi-lstm span is scored and labeled by a feedforward network si, j, X s The score of a tree is the sum of all the labeled span scores f j f i,b i b j f 0 f 1 f 2 f 3 f 4 f 5 s tree t = P si, j, X i,j,x2t s 0 You 1 should 2 eat 3 ice 4 cream 5 /s b 0 b 1 b 2 b 3 b 4 b 5 ross + Huang 2016 Stern et al Wang + hang

97 Our pproach: RN structure predict. Replace words with nucleotides,,, as inputs si, j, X 3 labels for a span: s Left-unpair. f j f i,b i b j Left-pair f 0 f 1 f 2 f 3 f 4 f 5 Pair s /s b 0 b 1 b 2 b 3 b 4 b 5 55

98 Experiments database: S-Full ndronescu et al. 2008; 2010 preprocessing cap at 700nt; removing pseudoknots and non-canonical base-pairs; 3245 distinct structures; 80% / 20% split for training / set randomly split into training set 2586 and testing set 659 can be found system precision recall F-score Physics-based Vienna RNfold Machine learning ONTRfold off the shelf ONTRfold retrained Deep learning Our model

99 非常感谢! fēi cháng gǎn xiè Just google LinearFold

100 非常感谢! fēi cháng gǎn xiè Just google LinearFold Thank you very much!

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set Tianze Shi* Liang Huang Lillian Lee* * Cornell University Oregon State University O n 3 O n