Machine Learning : supervised versus unsupervised

Size: px
Start display at page:

Download "Machine Learning : supervised versus unsupervised"

Transcription

1 Machine Learning : supervised versus unsupervised Neural Networks: supervised learning makes use of a known property of the data: the digit as classified by a human the ground truth needs a training set, and runs through many feedback cycles has (very) many parameters that need to be optimised (danger of overfitting the data) performance is assessed with the test set In contrast, unsupervised learning: does not have free parameters that need to be optimised does not need to use or know the ground truth does not overfit the data has good performance with low numbers of data sets 1

2 Work in our lab: assessing the homo- or heterogeneity of noisy experimental data Why do an experiment? because we want to find out a property (or several) of an object Why do multiple experiments? because the experimental error (noise) may be high and the signal weak, the signal-tonoise-ratio (SNR) may be low: the property cannot be measured/detected accurately averaging of results from multiple experiments improves the SNR because random differences cancel out (following a N law if we do N measurements) What is the problem? 2 anything that systematically changes the measured value between experiments leads to different values of the property we need to cluster our experiments (i.e. find objects that belong to same population) before averaging. Within a cluster, the property of interest should be the same.

3 Nomenclature, properties, and a few examples in general: heterogeneity versus homogeneity in crystallography: non-isomorphism versus isomorphism (from the Ancient Greek: ἴσος isos "equal", and μορφή morphe "form" or "shape") homogeneity is required (assumed) to hold if data should be averaged. Data must be from the same population. violation of this assumption can lead to additional error: the averaged data no longer represent the object. 1) can/should images in Electron Microscopy be averaged? Only if the molecules have the same composition and conformation, and the projection is the same 2) can/should the weak data sets from two crystallographic experiments be averaged, to obtain a better data set? Only possible if indexed consistently, and crystals are from identical protein preparations. 3) do two amino-acid sequences belong to the same protein family? If so, they can be aligned, and identical residues may have an important function. 4) do noisy IR spectra from different samples describe the same object, or are the molecules actually different? 3

4 Crystallographic example: two forms of lysoyzme 3aw6 3aw7 RMSD = 0.18 Å Δcell = 0.7 % 4 RH: 84.2% vs 71.9% Riso = 44.5%

5 Crystallography: multiple crystals/datasets Femtosecond X-ray protein nanocrystallography Chapman et al. (2011) Nature 470, nanocrystals of photosystem I, one of the largest membrane protein complexes. More than 3,000,000 diffraction patterns were collected in this study, and a three-dimensional data set was assembled from individual photosystem I nanocrystals (~200 nm to 2 μm in size). (15445 xtals used) Data collection at XFEL (LCLS, Stanford) 5

6 Assessing heterogeneity By way of an easy-to-measure quantity ( proxy ) Crystallography: cell parameters are easy to measure (but the really interesting data are the intensities of the reflections) (Until now) we didn't understand what differences of reflection intensities are due to (are they random or systematic??), but for difference of cell parameters there is a simply theory that tells us how different they should be at most Agreement of cell parameters is only a necessary, but not a sufficient condition for isomorphism More general: data-based approach comparison of datasets based on pairwise correlation coefficients cc ij = 6 ( x k x )( y k y ) (x k x )2 ( y k y )2 ccij= hierarchical cluster analysis

7 Mathematical treatment Need to separate the random error from the systematic error 2 2 total error (difference of values that should be equal) is random + systematic pairwise CC has contributions from total error, i.e. form both sources of error separation of random and systematic errors is not generally possible A new way to analyze CCs Brehm and Diederichs (2014) minimize ϕ ({ x })= (ccij x i x j )2 i>j with {x}={x1,x2,...xn} where xi and xj are N vectors in n-dimensional space representing the datasets, and ccij is 7 (Pearson's) correlation coefficient between intensities of datasets i and j with n = 2 or 4, this solves the indexing ambiguity ( twinning) present in point groups 3, 4, 6, 312, 321 and 23, and additional cases with particular values of cell parameters. This type of analysis is called Multidimensional Scaling; the method has no name so far It turns XFEL data collection into a technique with general applicability

8 The idea (n=2) x1 x2 x3 x 1 x 2 =cc12 x 1 x 3 =cc13 x 2 x 3=cc n*n unknowns N*(N-1)/2 knowns overdetermined system of equations if N > 2*n 8

9 Least-squares iterations starting from random positions each point represents one dataset with one of two indexing modes Brehm, W. & Diederichs, K. (2014) Breaking the indexing ambiguity in serial crystallography. Acta Cryst. (2014). D70,

10 Which information can be extracted from the matrix of pairwise CCs? The analysis (Acta D73, ) shows that 2 the least-squares solution of ϕ ({ x })= (ccij x i x j ) exists and is "unique" if ccij known i>j the solution can be obtained from the n Eigenvalues/Eigenvector of the ccij matrix the x i vectors are arranged in a sphere with radius 1, in n-dimensional space vectors can be given as coordinates, or (better) length and spherical angles Amount of signal the length of a vector is CC*, the correlation with its prototype ( true ) dataset, and depends on the random error of the dataset CC* may be calculated from multiple observations in a dataset (crystallography) Relation between datasets 10 angle between x i and x j is proportional to the systematic difference between i and j ccij = CC*i CC*j cos(angle( x i, x j))

11 Example I: two kinds of noisy images noisy images (SNR=1/13 and SNR=1/9) of original and mirror picture 11 Result of averaging without knowledge whether original image, or its mirror

12 CC analysis with n=2 weakest S/N ratio 12

13 Result of averaging - after clustering 100 original 13 mirror

14 Example 2: MNIST data - classify handwritten digits Wikipedia: The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems

15 CC analysis in n=2 dimensions; N=6000 images 0,1 0,1,2,3 15 0,1,2 0,1,2,3,4

16 CC analysis in n=3 dimensions: 0,1,2,3,4 N=6000 The higher the possible number of systematic ways in which data sets may differ, the higher n must be otherwise no segmentation of data sets is achieved. n should correspond to the number of large Eigenvalues 16

17 CC analysis in n=3 dimensions; digits 0,1,2,3 N=6000 N= N=60

18 Example 3: PS I data: N=15445, n=3 after resolving the indexing ambiguity x-axis along strongest Eigenvector y-axis along second Eigenvector 18 second (x-axis) and third (y-axis) Eigenvectors (the view is along the first Eigenvector)

19 Example 4: attractors in 5 μs MD trajectory of a hepta aspartic acid N= conformations described by the 84 distances between Cα,γi and Cα,γj atoms (i j) ~ CCs n=5 (Collaboration with S. Hunkler, O. Kukharenko, C. Peter) 19

20 Summary Unsupervised learning A method that separates random and systematic effects "Segmentation" of objects; this enables clustering and e.g. averaging of similar ones Mathematical properties: "unique" solution; complexity ~ O(N2) Limitation: the axes of low-dimensional space are not "labelled" with specific properties Applications in Structural Biology: crystallographic datasets, or images of molecules Analogously applicable in other Sciences: spectra, expression patterns, sequences, patient data Further applications wherever correlation coefficients need to be understood

Assessing the homo- or heterogeneity of noisy experimental data. Kay Diederichs Konstanz, 01/06/2017

Assessing the homo- or heterogeneity of noisy experimental data. Kay Diederichs Konstanz, 01/06/2017 Assessing the homo- or heterogeneity of noisy experimental data Kay Diederichs Konstanz, 01/06/2017 What is the problem? Why do an experiment? because we want to find out a property (or several) of an

More information

Dissecting random and systematic differences between noisy composite data sets

Dissecting random and systematic differences between noisy composite data sets ISSN: 2059-7983 journals.iucr.org/d Dissecting random and systematic differences between noisy composite data sets Kay Diederichs Acta Cryst. (2017). D73, 286 293 IUCr Journals CRYSTALLOGRAPHY JOURNALS

More information

THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S. Visualization of Biological Molecules in Their Native Stat e.

THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S. Visualization of Biological Molecules in Their Native Stat e. THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S Visualization of Biological Molecules in Their Native Stat e Joachim Frank CHAPTER 1 Introduction 1 1 The Electron Microscope and

More information

II: Single particle cryoem - an averaging technique

II: Single particle cryoem - an averaging technique II: Single particle cryoem - an averaging technique Radiation damage limits the total electron dose that can be used to image biological sample. Thus, images of frozen hydrated macromolecules are very

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature09750 "#$%&'($)* #+"%%*,-.* /&01"2*$3* &)(4&"* 2"3%"5'($)#* 6&%'(7%(5('8* 9$07%"'": )"##*,;.*

More information

Automated Crystal Structure Identification from X-ray Diffraction Patterns

Automated Crystal Structure Identification from X-ray Diffraction Patterns Automated Crystal Structure Identification from X-ray Diffraction Patterns Rohit Prasanna (rohitpr) and Luca Bertoluzzi (bertoluz) CS229: Final Report 1 Introduction X-ray diffraction is a commonly used

More information

Spectral Classification

Spectral Classification Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/48877 holds various files of this Leiden University dissertation Author: Li, Y. Title: A new method to reconstruct the structure from crystal images Issue

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror

Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror 1 Last month s Nobel Prize in Chemistry Awarded to Jacques Dubochet, Joachim Frank

More information

Single Particle Reconstruction Techniques

Single Particle Reconstruction Techniques T H E U N I V E R S I T Y of T E X A S S C H O O L O F H E A L T H I N F O R M A T I O N S C I E N C E S A T H O U S T O N Single Particle Reconstruction Techniques For students of HI 6001-125 Computational

More information

A First Introduction to Scientific Visualization Geoffrey Gray

A First Introduction to Scientific Visualization Geoffrey Gray Visual Molecular Dynamics A First Introduction to Scientific Visualization Geoffrey Gray VMD on CIRCE: On the lower bottom left of your screen, click on the window start-up menu. In the search box type

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Small Libraries of Protein Fragments through Clustering

Small Libraries of Protein Fragments through Clustering Small Libraries of Protein Fragments through Clustering Varun Ganapathi Department of Computer Science Stanford University June 8, 2005 Abstract When trying to extract information from the information

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 7: Image Alignment and Panoramas What s inside your fridge? http://www.cs.washington.edu/education/courses/cse590ss/01wi/ Projection matrix intrinsics projection

More information

proteindiffraction.org Select

proteindiffraction.org Select This tutorial will walk you through the steps of processing the data from an X-ray diffraction experiment using HKL-2000. If you need to install HKL-2000, please see the instructions at the HKL Research

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

Image Classification. RS Image Classification. Present by: Dr.Weerakaset Suanpaga

Image Classification. RS Image Classification. Present by: Dr.Weerakaset Suanpaga Image Classification Present by: Dr.Weerakaset Suanpaga D.Eng(RS&GIS) 6.1 Concept of Classification Objectives of Classification Advantages of Multi-Spectral data for Classification Variation of Multi-Spectra

More information

Structural Information obtained

Structural Information obtained Structural Information obtained from Electron Microscopy Christiane Schaffitzel, 09.05.2013 Including slides from John Briggs, Bettina Boettcher, Nicolas Boisset, Andy Hoenger, Michael Schatz, and more

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

II: Single particle cryoem - an averaging technique

II: Single particle cryoem - an averaging technique II: Single particle cryoem - an averaging technique Radiation damage limits the total electron dose that can be used to image biological sample. Thus, images of frozen hydrated macromolecules are very

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Joint Embeddings of Shapes and Images. 128 dim space visualized by t-sne

Joint Embeddings of Shapes and Images. 128 dim space visualized by t-sne MDS Embedding MDS takes as input a distance matrix D, containing all N N pair of distances between elements xi, and embed the elements in N dimensional space such that the inter distances Dij are preserved

More information

XFEL single particle scattering data classification and assembly

XFEL single particle scattering data classification and assembly Workshop on Computational Methods in Bio-imaging Sciences XFEL single particle scattering data classification and assembly Haiguang Liu Beijing Computational Science Research Center 1 XFEL imaging & Model

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Information Driven Healthcare:

Information Driven Healthcare: Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust

More information

3. Image formation, Fourier analysis and CTF theory. Paula da Fonseca

3. Image formation, Fourier analysis and CTF theory. Paula da Fonseca 3. Image formation, Fourier analysis and CTF theory Paula da Fonseca EM course 2017 - Agenda - Overview of: Introduction to Fourier analysis o o o o Sine waves Fourier transform (simple examples of 1D

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Fundamentals of Digital Image Processing

Fundamentals of Digital Image Processing \L\.6 Gw.i Fundamentals of Digital Image Processing A Practical Approach with Examples in Matlab Chris Solomon School of Physical Sciences, University of Kent, Canterbury, UK Toby Breckon School of Engineering,

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Twinning OVERVIEW. CCP4 Fukuoka Is this a twin? Definition of twinning. Andrea Thorn

Twinning OVERVIEW. CCP4 Fukuoka Is this a twin? Definition of twinning. Andrea Thorn OVERVIEW CCP4 Fukuoka 2012 Twinning Andrea Thorn Introduction: Definitions, origins of twinning Merohedral twins: Recognition, statistical analysis: H plot, Yeates-Padilla plot Example Refinement and R

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

ACCURATE TEXTURE MEASUREMENTS ON THIN FILMS USING A POWDER X-RAY DIFFRACTOMETER

ACCURATE TEXTURE MEASUREMENTS ON THIN FILMS USING A POWDER X-RAY DIFFRACTOMETER ACCURATE TEXTURE MEASUREMENTS ON THIN FILMS USING A POWDER X-RAY DIFFRACTOMETER MARK D. VAUDIN NIST, Gaithersburg, MD, USA. Abstract A fast and accurate method that uses a conventional powder x-ray diffractometer

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

Simple REFMAC tutorials

Simple REFMAC tutorials Simple REFMAC tutorials Prerequisites: To use this tutorial you need to have ccp4. For jelly body, automatic and local ncs restraints, occupancy refinement you need to have the latest version of ccp4-6.2

More information

The Pre-Image Problem in Kernel Methods

The Pre-Image Problem in Kernel Methods The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1

More information

Math 7 Glossary Terms

Math 7 Glossary Terms Math 7 Glossary Terms Absolute Value Absolute value is the distance, or number of units, a number is from zero. Distance is always a positive value; therefore, absolute value is always a positive value.

More information

Crystallography & Cryo-electron microscopy

Crystallography & Cryo-electron microscopy Crystallography & Cryo-electron microscopy Methods in Molecular Biophysics, Spring 2010 Sample preparation Symmetries and diffraction Single-particle reconstruction Image manipulation Basic idea of diffraction:

More information

Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror

Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror Single-particle electron microscopy (cryo-electron microscopy) CS/CME/BioE/Biophys/BMI 279 Nov. 16 and 28, 2017 Ron Dror 1 Last month s Nobel Prize in Chemistry Awarded to Jacques Dubochet, Joachim Frank

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives

More information

12/7/2012. Biomolecular structure. Diffraction, X-ray crystallography, light- and electron microscopy. CD spectroscopy, mass spectrometry

12/7/2012. Biomolecular structure. Diffraction, X-ray crystallography, light- and electron microscopy. CD spectroscopy, mass spectrometry phase difference at a given distance constructive/destructive interference Biomolecular structure. Diffraction, X-ray crystallography, light- and electron microscopy. CD spectroscopy, mass spectrometry

More information

CS6716 Pattern Recognition

CS6716 Pattern Recognition CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.

More information

3D Shape Representation and Analysis of the Human Body and Ontology for Anthropometric Landmarks

3D Shape Representation and Analysis of the Human Body and Ontology for Anthropometric Landmarks 3D Shape Representation and Analysis of the Human Body and Ontology for Anthropometric Landmarks Afzal Godil National Institute of Standards and Technology, USA WEAR conference, Banff, Canada 2007 Introduction

More information

Twinning. Zaragoza Andrea Thorn

Twinning. Zaragoza Andrea Thorn Twinning Zaragoza 2012 Andrea Thorn OVERVIEW Introduction: Definitions, origins of twinning Merohedral twins: Recognition, statistical analysis: H plot, Yeates-Padilla plot Example Refinement and R values

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions Unit 1: Rational Numbers & Exponents M07.A-N & M08.A-N, M08.B-E Essential Questions Standards Content Skills Vocabulary What happens when you add, subtract, multiply and divide integers? What happens when

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Feature Selection in Learning Using Privileged Information

Feature Selection in Learning Using Privileged Information November 18, 2017 ICDM 2017 New Orleans Feature Selection in Learning Using Privileged Information Rauf Izmailov, Blerta Lindqvist, Peter Lin rizmailov@vencorelabs.com Phone: 908-748-2891 Agenda Learning

More information

Lecture 11: Classification

Lecture 11: Classification Lecture 11: Classification 1 2009-04-28 Patrik Malm Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapters for this lecture 12.1 12.2 in

More information

DESIGN OF EXPERIMENTS and ROBUST DESIGN

DESIGN OF EXPERIMENTS and ROBUST DESIGN DESIGN OF EXPERIMENTS and ROBUST DESIGN Problems in design and production environments often require experiments to find a solution. Design of experiments are a collection of statistical methods that,

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:10.1038/nature10934 Supplementary Methods Mathematical implementation of the EST method. The EST method begins with padding each projection with zeros (that is, embedding

More information

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks 8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks MS Objective CCSS Standard I Can Statements Included in MS Framework + Included in Phase 1 infusion Included in Phase 2 infusion 1a. Define, classify,

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Protein structure determination by single-wavelength anomalous diffraction phasing of X-ray free-electron laser data

Protein structure determination by single-wavelength anomalous diffraction phasing of X-ray free-electron laser data Supporting information IUCrJ Volume 3 (2016) Supporting information for article: Protein structure determination by single-wavelength anomalous diffraction phasing of X-ray free-electron laser data Karol

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Introduction to digital image classification

Introduction to digital image classification Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Purpose of lecture Main lecture topics Review

More information

Visual Computing Midterm Winter Pledge: I neither received nor gave any help from or to anyone in this exam.

Visual Computing Midterm Winter Pledge: I neither received nor gave any help from or to anyone in this exam. Visual Computing Midterm Winter 2018 Total Points: 80 points Name: Number: Pledge: I neither received nor gave any help from or to anyone in this exam. Signature: Useful Tips 1. All questions are multiple

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand

More information

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University Image Segmentation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Examples of grouping in vision [http://poseidon.csd.auth.gr/lab_research/latest/imgs/s peakdepvidindex_img2.jpg]

More information

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University Image Segmentation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Examples of grouping in vision [http://poseidon.csd.auth.gr/lab_research/latest/imgs/s peakdepvidindex_img2.jpg]

More information

Diffraction. Single-slit diffraction. Diffraction by a circular aperture. Chapter 38. In the forward direction, the intensity is maximal.

Diffraction. Single-slit diffraction. Diffraction by a circular aperture. Chapter 38. In the forward direction, the intensity is maximal. Diffraction Chapter 38 Huygens construction may be used to find the wave observed on the downstream side of an aperture of any shape. Diffraction The interference pattern encodes the shape as a Fourier

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

III: Single particle cryoem - practical approaches

III: Single particle cryoem - practical approaches III: Single particle cryoem - practical approaches Single particle EM analysis can be performed at both 2D and 3D. Single particle EM (both negative stain and cryo) is to extract structural information

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. Tina Memo No. 2014-004 Internal Report Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. P.D.Tar. Last updated 07 / 06 / 2014 ISBE, Medical School, University of Manchester, Stopford

More information

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object. Scale a scale is the ratio of any length in a scale drawing to the corresponding actual length. The lengths may be in different units. Scale drawing a drawing that is similar to an actual object or place.

More information

How to Analyze Materials

How to Analyze Materials INTERNATIONAL CENTRE FOR DIFFRACTION DATA How to Analyze Materials A PRACTICAL GUIDE FOR POWDER DIFFRACTION To All Readers This is a practical guide. We assume that the reader has access to a laboratory

More information

LIGHT SCATTERING THEORY

LIGHT SCATTERING THEORY LIGHT SCATTERING THEORY Laser Diffraction (Static Light Scattering) When a Light beam Strikes a Particle Some of the light is: Diffracted Reflected Refracted Absorbed and Reradiated Reflected Refracted

More information

LECTURE 16. Dr. Teresa D. Golden University of North Texas Department of Chemistry

LECTURE 16. Dr. Teresa D. Golden University of North Texas Department of Chemistry LECTURE 16 Dr. Teresa D. Golden University of North Texas Department of Chemistry A. Evaluation of Data Quality An ICDD study found that 50% of x-ray labs overestimated the accuracy of their data by an

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Improvement of the bimodal parameterization of particle size distribution using laser diffraction

Improvement of the bimodal parameterization of particle size distribution using laser diffraction Improvement of the bimodal parameterization of particle size distribution using laser diffraction Marco Bittelli Department of Agro-Environmental Science and Technology, University of Bologna, Italy. Soil

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Three-dimensional structure and flexibility of a membrane-coating module of the nuclear pore complex

Three-dimensional structure and flexibility of a membrane-coating module of the nuclear pore complex CORRECTION NOTICE Nat. Struct. Mol. Biol. advance online publication, doi:10.1038/nsmb.1618 (7 June 2009) Three-dimensional structure and flexibility of a membrane-coating module of the nuclear pore complex

More information

Lecture 8: Fitting. Tuesday, Sept 25

Lecture 8: Fitting. Tuesday, Sept 25 Lecture 8: Fitting Tuesday, Sept 25 Announcements, schedule Grad student extensions Due end of term Data sets, suggestions Reminder: Midterm Tuesday 10/9 Problem set 2 out Thursday, due 10/11 Outline Review

More information

Raster Classification with ArcGIS Desktop. Rebecca Richman Andy Shoemaker

Raster Classification with ArcGIS Desktop. Rebecca Richman Andy Shoemaker Raster Classification with ArcGIS Desktop Rebecca Richman Andy Shoemaker Raster Classification What is it? - Classifying imagery into different land use/ land cover classes based on the pixel values of

More information

Ratios and Proportional Relationships (RP) 6 8 Analyze proportional relationships and use them to solve real-world and mathematical problems.

Ratios and Proportional Relationships (RP) 6 8 Analyze proportional relationships and use them to solve real-world and mathematical problems. Ratios and Proportional Relationships (RP) 6 8 Analyze proportional relationships and use them to solve real-world and mathematical problems. 7.1 Compute unit rates associated with ratios of fractions,

More information

Enabling in-situ data analysis for large protein-folding trajectory datasets

Enabling in-situ data analysis for large protein-folding trajectory datasets Enabling in-situ data analysis for large protein-folding trajectory datasets Boyu Zhang, Trilce Estrada, Pietro Cicotti, Michela Taufer University of Delaware {bzhang, taufer}@udel.edu University of New

More information

DynOmics Portal and ENM server

DynOmics Portal and ENM server DynOmics Portal and ENM server 1. URLs of the DynOmics portal, ENM server, ANM server, and ignm database. DynOmics portal: http://dynomics.pitt.edu/ ENM server: ANM server: http://enm.pitt.edu/ http://anm.csb.pitt.edu/

More information

Figure 1: Workflow of object-based classification

Figure 1: Workflow of object-based classification Technical Specifications Object Analyst Object Analyst is an add-on package for Geomatica that provides tools for segmentation, classification, and feature extraction. Object Analyst includes an all-in-one

More information

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13. This document is designed to help North Carolina educators

More information

How Do We Measure Protein Shape? A Pattern Matching Example. A Simple Pattern Matching Algorithm. Comparing Protein Structures II

How Do We Measure Protein Shape? A Pattern Matching Example. A Simple Pattern Matching Algorithm. Comparing Protein Structures II How Do We Measure Protein Shape? omparing Protein Structures II Protein function is largely based on the proteins geometric shape Protein substructures with similar shapes are likely to share a common

More information

MAT 17C - DISCUSSION #4, Counting Proteins

MAT 17C - DISCUSSION #4, Counting Proteins MAT 17C - DISCUSSION #4, Counting Proteins Visualization of molecules inside a living cell is one of the most important tools of molecular biology in the 21 st century. In fact, it was the subject of the

More information

PHYSICS 116 POLARIZATION AND LIGHT MEASUREMENTS

PHYSICS 116 POLARIZATION AND LIGHT MEASUREMENTS Name Date Lab Time Lab TA PHYSICS 116 POLARIZATION AND LIGHT MEASUREMENTS I. POLARIZATION Natural unpolarized light is made up of waves vibrating in all directions. When a beam of unpolarized light is

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

2D Image Alignment and Classification

2D Image Alignment and Classification Structural Biology from Cells to Atoms Optical microscopy D Image Alignment and Classification Yao Cong cryotomography cryomicroscopy 8nm 00 nm 50nm 1nm 5nm Crystallography 0.1nm 0.35nm 0.6nm 0.3nm Shanghai

More information

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS F. Campeotto 1,2 A. Dal Palù 3 A. Dovier 2 F. Fioretto 1 E. Pontelli 1 1. Dept. Computer Science, NMSU 2. Dept.

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information