Table Of ContentUniversity of Colorado, Boulder
CU Scholar
Chemistry & Biochemistry Graduate Theses &
Chemistry & Biochemistry
Dissertations
Spring 1-1-2011
Characterization of Positive Matrix Factorization
Methods and Their Application to Ambient
Aerosol Mass Spectra
Ingrid Marie Ulbrich
University of Colorado at Boulder, [email protected]
Follow this and additional works at:http://scholar.colorado.edu/chem_gradetds
Part of theApplied Mathematics Commons,Atmospheric Sciences Commons, and the
Environmental Sciences Commons
Recommended Citation
Ulbrich, Ingrid Marie, "Characterization of Positive Matrix Factorization Methods and Their Application to Ambient Aerosol Mass
Spectra" (2011).Chemistry & Biochemistry Graduate Theses & Dissertations.Paper 35.
This Dissertation is brought to you for free and open access by Chemistry & Biochemistry at CU Scholar. It has been accepted for inclusion in
Chemistry & Biochemistry Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact
[email protected].
Characterization of Positive Matrix Factorization Methods and Their Application to
Ambient Aerosol Mass Spectra
By
Ingrid Marie Ulbrich
B.S., Massachusetts Institute of Technology, 2000
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirement for the degree of
Doctor of Philosophy
Department of Chemistry and Biochemistry
2011
This thesis entitled:
Characterization of Positive Matrix Factorization Methods and Their Application to
Ambient Aerosol Mass Spectra
written by Ingrid Marie Ulbrich
has been approved for the Department of Chemistry and Biochemistry by
____________________________________
Jose-Luis Jimenez
____________________________________
Michael P. Hannigan
Date: ______________
The final copy of this thesis has been examined by the signatories, and we find that both the
content and the form meet acceptable presentation standards of scholarly work in the above
mentioned discipline
iii
Ulbrich, Ingrid Marie (Ph.D. Chemistry)
Characterization of Positive Matrix Factorization Methods and Their Application to Ambient
Aerosol Mass Spectra
Thesis directed by Professor Jose-Luis Jimenez
ABSTRACT
Atmospheric aerosol has impacts on health, visibility, ecosystems, and climate. The organic
component of submicron aerosol is a complex mixture of tens of thousands of compounds, and it
is still challenging to quantify the direct sources of organic aerosol. Organic aerosol can also
form from a variety of secondary reactions in the atmosphere, which are poorly understood.
Real-time instrumental techniques, including the Aerosol Mass Spectrometer (AMS), which can
quantitatively measure aerosol composition with high time and size resolution, and some
chemical resolution, produce large volumes of data that contain rich information about aerosol
sources and processes. This thesis work seeks to extract the underlying information that
describes organic aerosol sources and processes by applying factor analytical techniques to
organic aerosol datasets from the AMS. We have developed a custom, open-source software tool
to compare factorization solutions, their residuals, and tracer-factor correlations. The application
of existing mathematical techniques to these new datasets requires careful characterization of the
precision in the data and the factorization models’ behavior with these specialized datasets. We
explore this behavior with synthetic datasets modeled on AMS data. The synthetic data
factorization has predictable behaviors when solved with “too many” factors. These behaviors
then guide the choice of solution for real aerosol datasets. The factor analyses of real aerosol
datasets are useful for identifying aerosol types related to sources (e.g., urban combustion and
iv
biomass burning) and secondary atmospheric processes (e.g., semivolatile and low-volatility
oxidized organic aerosol). We have also factored three-dimensional datasets of size-resolved
aerosol composition data to explore the variability of aerosol size distributions as the aerosol
undergoes processing in an urban atmosphere. This study provides evidence that primary
particles are coated with condensed secondary aerosol during photochemical processing, shifting
the size distribution of the primary particles to larger sizes. Application of these three-
dimensional factorization techniques to other complex aerosol composition datasets (e.g., that
use thermal desorption or chromatography for further chemical separation) has the potential to
yield additional insights about aerosol sources and processes.
for my parents
who always tell me that
I can accomplish anything I set my mind to,
but it might take longer than one day
and for my grandparents
who always tell me
I shouldn’t work too hard
and I should go home before it gets dark out
vi
Acknowledgements
Back when I had a “real job” (with a real salary!) I worked with a lot of people who had
Ph.D.’s. I saw that they knew how to approach and solve big problems in a way that I didn’t,
and I wanted to learn how to do that. While going back to school was not my immediate first
choice, I knew that it would be necessary to gain this experience. Furthermore, I realized that
getting a Master’s degree would not challenge me with a big enough project to get everything I
wanted. And so I embarked on the journey to get a Ph.D. Thank you to Praveen Amar,
Rawlings Miller, Gary Kleiman, and John Graham for your excellent advice about choosing a
program and an advisor.
My seven years of graduate study at CU have been a fantastic opportunity to work with a
lot of really good scientists who are also really good people. This work would not have been
possible without the AMS and the many people behind its development and the early success at
analyzing its complex datasets. I acknowledge those who began this work and developed it to
the point that my analyses were possible: John Jayne, Doug Worsnop, James Allan, Ravi
Alfarra, and Qi Zhang. Likewise, I am thankful that I did not have to write the algorithms that
solve the factorization models used in this work! Thank you to Pentti Paatero for developing
these tools and making them available. Early discussions with Pentti Paatero and Phil Hopke
about the application of PMF to our datasets were very helpful.
I must also acknowledge the past members of the Jimenez group for breaking in a new
professor and beginning his early training, for their assistance in preparing me for my oral exam
by telling me to stop talking, and for incorporating my analyses into their own work. Donna
Sueper has been a wonderful resource for programming the PET, including giving me coding
vii
advice, always knowing a little Igor trick, and taking over the code for a year so, which allowed
me to focus more on research. Of course, this work would not have been possible without the
assistance of my advisor, Prof. Jose-Luis Jimenez. I have always appreciated that we have a
common connection to Cambridge, we usually speak the same language about work and science,
and that I became the “anonymous student” who got you in trouble with The Boss for staying at
work too late.
The first four years of this work culminated with the text that forms Chapter 2 of this
thesis. Many thanks to Manjula Canagarata for her extensive revision of an early draft of that
work. It not only improved the paper significantly, but taught me a great deal about manuscript
organization and structure. I have tried to pass those lessons on when reading drafts of others’
work. Much of the analysis for that manuscript was done and most of the draft was written at
Folsom St. Coffee, where they know how I like my chai and my tea.
Many thanks to family, not just during my graduate career. My parents have always
encouraged and supported my endeavors, acknowledged my frustrations when I did not achieve
my goals immediately at the outset of a project, and waited patiently for me to realize when I had
achieved success. I am so glad that Jose was at CU so that I could come home after ten years in
Boston. Thank you to my grandparents, who think I work too hard, and listen patiently every
time I tell them that’s the way it needs to be sometimes. Thank you to Michelle, Eric, and
Rachel, who were ready teach my niece and nephew to call me “Dr. Auntie Ingrid” seven years
ago. And finally, thank you to my fiancé Brian, who not only loves and puts up with me, but in
especially in the last several months has also been my chauffeur, personal chef, thesaurus,
manual of style, and empathetic therapist. Don’t worry, I’ll cook dinner tomorrow.
viii
Table of Contents
Chapter 1
Introduction………………………………………………………………………………………1
1.1 Aerosol Overview……………………....…………………………………………1
1.2 Identification of Targets for Particle Controls…..………………..…………….…2
1.3 Chemometrics overview………………..…………………….…………………...3
1.3.1 Chemometric applications to ambient aerosol…………………………….3
1.4 Framework for This Thesis………………………………………………………..5
Chapter 2
Interpretation of Organic Components from Positive Matrix Factorization of Aerosol Mass
Spectrometric Data………………………………………..……………………………………..7
2.1 Chapter Introduction………………………………………………………………7
2.2 Methods…………………………………………....……………………………..13
2.2.1 Aerosol Mass Spectrometer (AMS)……………………………………...13
2.2.2 Factorization Methods…………………………………………………...15
2.2.2.1 Positive Matrix Factorization (PMF)…………………………...15
2.2.2.2 Singular Value Decomposition (SVD)…………………………22
2.2.3 Data Sets…………………………………………………………………22
2.2.3.1 Real Pittsburgh Dataset…………………………………………22
2.2.3.2 Synthetic Datasets………………………………………………25
2.2.4 Statistical Comparisons of Mass Spectra………………………………...27
2.2.4.1 Reference Spectra………………………………………………..27
2.2.4.2 Statistics of Correlation…………………………………………..28
2.3 Results……………………………………………………………………………29
2.3.1 Real Pittsburgh Data……………………………………………………..29
2.3.1.1 Solutions as a Function of Number of Factors………………….30
2.3.1.2 Rotations………………………………………………………..41
2.3.2 Synthetic AMS Data……………………………………………………..45
2.3.2.1 Solutions of Synthetic Data Base Cases…………………………45
2.3.2.2 Separation of Correlated Factors………………………………...50
2.4 Discussion………………………………………………………………………..53
2.5 Chapter Conclusions……………………………………………………………..59
Chapter 3
Three-dimensional factorization of size-resolved organic aerosol mass spectra from Mexico
City………………………………………………………………………………………………62
3.1 Chapter Introduction……………………………………………………………..62
3.2 3-Dimensional Matrix Factorization and its Application in the Literature………66
3.2.1 Mathematical Techniques for 3D Matrix Factorization………………….66
3.2.2 Research Reporting 3-Dimensional Factorizations Using Particle Size
Information………………………………………………………………71
3.3 Methods…………………………………………………………………………..73
3.3.1 Mexico City Measurements During the MILAGRO Field Campaign…...73
3.3.2 AMS Sampling of Particle Time-of-Flight Data………………………...74
ix
3.3.3 Particle Time-of-Flight Data Analysis…………………………………...77
3.3.3.1 Estimation of Measurement Precision of Particle Time-of-Flight
Data………………………………………………………………78
3.3.3.2 Further Data and Error Treatments Prior to Factorization………79
3.3.4 Matrix Factorization……………………………………………………...85
3.3.4.1 Models for Factoring the 3-Dimensional Matrix………………..85
3.3.4.2 Algorithms for Solving the 3-Dimensional Models……………..87
3.3.4.3 Guidelines for Choosing a Solution……………………………..90
3.3.4.4 Uncertainties in the Chosen Solution……………………………91
3.4 Results……………………………………………………………………………92
3.4.1 Results from the 3-Vector Model………………………………………..97
3.4.2 Results from the Vector-Matrix Model…………………………………..98
3.4.2.1 Choosing a Solution of the Constrained Vector-Matrix Model..103
3.4.2.2 Factors in the Best Solution of the Constrained Vector-Matrix
Model………………………………………..............................107
3.5 Discussion………………………………………………………………………110
3.5.1 Evaluation of the Assumptions of the 3-Vector Model………………...111
3.5.2 Insights into Ambient Aerosol and PToF Sampling……………………119
3.5.3 Directions for Future Research…………………………………………123
3.6 Chapter Conclusions……………………………………………………………125
Chapter 4
Conclusions…………………………………………………………………………………….128
4.1 Thesis Summary………………………………………………………………...128
4.2 Synthesis of Findings…………………………………………………………...131
4.3 Directions for Future Research…………………………………………………133
4.3.1 Better Understanding of “Split” Factors………………………………..133
4.3.2 Application of Factorization to New Datasets………………………….135
4.3.3. Creation of New Factorization Models…………………………………136
4.3.4. Maintenance of Good Software Tools………………………………….137
References……………………………………………………………………………………...138
Appendices……………………………………………………………………………………..152
Appendix A. Explanation of PMF…………….....……………………………………152
Appendix B. Documentation for the PMF Evaluation Tool (PET)…………………...175
Appendix C. Calculation of Error Values for the Synthetic Datasets…..…………….219
Appendix D. Supporting Information for Chapter 2………………….………………222
Appendix E. Separation of Correlated Factors in the two-factor synthetic dataset......246
Appendix F. 2-D Factorization of PToF m/z’s from Pittsburgh……………………...250
Appendix G. Choice of solutions of 3-D factorizations………………………………259
Appendix H. Supporting Information for Chapter 3..………………………………...270
Description:Chemistry & Biochemistry Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more of existing mathematical techniques to these new datasets requires careful characterization of the The first four years of this work culminated with the text that forms Chapter 2 of thi