Welcome! This is the first conference Celebrating Women in Statistics which will be held at the Embassy Suites in Cary, NC. To register, make hotel reservations, or explore options in how you may participate, please use the menu options in the Participate Tab. Interested in Donating Time or Money to help us kickstart this venture? Contact us at info@women-in-stats.org today!
Abstracts
Binary Data Analysis Based on Bayesian Semiparametric Approach
Abstract
An important and popular tool for analyzing binary data is based on utilizing latent variable models, which could provide a suitable framework from both theoretical and computational prespective. In this context, the modeling of dependent multivariate responses with introducing a correlation structure in the latent variables, is possible. It is often assumed that the prior distribution of latent variables is normal, leading to the well-known multivariate probit model. In this model, the mean and covariance matrix of latent variables across the experimental units are considered equal. However, the homogeneity assumption may not hold and accordingly, the parameter estimation may be inappropriate in such setting. To alleviate this obstacle, this paper adopts a semiparametric Bayesian framework and considers the Dirichlet process mixture as a prior for the latent variables. Hence, the heterogeneity in the mean and covariance structure is captured. In order to simplify model and Bayesian analysis, the modified cholesky decomposition is used for reparametrization of the covariance matrix. Finally, the model developed here is illustrated and assessed by a simulation study as well as an applied example related to smoking behavior in Tehran.
Balancing Communications and Crunching
Abstract
Women in the statistics and mathematics fields have to overcome the stigma of being black-box number crunchers who are inept at communicating and applying results from data. Developing the ability to make concise recommendations by leveraging data and statistical models is just as important as developing the skill to build robust, statistically-sound analyses. In order to gain visibility in a data field, it is critical to be able to communicate the business implications from analysis to get the ears of leaders in the business.
Bio:
Kerrie Adams is a Director of Customer Analytics at Walmart. She drives the Member communications for Sam’s Club both in targeting as well as in measuring the effectiveness of all Member and prospect communications. She has played an integral role in the standardization of metrics and automation of the campaign analytics. She has also driven analytical insights for the business supporting various areas such as Membership, Credit, Merchandising, and Operations.Prior to Walmart, Kerrie was Manager of Pricing Science where she led price test concept development, model design, execution, tracking and recommendations to maximize profitability of pricing schemas within the business. Her past also includes running an analytics department for a holistic web based software tool providing retailers and restaurants support for their facility operations; reporting market share, patient behavior, and forecasting volume for pharmaceutical clients; and a researcher for a Research and Development company. Kerrie has a Masters of Applied Statistics from The Ohio State University with an undergraduate degree from Otterbein College in Mathematics. She has been using SAS software to drive analytics and insights for over 15 years.
Elpida Ormanidou is the Director of HR Business Analytics and Budgeting supporting the Walmart US People group, including Store and Non-Store HR Operations, Talent Development, Compensation, Associate Innovation and HR Strategy. She is recognized by her leadership as an analytics visionary in the HR organization. Ormanidou studied Agricultural Economics at the University of Arkansas and continued with graduate studies in Statistics as well as Transportation Logistics also at the University of Arkansas. She began her career at Walmart in the Sam’s Club Division as a Member Trends Analyst in May of 2004. In 2006 Ormanidou was promoted to Sr. Pricing Manager supporting the Technology and Office segment. She is an active participant in Walmart’s advocacy for diversity and inclusion. Within Walmart, she is a member and volunteer for several associate resource groups. She serves on the Board of Directors for CASA of Northwest Arkansas, running membership for the NWA Chapter of the Network of Executive Women and strongly supporting the Foundation Fighting Blindness. Ormanidou maintains a strong partnership with the University of Arkansas, where she taught Business Intelligence as an Adjunct Instructor at the Walton College of Business.
To Flip or Not to Flip? Results From the Implementation of a “Flipped Classroom” in College-level Math and Statistics
Abstract
In a flipped classroom, students are given online instruction prior to class, replacing the traditional lecture that monopolizes class time. In the flipped model, students have more opportunity to interact with the instructor while applying the knowledge gleaned from the online materials through working problems in the presence of the instructor. Presented are the achievement, attendance, and student evaluation comparisons between one college instructor’s courses – flipped model vs. a traditional class setting.
Bio:
Tonya Adkins has been a Mathematics Educator for 21 years at the middle school, high school, community college, and university levels. This is her 10th year teaching mathematics and statistics at the university level. She received her Bachelor of Arts in Math Education from UNC-Wilmington in 1993, her Master of Education in Mathematics Education from Auburn University in 1998, and is currently ABD for a Doctor of Philosophy in Curriculum & Instruction – Urban Mathematics from UNC-Charlotte. She is the recipient of the 2009-2010 Adjunct Faculty of the Year award from Johnson & Wales – Charlotte.
M-Estimation for Dependent Data
Abstract
We focus on developing the theoretical foundations of M-estimation for dependent data. M-estimation is a popular technique to extract a parameter estimate by minimizing a particular loss function. Previous work on the subject focused on specific models of dependence and addressed limited parametric scenario and it lacks any overarching theory to address general problem under dependence. Our goal is to contribute new theory to this important area that will provide unifying tools to address a range of applications. As the first step toward the main result we have proved a general triangular array version of a limit theorem for empirical processes under dependence. This result is then further applied in general empirical process set-up to obtain asymptotics of maximizer of an empirical process. The result is then applied to a number of specific problems including classical cube root asymptotics scenario to obtain new results.
A Hybrid Freeway Travel Time Forecasting Model Integrating Principal Component Analysis and Neural Networks
Abstract
As travelers make their choices based on cost associated with travel time, its information can be helpful to them in choosing appropriate routes. This study aims to build a robust and accurate freeway travel time prediction model with significantly less number of input variables. In this study, principal component analysis (PCA) is used as a preprocessing technique for dimension reduction of input data and to make input variables uncorrelated. After preprocessing of data, back-propagation neural network (BPNN) is used to build the prediction model with optimum number of principal components (PCs). Then sequential zeroing of weights (SZW) algorithm is implemented to find the important PCs. A methodology is devised to retrace important original variables from important PCs. The developed methodology to retrace original variables from PCs gives motivation for researchers to use PCA extensively in future. In desire of making reliable model with less number of original variables, several prediction models were developed using subsets of important original variables. The key findings of this study suggest that the developed prediction models can predict travel time at Taiwan’s freeway with similar high accuracy (Mean Absolute Percentage Error in Prediction= 6.4%) using very less number of variables i.e. using even 4 predictor variables instead of earlier used 43 predictor variables and that facilitates considerable equipment saving during future data collection.
Bio:
Prateek received his B.Tech in Civil Engineering from IIT Delhi in 2013. He has undertaken research internships in IIT Delhi’s Transportation Research and Injury Prevention Program (TRIPP), Taiwan’s NCTU Institute of Traffic and Transportation, the University of Montreal’s MADITUC research group, and Sweden’s Chalmers University of Technology. He has investigated the feasibility of a Bus Rapid Transit (BRT) corridor in Delhi, developed robust freeway travel time prediction models for Taiwanese freeways, and designed a direct interface between Vissim and MATLAB for solving adaptive signal control problems. He was awarded a Summer Undergraduate Research Award at IIT Delhi for his novel research in public transportation. For his multi-disciplinary research with a focus on transportation, he was awarded the “Abhinav Dhupar Memorial Award” for being most distinguished Civil Engineering undergraduate, IIT Delhi.He started working with Dr. Kockelman in August 2013 and is investigating various aspects of electric and autonomous vehicles.
A Two-Step Integrated Approach to Detect Differentially Expressed Genes in RNA-Seq Data
Abstract
RNA-Seq experiments produce millions of discrete sequence reads as a measure of gene expression levels, and enable researchers to investigate complex aspects of the genomic studies. These include but not limited to identification of differentially expressed (DE) genes in two or more treatment conditions and detection of novel transcripts. One of the common assumptions of RNA-Seq data is that, all gene counts follow an overdispersed Poisson or negative binomial (NB) distributions, which may not be appropriate as some genes may have stable transcription levels with no overdispersion. Thus, a more realistic assumption in RNA-Seq data is to consider two sets of genes: overdispersed and non-overdispersed.We propose a two-step integrated approach to detect differentially expressed (DE) genes in RNA-Seq data using standard Poisson model for non-overdispersed genes and NB model for overdispersed genes. We evaluate the proposed approach using two simulated and two real RNA-Seq data sets. We compare the performance of our proposed method with the four popular R-software packages edgeR, DESeq, sSeq, and NBPSeq with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these R-packages.
Cognitive disability in children: does nativity matter?
Abstract
The objective of this study was to examine the impact of nativity on self-reported cognitive disability by comparing children who were born outside of the US (first generation immigrants) to US born offspring (second generation immigrants) of foreign born parents. We analyzed a diverse, nationally representative, sample of 77,324 first generation immigrant and second generation immigrant children (aged 5-17 years) from the 2009 American Community Survey. Multivariate logistic regression was used to assess the association between nativity and self-reported cognitive disability after adjustment for demographics and household characteristics. Self-reported cognitive disability was observed in 1.7% of the sample. The prevalence was higher among second generation immigrants than among first generation immigrants (1.9% vs 1.1%, p<0.001). After multivariate adjustment, the advantage of being foreign born remained (OR=0.62, 95% CI=0.52 – 0.74). Further analysis revealed effect modification of the immigrant health advantage by household income (p=0.002). In summary, we observed an immigrant advantage in self-reported cognitive disability; however, it was only evident among economically disadvantaged children. Future research should examine the contribution of the accumulation of poverty over time to the relationship between nativity and children’s health.
Bio:
Dr. Emma Benn is an Assistant Professor in the Center for Biostatistics in the Department of Health Evidence and Policy at the Icahn School of Medicine at Mount Sinai. She received her MPH in Sociomedical Sciences in May 2007 and her DrPH in Biostatistics in May 2012 from Columbia University's Mailman School of Public Health. She is a co-founder of the NIH/NHLBI-funded Biostatistics Enrichment Summer Training (BEST) Diversity Program at Columbia University, which aims to expose underserved undergraduates to biostatistics and its applications in public health research. Currently, she has been applying causal inference-based approaches to studies investigating racial/ethnic- and nativity-related differences in cognition across the life course. Additionally, she is interested in further elucidating the impact of intergenerational social mobility on social inequalities in cognition in older adulthood. Dr. Benn is a 2013-2015 recipient of the NIH Loan Repayment Program for Health Disparities Research.
Sparse autologistic model for dynamic networks
Abstract
Modelling of temporal evolution of network data has become a relevant problem for different applications. However, the complexity of the models increase rapidly with the number of nodes making efficient short term prediction of future outcomes of the system a challenge for big network data. Here, we propose an autologistic model for directed binary networks with a fused lasso penalty. This model favors sparse solutions of the coefficients and their differences in consecutive time points, and it is suitable for complex dynamic data where the number of parameters is considerably greater than the number of observations over time. The structure of our model allow us to treat the optimization problem separately for each pair of nodes increasing efficiency of the algorithm through parallel computing. The optimal fused lasso tuning parameters are chosen using BIC. We show the performance of the model on a real trading network from the NYMEX natural gas futures market observed weekly over a period of four years .
Power, Gender, and Advancement
Abstract
To grow their careers women need a clear understanding of both the internal and cultural forces that hold them back from achieving their potentials. Thus empowered, they are more likely to develop the requisite skills and approaches, rather than assuming that hard work and talent are enough to assure success.This interactive Welcome session will distill major findings from research on women's advancement and translate those into practical career development lessons. Topics covered include: why women have a harder time using their voices effectively at work and how to get more of their ideas in play, what “power,” “success” and “influence” mean to them and how to use these insights to make more strategic choices regarding their commitments, and what organizational savvy looks like and how to approach conflicts and negotiations constructively and courageously.
Bio:
Janet Bickel is a nationally recognized expert in faculty, career and leadership development with 40 years of experience in academic medicine and science. Over 120 academic health centers and 35 professional societies have invited her presentations and consultations. In addition to a wide-range of individual coaching clients, organizational clients have included United American Nurses, US Department of Commerce, and US Department of Health and Human Services. She is an Adjunct Assistant Professor of Medical Education at George Washington University School of Medicine and has also taught Leadership and Innovation at the CIA and the National Reconnaissance Office. During the 25 years prior to creating her own business, Janet held positions of increasing national leadership at the Association of American Medical Colleges, including Associate Vice President for Medical School Affairs. She established an Office of Women in Medicine of national repute, including leadership development programs that have stimulated the careers of thousands of women physicians and scientists. She also led AAMC’s first programs in faculty affairs and in student professionalism. Janet continues to publish broadly, with over 60 peer-reviewed articles and two books. During the Executive Leadership in Academic Medicine [ELAM] Fellowship Program's first 15 years, she served on its Advisory and Selection Committees; she continues to serve as faculty and is a Principal Member of its Executive Development Council. Janet is certified to administer the Myers-Briggs Type Indicator, the Center for Creative Leadership's multi-rater feedback instruments, and the Emotional Intelligence In Relationships profile. She has completed Relationship Centered Health Care’s fellowships (Courage to Lead and Leading Organizations to Health) and NTL’s Human Interaction Laboratory on Transforming Interpersonal Relationships. She has participated in Authentic Leadership in Action’s Shambhala Summer Institute and studied yoga and meditation at Kripalu. Between 1972-76, Janet served as founding admissions, financial aid and student affairs officer at the new Brown University Medical School. She holds a M.A. in sociology from Brown University and an A.B. in English from University of Missouri-Columbia.
Finding Our Place in History: Progress of Women Pioneers and Trailblazers
Abstract
Come learn about the contributions of early women pioneers and trailblazers to the development of statistical science. See the data showing where their trails have passed and to where their trails have led the statistics community. These data will help guide our efforts in present times to ensure we continue to trailblaze as a community.
Bio:
Lynne Billard is a professor at the University of Georgia known for her statistics research, leadership, and advocacy for women in science. She earned her Bachelors of Science degree in 1966, and Ph.D. in 1969, from the University of New South Wales, Australia. In 1980, Billard joined the University of Georgia as head of the Department of Statistics and Computer Science. She was named a University Professor in 1992. She has served as President of the American Statistical Association and the International Biometric Society. From 1988 to 2004 she served as principal investigator for "Pathways to the Future," an annual workshop focused on mentoring women in all fields of science and scientific research. In 2011, she received the tenth annual Janet L. Norwood Award for Outstanding Achievement by a Woman in the Statistical Sciences. In 2013, she was awarded the Florence Nightingale David Award for exemplary contributions to education, science and public service.
Identification of Acute Health Conditions During Extreme Heat Events
Abstract
Extreme heat is the largest cause of severe weather fatalities in the US, and as climate change progresses the health impacts are expected to be profound. Although the adverse effects of extreme heat on broad classes of health outcome, such as cardiovascular and respiratory mortality, have been well established, the specific conditions that are most sensitive to extreme heat have not been systematically identified. Here we develop methodology to identify the constellation of acute health conditions that are most that are most likely to occur during extreme heat events (EHE) and to estimate their relative and absolute risks. We consider a broad range of disease groupings classified from 15,000 ICD-9 codes, and we match EHE days to control days during the summer months. We apply this approach to a cohort of 11.5 million Medicare beneficiaries in 222 US counties during 1999-2010. Knowledge of the range of health responses that occur during EHE will provide insight into the physiological pathways by which pollution affects health, thereby informing public health approaches to prevention.
Bio:
Jennifer Bobb is a research associate in the Department of Biostatistics at the Harvard School of Public Health. She received her Ph.D. in biostatistics from the Johns Hopkings Bloomberg School of Public Health in 2012. Her primary focus is to develop statistical methods for estimating the health effects of environmental exposures, such as ambient temperature and air pollution levels using national databases. Dr. Bobb has developed methodology for estimating the health effects of multi-pollutant mixtures, and for evaluating the contribution of different sources of uncertainty in estimating the public health impact of heat waves under global climate change. Her main areas of methodological research include methods for the analysis of spatial-temporal data, hierarchical modeling, statistical methods for multi-site time series data, and the analysis of large administrative datasets.
Mining of Differential Correlation
Abstract
Given genetic data from two meaningful classes of samples, it is often of interest to identify genes (or other predictors) that behave differently in one class than the other. For example, one might consider samples from different tissue types, from a control group versus a treatment group, or from patients of unequal disease severity. A common approach is to study first-order differences; that is, to discover genes whose expression levels, as measured by microarray profiling or RNA sequencing, are higher in samples from the first class than in those from the second. By contrast, we propose a new second-order analysis, Mining of Differential Correlation (MDC).
The MDC procedure seeks to identify a group of genes whose average pairwise correlation amongst samples in the first class is higher than amongst samples in the second class. Our method is an iterative procedure that adaptively updates the size and elements of an initial gene set. These updates are based upon multiple testing of carefully defined p-values, so the final output gene set is statistically meaningful. We investigate the performance of MDC by applying it to simulated data as well as gene expression data taken from The Cancer Genome Atlas and other recent experimental datasets.
Bio:
Kelly Bodwin is a second-year Ph.D. student in Statistics at the University of North Carolina, Chapel Hill.
Latent Variable Estimation in Self-Reported Number of Days of Poor Physical and Mental Health
Abstract
Survey responses of the number days per month of poor physical and/or mental health display tend to cluster around multiples of 5 or 7 due to the propensity of respondents to round off estimates. This research uses the responses from three questions on the Behavioral Risk Factor Surveillance System (BRFSS) which ask respondents to recall the number of days of poor physical health, poor mental health and the number of days during which respondents were unable to do regular activities due to health issues. The purpose of this research is to estimate the true number of days of poor physical and mental health and inability to undertake regular activities for each respondent using inflation models and heaping model formulae. The predictive model accounts for the peaks and valleys present in responses provided by subjects and overestimation of the true, latent number of days per month. The predicted values for the three latent variables are then tested for goodness of fit. Smoothing the frequency distribution provides a better estimate of the variables themselves and permits the variable to be used more effectively when included in larger models.
Identification of Known and Unknown Chemicals in Nuclear Magnetic Resonance Data
Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is a valuable tool for analyzing the composition of various small non-protein molecules by comparing the experimental spectrum against a library of expected peak locations. Currently, small molecule identification can be time-consuming and labor intensive as spectral results can vary over sample preparation and run conditions, and typically hundreds of molecules are identified simultaneously within a single spectrum with varying overlapping peaks with unknown shifts and concentrations. We develop a Bayesian statistical learning algorithm to automate the identification of known small molecule entities in a sample. With rapid real-time data streams available, an algorithm for identifying small molecule entities and handling large-scale data is desirable. We develop a method that is an extension of incremental Bayesian statistical/machine learning methods to identify known and unknown small molecule entities in streaming spectroscopy data, which also has the capability to identify unknown entities, evolve as new entities are added to the library, and detect rare or anomalous events.
A Canonical Correlation Analysis Examining the Influence of Classroom Contextual Variables on Reading Instruction: Evidence from the Early Childhood Longitudinal Study First Grade Sample
Abstract
The purpose of this research endeavor was to examine how classroom contextual variables influence reading instruction in order to provide a baseline of associations for one year prior to NCLB using a national sample. These contextual factors (x-variate) addressed general classroom characteristics such as absenteeism, tardiness, minority and disability make-up, and classroom reading behaviors such as the number of students reading below average, and having letter, word and sentence knowledge. Furthermore reading instruction (y-variate) consisted of the time devoted to various reading activities. Canonical correlation analysis revealed three statistically significant canonical correlations (CC). The first canonical correlation (CC = .51, p < .001) revealed that high scores on time devoted to reading and language arts, reading aloud, stories, invented spelling and alphabet recognition were associated with high scores on three general classroom characteristics and low scores on one reading behavior. The second canonical correlation (CC = .39, p = .002) showed that low scores on vocabulary and high scores on non-print texts and retelling were associated with high scores on two general classroom characteristic and three reading behaviors. Finally, the third canonical correlation (CC = .33, p = .043) revealed that low scores on reading print and basal texts and high scores on retelling were associated with two general classroom characteristics and high scores on one reading behavior. The findings demonstrate that canonical correlation analysis is a useful statistical approach for unpacking how multiple classroom predictors are related to multiple reading outcomes. As such, the results show that reading instruction and classroom characteristics are associated in a variety of ways. These associations reveal the complexity of characteristics that influence reading instruction. Teachers must tailor their reading instruction not only to students’ reading behaviors but also to characteristics such as absenteeism, tardiness and the ethnic and disability characteristics of their students. The implications on failed reading policies and the lack of teacher fidelity to national reading programs are discussed.
Bio:
Camille L. Bryant is an Assistant Professor at Columbus State University's College of Education and Health Professions. She teaches introductory, intermediate and advanced statistical models to doctoral students. Her research interests address the use of advanced statistical modeling to examine national educational policy decisions and outcomes. She received her PhD in Research, Statistics and Evaluation from the University of Virginia and her B.S. in Psychology from the College of William and Mary.
Building a Professional Network: Why, How and When
Abstract
For better or for worse, it is not only what you know, but who you know. A wide and strong network of colleagues is one of the most valuable professional assets regardless of where you choose to practice statistics. Building a group of to-go-to people may seem daunting if you are just getting started or if you are not the outgoing type, but by taking just a few important steps you can enjoy the many benefits of making professional contacts and even friendships that will stay with you forever. We tend to think of our network as a means to enhance our own careers, get doors open, receive other special treatment. But networks are also the means by which you can reach out to others, share opportunities, and mentor younger colleagues. During the presentation, we discuss how to build and maintain a solid network, what to expect from your professional contacts and how to contribute to strengthening your network and as a consequence, your own career.
Bio:
Dr. Alicia Carriquiry is distinguished professor of statistics and director of graduate education at Iowa State University. Dr. Carriquiry is an elected member of the International Statistical Institute, a Fellow of the American Statistical Association and a Fellow of the Institute of Mathematical Statistics. She was recently named a National Associate of the National Research Council. Currently, she is a member of the standing Committee on National Statistics of the National Research Council, of the standing Committee on Use of Evidence in Public Policy, of the ad-hoc Committee to Address Representation of Minority Women in STEM Fields, and chairs the IOM Committee to Evaluate Mental Health Care Resources at the Veterans Administration. Carriquiry’s research is in applications of statistics in human nutrition, bioinformatics, and traffic safety. She has published about 80 peer-reviewed articles in journals in statistics, economics, nutrition, bioinformatics, mathematics, animal genetics, and several other areas. Dr. Carriquiry is one of the developers of the Iowa State University method for dietary assessment and of its companion software, PC-SIDE. She collaborates with governments in North and South America, Asia and Africa on the design and analyses of dietary intake surveys. At Iowa State, she teaches graduate and undergraduate level courses on Bayesian analysis, multivariate statistics, and general methods. Carriquiry has a degree in Engineering from the Universidad de la Republica (Uruguay). She also received an MSc in animal science from the University of Illinois, and an MSc in statistics and a PhD in statistics and animal genetics from Iowa State University.
10 Women in 10 Minutes: Increasing Women's Visibility on Wikipedia
Abstract
Look up a female statistician on Wikipedia, and you might not find what you're looking for. In this interactive session we will create and edit 10 Wikipedia articles for 10 women in statistics. The session will first provide a brief introduction to contributing to Wikipedia content, and then participants will create and edit the articles. We will also discuss the gender gap in Wikipedia as well as general tips for increasing one's visibility on the web.
Bio:
Mine Çetinkaya-Rundel is an Assistant Professor of the Practice at the Department of Statistical Science at Duke University. She received her Ph.D. in Statistics from the University of California, Los Angeles, and a B.S. in Actuarial Science from New York University's Stern School of Business. Dr. Çetinkaya-Rundel is primarily interested in innovative approaches to statistics pedagogy. Some of her recent work focuses on developing student-centered learning tools for introductory statistics courses, teaching computation at the introductory statistics level with an emphasis on reproducibility, and exploring the gender gap in self-efficacy in STEM fields. Her research interests also include spatial modeling of survey, public health, and environmental data. She is a co-author of OpenIntro Statistics and a contributing member of the OpenIntro project, whose mission is to make educational products that are open-licensed, transparent, and help lower barriers to education. She is also a co-editor of the Citizen Statistician blog and a contributor to the Taking a Chance in the Classroom column in Chance Magazine.
Bayesian model averaging in benchmark dose analysis
Abstract
In risk analysis of toxic agents, benchmark dose estimation has proven to be a valuable quantity and is commonly used by federal agencies in health risk assessment. However, one drawback of the benchmark dose estimator is that it is dependent on the assumed underlying risk model. Even when two risk models fit the data equally well, they can produce very different benchmark dose estimators for a given risk level. Herein, we propose a Bayesian model averaging methodology that overcomes this deficiency and does not require any model selection. We illustrate the usefulness of this approach in a simulation study and apply it to a data set from the National Toxicology Program.
A New Local Indicator of Spatial Association
Abstract
Spatial correlation requires a special class of statistical tests to detect clustering. There may be clusters in voting trends, neighborhood demographics, and disease patterns. I develop a new test to detect clusters. The new statistic is a local version of Jackson et al.’s (2010) Modified Moran’s I, a global clustering test. The global statistic can be decomposed into local indicators of spatial association (LISAs). LISAs identify clusters in spatially correlated data. I derive a LISA from the Modified Moran’s I, which I call the Local Modified Moran’s I. Existing LISAs suffer from statistical weaknesses – no clear null distribution, population heterogeneity, multiple testing issues, fixed neighborhood sizes and spatial correlation of statistics. These problems cause low statistical power and complicate the ability to correctly identify clusters. I use new methods to address statistical weaknesses and show the Local Modified Moran’s I has higher statistical power than existing tests.
Bio:
Fourth year PhD Economics student at American University. Research interests include spatial statistics and spatial econometrics. Dissertation topics: A new local indicator of spatial association and social spillovers in bankruptcy filings.
Bayesian Penalized Spline Models for the Analysis of Spatio-Temporal Count Data
Abstract
In recent years, the availability of infectious disease counts in time and space has increased, and consequently there has been renewed interest in model formulation for such data. In this paper, we describe a model that was motivated by the need to analyze hand, foot and mouth disease (HFMD) collected in China. For these data, the aims of the analysis were to gain insight into the space-time dynamics and to carry out short-term prediction in order to implement public health campaigns in those areas with a large predicted disease burden. The model we develop decomposes disease risk into marginal spatial and temporal components, and a space-time interaction piece. The latter is the crucial element, and we use a tensor product spline model with a Markov random field prior on the coefficients of the basis function. The model is highly parameterized and so computation is carried out using the integrated nested Laplace approximation (INLA) approach, which is fast. A simulation study shows that the model can pick up complex space-time structure and our analysis of HFMD in the central north region of China provides new insights into the dynamics of the disease.
Bio:
My name is Cici Chen Bauer and I am currently an assistant professor in the Biostatistics Department at Brown Univeristy. I graudated with PhD in Statistics from the Univeristy of Washington Seattle in Aug 2012. My research interests include spatial epidemiology, space-time models for health data, small area estimation and Bayesian hierarchical models for survey data.
Eliciting Priors for Hurdle Models with Shared Covariates
Abstract
Hurdle models are often presented as an alternative to zero-inflated models for count data with excess zeros. Hurdle models consist of two parts: a binary model indicating a positive response (the "hurdle") and a zero-truncated count model. One or both parts of the model can be dependent on covariates, which may or may not overlap. In the case of shared covariates, it may be reasonable to posit an implicit relationship between the two parts of the model. We propose an informative prior structure that takes advantage of such a relationship and apply it to a hypothetical sleep disorder study.
Does Disability Status Influence Patient Accessibility/Provider Contact, Patient-Provider Communication, Provider Coordination of Care and Patient Satisfaction?
Abstract
National survey data indicated that the number of individuals with disabilities is rising. Those with disabilities experience a large number of barriers to effective patient accessibility/provider contact, patient-provider communication, and provider coordination of care ultimately resulting in patient dissatisfaction. However, little research has been done on the patient-physician relationship and patient satisfaction issues as perceived by persons with disabilities. Only a limited number of studies have used nationally representative data to examine the health status of individuals with disabilities in comparison to those without disabilities. The Medical Expenditure Panel Survey (MEPS) was used to examine whether disability status influenced patient accessibility/provider contact, patient-provider communication, provider coordination of care and patient satisfaction. The purposes of this study were to determine if there were relationships between disability status and patient accessibility/provider contact, patient-provider communication, provider coordination of care and patient satisfaction, as well as how disability status affected the likelihood of ineffective patient-provider relationship and patient satisfaction. The MEPS was also used to assess whether disability status was associated with common chronic diseases and lower use of preventive care services. Additional analyses involved correlating disability status with race and ethnicity stratification in terms of patient-physician relationship and patient satisfaction. A retrospective analysis was conducted comparing the health of adults with disabilities to adults with no disabilities using data from the 2005-2009 full year consolidated data files from MEPS. Chi square analyses were performed to determine if there were significant differences in patient-physician relationship and patient satisfaction for persons with disabilities compared with persons without disabilities. A series of logistic regression analyses were conducted examining the likelihood of ineffective patient-physician relationship and patient dissatisfaction with the independent variable of disability status. Those with disabilities were less likely to be able to contact their physician than individuals without disabilities. Adults with disabilities were significantly more likely than persons without disabilities to perceive that their physician did not outline options for treatment or explain treatments in a way they understood, did not involve them in treatment decisions, did not listen to them, did not spend sufficient time with them and did not treat them with respect. Individuals with disabilities were also less likely than individuals without disabilities to receive preventive care services and more likely to be dissatisfied with their health care. Individuals with disabilities were also more likely to have chronic illnesses than persons without disabilities. This study revealed that adults with disabilities are at an increased risk of experiencing ineffective patient-provider relationship and poor patient satisfaction, thus, compromising their current health status and increasing the possibility of secondary chronic illnesses. This data suggests that adults with disabilities and chronic conditions receive significantly fewer preventive care services and have poorer health status than individuals without disabilities who have the same health conditions. This implies a need for public health interventions and methods for improving the effectiveness of patient-physician relationship and patient satisfaction among the disabled population.
Bio:
My name is Miranda M. Chung and I am currently a DrPH candidate in Health Policy and Management at New York Medical College. In May 2014, I anticipate graduating from the doctoral program and then intend to produce more research at a federal government Agency that will make a difference in health services research, health policy, program evaluation and other passionate areas of interest for priority populations, women being an integral group. Prior research experience include: being a Health Services Research (HSR) Fellow and Program Analyst/Intern at the Agency for Healthcare Research and Quality; Health Policy, Research and Program Analyst/Intern at the U.S. Department of Health and Human Services (DHHS) Office of the Assistant Secretary of Health (OASH) and Office of the Secretary (OS); project collaborations and National/Regional events with the Office on Women's Health (OWH), Office of Minority Health (OMH), Administration for Children and Families (ACF), Center for Medicaid and Medicare Services (CMS), Drug Enforcement Administration (DEA) and Human Resource Services Administration (HRSA); Bergen County Housing, Health and Human Services Center's (BCHHHSC) Program and Research Analyst; United Nations (UN) Women-United States National Committee's (USNC) Research and Public Affairs Intern; and Putnam County Department of Health's (PCDOH) Health Services Research/Analytical Intern. Simultaneously, while accumulating priceless research/public health experience and attending school full-time since 2007, I worked full-time as a Respiratory Therapist at New York Hospital Queens Medical Center and North Shore University Hospital until 2012.
Latent class approach to survival analysis with a compound Poisson frailty model with application to HIV prevention trials
Abstract
In randomized clinical trials where the outcome is time-to-event, intervention effectiveness is estimated with the Cox model. When heterogeneity is present, the assumption of proportionality does not hold and the Cox population-level estimate underestimates effectiveness for individuals at risk. This discrepancy is of particular concern in HIV prevention trials, where heterogeneity is expected and some participants have no risk of an event. Frailty models adjust for heterogeneity, but existing methods for univariate survival data assume a shared frailty distribution and provide no mechanism for using risk-related covariates to inform individuals’ frailties. We propose a Bayesian hierarchical approach that models frailty with a mixture of compound Poisson distributions by classifying participants into latent risk groups using covariate data. Individuals within a class share a frailty distribution. The model also allows that some participants have no risk of an event. We apply the proposed model to data from a recently completed HIV prevention trial to estimate individual-level intervention effectiveness as well as the effect of covariates on probability and magnitude of risk.
Bio:
Rebecca Yates Coley is a PhD candidate in Biostatistics at the University of Washington where her advisor is Elizabeth Brown. She is interested in the development of statistical methods to address questions of public health and clinical practice. Her dissertation research focuses on the development of Bayesian hierarchical frailty models for the analysis of univariate survival data. Areas of methodological interest include Bayesian modeling, survival analysis, clinical trials, and causal inference
Pre-operative prediction of surgical morbidity in children
Abstract
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to machine-learning algorithms in predicting 30-day surgical morbidity in children. All models included procedures and 49 pre-operative patient characteristics and were fit using data from 48,089 cases in the National Surgical Quality Improvement Program-Pediatric. After optimizing model fit using cross-validation in a training dataset, we compared discrimination (c-statistic), calibration intercept and slope, classification accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the models in the remaining cases. Compared to a LR model with no interactions and linear relationships between predictors and the log-odds of morbidity, ensemble-based methods showed higher accuracy (random forests (RF) 93.6%, boosted classification trees (BCT) 93.6%, LR 93.2%), sensitivity (RF 39.0%, BCT 39.8%, LR 37.5%), specificity (RF 98.7%, BCT 98.5%, LR 98.4%), PPV (RF 73.2%, BCT 71.5%, LR 68.0%), and NPV (RF 94.6%, BCT 94.7%, LR 94.5%). However, only BCT showed superior discrimination (BCT c=0.880, LR c=0.871) (p<.05 for all), and none of the models performed better than a more flexible LR model that incorporated restricted cubic splines and significant interactions (accuracy 93.7%, sensitivity 41.5%, specificity 98.5%, PPV 72.0%, NPV 94.8%, c=0.877). Both LR models showed superior calibration compared to the ensemble-based algorithms. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks.
Bio:
Jennifer Cooper is a Research Scientist at Nationwide Children's Hospital Research Institute in Columbus, Ohio. Before joining Nationwide Children's, she obtained a PhD in Epidemiology and MS in Biostatistics from the University of Pittsburgh Graduate School of Public Health in 2012. Dr. Cooper works with pediatric surgeons on a variety of surgical outcomes research studies. Her methodologic interests include the development and use of tools to assess patient reported outcomes, observational comparative effectiveness research studies using large healthcare databases, and the development and validation of algorithms to predict clinical outcomes.
Maternal-fetal genotype incompatibility test for quantitative trait outcomes in large families
Abstract
Complex familial disorders result from gene-by-environment and gene-by-gene interactions. Environmental factors, which may impact susceptibility of disease, can result from maternal-fetal genotype interactions because the maternal genotype helps determine the environment in which the fetus develops. Maternal-fetal genotype (MFG) incompatibility has been shown to be involved in complex diseases, even those that do not manifest until adulthood. Presently, there are no existing methods to investigate the effects of MFG incompatibility on quantitative traits in large families. Current methods use a retrospective likelihood to detect family-based association of quantitative traits in case-parent trios but these methods are not particularly powerful and are cumbersome to generalize to arbitrary family structures. In contrast, we use a linear mixed model where the genotypes of the offspring, mother and their interactions are included as covariates and the outcome is a trait whose residuals are reasonably modeled as normally distributed, thus allowing utilization of larger pedigrees and easy inclusion of additional genetic and environmental covariates. Our research demonstrates that this approach has desirable statistic properties regardless of risk allele frequencies. It is effective in identifying MFG incompatibility with the appropriate proportion of false positives and has considerable power under realistic effect sizes.
Bio:
Michelle Creek is a fourth year doctoral student in Biostatistics at UCLA under the advisement of Dr. Janet Sinsheimer. She received her B.S. in Mathematics with a minor in human biology from Chapman University in 2010. Her research interests are focused on developing new statistical genetic methods for analyzing complex traits. In particular, she concentrates on approaches to detect transgenerational genetic effects on disease susceptibility.
Interpreting Interactions Among Risk Factors in HIV Suppression
Abstract
In logistic regression models with categorical predictors, if the design matrix used to obtain the linear predictor is constructed so that the model parameters are measuring differences between a reference group and the levels of the factor (e.g. treatment contrasts in R), then the main effects parameters are easily interpreted. By exponentiating the parameter estimates, we obtain odds ratios, that is the odds of the response variable being one in a level of the predictor divided by the odds of the response variable being one at the reference level. Under this parameterization, the interaction predictors are easily obtained by multiplication. Unfortunately, the interaction parameters are not as easily interpreted as the main effect parameters, not to mention inference associated with the interactions.
Various authors have suggested measures that may be calculated to interpret an interaction in a logistic regression model with categorical predictors, e.g., interaction contrast ratios (ICR) or equivalently relative excess risk due to interaction (RERI), attributable proportion due to interaction (AP) and a synergy index (S) measuring the ratio between combined effect and individual effects. These measures are said to be additive. Others are multiplicative. In as much as most consumers of the information being described by the model are not statisticians, mathematicians nor scientists, the interpretation of interactions using these measures may be difficult.
When response variables are quantitative, the predictors are categorical and the link function is the identity link (e.g. ANOVA models), we interpret interactions by forming the difference between differences, construct contrasts and develop confidence intervals for the interaction contrasts. The method for doing this follows directly from what we know about linear combinations of random variables. The contrasts are easily interpreted and graphed. Apparently, in logistic regression, researchers have found it difficult to develop confidence intervals for interaction contrasts, often having to resort to the assumption that odds ratios are approximately risk ratios. We show that this problem is readily solved in the Bayesian approach using Markov Chain Monte Carlo methods.
Linking vitamin D status to vitamin D intakes: Fitting a nonlinear model with measurement error
Abstract
Sufficient vitamin D levels are essential to maintain healthy bones and to reduce risk of fracture. It is difficult, however, to determine recommended intake levels, due to the complexity of the metabolism of the vitamin D we consume.Vitamin D levels (or status) depend on factors other than consumption of vitamin D from food and supplements; for example, they depend on sun exposure, skin pigmentation, adiposity and several others. A biomarker for vitamin D status is a person's 25-hydroxy vitamin D (OHD) level. From a practical viewpoint, we cannot make public health recommendations using OHD levels. Ideally we want be able to make recommendations for intakes of vitamin D. In our work, we model the association between intake of vitamin D from all sources and the level of OHD in the serum. Since we can only obtain noisy measurements of vitamin D intake, we fit a nonlinear model where vitamin D intake (as well as other covariates) are contaminated with measurement error. Initial results suggest that there is indeed a nonlinear association between OHD and vitamin D intake ---- with differences between ethnicities.
Bio:
Attended Grinnell College where I received my undergraduate degree in Mathematics/Statistics and French. Have a Masters degree from Iowa State University and currently pursuing a PhD.
Transforming Health Care Through Data
Abstract
The health care system has changed dramatically over the years and is more data centric than ever. These days petabytes or Exabyte of data including clinical outcome data, socio-economic data, personal genomic data, and medical imaging data are collected on patients. While organizing, storing and distributing the data in a secured fashion is a daunting task, analyzing these data and integrating the results to assist physicians is even harder. These changes pose an opportunity and a challenge for statisticians. Statisticians are central and essential to uncovering the scientific knowledge hidden in the data tsunami. In this session, we will focus on how statisticians are handling the challenge and transforming patient healthcare.
Bio:
Susmita Datta is a professor of Bioinformatics and Biostatistics and a Distinguished University Scholar at University of Louisville. She is a past president (2013) of the Caucus for Women in Statistics (CWIS). She is an executive committee member of this conference. She is an elected fellow of International Statistical Institute (ISI) and a fellow of American Statistical Association (ASA). She received her PhD from University of Georgia in 1995 and remained a postdoctoral associate at Emory University for two years. She joined Georgia State University in 1997 and promoted to tenured Associate Professor in 2001. She held a joint appointment in the Department of Mathematics and Statistics and Department of Biology at Georgia State University. She joined University of Louisville in 2005 and promoted to a full professor in 2010. Her methodological research areas include bioinformatics, proteomics, clustering and classification, infectious disease Modeling, non-linear regression Modeling for systems biology, statistical genetics, statistical issues in population biology, survival analysis and multi state models. Her clinical research interest is in cancer research, birth defects research. Her research has been funded by NIH and NSF. She frequently serves at the program committees of major bioinformatics conferences such as ISMB/ECCB and CAMDA. She is involved in multiple NIH grant review panels, served as biostatistics program reviewer of NIEHS and serves on the editorial boards as an associate editor of several statistics and bioinformatics journals. She is highly motivated to improve the status of women in STEM fields.
John Quackenbush is a Professor of Computational Biology and Bioinformatics in the Department of Biostatistics, Harvard School of Public Health and at Dana-Farber Cancer Institute and the since 2005. He has received his PhD in 1990 in theoretical physics from UCLA working on string theory models. Following two years as a postdoctoral fellow in physics, Dr. Quackenbush applied for and received a Special Emphasis Research Career Award from the National Center for Human Genome Research to work on the Human Genome Project. He spent two years at the Salk Institute and two years at Stanford University working at the interface of genomics and computational biology. In 1997 he joined the faculty of The Institute for Genomic Research (TIGR) where his focus began to shift to understanding what was encoded within the human genome. Since joining the faculties of the Dana-Farber Cancer Institute and the Harvard School of Public Health in 2005, his work has focused on the use of genomic data to reconstruct the networks of genes that drive the development of diseases such as cancer and emphysema. Quackenbush currently serves on the editorial boards of five major journals and is editor-in-chief at Genomics. He has served on several committees at the National Academies and the Institute of Medicine, including the Committee on Validation of Toxicogenomic Technologies. He is currently a member of scientific advisory boards at the Lovelace Respiratory Research Institute and the Hope Funds for Cancer Research and was previously an advisor at St. Jude Children’s Research Hospital and the National Institute for Health's Roadmap Epigenomics Project. Quackenbush is also a member of the scientific advisory boards of a number of biotech start-up companies, including Exosome Diagnostics, Karyopharm Therapeutics, and NABsys and he founded the Precision Medicine software company, GenoSpace. In 2013 he was honored as a White House Open Science Champion of Change for his work in making large-scale data available, usable, and useful.
An information theoretic approach to biomarker validation.
Abstract
Recent technological and therapeutic advancements have increased demand for the use of biomarkers in clinical trials. Biomarkers are useful for obtaining information about the clinical endpoint when the collection of observational data may be impractical or costly. As an initial step, biomarker validation is conducted to ensure that change and lack of change in the clinical endpoint are captured.
The commonly used validation methods are not comparable across different types of biomarkers and different types of clinical endpoints. This motivates the use of an information theoretic measure which is comparable across biomarkers and clinical endpoints. The focus of this talk is the validation of various types of biomarkers for qualitative clinical endpoints. An estimate of this measure is given and various properties including bias, variance, and modes of convergence are derived.
Simulation studies are conducted to explore boundary effects and to study whether or not a loss in information occurs when a continuous biomarker is discretized. The discussed methodology is applied to data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to investigate the amount of information captured by various biomarkers about three year conversion to Alzheimer’s Disease (AD) from Mild Cognitive Impairment (MCI).
Bio:
My name is Erin Dienes and I received my Ph.D. in biostatistics from the University of California Davis in August of 2013. I'm a mathematical statistician for the Office of Research and Methodology at the National Center for Health Statistics. My research interests include biomarker validation, information theory, small area estimation, and multiple imputation diagnostics.
Logit-Normal Mixed Model for Indian Monsoon Rainfall
Abstract
Describing the nature and variability of Indian monsoon rainfall extremes is a topic of much debate in the current literature. We suggest the use of a General Linear Mixed Model (GLMM) specifically, the logit-normal mixed model, to describe the underlying structure of this complex climatic event. Several GLMM algorithms are described and simulations are performed to vet these algorithms before applying them to the Indian rainfall data procured from the National Climatic Data Center. The logit-normal model was applied with fixed covariates of latitude, longitude, elevation, daily minimum and maximum temperatures with a random intercept by weather station. In general, these estimation methods concurred in their suggestion of a strong relationship between the El Nino Southern Oscillation (ENSO) index and extreme rainfall variability estimates. This work provides a valuable starting point for extending GLMM to incorporate the complex dependencies in extreme climate events.
Bio:
I completed a B.S. (2006) and M.S. (2008) in mathematics from the University of Minnesota- Duluth campus. As an undergraduate, I participated as a women's varsity basketball player and was named All American and Academic All American numerous times. I was also a volunteer assistant coach for the team while completing my master's degree. From 2008-2013, I worked as a model validation analyst for U.S. Bank and eventually attained Assistant Vice President status. I chose to pursue further education and am currently in my 3rd year as a Statistics Ph. D. student at the University of Minnesota-Twin Cities campus. I'm working with my advisor on developing and implementing statistical methods that are effective with climate change data.
Mixture Representable Treatment Efficacy Measure and Subgroup Mixable Inference Procedure in Personalized Medicine Development
Abstract
Measuring treatment efficacy in mixture of subgroups is a fundamental problem in personalized medicine, in deciding whether to treat the entire patient population or to target a subgroup defined by certain biomarker(s), for example. Proposing a new concept called mixture representable treatment efficacy measure, we show that some commonly used treatment efficacy measures are not suitable for mixture populations. We also show that for binary and time-to-event outcomes, it is inappropriately to directly apply least squares means on their models’ natural scale when doing inference. For such outcomes, we propose subgroup mixable estimation as an appropriate extension of the least squares means concept. Using the time-to-event outcome as an example, we develop a simultaneous inference procedure based on the new ideas above to provide a useful guidance for the decision making in personalized medicine development.
Bio:
I am an assistant professor (tenure stream) at Department of Biostatistics, University of Pittsburgh. I got my Ph.D. degree from Department of Biostatistics, University of Michigan in 2010.
My primary research interests include semiparametric methods and inferences, especially in survival analysis; personalized medicine such as designs and analysis for biomarker studies, biomarker and subgroup identifications, multiple comparisons. Currently, my collaborative research focuses on proteomic experiment design and network analysis; bivariate survival approach for progression of eye diseases.
Prior to joining Univeristy of Pittsburgh in Jan 2013, I had three-year experience working at a pharmaceutical company - Eli Lilly. I had been a lead statistican on multiple investigational anti-diabetes compounds through different clinical phases. I has been working on multiple projects in tailored therapuetics research since 2011.
Graduate Education for the Next Generation: It's More than Course Work and Qualifying Exams
Abstract
Graduate education, especially in Statistics and Biostatistics, in 2014 is much more involved and demanding than it was twenty years ago. While the big data era presents exciting opportunities for statisticians, it also presents significant challenges in graduate training. Specifically, more interdisciplinary training in computer science, machine learning and subject-matter science. Both Masters and Ph.D. students seek out internships, publications, advanced computing skills, teaching, consulting or collaboration with other disciplines, as well as volunteer opportunities in efforts to make themselves more competitive in the job market. As CVs and resumes become packed with experiences and publications, we, as educators are left wondering how to prepare the next global generation of graduate students for successful futures. Each member of this panel will reflect on her own graduate education, discuss and contrast her experiences raising today's generation of graduate students, and discuss emerging challenges and graduate education in Statistics and Biostatistics for the next generation and what it will require. The hope for this session is a frank and open discussion about the past, present and future challenges of graduate education Statistics and Biostatistics.
Bio:
Rebecca Doerge is the Trent and Judith Anderson Distinguished Professor of Statistics at Purdue University. She joined Purdue University in 1995 and holds a joint appointment between the Colleges of Agriculture (Department of Agronomy) and Science (Department of Statistics).Professor Doerge's research program is focused on Statistical Bioinformatics, acomponent of bioinformatics that brings together many scientific disciplines into one arena to ask, answer, and disseminate biologically interesting information in the quest to understand the ultimate function of DNA and epigenomic associations. Rebecca is the recipient theTeaching for Tomorrow Award, Purdue University, 1996; University Scholar Award, Purdue University, 2001-2006; and the Provost's Award for Outstanding Graduate Faculty Mentor, Purdue University, 2010. She is an elected Fellow of the American Statistical Association (2007), an elected Fellow of the American Association for the Advancement of Science(2007), and a Fellow of the Committee on Institutional Cooperation (CIC; 2009). Professor Doerge has published over 100 scientific articles, published one book, and graduated 22 PhD students.
Susmita Datta is a professor of Bioinformatics and Biostatistics and a Distinguished University Scholar at University of Louisville. She is a past president (2013) of the Caucus for Women in Statistics (CWIS). She is an executive committee member of this conference. She is an elected fellow of International Statistical Institute (ISI) and a fellow of American Statistical Association (ASA). She received her PhD from University of Georgia in 1995 and remained a postdoctoral associate at Emory University for two years. She joined Georgia State University in 1997 and promoted to tenured Associate Professor in 2001. She held a joint appointment in the Department of Mathematics and Statistics and Department of Biology at Georgia State University. She joined University of Louisville in 2005 and promoted to a full professor in 2010. Her methodological research areas include bioinformatics, proteomics, clustering and classification, infectious disease Modeling, non-linear regression Modeling for systems biology, statistical genetics, statistical issues in population biology, survival analysis and multi state models. Her clinical research interest is in cancer research, birth defects research. Her research has been funded by NIH and NSF. She frequently serves at the program committees of major bioinformatics conferences such as ISMB/ECCB and CAMDA. She is involved in multiple NIH grant review panels, served as biostatistics program reviewer of NIEHS and serves on the editorial boards as an associate editor of several statistics and bioinformatics journals. She is highly motivated to improve the status of women in STEM fields.
Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the Harvard School of Public Health (HSPH). She received her PhD degree from the Department of Biostatistics of the University of Washington in 1994. She was on the faculty of the Department of Biostatistics at the University of Michigan between 1994 and 2005 before she joined the HSPH in 2005. Her research interests lie in development and application of statistical and computational methods for analysis of high-throughput genetic, genomic and 'omics data in epidemiological, environmental and clinical sciences. Her method research is supported by the MERIT award and a P01 grant from the National Cancer Institute. She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology. Dr. Lin received the 2002 Mortimer Spiegelman Award from the American Public Health Association, and the 2006 Presidents' Award from the Committee of the Presidents of Statistical Societies (COPSS). She is an elected fellow of the American Statistical Association, Institute of Mathematical Statistics, and International Statistical Institute. She was the Chair of the COPSS (2010-2012). She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science. She has served on numerous editorial boards of statistical and genetic journals. She was the former Coordinating Editor of Biometrics, and currently the co-editor of Statistics in Biosciences, and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics. She has served on a large number of study sections of NIH and NSF.
Why Women Can't Have It All? A Discussion
Abstract
I don't have an answer to the question of "Why women can't have it all". There is not a single correct answer.
This is a topic dominated by personal choices, and strategies highly depend on the situation, the work
environment, the cultural background etc. However I do believe you are all here because of a common goal:
you are here because you are an ambitious woman that loves her job and wants to be successful at it while at the
same time you want to be a good mother and/or caregiver. We will have a frank discussion of how to make that happen.
Bio:
Francesca Dominici is Professor in the Department of Biostatistics at the Harvard School of Public Health and the Associate Dean of Information Technology. She has extensive experience on the development of statistical methods for the analysis of large observational data and for comparative effectiveness research. She has led several national studies on the health effects of air pollution, heat waves and climate change. She is now working on statistical methods for causal inference to assess the public health consequences of public health interventions. Dr. Dominici has served on a number of National Academies’ committees. Dr. Dominici received her Ph.D. in statistics at the University of Padua, Italy.Leading Analytics and Insights as a Diverse Leader
Abstract
This panel discussion will focus on a series of topical questions addressed by Kiersten and Suzanne, followed by audience Q&A. The topics will be divided in two broad themes: 1) A Business Development Dimension (topics to include: How to balance the hard and soft sciences; How to proactively drive your analytics leadership journey; The challenges in communicating/messaging analytics to a non-technical audience) and 2) A Personal Development Dimension ( topics to include: How to seek Constructive Feedback; How to identify and find a personal champion; How to calibrate and balance your high self-expectations across the organization ). Kiersten and Suzanne will openly discuss their 'moments of truth', hoping that attendees can learn from their experiences and personal insights.
Bio:
Suzanne Smith has been at Lowe’s for eight years and has served in numerous research and analytic leadership positions that support the enterprise, including: marketing, macro-economic and consumer analytics, branding, merchandising, global assessments, enterprise measures of success, and most recently, Enterprise Analytics. Suzanne is passionate about using analytics to uncover and give meaning to the diverse customer experience in an omni-channel retail environment. Prior to Lowe’s, Suzanne was a Research Associate at the UNC Charlotte Urban Institute, focusing on social and economic research. She has a BA in Sociology & Criminology and a BA in Psychology from Winthrop University; and a Masters in Sociology & Statistics from UNC Charlotte.
Kiersten Einsweiler has been with Lowe’s for 10 years, providing analytic solutions for multiple business areas, including merchandising, marketing, performance forecasting, credit, and pricing and promotion. In her work, she strives to create solutions that promotes personalization for Lowe's customers. She has a BS from Arizona State University in Psychology and a Master of Statistics from North Carolina State University in Statistics and Operations Research.
Dan Thorpe is currently the Vice President of Enterprise Analytics at Lowes. Prior he was with Walmart, Sam's Club, Wachovia and WL Gore and Associates, with one of his passions being the creation and enabling of high performing analytics associates and teams.
Small Area Estimates for the Conservation Effects Assessment Project
Abstract
The Conservation Effects Assessment Project (CEAP) is a series of surveys that evaluate erosion rates on agricultural land. One of the regions of interest is the Boone/Raccoon River Watershed in Iowa, which is subdivided into smaller watersheds called 8-digit hydrologic unit codes. Because sample sizes for 8-digit hydrologic unit codes are relatively small, model based estimation methods are considered. Exploratory analysis suggests a positive correlation between the estimated means and standard deviations for erosion rates in small watersheds. Hierarchical Bayes models are developed that relate direct estimators of variances to covariates. Alternative distributional forms and expectation functions for the direct estimators of the variances are compared.
Bio:
I am an international (originally from Romania) PhD student in the Department of Statistics, at Iowa State University, and a research assistant for the Center for Survey Statistics and Methodology. I completed my MS in Statistics at Iowa State University in December 2012 and my BA in Mathematics at Colorado State University in May 2011. My interests are in mixed models, small area estimation, computational statistics, and survey sampling.
Bilaterally Contaminated Normal Model with Nuisance Parameter and Its Applications
Abstract
Bilaterally Contaminated Normal Model with Nuisance Parameter and Its Applications
Qian Fan*, University of Kentucky
Hongying Dai, Children's Mercy Hospital
Richard Charnigo, University of Kentucky
Dai and Charnigo (2008, 2010) have proposed various contaminated density models which can be applied to microarray data and studied the MLRT and D-test statistics for these models under an omnibus null hypothesis of no differential expression. Charnigo et al (2013) subsequently proposed the bilateral contaminated normal model (BCN) without nuisance parameter, which is able to describe both gene under- and over-expression simultaneously. They proved the testing procedure is consistent and the test statistic has a limiting normal distribution under the unilateral null hypothesis of differential expression in one direction only. This presentation extends the previous work and puts forward tests of contamination in the BCN model when the common within-component variance is unknown. This model is more flexible compared to the BCN model without nuisance parameter. Yet the derivation and justification of the testing procedures are challenging. The first step is to do an omnibus test of no contamination, for which we propose a union-intersection test. The next stage is testing unilateral contamination against bilateral contamination. Both test procedures are shown to be asymptotically unbiased and consistent. We investigate their empirical performances via simulations and application to real data.
e-mail: qfa222@uky.edu
Preparing for Promotion in Academia
Abstract
This panel will discuss issues related to promotion in academia. Jing Qiu will speak from the perspective of
one recently promoted to associate professor in a traditional statistics department; but her experience is
complicated by her having consulting responsibilities in a College of Agriculture, Food and Natural Resources.
Efstathia Bura has experienced a recent promotion to full professor in a traditional department of statistics.
The more senior members of the panel, Nancy Flournoy and Jessica Utts, will share some words
of advice.
Bio:
Efstathia Bura is a professor in the Department of Statistics at the George Washington University. Her main area of research is high dimensional statistics and dimension reduction in regression. She has also worked on developing statistical methodology with applications in Biostatistics and Biophysics, Finance and Economics and Legal Statistics. She spent two years as a biostatistician in a biotech company, and she has received research and travel grants from NSF and a Fulbright scholar award in 2011. She served as Deputy Chair, Master's Program Director and currently she is the Biostatistics PhD Program Director of her department. She has also served in several professional committees of the American Statistical Association and is associate editor for The American Statistician and a book co-editor in Law, Probability and Risk.
Nancy Flournoy is Curators¹ Distinguished Professor in the Department of Statistics at the University of Missouri. She is an expert in adaptive allocation methods focusing on dose-response experiments; her interest is shifting to questions of inference for such sequential experiments. She was the first female director of the NSF Program in Statistics. She is Fellow of ASA, IMS, AAAS, and the World Academy of Art and Science, and she is an elected Member of the ISI. She is recipient of the Elizabeth Scott, F.N. David, Janet Norwood awards, distinguished service awards from NSF and NISS. She served as department chair at the University of Missouri and American University, and she was a member of the team that pioneered bone marrow transplantation under the direction of Nobel Laureate E. Donnell Thomas.
Jing Qiu is an Associate Professor of Statistics in the Department of Statistics with joint appointment as a consultant for the College of Agriculture, Food and Natural Resources at University of Missouri, Columbia. Her areas of research interest and expertise are in bioinformatics, gene expression analysis, empirical Bayes confidence intervals, empirical Bayes tests for high dimensional parameters, equivalence testing, multiple testing.
Jessica Utts is Professor and Chair of the Department of Statistics at the University of California, Irvine. She previously was Professor of Statistics at the University of California, Davis, where she also held two administrative roles at various times - Associate Vice Provost for University Outreach, and Director of the campus-wide honors program. She is a Fellow of ASA, IMS, AAAS and the Association for Psychological Science, and has served as President of the Caucus for Women in Statistics and WNAR, and as Chair of COPSS. Her areas of statistical expertise include statistical education and literacy, and applications of statistics to a variety of disciplines, most notably parapsychology. She has appeared on numerous television shows to discuss that work, including Larry King Live and CNN News. While at UC Davis she was also co-P.I. on a five-year grant from the Sloan Foundation, titled " Model Projects: Enhancing the Educational Environment and Opportunities for Women in Engineering, Math and Science," which created multiple programs for STEM women students and faculty, some of which are still ongoing at UC Davis.
Consistent Biclustering
Abstract
Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. A variety of biclustering algorithms exist, and they have been applied successfully to data sources ranging from review-website data to gene expression arrays. Currently, while biclustering appears to work well in practice, there have been no theoretical guarantees about its performance. We address this shortcoming with a new biclustering algorithm based on profile likelihood and a theorem providing sufficient conditions for consistent biclustering when both dimensions of the data matrix tend to infinity. This theorem applies to a broad range of data matrices, including binary, count, and continuous data. We propose a new approximation algorithm to implement profile likelihood biclustering and we show that our algorithm has low computational complexity and performs well in practice. We demonstrate our results through an empirical study that includes examples from collaborative filtering and microarray analysis.
Bio:
I am a statistics PhD student at the Stern School of Business at New York University. I am interested in problems involving high-dimensional data analysis that combine ideas from statistics and machine learning. My current research focuses on regularization methods and unsupervised learning methods. In both areas I'm interested in problems related to model selection and theoretical performance.
Development and Evaluation of Two Reduced Form Versions of a Deterministic Air Quality Model for Ozone and Particulate Matter
Abstract
Due to the computational cost of running regional-scale numerical air quality models, reduced form models (RFM) have been proposed as computationally efficient simulation tools for characterizing the pollutant response to many different types of emissions reductions. The U.S. Environmental Protection Agency has developed two types of reduced form models based upon simulations of the Community Multiscale Air Quality (CMAQ) modeling system. One is based on statistical response surface modeling (RSM) techniques using a multidimensional kriging approach to approximate the nonlinear chemical and physical processes. The second approach is based on sensitivity calculations from the Higher-Order Decoupled Direct Method in 3 dimensions (HDDM) and uses a Taylor series approximation for the nonlinear response of the pollutant concentrations to changes in emissions from specific sectors and locations. Both types of reduced form models are used to estimate the changes in O3 and PM2.5 across space associated with emissions reductions of NOx and SO2 from power plants and other sectors in the eastern United States. This study provides a direct comparison of the RSM and DDM RFMs in terms of: computational cost, model performance against brute force runs, and model response to changes in emissions inputs. The DDM RFM is found to be more computationally efficient and has similar evaluation performance for ozone for low to moderate emissions reductions compared to the kriging-based RSM. However the RSM tends to provide more accurate predictions for PM2.5 and for predictions for very large emissions cuts (e.g. -70-90%).
Bio:
Kristen Foley is a statistician for the U.S. Environmental Protection Agency's Atmospheric Modeling Division. She received her Ph.D. in Statistics from NC State University in 2006. Her thesis research involved improving forecast for hurricane-induced storm surge using data assimilation and spatial statistical modeling methods. Dr. Foley began working at the EPA as a postdoctoral fellow before transitioning to a full time employee in 2007. Her current research includes development and application of statistical techniques to evaluate air quality model output using disparate types of observational data. She collaborates with meteorologist, environmental engineers, computer scientists and other statisticians to provide EPA policy makers with information needed for the development of emission control policies and regulations to improve our nation's air quality.Nonresponse weighing adjustment using penalized spline regression
Abstract
The nonresponse weighting adjustment that consists of multiplying the sampling weight of the respondent by the inverse of the estimated response probability is a common method to reduce the nonresponse bias in sample surveys. However, the response probability estimator from a linear logistic regression using the auxiliary information may not be appropriate for the nonlinear response. We propose the response probability estimator which using the penalized spline regression model and show that it reduces the bias for the point estimator. We discuss the properties of the direct NWA estimator using the penalized spline logistic regression. We describe two methods of the variance estimation. We conduct a simulation study to compare the nonparametric estimator for both linear and nonlinear cases.
Bio:
I am a Ph.D. student of Statistics Department in Colorado State University. I expect to graduate in May 2015.
I earned my master degree in Statistics from North Dakota State University in 2011 and the bachelor's degree in economics from Capital University of Economics and Business (China) in 2009.
My research area includes nonparametric, survey sampling and some applications in economics and environment. I am now working on nonparametric model in survey sampling with my advisor Jean Opsomer in Colorado State University.
I am a proactive and social person. I am enthusiasm about statistics. I can speak Chinese, English.
Different Propensity Score Stratification Models to Estimate Adjusted AUC: Study of Gender Differences in Pain Frequency
Abstract
Propensity score methods have been widely used in epidemiologic research to reduce bias in cohort studies. For continuous outcomes, the mean difference between two risk groups is a well-known measure of group effect. Another effect measure for which there has been an increased interest in the literature is the probability that a randomly selected participant in the treatment group (X) has a better result than a randomly selected participant in the comparison group (Y), i.e. P(X>Y). This probability is equivalent to the area under the curve (AUC), a common measure used with receiver operating characteristic (ROC) curves to assess accuracy of medical tests. We use the method of stratification on the propensity score to estimate AUC while controlling for confounding. The adjusted AUC estimator is a weighted average of the stratum-specific AUCs. Finally, we compare the adjusted AUC with the well-known Mann-Whitney non-parametric statistic. We illustrate the methodology using a sample of adults with sickle cell disease (SCD), living in the Richmond and Tidewater areas of Virginia to estimate the effect of gender on frequency of pain due to SCD controlling for differences between groups.
Bio:
Hadiza Galadima was born and raised in Niamey, Niger. However, she completed her undergraduate studies in Statistics and Actuarial Science at St. Cloud State University in Minnesota. She is currently a PhD candidate in Biostatistics at Virginia Commonwealth University under the advisement of Dr. Donna McClish. Her dissertation focuses on examining the performance of propensity score methods to control for confounding when AUC is used as the measure of group effect. Her area of interests include propensity score (PS) analysis, ROC analysis, clinical trials, and variables selections for PS model building.
NIH and NSF Methods Grants: Pre-submission to Funding
Abstract
Grant applications to the National Institutes of Health (NIH) and National Science Foundation (NSF) are submitted as solicited (in response to funding opportunities) or unsolicited (investigator initiated) proposals. This panel will provide suggestions on how to prepare for NIH and NSF grant application submissions; especially for those submitting for the first time. It will cover different stages of grant application process and provide helpful websites and contacts. Information will also be provided for those interested in global health research and collaborations. New policies affecting grant applications are published periodically. As an example, two policies that are published regarding new investigators and multiple investigators will be discussed.
Bio:
Dr. Gezmu holds a PhD in statistics from The American University in Washington, DC. She has worked at the National Institute of Health(NIH), National Institute of Allergy and Infectious Diseases (NIAID), Biostatistics Research Branch (BRB) since 1998 providing statistical consultation of biomedical studies and reviewing clinical trial protocols for AIDS clinical trials, vaccine trials, and prevention studies. Prior to working at the NIH she was an Assistant Professor in the Department of Decision Science at Norfolk State University, and spent the summers of 1997 and 1998 as a faculty fellow in NASA's Laboratory for Terrestrial Physics.
Marie Davidian is the William Neal Reynolds Professor of Statistics at North Carolina State University. She received her Ph.D. in Statistics from the Department of Statistics at the University of North Carolina at Chapel Hill in 1987 under the direction of Raymond J. Carroll. Her interests include statistical models and methods for analysis of longitudinal data, especially nonlinear mixed effects models; methods for handling missing and mismeasured data; methods for analysis of clinical trials and observational studies, including approaches for drawing causal inferences; pharmacokinetic and pharmacodynamic analysis; combining mechanistic mathematical and statistical modeling of disease progression to design treatment strategies and clinical trials; and statistical methods for estimating optimal treatment strategies from data. In addition to her position at NCSU, she is Adjunct Professor of Biostatistics and Bioinformatics at Duke University, and works with the Duke Clinical Research Institute collaborating with clinicians and biostatisticians on problems in cardiovascular disease research. She co-authored the book Nonlinear Models for Repeated Measurement Data and co-edited the book Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. She served as President of the American Statistical Association in 2013.
Applied Logistic Regression Models at the U.S. Census Bureau: Using Statistics to Select Blocks for Address Canvassing in the 2020 Census
Abstract
The 2010 Address Canvassing (AC) operation was the second most expensive single field operation in the 2010 Census at more than 400 million dollars in direct costs. Two logistic regression models are examined here for reducing workloads in the 2020 Census. Recent work has focused on more efficient and parsimonious models predicting one or more adds per block using data from the 2010 AC operation. In this presentation, I will focus on interpreting the more parsimonious models and estimated coefficients. I will discuss resource expenditures versus quality trade-offs and a detailed analysis of the predicted outcomes of this modeling endeavor. Special attention is given to applying statistical approaches to real world outcomes.
Monotone Interpolation with Uncertainty Quantification for Computer Experiments
Abstract
In statistical modelling of computer experiments sometimes prior information is available about the underlying function. For example, the physical system simulated by the computer code may be known to be monotone with respect to some or all inputs. We develop a Bayesian approach to Gaussian process modelling capable of incorporating monotonicity information for computer model emulation. A sequential Monte Carlo algorithm is used to sample from the posterior distribution of the process given the simulator output and monotonicity information. The performance of the proposed approach in terms of predictive accuracy and uncertainty quantification is demonstrated in a number of simulated examples as well as a real queueing system.
Gender Neutral Writing
Abstract
For 35 years Dr. George Gopen, now the Professor Emeritus of the Practice of Rhetoric at Duke University, has been developing a new way of analyzing and controlling written English, called The Reader Expectation Approach. It explores the insight that most of the clues a reader uses, in trying to perceive “meaning” in a text, come neither from word choice nor from word meaning but rather from structural location. That is, where a word appears in a sentence will control most of the use to which that word will be put. Readers understand these things intuitively; he teaches writers how to be aware of them consciously – and thus to be far superior writers. In studying the habitual writing patterns of thousands of professionals (predominantly scientists and lawyers), he has become aware of three patterns that curiously seem gender based. Women seem to take care of one constant reader’s need better than men, and another worse. The third is a writing habit that, when it appears, is four times as likely to reveal a female writer than a male. This talk will introduce you to the Reader Expectations Approach and address the possibility that these gender-based patterns actually exist.
Function Registration with Regularized Fisher-Rao Metrics
Abstract
Alignment of functional observations is critical in functional data analysis, and has been extensively studied over the past two decades. An information-geometric framework, referred to as the Extended Fisher-Rao (EFR) method, was recently introduced. The EFR defines a proper metric on the orbits of functions (quotient space) and all functions are then aligned to a well-defined template. It was shown that the EFR outperforms previous methods in function registration. However, we note that the EFR has no regularization to control the degree of time warping, which may result in over-alignment in given data. We propose three forms of regularization terms, referred to as second-order, extrinsic, and intrinsic penalties, in the distance definition. The new frameworks simplify the EFR by avoiding the notion of quotient space and address issues on over-alignment and outliers, while retaining the efficiency. Using simulations as well as a real dataset of neural spike trains, we demonstrate that the new methods generate more robust alignment that results in more accurate classification than the EFR method.
Bio:
Glenna Gordon is a Ph.D. candidate in the Department of Statistics at Florida State University under the advisement of Dr. Wei Wu. Ms. Gordon earned a B.S. in Mathematics in 2006 from CSU Bakersfield and a M.S. in Statistics in 2008 from MSU Bozeman. Her current research focuses on the registration of functional data.
Inference for Environmental Intervention Studies using Principal Stratification
Abstract
Previous research has found evidence of an association between indoor air pollution and asthma morbidity in children. Environmental intervention studies have been performed to examine the role of household environmental interventions in altering indoor air pollution concentration and improving health. Previous environmental intervention studies have found only modest effects on health outcomes. It is unclear if the health benefits provided by environmental modification are comparable to those provided by medication. Traditionally, the statistical analysis of environmental intervention studies has involved performing two intention-to-treat analyses that separately estimate the effect of the environmental intervention on health and the effect of the environmental intervention on indoor air pollution concentrations. We propose a principal stratification (PS) approach to examine the extent to which an environmental intervention's effect on health outcomes coincides with its effect on indoor air pollution. We apply this approach to data from a randomized air cleaner intervention trial conducted in a population of asthmatic children living in Baltimore, Maryland, USA. We find that amongst children for whom the air cleaner reduced indoor particulate matter concentrations, the intervention resulted in a meaningful improvement of asthma symptoms, with an effect generally larger than previous studies have shown. A key benefit of using principal stratification in environmental intervention studies is that it allows investigators to estimate causal effects of the intervention for sub-groups defined by changes in the indoor pollution concentration.
Bio:
I obtained a PhD in Statistics from Colorado State University in the fall of 2011 and am currently a postdoctoral fellow in the Biostatistics Department at Johns Hopkins University.
Computation Methods for the Characterization of Chemically Induced Change in Neuronal Networks
Abstract
Thousands of chemicals are utilized in commerce for which adequate toxicity data is lacking. Unfortunately, standard toxicity assays fail to keep pace with the rate at which new chemicals become commercially available. Therefore, high throughput in-vitro toxicity screening methods are needed. In particular, methods for detecting neurotoxicity using multi-well Micro-Electrode Array (MEA) technology are under development. Electrophysiological data gathered from neuronal cells cultured on MEAs include firing and bursting rates, synchrony and network connectivity. Harnessing appropriate statistical techniques to quantify chemically induced changes in neuronal cultures is key to the success of in-vitro neurotoxicity screening. The purpose of this talk is to present the scope and characteristics of MEA data, as well as methods for its visualization and statistical analysis.
Bio:
As a doctoral student in the department of biostatistics at UNC-Chapel Hill, I statistically model neuronal data for chemical screening as well as functional analysis of genes. I have enjoyed the diversity of projects statistics has given me the tools to undertake. I have a MS from UNC Chapel Hill Statistics Department and work at Duke as well as the U.S. EPA concurrent to pursuing my DrPH.
Testing O'Brien Fleming procedure
Abstract
O'Brien and Fleming (1979) proposed a straightforward and useful multiple testing procedure (group sequential testing procedure) for comparing two treatments in clinical trials where subject responses are dichotomous (e.g. success and failure). O'Brien and Fleming stated that their group sequential testing procedure has the same Type I error rate and power as that of a fixed one-stage chi-square test, but gives the opportunity to terminate the trial early when one treatment is clearly performing better than the other. We studied and tested the O'Brien and Fleming procedure specifically by correcting the originally proposed critical values.
When to use Quantile Normalization?
Abstract
Quantile normalization is a powerful method for removing unwanted variability from high-throughput data. It has been widely applied to gene expression microarray data where we assume that observed differences between the distributions of each sample are due to only technical variation unrelated to biological variation. To normalize the samples, the distributions are forced to be the same. In general, this assumption is justified as only a minority of genes are expected to be differentially expressed between samples, but if the samples are expected to have a high percentage of global differences, it may not be appropriate to use quantile normalization as it may remove interesting global biological variation. We propose a novel method to test for global differences between groups of distributions to guide the choice of a normalization approach. We perform a simulation study to illustrate the bias-variance tradeoff of using normalization methods with and without global adjustments in the context of distributions with and without global differences. In addition, we provide several applications to DNA methylation and gene expression experiments and discuss the importance of batch effects. An R-package implementing our method is available on Github (https://github.com/stephaniehicks/quantro).
Bio:
Stephanie Hicks is a Postdoctoral Research Fellow under the direction of Rafael Irizarry in the Department of Biostatistics and Computational Biology at Dana-Farber Cancer Institute and the Department of Biostatistics at Harvard School of Public Health in Boston, MA. She received her B.S. in Mathematics from Louisiana State University and her M.A. and Ph.D. from the Department of Statistics at Rice University in Houston, TX under the direction of Marek Kimmel, Ph.D. (Departments of Statistics and Bioengineering, Rice University) and Sharon Plon, M.D., Ph.D. (Departments of Pediatrics and Molecular and Human Genetics, Baylor College of Medicine). Her research interests focus around developing statistical methods and tools in application for genomics and epigenomics data. Currently she is focused on methods for processing and analyzing DNA methylation and gene expression data using microarrays and next-generation sequencing.
Residual Plots to Identify Outliers and Influential Observations in Structural Equation Modeling
Abstract
Residual plots are standard diagnostic tools frequently used in regression analysis to evaluate underlying model assumptions and identify outliers. Similar graphical tools are uncommon in structural equation modeling (SEM) due to complications that arise when constructing residual-based diagnostics. We present a method for constructing residuals in SEM; specifically we consider three estimators that are weighted linear functions of the observed variables. We then propose a method to construct residual plots under the SEM framework analogous to “residuals versus fitted values plots” in regression analysis. The utility of these plots to identify potential outliers is demonstrated by using Mardia’s exam data. We also discuss the impact the choice of estimator has on the residual plots and provides insight into which residual estimator is “best.” These proposed residual plots lay the foundation for future research into residual-based diagnostic plots in SEM.
Bio:
Laura Hildreth is an Assistant Professor of Statistics at Montana State University and has held this position since 2013. Laura earned her PhD in statistics from Iowa State University and an MS in applied economics from the University of Minnesota, Twin Cities, and completed her undergraduate studies in economics and statistics at the University of Minnesota, Morris. Her research interests include structural equation modeling, applications to the social and behavioral sciences, and statistics education.
Yuan Huang
Abstract
Marker selection suffers from the inaccuracy and the lack of reproducibility due to the limitation of sample size. In this study, we conduct integrative analysis and marker selection under the heterogeneity model, which postulates that different datasets have possibly overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect similarity of identified marker sets -- or equivalently, similarity of model sparsity structures -- across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To solve this problem, we develop a novel sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners and encourage sparsity. A new penalty is introduced to promote similarity of model sparsity structures across datasets. The proposed method has an intuitive formulation and is computationally affordable. Simulation shows that the proposed method outperforms alternatives with more accurate marker identification.
Strategies for Addressing Sampling and Analysis Challenges in Energy Efficiency Program Evaluation
Abstract
Utilities, regulators, and third parties design and implement energy efficiency programs with goals that include increasing efficiency to support decreases in energy consumption and peak demand. Energy efficiency programs target various customer segments (residential, commercial, industrial), end-uses (HVAC, Energy-STAR appliances, lighting), and geographical regions using a number of tools to accomplish energy efficiency (behavior and educational activities, rebate offers, incentive-based retrofits, etc.). Implemented programs are ultimately funded by rate payers (utility customers), are becoming important pieces of infrastructure planning (e.g., as a demand reduction resource in capacity markets), and are being considered for state air quality management plans. Therefore, evaluations demonstrating the impact of program investments on efficiency must be robust and defensible.
Impact evaluations report estimates of gross energy and demand savings, net savings attributable to a program, and load shapes describing hourly demand or savings expected in a typical year. In many cases these quantities cannot be directly measured by simply comparing pre and post energy use for each customer, participant, or efficiency measure. Further, impact evaluation activities are typically limited to costs that make up 5% or less than of the total energy efficiency program budget. Therefore, sampling and analysis play a critical role in program evaluation. Depending on the program, sampling and analysis approaches range from simple to quite complex. This poster will present the sampling and analysis approaches used in two specific energy efficiency program evaluations for the Northeast Energy Efficiency Partnership and the Public Utility Commission of Texas, describe current sampling and analysis approaches in energy efficiency program evaluation in general, and identify remaining challenges and opportunities for growth and contributions from the broader field of statistics.
Bio:
Jennifer Huckett, an associate with Cadmus, specializes in statistical modeling, analysis, and sampling for program planning and evaluation. Dr. Huckett conducts quantitative data analysis for a broad range of projects including process evaluations, impact evaluations, and metering studies. She is experienced in survey sample design and analysis, statistical analysis and model development, and stochastic modeling and simulation. Dr. Huckett has extensive experience programming with R statistical software. Before joining Cadmus in July 2013, Dr. Huckett worked at Battelle in Columbus, Ohio where she addressed problems faced by clients in the US Department of Homeland Security, the intelligence community, and the US Department of Transportation.
Cost Efficiency and Economies of Scale of European Banks: A Multi-Country Nested Bayesian Frontier Model
Abstract
Cost efficiency of banks is a key indicator that provides valuable insight to researchers and policymakers about the functioning of the financial intermediation process, and the overall performance of the entire financial system.
This study focuses on the efficiency of the European banking market and uses Bureau van Dijk database of banks' balance sheets and income statements data for 2819 commercial, savings and cooperative banks from 14 European countries for the 2001-2009 period.
Our interest in the subject is twofold. At each nation’s level, cost efficiency influences the relative competitiveness of banks, setting the profile of the national banking industry with direct implications on economic growth. At the European Union level, the financial, institutional and regulatory integration raise the question about the existence of a common cost frontier.
Among the numerous studies that have explored this subject, two competing approaches that have often produced contradictory results stand out. They differ with respect to the way frontiers are constructed and relative efficiencies determined. These two empirical groups are either developing nation-specific frontiers or assuming a common frontier that may be used to determine and compare efficiencies across countries. There are drawbacks associated with each method. With individual countries frontiers, comparisons across countries are not relevant as the reference system is not the same for everybody. On the other hand, the common frontier setup could be too strong of an assumption as the production technologies might differ substantially across countries.
The objective of this study is to present a Bayesian methodology that nests both of these approaches. A Gibbs sampler is used to estimate a stochastic translog cost frontier with country dummy variables. This is a composite error model that constructs an efficient frontier from which the individual firm deviates due to both inefficiency (incurring higher costs) and measurement error or bad luck (the random aspect).
By varying the degree of precision on the prior distribution of frontier parameters, the model allows for a continuous shift from individual country frontiers to a single, common frontier. This is a flexible way of modeling cost frontiers that nests both approaches (individual multiple frontiers and common single frontier) and it can be applied in numerous other setups.
With an uninformative prior on the variance of the translog parameters, the hybrid model generates the nation-specific frontier results. With a strong prior on the variance of the translog parameters, the hybrid model reproduces the same results as the common frontier approach.
Through the use of an informative prior varied according to our beliefs about the frontiers, the model also allows for different “sub-frontiers” to exist. We find that as the strength of the prior increases and we move from single frontiers for each country towards a common “European frontier", the convergence does not always happen in a direct manner.
The fact that in some cases we need a very strong prior for the parameters' results to start adjusting and move from their single frontier values to their common frontier values might be interpreted as evidence from the data against the idea of a common frontier. This information can be used to separate the countries into subgroups that appear to share a common frontier based on the speed of convergence.
The complexity of the convergence process suggests that our analysis could be improved in future research by running the model for a bigger number of prior values on the variance of the translog parameters.
For a group of selected banks, we compute economies of scale and while the results vary in magnitude depending on the approach, we find more often than not and for all bank sizes that they remain greater than one. Therefore, we can conclude that banks are not operating at their optimum level and could reduce average costs by increasing their output. The subject of economies of scale has been a source of argument for a long time as researchers have found contradictory results over the years. Nevertheless, recent datasets suggest that even large banks have economies of scale greater than one. This study supports these findings.
Key words: Stochastic frontier, Bayesian methods, Gibbs sampler, cost efficiency, European banking system
Bio:
Ana-Maria Ichim was born in Turnu-Magurele, Romania.
She received her bachelor degree in Economics with a major in International Business in 1998 and earned a postgraduate degree in International Trade in 2004 from The Alexandru Ioan Cuza University, Iasi, Romania.
She worked at her Alma Mater's Career Center till August 2004 when she became a graduate student at Louisiana State University.
In May 2006 she received her Master of Science in Economics and since then taught Principles of Economics, Principles of Microeconomics, and Principles of Macroeconomics.
She is currently working as a graduate assistant at the Highway Safety Research Group at Louisiana State University on a project involving distracted drivers and traffic accidents, with data from the Louisiana Crash Reports.
Ana was awarded a Doctor of Philosophy in Economics degree at Louisiana State University in December 2012. She also expects to earn her Master of Applied Statistics in August 2014.
Which Career Path is Right for You?
Abstract
Are careers in Government, Industry and Academia that much different? How do you advance and succeed in each of them? Are the definitions of success different? What is success for you? Is it possible or even advantageous to move from one to another in different periods of your career life? Is it advantageous to bring the Academic experience to Industry or Government or vice-versa? How is the work-life balance in each of them? The panelists will discuss, compare and answer questions about career paths in Government, Industry and Academia.
Bio:
Kerrie Adams is a Director of Customer Analytics at Walmart. She drives the Member communications for Sam’s Club both in targeting as well as in measuring the effectiveness of all Member and prospect communications. She has played an integral role in the standardization of metrics and automation of the campaign analytics. She has also driven analytical insights for the business supporting various areas such as Membership, Credit, Merchandising, and Operations.Prior to Walmart, Kerrie was Manager of Pricing Science where she led price test concept development, model design, execution, tracking and recommendations to maximize profitability of pricing schemas within the business. Her past also includes running an analytics department for a holistic web based software tool providing retailers and restaurants support for their facility operations; reporting market share, patient behavior, and forecasting volume for pharmaceutical clients; and a researcher for a Research and Development company. Kerrie has a Masters of Applied Statistics from The Ohio State University with an undergraduate degree from Otterbein College in Mathematics. She has been using SAS software to drive analytics and insights for over 15 years.
Telba Irony, Ph.D. is Chief, General and Surgical Devices Branch, Division of Biostatistics, Center for Devices and Radiological health, Food and Drug Administration
Francesca Dominici is Professor in the Department of Biostatistics at the Harvard School of Public Health and the Associate Dean of Information Technology. She has extensive experience on the development of statistical methods for the analysis of large observational data and for comparative effectiveness research. She has led several national studies on the health effects of air pollution, heat waves and climate change. She is now working on statistical methods for causal inference to assess the public health consequences of public health interventions.;Dr. Dominici has served on a number of National Academies’ committees. Dr. Dominici received her Ph.D. in statistics at the University of Padua, Italy.
Careers of Statisticians and Biostatisticians in the Government: The Food and Drug Administration
Abstract
Statisticians are extremely respected professionals who play a crucial role in the Department of Health and Human Services and in particular, in the approval process of medical treatments and diagnostics by the Food and Drug Administration. In this presentation the responsibilities of statisticians in the government and the need for communication and collaborative skills in addition to technical statistical skills will be discussed. The statistician in government is a problem solver, who must be interested in science and teaching, and who ideally aspires to leadership positions. The statistician can be a force that spurs innovation, not only in science and medicine, but also in statistical techniques and decision making processes. Potential career ladders of statisticians in the government will be presented. Ample time will be available for questions and comments from the audience.
Bio:
Telba Irony, Ph.D. is Chief, General and Surgical Devices Branch, Division of Biostatistics, Center for Devices and Radiological health, Food and Drug Administration
Modelling the effects of weather and climate on malaria distributions in West Africa
Abstract
Malaria is a leading cause of mortality worldwide. There is currently conflicting data and interpretation on how variability in climate and rainfall affects the incidence of malaria. This study presents a hierarchical Bayesian modeling framework for the analysis of malaria vs. climate factors in West Africa. The framework takes into account spatio-temporal dependencies, and in this paper is applied to annual malaria and climate data from ten West African countries (Benin, Burkina Faso, Cote d'Ivoire, Gambia, Ghana, Liberia, Mali, Senegal, Sierra Leone, and Togo) during the period 1996-2006. Our results show a statistically significant correspondence between malaria rates and the climate variables considered. The two most important climate factors are found to be average annual temperature and total annual precipitation, and they show negative association with malaria incidence. In contrast, malaria incidence is positively associated with both a departure from long-term average temperature and a departure from long-term total precipitation. This modeling framework provides a useful approach for studying the impact of climate variability on the spread of malaria and may help to resolve some conflicting interpretations.
Bio:
Monica Christine Jackson was born and raised in Kansas City, Missouri. She obtained a B.S. and M.S. degree in Mathematics from Clark Atlanta University. She completed a PhD in Applied Mathematics and Scientific Computation from University of Maryland, College Park. After completing her degree at the University of Maryland, she was a post doctoral researcher at Emory University in the Department of Biostatistics under the direction of Lance A. Waller. She has held visiting research positions at the National Cancer Institute and the Statistical and Applied Mathematical Sciences Institute. In 2012, she was the recipient of the Morton Bender Prize for top research by an Associate Professor. Currently, she is an Associate Professor of Mathematics and Statistics at American University in Washington, DC. Her current research interests are in the areas of spatial statistics and disease surveillance with applications to developing and investigating methods for detecting cancer clusters, global clustering patterns, and developing simulation algorithms for spatially correlated data.Statistical Models for Phenophase Identification in Remotely-Sensed MTCI Data
Abstract
Understanding the effects of phenological events, due to both natural and man-made causes, is critical for research in global climate modeling and agriculture, among many others. During the last two decades, remote sensing data on satellite derived biophysical variables (such as chlorophyll content) have become widely available through the launch of satellites such as MERIS (MEdium Resolution Imaging Spectrometer). Weekly MTCI aggregates from 2003 to 2007 were used to model phenological changes in southern India. Three modeling techniques -- a classical time series model with seasonality represented by Fourier terms, a method integrating the decomposition of time series into season, trend, and white noise components with methods for detecting significant changes (BFAST), and a hierarchical model incorporating spatially distributed covariates -- were used to extract the phenological variables of onset of greenness, peak of greenness, and end of senescence using an iterative search. The advantages and shortcomings of the three methods are compared and discussed in terms of phenological variable extraction and efficacy across spatial locations.
Bio:
Maggie Johnson is a graduate student in the Department of Statistics at Iowa State University. She earned her B.S. in Mathematics from the University of Minnesota, and her M.S. in Statistics from Iowa State University. Her current interests are in spatial statistics, Bayesian modeling, computational statistics, and environmental statistics.
Women in Science: Contributions, Inspirations, and Rewards
Abstract
Significant contributions have been made to science, through statistics, by women all over the world. These contributions range from theoretical developments to applications in diverse fields such as agriculture, climatology, engineering, biology, and medicine. What factors led to their successes, despite, in some cases, strong odds against them? In this talk, we mention some key contributions to statistics, science, and applications made by leading women in our field. We describe some of the conditions that inspired them to pursue a career in the first place, and then suggest some possible factors that facilitated their transition from "computing people" to respected professionals. We end by noting the differences in proportions of women at various career levels and then propose some ways in which women themselves can help to stimulate women into the field and to attenuate the differences in female representation at different stages.
Bio:
Karen Kafadar is Rudy Professor of Statistics in the College of Arts and Sciences at Indiana University, Bloomington. She received her B.S. in Mathematics and M.S. in Statistics at Stanford University, and her Ph.D. in Statistics from Princeton University. Prior to joining the Statistics department in 2007, she was Mathematical Statistician at National Institute of Standards and Technology, Member of the Technical Staff at Hewlett Packard's RF/Microwave R&D Department, Fellow in the Division of Cancer Prevention at National Cancer Institute, and Professor and Chancellor's Scholar at University of Colorado-Denver. Her research focuses on robust methods, exploratory data analysis, characterization of uncertainty in the physical, chemical, biological, and engineering sciences, and methodology for the analysis of screening trials, with awards from CDC, American Statistical Association (ASA), and American Society for Quality. She was Editor for Journal of the American Statistical Association's Review Section and for Technometrics, and is currently Biology & Genetics Editor for The Annals for Applied Statistics. She has served on several NAS committees and is a past or present member on the governing boards for ASA, Institute of Mathematical Statistics, International Statistical Institute, and National Institute for Statistical Sciences. She is a Fellow of ASA, AAAS, and International Statistics Institute (ISI), has authored over 100 journal articles and book chapters, and has advised numerous M.S. and Ph.D. students.
Statistical and mathematical self-efficacy of incoming students at a large public university
Abstract
All participants in the ongoing STEM education discussion agree that, in addressing national priorities, a key concern is the critical transition of students from high school (or community college) to a four-year college program in the mathematical sciences in particular. Failure in college-level mathematics and statistics courses may discourage students from pursuing STEM majors or perhaps lead to complete college drop-out. In fact, even a mediocre performance in these courses often restricts student career choices to fields outside of STEM disciplines. This presentation is aimed at summarizing a statistical investigation of student self-efficacy and self-confidence in mathematics and statistics, particularly with regard to gender differences and the impact of mathematical preparation at high school or community college levels. Interestingly, preliminary results indicate much higher levels of self-efficacy/-confidence in statistics compared to mathematics. Additionally, for statistics these results are consistent across gender, which is seemingly not the case for mathematics.
Bio:
Andee Kaplan is a graduate student at Iowa State University in the Department of Statistics. She has a BS and MA in Mathematics from the University of Texas at Austin. Her research interests include exploratory data analysis and interactive statistical graphics, computational statistics, Bayesian inference, mathematical statistics, and reproducible research. She is currently working on an interactive web application to visualize communities in a graph framework.
Leadership - the Untold Story
Abstract
Who trains leaders for unforeseen catastrophes, unexpected life changing events, and unanticipated violations of professional norms? It is in these, frequently extreme, events that leaders must find the right balance between what is good for a single individual versus what is good for the organization; between their own personal value system and respecting that of others,; between being an sympathetic enabler or critical maniac versus creating an environment of collaborative problem solving. Personal experiences will be shared with a special emphasis on the importance of maintaining clear boundaries between the roles of coaching, counseling, and mentoring.
Bio:
Sallie Ann Keller is director and professor of statistics for the Social and Decision Analytics Laboratory within the Virginia Bioinformatics Institute at Virginia Tech University. Formerly she was professor of statistics at University of Waterloo and their vice-president, Academic Provost. Prior to this she was the director of the IDA Science and Technology Policy Institute in Washington DC. Prior to this she was the William and Stephanie Sick dean of engineering and professor of statistics at Rice University. Her other appointments include head of the Statistical Sciences group at Los Alamos National Laboratory, professor and director of graduate studies in the Department of Statistics at Kansas State University, and statistics program director at the National Science Foundation. She has served as a member of the National Academy of Sciences Board on Mathematical Sciences and its Applications, has chaired the Committee on Applied and Theoretical Statistics, and is currently a member of the Committee on National Statistics. Her areas of research are uncertainty quantification, computational and graphical statistics and related software and modeling techniques, and data access and confidentiality. She is a national associate of the National Academy of Sciences, fellow of the American Association for the Advancement of Science, elected member of the International Statistics Institute, and member of the JASON advisory group. She is also a fellow and past president of the American Statistical Association. She holds a Ph.D. in statistics from the Iowa State University of Science and Technology.
Power Calculation for Comparing Diagnostic Accuracies in a Multi-Reader, Multi-Test Design
Abstract
Receiver operating characteristic (ROC) analysis is widely used to evaluate the performance of diagnostic tests with continuous or ordinal responses. A popular study design for assessing the accuracy of diagnostic tests involves multiple readers interpreting multiple diagnostic test results, called the multi-reader, multi-test design. Although several different approaches to analyzing data from this design exist, few methods have discussed the sample size and power issues. In this article, we develop a power formula to compare the correlated areas under the ROC curves (AUC) in a multi-reader, multi-test design. We present a nonparametric approach to estimate and compare the correlated AUCs by extending DeLong et al.’s (1988) approach. A power formula is derived based on the asymptotic distribution of the nonparametric AUCs. Simulation studies are conducted to demonstrate the performance of the proposed power formula and an example is provided to illustrate the proposed procedure.
Bio:
Eunhee Kim is an Assistant Professor of Biostatistics at Brown University. Her methodological research interests include semiparametric and nonparametric methods for evaluating biomarkers and medical diagnostic tests, classification and prediction methods, and longitudinal data analysis. Her current collaborative research interests lie in cancer, maternal and child health, and women's health. She is a lead statistician of the American College of Radiology Imaging Network (ACRIN), conducting clinical research to evaluate diagnostic imaging and image-guided therapy for cancer.
Web-based Sample Size Calculator for Non-Inferiority Study
Abstract
A sample size and power calculation is essential for planning any clinical trial, and depends upon the study design, primary outcome, and detectable difference. Investigators may determine these values for traditional superiority trials by using statistical software or consulting a statistician. For simple designs, free online sample size calculators are cheap and viable alternatives. Clinical trials designed to show non-inferiority are growing in acceptance and popularity. Sample sizes for non-inferiority trials are selected to achieve desired power to reject the null (inferiority) when examining the lower bound of the confidence interval for treatment differences to a predetermined non-inferiority margin. However, traditional sample size and power calculations do not apply and few freely accessible tools exist. In particular, no free online applications for a non-inferiority study with censored outcome have been developed so far. We present the Non-Inferiority Sample Size Calculator (NISSC), a free web-based sample size calculator for non-inferiority studies with continuous, binary, and survival outcomes. It is an open-source Web application available on GitHub (https://agrueneberg.github.io/NISSC/) and does not require downloads or installations. In order to validate sample sizes calculated by the NISSC, published sample size calculations were compare to the results using the NISSC, and results indicated that the computations were accurate. The NISSC has great potential and usability for those who designing non-inferiority studies.
Bio:
Hwasoon Kim is a Ph.D. candidate in the Department of Biostatistics at the University of Alabama at Birmingham (UAB). She received her B.S. in statistics from Ewha Womans University, Seoul, Korea (2001) and her M.S. in mathematics with concentration of statistics from University of Nevada, Reno (2010). Her dissertation research focuses on the sample size re-estimation in non-inferiority trials with censored outcome. Her current research interests include clinical trials with adaptive design, survival analysis, non-inferiority trials, net reclassification improvement, modeling loss to follow-up in clinical trials. She collaborates with the National Spinal Cord Injury Statistical Center (NSCISC) and the Division of Gastroenterology and Hepatology at UAB. In her free time, she likes practicing yoga, kayaking, fencing, and traveling around the world.
Enpowering Women with CWIS
Abstract
This poster provides information about the Caucus of Women in Statistics. I will present our objectives, as well as ways to get involved. There will be some information about the leaders of this organization. We would like to recruite you, reach out to women in statistics, remind everyone that there are still gender issues that need to be addressed, and find ways to address them. We value the many great accomplishments of women in statistics and celebrate them. We have come a long way, but we still have a long way to go. A recurring theme is that we need more women in leadership and other visible positions - we hope that the CWIS can help change the tide by being more visible and outspoken, reminding people of the gender differences that persist. Our hope is to make a difference for women, ourselves and those who come after us, in their professional life.
Bio:
Dr. Jessica Kohlschmidt grew up in Texas and received her BS and MS from Sam Houston State University. She then went on to earn her Ph.D in Statistics from The Ohio State University with a focus on sampling and missing data. She has enjoyed teaching at the college level and is currently a statistician for an acute myeloid leukemia research group. Her group focuses on markers and other genetic changes that influence patient's response to treatment. Dr. Kohlschmidt's current focus is survival analysis, logistic regression and techniques to use "big data".
A method to identify regional particulate matter sources and their health effects
Abstract
Determining whether different sources of particulate matter (PM) air pollution vary in toxicity is critical for the study of PM pollution. Sources of PM are not directly measured and frequently must be inferred from PM chemical constituent concentrations observed at ambient monitors. To estimate regional associations between PM sources and adverse health outcomes, it is necessary to pool estimated health effects across monitors. Pooling estimated health effects of PM sources is challenging because PM sources are frequently estimated separately for each ambient monitor and the sources that generate PM vary between communities. Currently, ad hoc approaches are applied to pool estimated health effects of PM sources across monitors, but these methods become infeasible for large, regional studies. We developed a novel approach for identifying major PM sources shared across multiple monitors that guides pooling source information, such as estimated health effects, across monitors. First, our method estimates the chemical composition of PM sources at individual monitors using a principal component analysis (PCA) approach. Then, the method extracts major PM sources using a second-level PCA applied to the chemical composition of PM sources from all monitors. The resulting database of major PM sources is used to guide pooling source information across multiple monitors. Using data from 2000-2005 for 24 communities in the northeastern US, we applied our method to estimate the first regional associations between short-term exposure to major PM sources and mortality.
Bio:
Jenna Krall is a PhD candidate in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health
Modeling Ocean Temperature as a Function of Depth with Spatial Processes
Abstract
Interest in the ocean has grown over time, and in the last three centuries ocean temperature data was gathered at particular locations and depths around the globe, and is currently available via the World Ocean Database (WOD). We develop a hierarchical Bayesian model to describe the relationship between ocean temperature and its depth. The proposed model is useful for interpolating an entire profile of temperature at a new location and identifying the number of mixed layers in the part of the ocean. Spatial random effects are modeled through a generalization of "kernel convolution". The model is fitted to the data gathered at 31 sites in the North Atlantic Ocean region with latitude between 41.5 and 47 and longitude between -27.5 and -22 in July 2012.
Bio:
Ksenia Kyzyurova is a second year Ph.D. student in the Department of Statistical Science at Duke University, where she is currently working on a project on spatial dependence between ocean temperature and depth with Professor Alan E. Gelfand. Her research focuses on modeling for environmental and ecological data, hierarchical modeling of spatio-temporal data and Bayesian statistics. Before coming to Duke, she received a BS in 2009 and a MS in 2011, both in applied mathematics and computer science, from St-Petersburg State University of Information Technologies, Mechanics and Optics in Russia.
Outlier Detection for High Dimension, Low Sample Size Data
Abstract
Despite the popularity of high dimension, low sample size data analysis, little attention has
been paid to outlier detection. A main challenge is that there are not enough observations to
measure the remoteness of a potential outlier. We consider three distance measures and study
their high-dimensional properties regarding their abilities to identify multiple outliers when the
dimension is much larger than the sample size. Using these distances, we propose an effective
outlier detection algorithm that utilizes parametric bootstrap to obtain null distance values
under the assumption of no outlier. A graphical diagnostic method that compares the observed
one-vs-rest distances with null distances is also proposed. Both simulated and real data are used
to demonstrate the performance of the proposed method in various population settings.
Bio:
Myung Hee Lee is an Assistant Professor of Statistics at Colorado State University. She earned her PhD in Statistics from the University of North Carolina, Chapel Hill in 2007. Her research interests include statistical inference for High Dimensional Low Sample Size data, Statistical Learning, and Network Inference from High Dimensional Data.
Bayesian Estimation of Repeated Measures Models Using MCMC Methods
Abstract
We analyze the annual audit fees for 1947 companies from 2002 through 2009 as functions of company size and degree of industry regulation. Bayesian models with different variance-covariance structures across the 8 years of repeated measures are used and model fits are compared and contrasted with traditional parametric mixed models fitted using likelihood methods. Markov Chain Monte Carlo (MCMC) methods are used for the simulations as they provide estimates of the posterior distributions for model parameters which are not dependent on parametric assumptions such as normality or constant variance and are generally robust to choices of priors, most especially non-informative priors. In cases where an analytic representation of the posterior distribution is not tractable, an additional Metropolis-Hastings (MH) step is incorporated at the expense of greater computational complexity. Within the fitted model structures, we compare mean responses between large companies and small companies and regulated and non-regulated industries and analyze the trend in mean audit fee ratios across the eight years of observed data. Of greater importance in these analyses is any trend(s), before and after regulatory requirements changed at year 2004, in variability in audit-fee ratios and the degree of dependence between adjacent years, as these are measures of changing economic volatility in company audit expenses.
Bio:
Yuanzhi Li is a Ph.D student in Statistics in the Department of Mathematics and Statistics in Utah State University. She received her B.S in statistics from Nanjing University of Posts and Telecommunications, China and M.S from Michigan State University. She joined the Ph.D program at Utah State University working with Dr. Daniel Coster. Her research interests include Bayesian models, repeated measures, variance-covariance structures and Markov Chain Monte Carlo. She currently works on Bayesian models with different variance-covariance structures applying to audit fee data and model fitting and estimation of behavior over time will be extended to a second dataset from the educational field.
An Evaluation of Successive Difference Replication Variance Estimation for Systematic Sampling
Abstract
Systematic sampling is often used in surveys due to its implementation simplicity and efficiency. Variance estimation for systematic samples remains problematic, however, since no direct design-based estimators are available. Fay and Train (1995) introduced the Successive Difference Replication (SDR) variance estimator, in the context of variance estimation for the Current Population Survey (CPS) conducted by the US Census Bureau. In this presentation, we report on the results of an evaluation of the statistical properties of SDR variance estimator. We compare this estimator with several alternatives often considered for systematic sampling, including those based on two per-stratum approximations and simple random sampling.
Bio:
Yao Li is a graduate student in the Statistics Department at Colorado State University. She is originally from China. She transferred to Colorado State at her junior year and got her Bachelor of Science degree in Mathematics there. She is interested in survey statistics, including variance estimation methods, environmental Statistics and high-dimensional data analysis.
System wide analyses have underestimated protein abundances and the importance of transcription in mammals
Abstract
Large scale surveys in mammalian tissue culture cells suggest that the protein expressed at the median abundance is present at 8,000–16,000 molecules per cell and that differences in mRNA expression between genes explain only 10–40% of the differences in protein levels. We find, however, that these surveys have significantly underestimated protein abundances and the relative importance of transcription. Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausseret al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher. We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression. We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R2 = 0.13). Based on this, our second strategy suggests that mRNA levels explain ∼81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the absence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the ∼40% of genes that are non-expressed.
Bayesian Estimation of Repeated Measures Model Using MCMC
Abstract
We analyze the annual proportion of audit fees to assets for 1947 companies from 2002 through 2009 as functions of company size and degree of industry regulation. Bayesian models with different variance-covariance structures across the 8 years of repeated measures are used and model fits are compared and contrasted with traditional parametric mixed models fitted using likelihood methods. Markov Chain Monte Carlo (MCMC) methods are used for the simulations as they provide estimates of the posterior distributions for model parameters which are not dependent on parametric assumptions such as normality or constant variance and are generally robust to choices of priors, most especially non-informative priors. In cases where an analytic representation of the posterior distribution is not tractable, an additional Metropolis-Hastings (MH) step is incorporated at the expense of greater computational complexity. Within the fitted model structures, we compare mean responses between large companies and small companies and regulated and non-regulated industries and analyze the trend in mean audit fee ratios across the eight years of observed data. Of greater importance in these analyses is any trend(s), before and after regulatory requirements changed at year 2004, in variability in audit-fee ratios and the degree of dependence between adjacent years, as these are measures of changing economic volatility in company audit expenses.
Bio:
Yuanzhi Li is a Ph.D student in Statistics in the Department of Mathematics and Statistics in Utah State University. She received her B.S in statistics from Nanjing University of Posts and Telecommunications, China and M.S from Michigan State University. She joined the Ph.D program at Utah State University working with Dr. Daniel Coster. Her research interests include Bayesian models, repeated measures, variance-covariance structures and Markov Chain Monte Carlo. She currently works on Bayesian models with different variance-covariance structures applying to audit fee data and model fitting and estimation of behavior over time will be extended to a second dataset from the educational field.
Detecting Differential Expression with Shrinkage Variance Estimate Using Public Databases of Gene Expression Data
Abstract
Identifying differential expressed (DE) genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene specific variance is needed in many methods for DE detection, including linear models (eg. for transformed and normalized microarray data) or generalized linear models (eg. for count data in RNAseq). Due to a common limit in sample size, the variance estimate can be unstable. Shrinkage estimates using empirical Bayes methods have proven useful in improving the detection of DE. The most widely used empirical Bayes methods shrink gene specific variance estimates by borrowing information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information on historical data on the same gene can provide better estimate of gene specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene specific and reproducible between different experiments. We present a new method to establish gene specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect differential expression. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods in resampled real expressed data.
Bio:
Nan Li is a Postdoctoral Research Associate in Department of Biostatistics at Brown University. She received her doctoral degree from Department of Statistics, North Carolina State University in 2011.
Identifying Mixture Components in Pause Events during Essay Writing
Abstract
Automated scoring systems offer a substantial reduction of cost and increase in the speed of scoring compared to using human essays raters. Many automated measures of writing focus on the writing product. Few of them focus on the writing process. Pause times provide a possibility for measuring writing. Modeling the pause times (captured by keystroke logging software) during writing could help teachers identify students with weak writing skills. In a previous analysis, researchers at Educational Testing Service (ETS) used pilot data (n = 25 students) to develop an algorithm to classify raw information into pause events according to the linguistic contexts. They found that data addressing the distribution of pause events was a mixture of lognormal distributions and hypothesized that the mixture components represented occasions on which students were attending to different cognitive processes associated with writing. The present analysis examines a larger sample of event histories (n=1,054 students) from two different analysis, and attempts to find the number of mixture components that can be identified from the data associated with different linguistic contexts. We adopt the Bayesian hierarchical approach because it automatically includes both inferential uncertainty and population variability; we set up weakly-informative inverse gamma prior for the variance parameters. Using the hierarchical model, 3 distinct mixture components were identifiable for within-word linguistic context, but only 2 mixture components were identifiable for between-sentence linguistic context.
Quantile Association Regression Models
Abstract
It is often important to study the association between two continuous variables. We propose a novel regression framework for assessing conditional associations on quantiles. General methodology is developed which permits covariate effects on both the marginal quantile models for the two variables and their quantile associations. The proposed quantile copula models have straightforward interpretation, facilitating a comprehensive view of association structure which is much richer than that based on standard product moment and rank correlations. The resulting estimators are shown to be uniformly consistent and weakly convergent as a process of the quantile index. Simple variance estimators are presented which perform well in numerical studies. Extensive simulations and a real data example demonstrate the practical utility of the proposed methodology.
Bio:
I am an assistant professor at the Department of Biostatistics, University of Pittsburgh. My research revolves quantile regression and survival analysis. In my dissertation study, I studied quantile regression methods for survival data that involve additional complications, such as left truncation and dependent censoring. When compared to standard regression methods, the quantile regression methods provide more comprehensive insights into disease progressions. I have also worked on association studies, the assessment of agreement, prediction accuracy, and clustered data. My collaboration research focuses on liver disease research and clinical trials.
Obstacles as Skill Building Opportunities
Abstract
Career building is about overcoming obstacles. We often face unknowns, rejections, failures, risky situations, and difficult decisions. We deal with paper rejections, unfunded grants, unplanned research detours, over scheduled calendars, conflicting work and family demands, and of course the biggest obstacle, our own attitudes. " If you look at these obstacles as a containing fence, they become your excuse for failure. If you look at them as a hurdle, each one strengthens you for the next” (Ben Carson). Indeed, if we treat obstacles are a stepping stone for career success, this opens doors to a new world of opportunities. As Charles Beard wrote “When it is dark enough, you can see the stars.” We will discuss these career struggles and hurdles, share stories and perspectives, and brainstorm ideas and strategies to convert these obstacles into stimuli for career advancement.
Bio:
Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the Harvard School of Public Health (HSPH). She received her PhD degree from the Department of Biostatistics of the University of Washington in 1994. She was on the faculty of the Department of Biostatistics at the University of Michigan between 1994 and 2005 before she joined the HSPH in 2005. Her research interests lie in development and application of statistical and computational methods for analysis of high-throughput genetic, genomic and 'omics data in epidemiological, environmental and clinical sciences. Her method research is supported by the MERIT award and a P01 grant from the National Cancer Institute. She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology. Dr. Lin received the 2002 Mortimer Spiegelman Award from the American Public Health Association, and the 2006 Presidents' Award from the Committee of the Presidents of Statistical Societies (COPSS). She is an elected fellow of the American Statistical Association, Institute of Mathematical Statistics, and International Statistical Institute. She was the Chair of the COPSS (2010-2012). She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science. She has served on numerous editorial boards of statistical and genetic journals. She was the former Coordinating Editor of Biometrics, and currently the co-editor of Statistics in Biosciences, and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics. She has served on a large number of study sections of NIH and NSF.
Adjustments to r: insides from Taylor expansion
Abstract
The likelihood ratio test and the signed likelihood root (r) are two of the most commonly used statistics for inference in parametric models, with the r often viewed as the most reliable. A third-order adjusted signed likelihood root is available from likelihood theory, but the formulas and development methods are not always easily implemented and understood. Using a log-model Taylor expansion, we develop simple second order additive adjustments to the r. For a scalar linear interest parameter, the adjustment to r are easy to calculate and easy to explain. In the curve interest parameter case, adjustment is available but need more calculations. The theory is developed and simulations are conducted to indicate the repetition accuracy.
Bio:
I am a senior Ph.D. candidate in the department of statistical science at University of Toronto with research interest primarily in likelihood-based inference. My work focuses on developing more accurate likelihood-based inference methods. Previously I completed a BA in computer science in China (2000) and a MA in statistics at university of Toronto(2009). In my free time I enjoy reading, hiking and photography.
Influencing: Even without Statistics!
Abstract
Influencing, is how we move the world, ours and that of others. The ability to influence people is a necessity no matter where you work or the focus of your job. Getting better at it brings extraordinary rewards. In this session, we will review resources we all possess by examining two dimensions that we can all tap into as we explore our own ability to influence and lead. The first dimension is what I’ll refer to as “naked leadership”, the unique set of gifts that we each bring to the table which allow us to lead from where we are. In truth, we each possess a wealth of gifts to draw from and influence, even without Statistics! Second, we’ll unpack the incredible power that we bring to the table as statisticians. And lastly we’ll explore common barriers that keep us from having the maximal impact derived from our unique talent and explore ways to break these down and allow us to imagine the possibilities of these 3 approaches coming together.
Bio:
Stacy Lindborg, Ph.D. is Sr. Director, Biostatistics, Biogen Idec where she has been from 2012. Prior to this she was at Eli Lilly & Company for 16 years spanning a number of roles including Senior Director, Global Statistics and Advanced Analytics spanning Global Patient Safety & molecules from Phase I-IV across therapeutic area’s: Auto Immune, Cardiovascular, Critical Care, Bone/Muscle/Joint, Urogenital Health, Kidney Disease; Director Research & Development Strategy and Decision Sciences; Global Head, Exploratory and Program Medical Statistics; Quantitative Pharmacology, Board Member; Global Statistics Leader, Biomarker Research; Global Zyprexa Product Team Schizophrenia Leader. She received her Ph.D. in Statistics from Baylor University.
Dr. Lindborg’s research interests include applications of novel design and analysis methods to solve real world problems in the pharmaceutical industry. Specific areas of interest include: Bayesian methodology, Adaptive Designs, Missing Data problems, Portfolio Productivity. She has 35+ collaborative and methodological manuscripts published in:American Journal of Emergency Medicine, American Journal of Psychiatry, Archives of General Psychiatry, Biological Psychiatry, Biopharmaceutical Report, Canadian Journal of Psychiatry, Clinical Therapeutics, Clinical Trials, Drug Information Journal, Encyclopedia of Biopharmaceutical Statistics, Journal of Biopharmaceutical Statistics, Journal of Clinical Psychopharmacology, Journal of Psychiatric Research, Mathematical & Computer Modelling, Nature Reviews Drug Discovery, Neuropsychopharmacology, Pharmaceutical Statistics, Psychiatric Research, Psychopharmacology, Schizophrenia Research, Statistics in Medicine, Vertex.
Dr. Lindborg has been active in professional organizations across her career, currently serving as a Vice-chair, Council of Sections governing board of the American Statistical Association (ASA); ASA Accreditation Committee Member; DIA Bayesian Working group, Co-chair Missing Data Work Stream; and Board Member to the Strategic Initiative: Conference on Women in Statistics. She has received a number of honors across her career including a recent Development Sciences Innovation Award. She was elected American Statistical Association Fellow in 2008.
Landmark Proportional Subdistribution Hazards Models for Dynamic Prediction of Cumulative Incidence Probabilities
Abstract
A risk predictive model that can dynamically predict an individual’'s cause-specific cumulative incidence probabilities is crucial to risk stratification, drug effect evaluation and treatment assignment. For data containing no competing risks, landmark Cox models have served for this purpose. In this study, we extended the landmark method to the Fine-Gray proportional subdistribution hazards (PSH) model for data with competing risks. The proposed landmark PSH model is robust against the violation of the PSH assumption and can directly estimate the conditional cumulative incidence probabilities at a fixed landmark point. We further developed a more comprehensive landmark PSH supermodel which enables the user to complete a dynamic prediction involving a number of landmark points in one step. Through simulations we evaluated the prediction performance of the proposed landmark PSH models by estimating the time-dependent Brier scores. The proposed models were applied to a breast cancer trial to predict the dynamic cumulative incidence probabilities of developing locoregional recurrence.
Bio:
I got my bachelor degree in biological sciences at the Shanghai Jiao Tong University (China) in 2007. Then I joined the department of biostatistics as a PhD student at the University of Pittsburgh since 2009. My research interests focus on dynamic prediction in clinical research, experimental designs, survival and competing risks analysis.
Estimating Center Effects on Recurrent Events
Abstract
In this work, we develop methods for quantifying center effects with respect to recurrent event data. In the models of interest, center effects are assumed to act multiplicatively on the recurrent event rate function. When the number of centers is large, traditional estimation methods that treat centers as categorical variables have many parameters and are sometimes not feasible to implement, especially with large numbers of distinct recurrent event times. We propose a new estimation method for center effects which avoids including indicator variables for centers. We then show that center effects can be consistently estimated by the center-specific ratio of observed to expected cumulative numbers of events. We also consider the case where the recurrent event sequence can be stopped permanently by a terminating event. Large sample results are developed for the proposed estimators. We assess the finite-sample properties of the proposed estimators through simulation studies. The method is then applied to national hospital admissions data for end stage renal disease patients.
Bio:
Dr. Dandan Liu is an Assistant Professor in the Department of Biostatistics at Vanderbilt University. She completed her PhD in Biostatistics from the University of Michigan in 2010 and completed a 1-year postdoctoral fellowship at Fred Hutchinson Cancer Research Center. Her research interests are focused on event history data analysis, risk prediction and biomarker evaluation. Dr. Liu also serves as the Faculty Biostatistician for the Vanderbilt Memory & Alzheimer’s Center.
Legislative voting preference modeling accounting for bias
Abstract
Roll call data are used to make inference about a legislator’s most preferred policy, referred to as his or her ideal point. We develop a Bayesian hierarchical spatial model to account for censoring bias from the Hastert rule in the United States Congress, which restricts voting to bills supported by a majority of the majority party in an attempt to limit the power of the minority. Our model can be summarized as a latent factor model that allows for the position of each legislator in policy space to vary according with whether voting on a specific bill has been censored through the Hastert rule. Zero-inflated prior allows us to assess the effect of the Hastert rule of the voting patterns of individual legislators. The model is illustrated using data from the 110th House of Representatives.
A robust coefficient of determination for heritability estimation in genetic association studies
Abstract
Heritability is key in plant studies to help achieve better yield and other agronomic traits of interest. In candidate gene studies regression models are used to test for associations between phenotype and candidate SNPs. SNP imputation guarantees that marker information is complete and the data are balanced. So both the coefficient of determination, R2, and broad-sense heritability are equivalent. However, when the normality assumption is violated, the classical R2 may be seriously affected. Recently two R2 alternatives with good properties were proposed for the linear mixed model. We evaluate their performance under contamination and step forward a robust version of these coefficients assessing their adequacy for heritability estimation via simulation. An example of application is also presented.
Bio:
Vanda M. Lourenço is an assistant professor in the Mathematics Department of the FCT - NOVA University of Lisbon. She earned her PhD in Statistics and Stochastic Processes, awarded by IST - Technical University of Lisbon, in December 2011. Her main research interest is satistical genetics with focus in genetic association studies of quantitative traits. She currently is PI of a project funded by Portuguese National Funds to study and develop robust methods for the above studies with application to plant genetic data.
Bayesian Methods for Inference of Sensitive Questions
Abstract
Statisticians working in behavioral research have long been interested in estimating the occurences of sensitive behaviors in a population. In this poster session, we consider a questionnaire that asks three binary sensitive questions of each individual participating in a survey. We regard the three responses as dependent Bernoulli random variables governed by a Markov dependency model. We employ a Markov Chain Monte Carlo approach to obtain posterior probabilities for the sensitive questions and consider problems like induced priors and Bayesian sample size determination. The construction of a Bayesian logistic regression framework is outlined.
Gerrymandering and Political Gridlock in The US
Abstract
In the recent decades gerrymandering became one of the effective tactics to guarantee the outcomes of general elections. Since elected officials know that they are safe in the next election cycle, they only are to appeal to one partisan interest group. This discourages congressmen from working across party lines and leads to congressional gridlock.
In this paper we discuss our approach to measuring gerrymandering of a congressional district. Starting with the ratio of the shape area to its perimeter as compactness we analyzed which congressional districts are gerrymandered. Furthermore, we deployed Partisan Voting Index data to identify Swing districts and districts with one-party leaning. Comparing mean compactness of all three groups we concluded that the Swing districts are less gerrymandered compared to party-aligned ones. We discuss the challenges, results and future work in analyzing congressional data to understand the impact of gerrymandering.
Bio:
Blending both industrial and academic research, Tatsiana is expert at solving hard business problems. She brings a background in both mathematics and statistics, and has deep experience researching and implementing models for predicting user behavior. She is currently enrolled in a graduate program at UC Santa Cruz focusing on Applied Mathematics and Statistics. Prior to this Tatsiana earned Masters of Arts in Mathematics degree from San Francisco State University.
Tools for Teaching Data Science
Abstract
There is a need for new computational tools to address modern data problems, such as teaching students true data analysis, making transparent data journalism, and providing user rewards to participants in citizen science projects. Novices (people with little to no knowledge of statistics or computer science) should be able to start using a simple platform that will help scaffold them into more sophisticated analysis and computation. Such a system should encourage computational thinking, exploratory data analysis, flexible data visualization, and reproducible and publishable research. This framework was inspired by the struggle to find an appropriate computational tool for high school students to use for data analysis that we have experienced in the NSF-funded project, Mobilize. Over the years of the grant, we have iterated through a series of tools (standard R, Deducer, R within RStudio, and several artisanal dashboards) and our experience has given us some insight into what makes a tool succeed or flop in this sort of situation. Inspired by the need for a new tool, we have begun creating a new platform. Instead of asking students to learn on a toy system and make the big leap to a ``real'' programming language, we are attempting to scaffold the transition visually; from pieces that illustrate the processes that are taking place to the underlying code that makes it happen.
Bio:
Amelia McNamara is a 4th year Ph.D. candidate in statistics at UCLA. She is a graduate student researcher on the NSF-funded project Mobilize, which seeks to bring computational thinking and data analysis to Los Angeles Unified High School classes. Additionally, she works with the Viewpoints Research Institute to conceptualize and develop new tools for enhanced communication, specifically of statistical ideas. She completed her undergraduate degree at Macalester College in 2010.
How to Make Social Media Work for You
Abstract
Don’t rely on your CV and list of papers presented and published to speak for you. You can easily optimize your LinkedIn profile and boost your professional reputation via that social network. In this session, you’ll get LinkedIn tips, as well as do’s and don’ts for other social media, such as Twitter, Facebook and blogs. Whether you are looking for a job, working toward a promotion, hoping to connect with colleagues or wanting to share your research, you’ll walk away with ideas to make the most of social media.
Bio:
Arati Mejdal is the Global Social Media Manager for the JMP division of SAS. She helps to raise awareness of JMP statistical discovery software, to answer users’ questions and to help customers improve their JMP skills. She manages the JMP Blog and all official social media presences for JMP. She works with colleagues to help them make the most of their social media use. Arati has a PhD from the University of North Carolina at Chapel Hill. She used to be a journalist and professor of mass communication. These days, she spends a lot of time on her mobile devices.
SIMULTAENOUS ANALYSIS OF HYPERSPECTRAL DATA USING THE FUSED LASSO
Abstract
The use of hyperspectral sensors in remote sensing of gas plumes has proven to be important for a wide variety of military and environmental applications. For example, hyperspectral imagery can be used to detect chemical warfare agents, and to monitor gas plumes in the atmosphere. A hyperspectral sensor captures a hyperspectral image over the electromagnetic spectrum. Hence, a hyperspectral image is a cube of data. These images typically consist of thousands of pixels. Currently, algorithms for detecting, quantifying, and identifying the constituents of chemical plumes in a hyperspectral cube involve analyzing such images one pixel at a time. However, analyzing pixels individually is inefficient because it ignores the spatial relationships among neighboring pixels. As a novel approach, we consider a variant of the least absolute shrinkage and selection operator (LASSO), called the Fused LASSO, which analyses pixels simultaneously, and allows us to borrow strength or information from nearby pixels. As an illustration, we apply the Fused Lasso to a hyperspectral image which contains plume-present pixels, as well as pixels with no plume, and by using confusion matrices and the Frobenius norm, we show that borrowing information across nearby pixels does substantially better than methods that analyze pixels individually, such as the BIC, and the LASSO.
Bio:
Nicole is a PhD student in the Applied Mathematics and Statistics Department at the University of California, Santa Cruz.
Distance-Weighted Models for Methylation Pattern Inheritance
Abstract
Cytosine methylation at CpG dinucleotides is a semistable epigenetic marker critical to the normal development of vertebrates. Abnormal levels of methylation are associated with a host of human diseases and disorders, and many diagnostic tools have been developed based on analysis of methylation in tissue samples. Methylation is governed by a complex set of dynamic processes and has been observed to exhibit cyclical gains and losses, leading to the development of stochastic models of its inheritance. Many such models have assumed independence between sites and have largely focused on the proportion of methylation present in a sample, ignoring the diversity that exists in individual patterns. When analyzed at a single-base resolution, methylation patterns exhibit strong evidence of spatial dependence, and a recently proposed neighboring sites model which incorporates dependence between pairs of adjacent CpG sites has offered significant improvements over independent models. CpG sites are non-uniformly distributed throughout the genome, and the number of bases separating ``adjacent'' sites can vary greatly. In this poster, we develop and test an extension of this neighboring sites model which places a distance-dependent weight on the association between each pair of neighboring sites. Models are compared with regard to their ability to produce simulations that are statistically similar to biological data. We find that the distance-weighted model offer substantive improvements over distance-blind approaches to modeling the dependence structure, particularly in cases where firm boundaries between methylated and unmethylated regions exist in the data.
Leadership Styles
Abstract
This panel will explore different styles of successful leadership. Women inherently develop as people leaders or managers. They traditionally have primary responsibility for managing the activities and growth of the family. They are also often expected to be the caretaker of their parents as they age. Women or girls are often raised to have empathetic skills so that they can be good mothers and/or caretakers. The panel will discuss the ways the traditional female role helps make women successful leaders. Other leadership styles contrary to the traditional female role will also be discussed.
Bio:
Laura J Meyerson, PhD is Vice President, Biometrics, Data Management and Medical Writing (BDMW) at Biogen Idec, Cambridge Ctr, MA.
Hairong Crigler has more than over 15 years of experience leading marketing analytic organizations as well as delivering strategic and innovative analytical services across multiple industries.Currently she is the leader of Experian Marketing Analytics group which provides marketers with actionable customer insights, online and off-line targeting and measurement tools using advanced analytical methods. Her deep analytical knowledge, extended industry experience and strong leadership enable her to lead her team to solve Experian’s clients business problems with insightful and actionable analytical solutions and deliver satisfactory financial results. She is a highly recognized marketing analytics leader and expert. She is also winner of two Experian Innovation Idea of the Year awards. Hairong holds a Ph. D. degree in Statistics from the University of Illinois at Chicago. She taught database marketing at New York University as guest professor and was a faculty member at the graduate school of continuous studies at Northwestern University from 2011 to 2012.
Savvy Politics for the Non-Politician
Abstract
As professionals we are instantly members of multiple communities. To be effective community members we must understand the political landscape of these communities. I for one dislike politics, but quickly understood the need to develop skills that allowed my voice to be heard and my views to be respected. If you understand the political landscape of a group, you have a stronger opportunity for a positive impact. I will speak to these issues and some of the strategies I have successfully used throughout my professional life.
Is Online Learning Gender Specific?
Abstract
The relatively new upsurge of online education promises to reshape how educators deliver knowledge. As promising as the new model is, it is still in the budding stages. Online education poses challenges that were not present in face to face delivery of education. This paper compares learning outcomes of undergraduate students at the University of New England, a top twenty medical school in the nation. The paper explores strengths and weaknesses of the two modes of delivery, differences of learning outcomes by gender, final grades, midterm and final exam grades, year in college, major and measures the transition over time. This study compares the gender differences on learning outcomes, with a special focus on which model works best for females.
Bio:
Amita Mittal is a professor in the Department of Mathematical Sciences at the University of New England. She has extensive experience in teaching in class and online statistics courses to undergraduate students. She works on assessment and reporting of the teaching results every year within the university.
Is online learning gender specific?
Abstract
The relatively new upsurge of online education promises to reshape how educators deliver knowledge. As promising as the new model is, it is still in the budding stages. Online education poses challenges that were not present in face to face delivery of education. This paper compares learning outcomes of undergraduate students at the University of New England, a top twenty medical school in the nation. The paper explores strengths and weaknesses of the two modes of delivery, differences of learning outcomes by gender, final grades, midterm and final exam grades, year in college, major and measures the transition over time. This study compares the gender differences on learning outcomes, with a special focus on which model works best for females.
Bio:
Amita Mittal is a professor in the Department of Mathematical Sciences at the University of New England. She has extensive experience in teaching in class and online statistics courses to undergraduate students. She works on assessment and reporting of the teaching results every year within the university.
Best policy implementation when there is parameter uncertainty; a Bayesian and adaptive control approach
Abstract
We focus on improving the current methodology for estimating transmission parameters by applying a Bayesian statistical framework with a probabilistic model of disease transmission and generalizing this formulation to any disease. This method takes into account the intrinsic stochasticity of disease transmission and provides more robust parameter estimates.We then use adaptive control techniques with the updated parameters in order to obtain the best policies to minimize the cost infection. Increasing estimation accuracy through the adoption of the Bayesian updating framework will equip policymakers with better tools for mitigating the effects of an epidemic
Bio:
Romarie Morales is a graduate student seeking a PhD in Applied Mathematics for the Life and Social Sciences at Arizona State University. She has a great passion for solving problems in the public Health field. Romarie's goal is to make a contribution to social wellness by helping mitigate the effect of diseases. Romarie has also taken active roles in the academic community (e.g. Society of Industrial and Applied Mathematics @ ASU chapter (President), Latino Graduate Student Association LGSA (Mentorship Co-chair), Association of Anthropology Graduate Students AAGS (Former Applied Math Representative) that allowed her to serve as a liaison between the Academic community and the students that share my interest.) . She has also received several fellowships and awards (e.g Graduate Assistance in Areas of National Need Fellowship (GANN) , SIAM Student Chapter Certificate of Recognition ,Alfred P. Sloan Foundation fellowship ,Bridge to the Doctorate Fellowship, Infinitive Opportunities conference 1st place poster presentation award, 1st place Poster Presentation Award at Emerging Researchers National Conference (ERN)). Also, Romarie currently serves as Teacher Assistant (Instructor of record) for Elementary Statistics and College Algebra.
Creating Career Flexibility
Abstract
The field of statistics provides a variety of career options in the academic, government and industry sectors. All statisticians should be aware of career opportunities across the profession. Mobility across sectors may be beneficial to one’s career but will entail risks, both professional and personal.
Common skills valued by all sectors Different expertise emphasized by each sector Potential benefits and risks to changing careers, both professional and personal How to remain marketable across sectors so career change is possible How to be successful at changing careersHow to learn more about career flexibilityBio:
Sally C. Morton is Professor and Chair of the Department of Biostatistics in the Graduate School of Public Health, and directs the Comparative Effectiveness Research Core at the University of Pittsburgh. She holds secondary appointments in the Clinical and Translational Science Institute, and the Department of Statistics. Previously, she was Vice President for Statistics and Epidemiology at RTI International. She spent the first part of her career at the RAND Corporation where she was Head of the Statistics Group, and held the RAND Endowed Chair in Statistics.Her research interests include the use of statistics in evidence-based medicine, particularly meta-analysis. She serves as an evidence synthesis expert for the Agency for Healthcare Research and Quality(AHRQ) RTI–University of North Carolina (UNC) Evidence-Based Practice Center (EPC), collaborates with other EPCs, and was Co-Director of the Southern California EPC. She was a member of the Institute of Medicine (IOM) committee on comparative effectiveness research prioritization, and vice chair of the IOM committee on standards for systematic reviews. Dr. Morton is a member of the National Academy of Sciences Committee on National Statistics (CNSTAT), and Chair -Elect of the Statistics Section of the American Association for the Advancement of Science (AAAS). She was the 2009 President of the American Statistical Association (ASA), is a Fellow of the ASA and of the AAAS, and is an Elected Member of the Society for Research Synthesis Methodology. She recently received the Craig Award for excellence in teaching and mentoring at the Graduate School of Public Health. She holds a Ph.D. in statistics from Stanford University.
Dr. Rachel Schutt is the new Senior Vice President of Data Science at News Corp, the publishing, news and information company, which is the publisher of The Wall Street Journal, New York Post, Times of London, The Sun, The Australian; Harper Collins and Amplify. Previously, Rachel was a statistician at Google Research and holds pending patents based on her work in the areas of social networks, large data sets, experimental design and machine learning. She is the co-author of the book "Doing Data Science" published by O'Reilly in October 2013 and based on a class she created and teaches at Columbia University. The book explores the central question "What is Data Science?" through the lens of the experiences of data practitioners in major tech companies such as Google, Microsoft and eBay, as well as NYC start-ups. She is an adjunct professor in the Department of Statistics at Columbia University and a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. She earned her PhD in Statistics from Columbia University, a Masters degree in mathematics from NYU, and a Masters degree in Engineering-Economic Systems and Operations Research from Stanford University. Her undergraduate degree is in Honors Mathematics from the University of Michigan. Rachel is a frequent invited speaker at universities and conferences, often on the subject of data science and statistics education, and more recently on data and journalism. She is on a research advisory panel for the UK-based organization NESTA(National Endowment for Science, Technology and the Arts) for their research project, Skills of the Datavores.
Dr. Alyson Wilson is an Associate Professor at North Carolina State University. She is a Fellow of the American Statistical Association and an Elected Member of the International Statistical Institute. Dr. Wilson’s research interests include Bayesian methods, reliability, information integration, uncertainty quantification, and the application of statistics to problems in defense and national security. She has held positions with IDA Science and Technology Policy Institute, Iowa State University, Los Alamos National Laboratory, Cowboy Programming Resources, and National Institutes of Health.Dr. Wilson has served on numerous national panels, including the National Academy of Sciences (NAS) Committee on Mathematical Foundation of Validation, Verification, and Uncertainty Quantification (2010-2011), the NAS Committee to Review the Testing of Body Armor Materials by the U.S. Army (2009-2010), the NAS Oversight Committee for the Workshop on Industrial Methods for the Effective Test and Development of Defense Systems (2008-2009), the Sandia National Laboratories’ Predictive Engineering Science Panel (2008-2013), the NAS Panel on Methodological Improvement to the Department of Homeland Security’s Biological Agent Risk Analysis (2006-2008), and the NAS Panel on the Operational Test Design and Evaluation of the Interim Armored Vehicle (2002-2003). She was on the organizing committee for the Department of Energy Office of Science (DOE/OS) Workshop on Mathematical Issues for Petascale Data Sets (2008), and an invited participant in the Chief of Naval Operations Distinguished Fellows Workshop on Critical Infrastructure Vulnerability (2008), the DOE/OS Workshop on Mathematical Research Challenges in Optimization of Complex Systems (2006), and the DOE Simulation and Modeling for Advanced Nuclear Energy Systems Workshop (2006). In 2006, she chaired the American Statistical Association President’s Task Force on Statistics in Defense and National Security. Dr. Wilson is the winner of the Los Alamos National Laboratory (LANL) Director’s Distinguished Performance Award (2008), the LANL Star Award (2008), the DOE Defense Programs Award of Excellence (2007), and the LANL Achievement Award (2000, 2005). She is a founder and past-chair of the American Statistical Association’s Section on Statistics in Defense and National Security. She was Reviews Editor (2011-2013) for the Journal of the American Statistical Association and The American Statistician. Dr. Wilson received her Ph.D. in Statistics from Duke University, her M.S. in Statistics from Carnegie-Mellon University, and her B.A. in Mathematical Sciences from Rice University.
Gene-by-Environment Interactions: Towards A Robust Statistical Definition
Abstract
Gene-by-environment interactions are a hypothesized contributing factor to the problem of "missing heritability", and so are the focus of major research efforts. However, there is no clear bridge between the biological and the statistical formulations of gene-by-environment interaction, and due to this deficiency, no statistical standard exists to detect such interactions in a robust, scale-invariant manner. We propose a stricter, more precise definition of statistical interaction and develop a rigorous, generalized method that detects linkage between genotype and gene expression in various environments using this definition. Our method calculates each gene's posterior probability of self-linkage in each of an arbitrary number of conditions through an application of the local false discovery rate, while correcting for confounding non-local linkages via surrogate variable analysis. This method is more robust to monotonic transformation of the data due to a reformulation of the statistical definition of interaction. We validate the results obtained from this method using a specially designed diploid dataset, and extend the method to more complex problems and data.
Computing Intrinsic Mean on Dihedral Angles using Gradient Descent Algorithm
Abstract
Advanced in technology not only brought along massive information but also shed light on human thought on modern sciences. Statistics is one of the applied sciences that has great impacts in vast area of other sciences. Gradient Descent Algorithm (GDA) plays a key role in some of the statistical problems. Particularly, among many algorithms, it is a simple tool to derive an optimal quantity in dealing with an optimization problem in the linear space. However, most of the optimization procedures in the non-linear statistics, which deals with the geometrical aspects of the objects e.g. predicting protein structure with dihedral angles, are problems lending themselves to the non-linear space. Hence, one should pay a great attention to implement such optimization algorithms in this new fields of statistics. From theoretical point of view, it has different topological properties from usual statistics. Since the dihedral angles are variables on non-Euclidean space, particularly they lie on the torus, it is expected that direct implementation of this statistical tool does not provide great information in this case. In this article a procedure to utilize this tool for finding intrinsic mean on a set of dihedral angles is highlighted. Particularly, in simulation study and the real data set, we regard the topological space of dihedral angles.
Bio:
Anahita Nodehi and Mousa Golalizzadeh, Department of Statistics, School of Mathematical Sciences, Tarbiat Modares University, Anahita6677@yahoo.com
Priority Assignment Under Convex Cost Functions
Abstract
Consider a queueing system with two different types of customers. Each customer incurs a cost depending on its type and waiting time in the queue. It has been studied in many researches when the cost functions are linear, now we are considering under nonlinear cost functions what the best static policy should be, and how the best static policy performs comparing to the generalized $c\mu$ rule which is proved to be asymptotic optimal policy under heavy traffic with convex cost functions. Then we give some sufficient conditions when priority policy performs better than non-priority policy, and when non-priority policy performs better. At last, we compare costs of using the static policies and using the generalized $c\mu$ rule by simulation and conclude that the cost of best static policy does not have significant difference compare to the cost of using the generalized $c\mu$ rule.
Editorial Roles: From Peer Reviewer to the Editor-in-Chief
Abstract
An invitation to participate in the editorial process for a peer-reviewed journal is recognition of one's expertise in the field. In this panel, we will provide an overview of the editorial process that will include information about the ins-and-outs of several editorial roles -- from the peer reviewer to the Editor-in-Chief. We will discuss strategies for becoming effectively involved in editorial work and discuss its benefits and challenges. Though the panel will emphasize editorial work for statistical journals, the insights should be more broadly applicable and some attention will be given to editorial work for non-statistical journals.
Bio:
Karen Kafadar is Rudy Professor of Statistics in the College of Arts and Sciences at Indiana University, Bloomington. She received her B.S. in Mathematics and M.S. in Statistics at Stanford University, and her Ph.D. in Statistics from Princeton University. Prior to joining the Statistics department in 2007, she was Mathematical Statistician at National Institute of Standards and Technology, Member of the Technical Staff at Hewlett Packard's RF/Microwave R&D Department, Fellow in the Division of Cancer Prevention at National Cancer Institute, and Professor and Chancellor's Scholar at University of Colorado-Denver. Her research focuses on robust methods, exploratory data analysis, characterization of uncertainty in the physical, chemical, biological, and engineering sciences, and methodology for the analysis of screening trials, with awards from CDC, American Statistical Association (ASA), and American Society for Quality. She was Editor for Journal of the American Statistical Association's Review Section and for Technometrics, and is currently Biology & Genetics Editor for The Annals for Applied Statistics. She has served on several NAS committees and is a past or present member on the governing boards for ASA, Institute of Mathematical Statistics, International Statistical Institute, and National Institute for Statistical Sciences. She is a Fellow of ASA, AAAS, and International Statistics Institute (ISI), has authored over 100 journal articles and book chapters, and has advised numerous M.S. and Ph.D. students.
Susan Paddock is a senior statistician at the RAND Corporation. Her research includes developing innovative statistical methods, with a focus on Bayesian methods, hierarchical (multilevel) modeling, longitudinal data analysis, and missing data techniques. She is the principal investigator of a project sponsored by the Agency for Healthcare Research and Quality (AHRQ) to improve the science of public reporting of healthcare provider performance. Paddock is the principal investigator of a project sponsored by the National Institute on Alcohol Abuse and Alcoholism to develop methods for analyzing data arising from studies of group therapy–based interventions. She previously led a study sponsored by AHRQ to investigate statistical methods for the analysis of longitudinal quality of care data for non-ignorable missing data. Paddock's substantive research interests include health services research, substance abuse treatment, drug policy, mental health, quality of health care, and health care provider performance assessment. She is the co-principal investigator of a project to conduct analyses related to the Medicare Advantage Plan Ratings for Quality Bonus Payments. She was the project statistician for the external program evaluation of the quality of care provided by the Veterans Health Administration to patients with mental health conditions. Paddock has been involved with several evaluations of group cognitive behavioral therapy–based interventions for treating co-occurring depression in substance abuse treatment clients. She has previously been involved with research on developing, monitoring, and refining a prospective payment system for inpatient rehabilitation care for Medicare beneficiaries. Paddock has served on editorial boards for the Annals of Applied Statistics and Medical Care. She received her Ph.D. in statistics from Duke University.
Professor Stangl obtained her Ph.D. in statistics from Carnegie Mellon University.She has been with Duke University since 1992 during which time she has given dozens of talks and short courses promoting the use of Bayesian methods in health-related fields. She won ASA’s Youden Award for her research on multi-center clinical trials and awards for outstanding teaching.She has served on panels for NIH, NSF, and NAS. She has chaired the ASA Bayesian Statistical Science Section, served as an editor for JASA, TAS, Bayesian Analysis, and Chance. She has co-edited two books, Bayesian Biostatistics, and Meta-Analysis in Medicine and Health Policy. She is a Fellow of the ASA and currently serves as Chair of the ASA Committee on Women in Statistics and as Associate Chair for Dept. of Statistical Science at Duke University.
Imputation of multivariate continuous data with nonignorable missingness
Abstract
Regular imputation methods have been used to deal with non-response in several types of survey data. However, in some of these studies, the assumption of missing at random is not valid since the probability of missing depends on the response variable. We propose an imputation method for multivariate data sets when there is non-ignorable missingness. A Dirichlet process mixture of multivariate normals is fit to the observed data under a Bayesian framework to provide flexibility. We provide some guidelines on how to alter the estimated distribution using the posterior samples of the mixture model and obtain imputed data under different scenarios. Lastly, we apply the method to a real data set.
Bio:
Thais Paiva is a PhD candidate in Statistical Science at Duke University working with Prof. Jerry Reiter on methods for dealing with confidential and missing data. She received a Bachelor's degree in Actuarial Science and a Master's degree in Statistical Science from the Federal University of Minas Gerais, in Belo Horizonte, Brazil. Some of her recent projects are simulation of synthetic spatial locations for confidential data sets, imputation of multivariate continuous variables with nonignorable missingness, and adaptive design models for survey data. Her research interests include Bayesian modeling, imputation methods and spatial statistics.
Professional Societies: Symbiotic Benefits
Abstract
Networking is key to career success, and there is no better access to networks than participating in a professional society. By attending, participating, or speaking at professional meetings you increase your visibility and are provided with ample opportunities for meeting like-minded people. Via section and committee meetings, local chapters and outreach groups, and other society groups, opportunities for networking abound. By participating in professional societies, you broaden your view, not only in your own area of expertise, but also in other specialties through workshops, conference sessions, access to resources, experts, and potential collaborators. Professional societies also provide opportunities to develop leadership and organizational skills. Perhaps most importantly, they help you give back to your profession and community. These and other topics describing the symbiotic benefits of membership in professional societies will be discussed.
Bio:
J. Lynn Palmer, Ph.D., is the Director of Programs at the American Statistical Association (ASA). Before that she was a tenured Associate Professor at The University of Texas M.D. Anderson Cancer Center for many years. She began her career at The University of Texas Medical Branch at Galveston. Lynn has served in several leadership positions at ASA, including serving on its Board of Directors, Chair of the Committee on Nominations, Chair of the Council of Chapters, Chair of the Committee on Career Development, Chair of the JCGS Management Committee, and President of the Houston Area Chapter. Lynn is a fellow of the ASA, a fellow of the Royal Statistical Society, and an elected member of the International Statistical Institute. She was also President of the Caucus for Women in Statistics and has served on the ASA Committee on Women in Statistics.
Stephanie Shipp, Ph.D., is the deputy director and research professor at the Social and Decision Analytics Laboratory (SDAL) at the Virginia Bioinformatics Laboratory at Virginia Tech. The goal of SDAL is to advance the development of statistical methodology and tools for using big data to address social science policy questions quantitatively. Before that her career spanned statistical programs at the BLS and Census Bureau, and innovation programs at the National Institute of Standards and Technology. She was also a senior researcher at the IDA and Science and Technology Policy Institute. Stephanie is a fellow of the American Statistical Association (ASA) and has held several leadership positions within ASA, including President of the Caucus for Women in Statistics and Chair of the Committee on Women in Statistics.
Jill Montaquila, Ph.D.., is a senior statistician and Associate Director of the Statistics Group at Westat, one of the leading research and statistical survey organizations in the U.S. She is also research associate professor in the Joint Program in Survey Methodology (JPSM) at the University of Maryland and the University of Michigan. Jill has been with Westat for 19 years, and with JPSM for 15 years. Prior to joining Westat, Jill was a mathematical statistician at the Bureau of Labor Statistics. She is a Fellow of the American Statistical Association and has served in many roles in the ASA, including President of the Washington Statistical Society and Chair of the Survey Research Methods and Government Statistics Sections. Jill has also had the opportunity to serve in the Caucus for Women in Statistics for nearly two decades.
Post PhD: What to Expect in Your First Year?
Abstract
As a student or recent graduate, knowing what types of jobs to apply for and what to expect in your first year is difficult and for some, full of confusion. This session will describe some of the many varied career options available to recent graduates in statistics and biostatistics. The speakers will discuss their transition from student life into a work environment, aspects about their position they found difficult or surprising, aspects about their position they enjoy, how they feel their PhD experience did or did not prepare them for the work expected of them, what they see for themselves in the the future and advice for current students and/or jobseekers.
Bio:
Dr. Layla Parast is an Associate Statistician at the RAND Corporation. She received her PhD in Biostatistics in 2012 from Harvard University and her MS in Statistics from Stanford University. Dr. Parast's research interests include survival analysis, risk prediction, landmark models, and evaluation of surrogate markers. Her work at RAND is focused on health and criminal policy.
Jessica Minnier is in her first year as Assistant Professor of Biostatistics in the Public Health & Preventive Medicine Department at Oregon Health & Science University in Portland, OR. She was previously a post-doctoral research fellow at Fred Hutchinson Cancer Research Center in Seattle and obtained her PhD in Biostatistics from Harvard in 2012. The past two years have both been 'first years' in academic positions. She will recount her experiences as a postdoc and new professor and finding her place as a statistician within a school of medicine. Her research interests focus on risk prediction and statistical genetics and she is currently navigating the challenges to becoming an independent researcher and integrated collaborator.
Dr. Jeng is an Assistant Professor of Statistics at North Carolina State University. Prior to joining NCSU, Dr. Jeng was a postdoctoral researcher in the Department of Biostatistics and Epidemiology at University of Pennsylvania (2009-2012). She received a Ph.D. in Statistics from Purdue University in 2009. Dr. Jeng's research interests include high-dimensional inference, multiple testing, variable selection, mixture model detection, and statistical genomics.
An Approach to Handling Multiple Experts In Multiple Imputation
Abstract
Multiple imputation is one method commonly utilized to deal with incomplete data. Imputations typically require the assignment of prior distributions to unknown model parameters. However, since there is uncertainty in what the hyperparameters of these prior distributions should be, not accounting for this uncertainty will inherently lead to over-confident inferences. We propose utilizing hyperpriors to account for this uncertainty using the rules of two stage multiple imputation. In particular, we will examine the utility of this method when data is assumed to be multivariate normal.
A novel dyadic-dependence model to analyze higher order structuring in physician networks
Abstract
Professional physician networks can potentially influence clinical practices and quality of care.. Currently used models for modeling physician networks assume independence of relationships between different pairs of actors (dyads) conditional on actor-specific effects. However, biased dyadic-level (e.g., reciprocity) and individual-level effects (e.g., same-gender, same residency location) may be obtained if effects involving 3 or more actors are not accounted for in the analysis of the network. To address this deficiency, we develop a new model that accounts for inter-dyad dependence involving multiple (≥ 3) actors thereby allowing for effects such as “transitivity” and feasible implementation on both directed and undirected networks. The new methodology is motivated by two real life physician networks: a directed physician influential conversation network (N=33) and an undirected physician network obtained from patient visit data (N=135). In both cases we find extensive evidence of triadic dependence that if not accounted for confounds the effect of reciprocity and nodal attributes (spatial proximity, complementary expertise). The results of our analysis suggest alternative conclusions to those from incumbent models.
Bio:
Dr. Paul is a Research Assistant Professor in the Nell Hodgson Woodruff School of Nursing at Emory University. She received her PhD in Statistics from Purdue University in December, 2009 and postdoctoral training (2010-2012) in the Dept. of HealthCare policy at Harvard Medical School. Her research interests include developing statistical methods for relational-, temporal- and spatially correlated data utilizing Bayesian computational methods with targeted applications in social/contact networks, biology and public health.
Applications of Statistical Learning: Segmentation and Classification
Abstract
Statistical learning algorithms are found in many applications. Some examples include recommendation engines, handwriting recognition, targeted marketing, and text classification. This poster will demonstrate a type of statistical learning using the multivariate statistical techniques of segmentation and classification. Segmentation analysis is used to make sense of large amounts of data by iteratively placing the data into smaller, more manageable “segments.” Once segmented, classification analysis is used to describe the separation between segments, allowing for future assignment of new observations into a segment. This poster will review the segmentation techniques of hierarchical and K-means clustering using R, discuss their benefits and challenges, and build a classification model for future assignment of new observations.
Bio:
Rachel Poulsen has a background in Math, Communications, and Statistics. She enjoyed the problem-solving aspect of math but still wanted to interact with people in her career. This led her to a graduate degree in Statistics and then to Silicon Valley 5 years ago. She has worked for companies like American Express, TiVo, and now Silicon Valley Data Science (SVDS) -- a consulting company for Data Science. At SVDS she gets to interact with a variety of people and help in solving a variety of data-related problems.
Spatial scale and the health risks associated with long-term exposure to coarse particulate matter
Abstract
Studies which investigate the association between human health and long-term exposure to air pollution typically model the spatial structure in the health outcome using standard spatial regression methods such as kriging, the inclusion of spatial random effects or penalized splines. The goal of these methods is to reduce bias from spatial confounding, which occurs when we cannot distinguish the effect of air pollution from residual spatial variation in the health outcome due to unmeasured spatially varying confounders. These methods, however, will only reduce this bias if the spatial variability in the unconfounded part of the exposure is at a smaller scale than that of the unmeasured confounders. Thus, it is important to consider the spatial scale of both the exposure and the confounding variables.
Using the health data from the Medicare billings claims and pollution data from the EPA monitoring network, we propose to estimate the health effects associated with long-term exposure to coarse thoracic particulate matter using a two stage approach which considers the spatial scales involved. First, we model the spatial trend in the pollution data using nonparametric methods and specifically a `bias corrected and estimated' generalized cross-validation criterion for selecting the bandwidth. Second, we fit a health model to determine the effect of pollution, which includes a kernel density to account for the spatial structure in the health data. The degrees of freedom for this smooth function are provided by the bandwidth matrix from the first stage, thus allowing the spatial variability of the air pollution data to capture the spatial variability of the unmeasured confounding variables which would effect the health model.
Bio:
I was awarded my PhD in June 2012 from the University of Glasgow, Scotland, and in Spetember of that same year I moved to Baltimore (despite numerous conversations about The Wire) to take up a postdoctoral position in the Biostatistics department of the Johns Hopkins Bloomberg School of Puiblic Health.
Until I came to the US I had never had a discussion about the possibility, and even the reality, that women were under-represented, under-valued, under-everything in the STEM fields. I don't even think I had ever heard the term STEM. I can honestly say that I thought very little about it as I truly thought we were doing fine, that the battle had been won. I can conseed that I was possibly just a little bit naive about all of this.... Since being in the US I have become passionate about the role of women in STEM fields, after all I am one myself. I read as much as I can about what is happening at other institutions and industry so that I can have an educated voice on this matter. I am a strong women and I fully believe in my current abilities and my future potential, but others might not feel quite so empowered. I want to be a part of any event which is working towards creating a community for women where we can learn from each other and feed off the strength and experiences of others. I have strength to spare and I definately want to share it with women in my field. This is why I know I need to be a part of this first ever Women in Statistics conference.
Taking on Leadership While Keeping Research/Technical Skills Vibrant
Abstract
With many responsibilities at home and work, statisticians need to overcome great obstacles to keep knowledge and skills current and to continue being productive in research. The hurdle is even higher for those who are taking administrative leadership positions at their organizations. Both the why and how behind the need to keep research/technical skills vibrant will be elaborated. Experiences of the panelists and other well-established statisticians holding leadership positions in government, industry, and academics will be shared. Despite differences in career path and job title, common traits spanning all sectors that enable keeping a sharp scientific mind and being research productive will be identified.
Bio:
Dr. Yili L. Pritchett is a Senior Director in Astellas Global Pharma Development, Inc. In this position, she is responsible for statistical aspects of drug development in Infectious Disease, Immunology, Transplantation, Neuroscience and Pain therapeutic areas. Previously, Yili held positions of Research Fellow and Director, Clinical Statistics, in Global Pharmaceutical R&D at Abbott Laboratories, and Research Advisor at Eli Lilly and Company prior to joining Astellas in early 2013. Dr. Pritchett obtained her Ph.D. in Statistics from the University of Wisconsin at Madison in 1994, and is an ASA Fellow since 2013. Dr. Pritchett has been instrumental in using innovative statistical approaches in supporting drug product development.She has directly influenced and significantly contributed to marketing authorization for a number of drug products, and championed the use of adaptive designs in testing and advancing pipeline compounds. Dr. Pritchett has received multiple industry awards for her statistical innovations and leadership, including 4 President Awards at Abbott and Lilly, which recognized her scientific contributions to drug development. Dr. Pritchett authored or co-authored 66 manuscripts or book chapters, made 37 oral presentations at statistics or drug development conferences with 18 invited. Currently, she is the President of North Illinois Chapter of ASA, and a co-chair of Confirmatory Sample Size Re-estimation Sub-stream under DIA Adaptive Designs Working Group. Dr. Pritchett co-chaired the program committee for DIA/FDA 2012 workshop and was the clinical program chairperson for 2012 Midwest Biopharmaceutical Statistics Workshop.
Generalized Linear Mixed Model with Random Effects from Dirichlet Process
Abstract
Nowadays, generalized linear mixed models are widely used for data analysis in various sciences. In such models, it is often adopted a parametric approach and accordingly, a normal distribution is used for random effects. Although, this assumption leads us to simple calculations but it can be inappropriate. To alleviate this obstacle, this paper adopts a nonparametric Bayesian framework, which relies on the Dirichlet process. Hence we consider a flexible class of distribution for random effects. Then, the methodology is extended for longitudinal and spatial data. It should be noted that inference is implemented via the Bayesian approach in which for model parameters, the posterior distribution is sampled based on Markov chain Monte Carlo methods such as the blocked Gibbs sampling algorithm. at the end, the methodology is illustrated assessed by a simulation example and two applied examples related standard penetration resistance of soil in Chabahar and the number of economic activists province during the years 1384 to 1389.
A Unified Approach for Population-based Multiple Phenotype-Genotype Analysis
Abstract
Identification of genetic variants contributing to risk of common but complex diseases (like diabetes, hypertension, cancer) is important in the development of effective means of diagnosis, cure and prevention. Keeping the complex nature of these diseases in mind, several mutually correlated phenotypes are often measured as risk factors for the disease, one or more of which may be associated with common genetic variants. Instead of analyzing individual phenotypes separately, joint modeling of multiple disease-related correlated traits may improve the power to detect association between a genetic variant and the disease. Currently only a few statistical approaches are equipped to jointly analyze multiple phenotypes. Standard MANOVA is very powerful when a genetic variant is associated with a subset of the traits or the effects of causal variant are in different directions with the correlation of these traits. We have proposed a powerful weighted approach that maximizes power by adaptively using the data to optimally combine MANOVA and a test that potentially ignores the correlation between the traits. We have studied and compared some popular existing methods for multiple phenotype analysis in unrelated individuals using simulated data on a single common variant. We have shown, both theoretically and using simulations, that MANOVA loses significant power when the variant is associated with all the traits and the effect direction is same as the direction of high dependence between the traits. We then illustrated the performance of our proposed unified test compared to MANOVA and other current methods under situations that commonly arise in such studies. Analysis using our multivariate approach might enable us to identify possibility of pleiotropy, and can potentially improve our understanding of how bio-chemical pathways relate to complex disorders.
Bio:
Debashree Ray is a final year PhD student in the Division of Biostatistics at the University of Minnesota, Twin Cities. She completed B.Sc. in Statistics from the University of Calcutta in 2008, and M.Stat with specialization in Biostatistics from the Indian Statistical Institute, Kolkata in 2010. Her primary research interests include developing novel statistical methods for identifying genetic factors that influence common human diseases such as cancer, diabetes, blood pressure, etc. Her dissertation, under the able guidance of Dr. Saonli Basu, is aimed at investigating several alternative hypotheses to explain the genetic risk of diseases attributable to genetic variants influencing such diseases. Her first work was focussed on a novel Bayesian modeling of multi-SNP effects in unrelated individuals. Her second and current work is about a novel multivariate test using multiple correlated phenotypes in unrelated individuals. She also does collaborative research with Dr. Mariza de Andrade from Mayo Clinic, and Dr. Julia M.P. Soler from University of Sao-Paulo, Brazil. During summer 2013, she did an internship from Mayo Clinic, Rochester.
Formal and Informal Mentoring
Abstract
Mentoring is important at every stage of life, both in our personal and our professional lives. This panel will focus on how our academic and professional journeys are shaped by good mentoring, whether formal or informal.
With collective experience in government, industry and academia, the panelists will describe their key mentoring experiences and how those experiences have helped to advance their careers or the careers of others.
The panelists will discuss different types of mentoring relationships, including that of an “executive sponsor” (a personal advocate). They will also address finding mentors, expectations of a mentor-protege relationship, and mentoring opportunities outside of the workplace, e.g. through professional societies.
Bio:
Bonnie Ray is Director, Cognitive Algorithms, in the newly formed Cognitive Computing organization at IBM’s T. J. Watson Research Lab. Her areas of expertise are decision analysis and statistics/machine learning, with particular focus on the use of statistics and optimization for decision making. Since joining IBM, she has played key roles in analytics for sales target allocation, acquisition risk management, customer targeting for IBM’s outsourcing businesses, resource demand forecasting methods for workforce management processes, and methods and tools for automated risk assessment in software development. Dr. Ray is a Fellow of the American Statistical Association, holds eleven+ patents, and has over sixty refereed publications. Prior to joining IBM, she was a tenured faculty member in Mathematics at the New Jersey Institute of Technology and a post-doctoral researcher at the Naval Postgraduate School. She holds a Ph.D. in Statistics from Columbia University and a B.S. in Mathematics from Baylor University.
Kimberly S. Weems earned a B.S. in Mathematics from Spelman College and her MA and PhD degrees from the University of Maryland, College Park. Upon graduation, she accepted a post-doctoral research position in the Statistics Department at North Carolina State University, where she later joined the faculty. She also held a visiting research position at the University of Alicante in Spain. Her research interests include measurement error models and statistics education. Weems has received numerous honors and awards, notably the Outstanding Faculty Award from the College of Sciences. She is currently co-Director of Statistics Graduate Programs at NC State.
Michelle Dunn is Program Director at the National Cancer Institute, a part of the National Institutes of Health (NIH). She co-chairs the Training subcommittee of the Big Data to Knowledge initiative, a trans-NIH program that aims to enable the biomedical research community to utilize the increasingly large and complex Big Data being generated, whether biological, biomedical, behavioral, or clinical. Dunn studied Applied Mathematics as an undergraduate at Harvard College and got a Ph.D. in Statistics from Carnegie Mellon University. She is active in statistical professional societies by participating in committees and editorial boards of statistical magazines.
Dr. Linda J. Young is Chief Mathematical Statistician and Director of Research and Development at the National Agricultural Statistics Service within the U.S.D.A. She has a Ph.D. from Oklahoma State University. Dr. Young has been a faculty member at Oklahoma State University, the University of Nebraska, and the University of Florida. She has more than 100 publications in 47 different journals, constituting a mixture of statistics and subject-matter journals, and three books. A major component of her work is collaborative with researchers in the agricultural, ecological, environmental, and health sciences. Her recent research has focused on linking disparate data sets and the subsequent analysis of these data using spatial statistical methods. Dr. Young has been the editor of the Journal of Agricultural, Biological and Environmental Statistics and is currently associate editor for Sequential Analysis. She also has a keen interest in statistics education at all levels, having worked with students and teachers from Kindergarten through High School as well as undergraduate, graduate, and post-graduate training. Dr. Young has served in a broad range of offices within the professional statistical societies, including President of the Eastern North American Region of the American Statistical Association, Vice-President of the American Statistical Association, Chair of the Committee of Presidents of Statistical Societies, member of EPA's Human Studies Review Board, and member of the National Institute of Statistical Science’s Board of Directors. Dr. Young is a fellow of the American Statistical Association, a fellow of the American Association for the Advancement of Science, and an elected member of the International Statistical Institute.
"Growing your Research Program: When, Why, and How"
Abstract
We all know that it is important to grow your academic reputation, research, and Curriculum Vitae, as you move through the academic ranks, but it is not obvious or straightforward to know how or when to do so. In this session we will detail our individual career paths, decisions we made (good and bad), opportunities to which we said yes and no, and mistakes we made (and were tempted to make, but did not). We will share how the dynamics of our work environments, personal lives, decisions and choices affected the development of our research programs and our career trajectory. We will discuss the importance of having good taste in research topics, the intellectual conviction and courage to take risks and re-invent yourself, choosing and mentoring students, identifying the right collaborations, the transition from "doer" to manager of people, promoting yourself and your work, and being personally aware of what you need to be professionally fulfilled at specific stages of life and career. Finally, we will emphasize the need for seeking out mentors, and growing toward the next phase of your career by defining new milestones, regardless of career stage.
Bio:
Bhramar Mukherjee is currently a Professor of Biostatistics at the School of Public Health, University of Michigan. She received her Ph.D in Statistics from Purdue University in 2001 and spent the next 4 years as an assistant professor of Statistics at the University of Florida. Her research interests are gene-environment interactions, Bayesian methods in genetic and environmental epidemiology and statistical analysis under outcome dependent sampling. Dr. Mukherjee has authored more than one hundred articles in reputed Statistics, Biostatistics and Epidemiology journals. She is leading multiple NIH and NSF grants on gene-environment interaction analysis and Bayesian methods in epidemiology as principal investigator. She is the director of the Biostatistics core in two large center grants related to environmental health. She is a member of the University of Michigan Comprehensive Cancer Center and Associate Director of the Cancer Biostatistics Training Grant. She is presently serving on the editorial board of six journals in Statistics, Biostatistics and Epidemiology. Dr. Mukherjee has won multiple grants and awards for her teaching accomplishments, including the 2012 School of Public Health Excellence in Teaching Award at the University of Michigan. She is an elected member of the International Statistical Institute and is a fellow of the American Statistical Association.
Rebecca Doerge is the Trent and Judith Anderson Distinguished Professor of Statistics at Purdue University. She joined Purdue University in 1995 and holds a joint appointment between the Colleges of Agriculture (Department of Agronomy) and Science (Department of Statistics). Professor Doerge's research program is focused on Statistical Bioinformatics, a component of bioinformatics that brings together many scientific disciplines into one arena to ask, answer, and disseminate biologically interesting information in the quest to understand the ultimate function of DNA and epigenomic associations. Rebecca is the recipient of the Teaching for Tomorrow Award, Purdue University, 1996; University Scholar Award, Purdue University, 2001-2006; and the Provost's Award for Outstanding Graduate Faculty Mentor,Purdue University, 2010. She is an elected Fellow of the American Statistical Association (2007), an elected Fellow of the American Association for the Advancement of Science (2007), and a Fellow of the Committee on Institutional Cooperation (CIC; 2009). Professor Doerge has published over 100 scientific articles, published one book, and graduated 22 PhD students.
Proportional Subdistribution Hazard Regression with Interval Censored Competing Risks Data
Abstract
In survival analysis, the failure time of an event is interval-censored when it is only known to occur between two consecutive observation times. Most existing methods for interval-censored data only account for a single cause of failure. However, in some situations a subject may fail due to more than one causes. We propose estimation procedures that account for both interval censoring and competing risks, adopting the modeling framework of the proportional subdistribution hazards model. The proposed estimating equations effectively utilize the ordering of the event time pairs, and the technique of inverse probability weighting is used to account for the missing mechanism. Simulation studies show that the proposed methods perform well under realistic scenarios.
Bio:
Yi is a Ph.D. candidate in Biostatistics at Graduate School of Public Health, University of Pittsburgh.
Campaigns and Information Costs: A Case Study of the 2008 General Election
Abstract
Much information has been gathered and debated with regards to the social and political context concerning the expected utility of voting. However, one aspect of analysis that deserves a second look concerning one’s voting utility is the increase in technology and Internet usage in the context of political campaigns and its effect on the information cost felt by the voter. Consequently, this paper seeks to add to that discussion by taking President Obama’s 2008 bid for the presidency as a case study. By analyzing panel data collected by the American National Election Studies, this paper will seek to understand the specific information barriers that exist for voters and how these barriers were eliminated or decreased in the 2008 general election. In conclusion, this paper, by closely examining the increase in technology in the 2008 general election and its effect on the expected utility of voting, not only provides a different look at barriers to voting but could also provide a new perspective on political activism, civic engagement, and future political campaigns.
Bio:
Tara Rhodes is a doctoral candidate at the University of Denver, finishing her second year in the research methods and statistics program. Her interests lie in political methodology, public opinion research, and realignment theory. She is currently employed as the data coordinator for the Fostering Healthy Futures program, a federally-funded research project aimed at developing an intervention program for children and teens with a history of out-of-home care.
Meta-Analysis of Odds Ratios: Bridging the Divide between Ideal and Available Extracted-Data
Abstract
Meta-analysis combines evidence from multiple studies with a common research question to derive a stronger conclusion. It is particularly useful in the health sciences, where it can provide better understanding of the efficacy of a treatment. While an ideal meta-analysis would combine the datasets of each study, a more widely used method performs an analysis on aggregate data extracted from published studies. A common problem is that the summary statistics necessary for the meta-analysis are often unavailable from publications, and common practice is to compute best-guesses using other information in the paper, such as Kaplan Meier plots. However, treating these best-guesses as observed summary statistics leads to unjustified certainty, and possibly to inaccurate conclusions and unfounded policy recommendations.
We introduce the Uncertain Reading Estimated Events (UR-EE) Bayesian random effects model to incorporate the uncertainty that arises during data extraction in the meta-analysis of odds ratios. We re-evaluate published meta-analyses that compare two treatments (percutaneous coronary intervention and coronary artery bypass grafting) for unprotected left main coronary artery stenosis in three outcomes: (1) mortality, (2) major adverse cardiovascular or cerebrovascular events and (3) target vessel revascularization after one year. The added uncertainty results in an increase in the standard deviation of the log-odds ratio by up to 28%. Alternatively, uncertainty from data extraction is equivalent to a reduction in the meta-analysis effective sample size by up to 38%.
Bio:
Shemra Rizzo is a Ph.D. student in Biostatistics at UCLA. She holds a BA in Physics from the Tecnologico de Monterrey in Mexico and an MS in Statistics and Operations Research from UNC Chapel Hill. Her statistical interests include meta-analysis, Bayesian methods, survival analysis and missing data.
Congratulations, You’ve Got Tenure! Now What?
Abstract
Congratulations, You’ve Got Tenure! Now What??
Sponsored by Joint Committee on Women in the Mathematical Sciences
Moderator: Paula K. Roberson (University of Arkansas for Medical Sciences)
This panel focuses on opportunities and obstacles for professional growth and development faced by mid-career faculty post-tenure. Some individuals may wish to move into administrative positions, but this is not the only path for career advancement and professional satisfaction. Discussion will include how tenured faculty members can become active in mentoring their junior colleagues and how those recently promoted can seek out mentors from senior colleagues. Other topics will include the freedom that tenure provides to take risks in assuming new roles or moving into new research areas as well as potentially appropriate responses to pressures to undertake additional administrative responsibilities which might be counterproductive to one’s personal career goals. Panelists and audience members will have the opportunity to share perspectives and lessons learned regarding strategies for targeting the next steps for one’s career.
Bio:
Jane L. Meza, Ph.D., received her Ph.D. in Statistics from the University of Nebraska Lincoln. She is Professor and Chair of the Department of Biostatistics and Director for the Center for Collaboration on Research Design and Analysis at the University of Nebraska Medical Center (UNMC) College of Public Health. She is Co-Director of the Biostatistics Shared Resource for the UNMC Fred and Pamela Buffet Cancer Center and Director of the Biostatistics Core for UNMCs Biostatistics Core for the Sponsored Program of Research Excellence in pancreatic cancer. Dr. Meza served as a Senior Statistician for the Children’s Oncology Group, working with the Cancer Control, Nursing and Soft Tissue Sarcoma committees. Dr. Meza has over 82 peer reviewed publications.
Item Parameter Recovery: An Empirical Comparison of the Effect of Sample Size, Total Number of Parameters, and Population Distribution on the Accuracy and Precision of IRT Item Parameter Estimates for Graded Responses
Abstract
The application of statistical analyses has drastically changed over the last years due to continued developments in computer science and information technology. The availability of powerful computers and powerful statistical software in academic research now allow the possibility of using complex models and techniques for the estimation of sample parameters. Applied researchers rely on theoretical and empirical bases to select the required sample size to achieve a desired statistical power; however, within Item Response Theory (IRT), the selection of sample size for accurate person and item parameters estimates is still not well established. Using a simulation approach, this study expands De Ayala and Sava-Bolesta’s (1999) study to the evaluation of the accuracy and precision of item parameter estimates for graded responses using Likert-type data. Manipulated factors include sample size ratio (SSR) to total number of parameters (2.5:1, 5:1, 10:1, and 20:1) and population distribution shape (normal, uniform, and skewed). A total of 5000 samples were generated for each condition in the simulation, which provides a maximum SE of .007 and a 95% CI width ± .013 The simulation study was conducted using SAS 9.3; the IRTGEN SAS macro (Whittaker, Fitzpatrick, Williams, & Dodd, nd) was used to generate data for the graded response model and MULTILOG 7.03 (Thissen, 2003) was used to calibrate the item parameters.
Bio:
Patricia Rodriguez de Gil is a dual Ph. D. candidate at the University of South Florida with concentrations in Educational Measurement, Research, and Evaluation, and Secondary Social Studies Education. She has taught courses in reading in the social studies, and in educational measurement for pre-service teachers. She is currently working as a part of the quantitative team on an NSF-funded grant project as a data analyst and quantitative researcher. Patricia was the recipient of the Successful Graduate Latina Student in 2007 and she was also the recipient of the Outstanding Latino Educator (OLE) award in 2009 at the University of South Florida. She received the Best Overall Graduate Poster award at FERA 2008 and she was awarded too one of the Graduate Student Scholarships from SEGUG 2012.
Patricia's research interests are: The study of the effects of class, race, and school context associated with STEM course work in High School, as well as missing data in the context of measurement models.
Teaching Statistical Concepts to Undergraduates with Limited Statistical Background: Lessons Learned
Abstract
Drawing on the works of Greg Gigerenzer and others, I will provide some strategies for teaching and presenting statistical information in a more engaging way. I will explore visualization of means, standard deviations and probabilities through the use of natural frequency charts. I will describe how to explain basic statistical concepts (such as relative risk, absolute risk, conditional probabilities, sensitivity and specificity) using real world examples. My examples involve actual health decisions relating to contraceptive pill usage, breast and colorectal cancer screening and smoking and lung cancer. Readers will walk away with tips to present information more clearly, an understanding of how to assess basic numeracy, and strategies for communicating complex statistical information in a more accessible way. These findings are currently being used to develop an undergraduate course in risk perception relating to health and medical decision making.
Bio:
Kashika Sahay, MPH is a first year doctoral student in Maternal and Child Health at UNC Chapel Hill with a Masters in Epidemiology from Emory University. She is minoring in Biostatistics. Her research interests include gathering timely information on vulnerable populations to improve health services access and engagement with care. She is currently working on improving the usability of data collection and reporting software for the NC Governor's Crime Commission and NC Sexual Assault/Domestic Violence Coalitions.
In addition, she is working on designing a course for undergraduates to improve their numeracy skills and understanding of statistical concepts. To further this end, she hopes to design research methods classes for individuals who do not have previous statistical backgrounds in public health. Through teaching and consulting, she hopes to build a career where the people she works with (general public health practitioners, the general public and students (of all genders) in statistical and non-statistical disciplines) are not afraid or intimidated by statistics.
Kashika has worked as a contract evaluator for the Centers for Disease Control and Prevention National Center for Injury Control and Prevention. She has adapted surveys for the Division of Reproductive Health and conducted infectious disease surveillance through the Georgia Department of Public Health. She has an undergraduate degree in Neuroscience from the University of Rochester.
Biostatistics Primer: Online e-Modules for Campus Learning
Abstract
Biostatistics concepts are important topics for future health professionals and researchers that are often lacking in health professions curricula, biomedical graduate programs, and post-graduate training programs (i.e. residencies and fellowship programs). Curricula for these programs are already full without room to add additional courses, although the importance of understanding biostatistics concepts has been recognized.
This presentation describes the development of five e-modules introducing common concepts in biostatistics. (1) Summarizing Data (2) Study Design (3) Variability (4) Risk (5) Statistical Inference Basics. These topics were selected based on their wide applicability to programs and are designed to be used in any combination ranging from one module to all five. Each e-module is able to be used with a flipped classroom, where the specific classroom activity or discussion topic is determined by the subject matter faculty member.
This series of modules can be used in whole or in part during a semester course or as additional training not included as part of a formal course. The modules themselves contain biostatistical concepts and examples, along with links for interactive tools or games and practice questions to reinforce concepts. The flipped classroom portion is left to the subject-matter expert so it can be tailored to the discipline specific content, but examples are provided.
Bio:
Kendra Schmid is an Associate Professor of Biostatistics and Director of Masters Programs in the College of Public Health, University of Nebraska Medical Center.
What Does Data Science Mean to Statisticians
Abstract
In this talk, I will explore the question "what is data science?" Many statisticians have understandably asked, "isn't statistics the science of data?" which suggests that data science is just a rebranding of the discipline of statistics. Yet data science is clearly emerging in job titles and academic programs, and doesn't seem to be going away any time soon. We'll discuss possible definitions of data science, and some important concepts that suggest that data science is a new and distinct discipline in its own right. And I'll try to persuade you, if you're not persuaded already, that rather than ignoring data science, it's to the benefit of statisticians to embrace it because there are many interesting opportunities.
Bio:
Dr. Rachel Schutt is the new Senior Vice President of Data Science at News Corp, the publishing, news and information company, which is the publisher of The Wall Street Journal, New York Post, Times of London, The Sun, The Australian; Harper Collins and Amplify. Previously, Rachel was a statistician at Google Research and holds pending patents based on her work in the areas of social networks, large data sets, experimental design and machine learning. She is the co-author of the book "Doing Data Science" published by O'Reilly in October 2013 and based on a class she created and teaches at Columbia University. The book explores the central question "What is Data Science?" through the lens of the experiences of data practitioners in major tech companies such as Google, Microsoft and eBay, as well as NYC start-ups. She is an adjunct professor in the Department of Statistics at Columbia University and a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. She earned her PhD in Statistics from Columbia University, a Masters degree in mathematics from NYU, and a Masters degree in Engineering-Economic Systems and Operations Research from Stanford University. Her undergraduate degree is in Honors Mathematics from the University of Michigan. Rachel is a frequent invited speaker at universities and conferences, often on the subject of data science and statistics education, and more recently on data and journalism. She is on a research advisory panel for the UK-based organization NESTA(National Endowment for Science, Technology and the Arts) for their research project, Skills of the Datavores.
Surviving Graduate School
Abstract
Graduate school have you stressed? This panel will provide personal stories and advice for not only surviving but thriving in graduate school. Tips and resources for stress management, writing a dissertation, working out advisor conflict, time management, and social networking will be discussed.
Weak Signal Identification and Inference
Abstract
The penalized method has been widely used for simultaneous variable selections and coefficient estimation, which is powerful for high-dimensional settings involving many covariates. However, there are limited works addressing statistical inference for penalized estimators, especially when the signals of the coefficients are weak. Weak signal inference is quite challenging, since the oracle properties hold for strong signals are not necessarily valid for weak signals. In this poster, we provide a range of signal strength where weak signals can be identified well. In addition, we construct confidence intervals of weak signals which have more accurate coverage rates. We compare the proposed method to the asymptotic and bootstrap approaches for confidence interval estimation in our numerical studies.
Bio:
Bio:
Peibei Shi is a Ph.D. candidate in statistics at University of Illinois at Urbana-Champaign, expecting to graduate in May 2015. Her research area includes signal detection and inference in model selection, local feature selection in varying coefficient models. She is currently working on weak signal’s identification and inference under the supervision of Professor Annie Qu. She serves as a student consultant at Illinois Statistics Office since 2012, providing statistical advice and solutions to clients in different areas. She also worked as a research assistant at National Institute of Health for three months during a summer internship at 2012.
Peibei received her bachelor degree at Tsinghua University in 2010, major in Academic Talent Program of mathematics and physics. She worked as a research assistant in the bioinformatics lab at Tsinghua University during 2008-2010.
Using Dirichlet mixture model to detect concomitant changes in allele frequencies
Abstract
RNA viruses are challenging for protein and nucleotide sequence based methods of analysis because of their high mutation rates and complex secondary structures. With new DNA and RNA sequencing technologies, viral sequence data from both individuals and populations are becoming easier and cheaper to obtain. Thus, there is a critical need for methods that can identify alleles whose frequencies change over time or due to a treatment. We have developed a novel statistical approach for identifying evolved nucleotides in a viral genome without relying on sequence annotation or the nature of the change. Its findings reveal nucleotides that have similar patterns of change. Our approach models allelic variances under a Bayesian Dirichlet mixture distribution. With a multi-stage clustering procedure we have developed an efficient clustering scheme that distinguishes treatment causal changes from variation within viral populations. Our method has been applied to an established HIV data set where we are able to replicate and improve previous findings. We believe our approach can be broadly applied and is particularly useful for the cases that are recalcitrant to traditional sequence analysis.
Bio:
Wen Jenny Shi is a fourth year PhD student in Statistics at the University of North Carolina, at Chapel Hill. Her research interests include bioinformatics, fiducial inference, data mining, and Monte Carlo methods.
Ensemble Variable Selection and Estimation
Abstract
The penalized maximum likelihood estimation has been extensively studied for simultaneous variable selection and estimation. However, the direct penalization on a certain full likelihood is computationally infeasible and requires model-specific theoretical work. To tackle the problems, we propose Ensemble variable selection and estimation (EVE) for a factorizable likelihood. EVE is a multi-layer procedure based on three methods: information combination across the factors, least squares approximation (LSA) method, and refitting. The full likelihood estimation can be obtained from likelihood factor estimation via weighted least squares. LSA is used to select important variables on the full likelihood. With the selected variables, we refit each factor and recombine the estimators. Our estimation has no asymptotic efficiency loss and is computationally efficient with existing software. Simulation studies and data analysis on HIV/AIDS studies confirm that EVE is competitive with other existing methods.
Automated Data Exploration through Insight
Abstract
Data exploration includes generating an overview of variables contained in a dataset and an understanding of the most important relationships among the existing variables. Data needs to be prepared for modeling while the models themselves are used for gaining insights into the main aspects of the discovered relationships. It involves use of many statistical techniques that require the judicious application of a proficient analyst. The task becomes much more difficult for large datasets with many variables. I present a robust automation framework that runs a set of exploratory statistical analyses on a given dataset. The statistics range from univariate analyses establishing the necessary metadata information to multivariate analyses discovering the relationships between the target variables and potential predictors. All the results are sorted by a new "interestingness" index comprised of suitable statistics. This allows the most relevant relationships to be selected. The most important aspects of the discovered relationships are further analyzed and presented as insights and expressed using a non-technical language allowing non-experts to gain a better understanding of their data.
Bio:
Jing Y. Shyr is a Distinguished Engineer and Chief Statistician at the IBM Business Analytics group. Before the IBM acquisition in 2009, Jing was Chief Statistician and Senior Vice President of Technology Solutions at SPSS Inc., a worldwide provider of analytical technology. She has leveraged her skills and experience in the field of statistical computing to become an innovative leader in the industry and the head of development for SPSS Inc.’s data mining technology and analytical solutions. Today, she leads a team of researchers and software developers responsible for the creation of data mining technology and statistical methodology. Her vision of making software produce consumable information will help to unlock the value of data for analysts and business users and spearheaded the use of data in decision making. During her tenure in SPSS, Ms Shyr set up a China development lab in Xi’an. The lab has become one of the most productive sites among IBM China labs. She holds a bachelors degree in Applied Mathematics from National Chiao Tung University, a master’s degree in Applied Statistics from National Tsing Hua University, Taiwan, and a Ph.D. in Statistics from Purdue University. She is an active member of the American Statistical Association, the PMML group (part of W3C), Advisory Council of the Science School of Purdue University and a recipient of the Distinguished Alumna Award from Purdue University in 2000. She has been named to i-Street magazine’s third annual Women in Black listing. Ms. Shyr has received the Distinguished Alumna Award from National Chiao Tung University in 2005 and the Distinguished Alumna Award form National Tsing Hua University in 2006.Since joining IBM, Ms Shyr has joined the Smart City initiative to help to solve Syracuse Vacant Property challenges. Her most recent efforts are on creating innovation for Business Analytics in IBM. Ms Shyr has led the new product creation, IBM SPSS Analytic Catalyst in IBM based on her innovation. The product was released in June of 2013 and has won Ventana Research IT innovation award in 2013. She has filed and submitted 14 disclosures for patent application in the first 4 years of her tenure in IBM.
Algorithm for Generating Experimental Designs for Multivariate Analysis
Abstract
In engineering studies, there are often multiple explanatory variables and multiple responses that are of interest. Prior knowledge and screening experiments often reveal that not all explanatory variables are related to all responses. One common practice to account for this in the experimental design is to use a resolution five fractional factorial in all explanatory variables. In some cases, we can improve on this by taking into account which groups of explanatory variables are of interest for each response, and constructing a design that is a full factorial in only these groups of variables. Here we will present an algorithm for generating designs of this type that are smaller than the resolution five designs for a particular class of problems.
Bio:
I received a Masters degree in Statistics from Iowa State University in 2012 and I am currently pursuing a PhD in the same department.
A Marginalized Two-Part Model for Semicontinuous Data
Abstract
In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Semicontinuous data are typically analyzed using two-part mixtures that separately model the probability of health services use and the distribution of positive expenditures among users. However, because the second part conditions on a nonzero response, conventional two-part models do not provide a marginal interpretation of covariate effects on the overall population of health service users and non-users, even though this is often of greatest interest to investigators. We propose a marginalized two-part model that yields more interpretable effect estimates in two-part models by parameterizing the model in terms of the marginal mean. This model maintains many of the important features of conventional two-part models, such as capturing zero-inflation and skewness, but allows investigators to examine covariate effects on the overall marginal mean, a target of primary interest in many applications. Using a simulation study, we examine properties of the maximum likelihood estimates from this model.We illustrate the approach by evaluating the effect of a behavioral weight loss intervention on health care expenditures in the Veterans Affairs health care system.
Leading Analytics and Insights Viewed Through a Diverse and Gender Lens
Abstract
This panel discussion will focus on a series of topical questions followed by audience Q&A. The topics will be
divided in two broad themes: 1) A Business Development Dimension (topics to include: How to balance the
hard and soft sciences; How to proactively drive your analytics leadership journey; The challenges in
communicating/messaging analytics to a non-technical audience) and 2) A Personal Development Dimension
(topics to include: How to seek Constructive Feedback; How to identify and find a personal champion;
How to calibrate and balance your high self-expectations across the organization ). Kiersten and Suzanne
will openly discuss their 'moments of truth', hoping that attendees can learn from their experiences and
personal insights.
Bio:
Suzanne Smith has been at Lowe's for eight years and has served in numerous research and analytic leadership
positions that support the enterprise, including: marketing, macro-economic and consumer analytics, branding,
merchandising, global assessments, enterprise measures of success, and most recently, Enterprise Analytics.
Suzanne is passionate about using analytics to uncover and give meaning to the diverse customer experience
in an omni-channel retail environment. Prior to Lowe's, Suzanne was a Research Associate at the UNC Charlotte
Urban Institute, focusing on social and economic research. She has a BA in Sociology & Criminology and a BA
in Psychology from Winthrop University; and a Masters in Sociology & Statistics from UNC Charlotte.
Kiersten Einsweiler has been with Lowe's for 10 years, providing analytic solutions for multiple business areas,
including merchandising, marketing, performance forecasting, credit, and pricing and promotion. In her work,
she strives to create solutions that promotes personalization for Lowe's customers. She has a BS from Arizona
State University in Psychology and a Master of Statistics from North Carolina State University in Statistics and
Operations Research.
Carmen Neudorff is is the VP of HR Analytics at Lowe’s
Dan Thorpe is currently the Vice President of Enterprise Analytics at Lowes. Prior he was with Walmart, Sam's
Club, Wachovia and WL Gore and Associates, with one of his passions on the creation and enabling of high
performing analytics associates and teams.
Control Function Assisted IPW Estimation with a Secondary Outcome in Case-Control Studies
Abstract
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to estimate population effects in such studies. However, these estimators are inefficient relative to estimators that make additional assumptions about the data generating mechanism.
We propose a class of estimators for the effect of risk factors on a secondary outcome in case-control studies, when the mean is modeled using either the identity or the log link. The proposed estimator combines IPW with a mean zero control function that depends explicitly on a model for the primary disease outcome. The efficient estimator in our class of estimators reduces to standard IPW when the model for the primary disease outcome is unrestricted, and is more efficient than standard IPW when the model is either parametric or semiparametric.
Random Forest Classification of Etiologies in Acute Liver Failure Patients
Abstract
Determining the etiology, or cause, of Acute Liver Failure (ALF) in patients can be arduous for clinicians. Accurate assessment of etiology is necessary because this determines the prognosis and treatment plan for patients. We employ a statistical classification procedure to provide additional information for this determination. Classification of etiology is challenging because of imbalanced data, large amounts of missing data, and correlated predictors. Random forest (RF) was selected for this study because it can address these challenges, often offering higher classification accuracy than other methods. However, retrieving and interpreting information from RF can be difficult.We aim to explore how results from the RF procedure may be used to improve diagnostic determinations of ALF etiologies.
RF was applied to ALF Study Group registry data. Analyses were carried out using R software for the 1,978 patients enrolled beginning in 1998 who had non-missing etiology data as of December 2012. The outcome variable, etiology, is categorized into one of the following: acetaminophen overdose, drug induced liver injury, autoimmune hepatitis, hepatitis B, indeterminate, and other (including all other etiologies). After data were processed and missing data imputed, 46 admission variables were used in the variable selection procedure. Based on the smallest error rate, 12 were selected for inclusion within the RF. The model yielded an overall prediction error rate of 35%. Variable importance plots illustrated the most important variables for the classification of etiology, and partial dependence plots showed the relationship between variable values and the probability of inclusion in each group.
RF offered impressive accuracy and the capability to assess many variables in establishing etiology. Typical questions that arise in the general framework of classification were presented along with interpretations of results from the procedure. RF provided a model for predicting etiology and information about relationships between independent variables and etiologies, which should improve the accuracy and efficiency of this determination for clinicians.This study was supported in part by a U-01 58369-014 from NIDDK to the ALF Study Group.
Bio:
Jaime Lynn Speiser is a Research Associate in the Data Coordination Unit within the Department of Public Health Sciences and a Ph.D. student in biostatistics at the Medical University of South Carolina. She received her B.S. in Mathematics from Elon University (2010) and her M.S. in Statistics from The Ohio State University (2012). As a Research Associate, Jaime compiles safety and monitoring reports for a clinical trial for a new stroke treatment, assists in the analysis of a clinical trial that recently ended, and collaborates with the Acute Liver Failure Study Group in reporting and analyzing registry data. She is interested in how ensemble learning methods can be used for statistical classification, and ways to present and communicate findings to researchers outside the field of statistics. In her free time, Jaime plays tennis and softball, rides her bike around Charleston, participates in book club, and relaxes on the beach.
Panel of Presidents of the American Statistical Association
Abstract
The American Statistical Association presidency has been held by 12 women: Helen Walker, Gertrude Cox, Margaret Martin, Barbara Bailer, Janet Norwood, Katherine Wallman, Lynne Billard, Sallie Keller, Mary Ellen Bock, Sally Morton, Nancy Geller, and Marie Davidian. This panel will introduce you to women who have taken on leadership of the American Statistical Association. Come learn about how they entered the field of statistics/biostatistics, what launched and propelled their careers, what hurdles they encountered, what tools they used to stay inspired, and what advice they have for women in all stages of their careers.
Bio:
Lynne Billard is a professor at the University of Georgia known for her statistics research, leadership, and advocacy for women in science. She earned her Bachelors of Science degree in 1966, and Ph.D. in 1969, from the University of New South Wales, Australia. In 1980, Billard joined the University of Georgia as head of the Department of Statistics and Computer Science. She was named a University Professor in 1992. She has served as President of the American Statistical Association and the International Biometric Society. From 1988 to 2004 she served as principal investigator for "Pathways to the Future," an annual workshop focused on mentoring women in all fields of science and scientific research. In 2011, she received the tenth annual Janet L. Norwood Award for Outstanding Achievement by a Woman in the Statistical Sciences. In 2013, she was awarded the Florence Nightingale David Award for exemplary contributions to education, science and public service.
Marie Davidian is the William Neal Reynolds Professor of Statistics at North Carolina State University. She received her Ph.D. in Statistics from the Department of Statistics at the University of North Carolina at Chapel Hill in 1987 under the direction of Raymond J. Carroll. Her interests include statistical models and methods for analysis of longitudinal data, especially nonlinear mixed effects models; methods for handling missing and mismeasured data; methods for analysis of clinical trials and observational studies, including approaches for drawing causal inferences; pharmacokinetic and pharmacodynamic analysis; combining mechanistic mathematical and statistical modeling of disease progression to design treatment strategies and clinical trials; and statistical methods for estimating optimal treatment strategies from data. In addition to her position at NCSU, she is Adjunct Professor of Biostatistics and Bioinformatics at Duke University, and works with the Duke Clinical Research Institute collaborating with clinicians and biostatisticians on problems in cardiovascular disease research. She co-authored the book Nonlinear Models for Repeated Measurement Data and co-edited the book Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. She served as President of the American Statistical Association in 2013.
Sallie Ann Keller is director and professor of statistics for the Social and Decision Analytics Laboratory within the Virginia Bioinformatics Institute at Virginia Tech University. Formerly she was professor of statistics at University of Waterloo and their vice-president, Academic Provost. Prior to this she was the director of the IDA Science and Technology Policy Institute in Washington DC. Prior to this she was the William and Stephanie Sick dean of engineering and professor of statistics at Rice University. Her other appointments include head of the Statistical Sciences group at Los Alamos National Laboratory, professor and director of graduate studies in the Department of Statistics at Kansas State University, and statistics program director at the National Science Foundation. She has served as a member of the National Academy of Sciences Board on Mathematical Sciences and its Applications, has chaired the Committee on Applied and Theoretical Statistics, and is currently a member of the Committee on National Statistics. Her areas of research are uncertainty quantification, computational and graphical statistics and related software and modeling techniques, and data access and confidentiality. She is a national associate of the National Academy of Sciences, fellow of the American Association for the Advancement of Science, elected member of the International Statistics Institute, and member of the JASON advisory group. She is also a fellow and past president of the American Statistical Association. She holds a Ph.D. in statistics from the Iowa State University of Science and Technology.
Sally C. Morton is Professor and Chair of the Department of Biostatistics in the Graduate School of Public Health, and directs the Comparative Effectiveness Research Core at the University of Pittsburgh. She holds secondary appointments in the Clinical and Translational Science Institute, and the Department of Statistics. Previously, she was Vice President for Statistics and Epidemiology at RTI International. She spent the first part of her career at the RAND Corporation where she was Head of the Statistics Group, and held the RAND Endowed Chair in Statistics.Her research interests include the use of statistics in evidence-based medicine, particularly meta-analysis. She serves as an evidence synthesis expert for the Agency for Healthcare Research and Quality(AHRQ) RTI–University of North Carolina (UNC) Evidence-Based Practice Center (EPC), collaborates with other EPCs, and was Co-Director of the Southern California EPC. She was a member of the Institute of Medicine (IOM) committee on comparative effectiveness research prioritization, and vice chair of the IOM committee on standards for systematic reviews. Dr. Morton is a member of the National Academy of Sciences Committee on National Statistics (CNSTAT), and Chair -Elect of the Statistics Section of the American Association for the Advancement of Science (AAAS). She was the 2009 President of the American Statistical Association (ASA), is a Fellow of the ASA and of the AAAS, and is an Elected Member of the Society for Research Synthesis Methodology. She recently received the Craig Award for excellence in teaching and mentoring at the Graduate School of Public Health. She holds a Ph.D. in statistics from Stanford University.
Taking a Closer Look at Learning: Factors Associated with Changes in Academic Performance During the Transition from Elementary to Middle School
Abstract
A regional school district in Western Massachusetts observed a significant and consistent drop in children’s standardized test scores in mathematics between 5th and 6th grades for a period of several years. A literature review suggested that social and emotional factors involved in the transition from elementary (5th grade) to middle school (6th grade) contribute to this performance drop. In collaboration with the school district, a research team at Smith College examined school, teacher, and student factors to identify variables associated with this decline in achievement. We did not find student demographic factors, including race and gender, to be significant. We did find teacher licensure of the 6th grade math teacher to be significantly associated with the drop in performance. Students in 6th grade math classes taught by teachers with a professional license had a smaller drop in performance than did students taught by partially licensed or unlicensed teachers. On average, students who had licensed 6th grade teachers actually improved their scores. We also found evidence to suggest that the transition itself is indeed associated with performance. Students in the one K-8 school in the district did not experience a decline in performance between 5th and 6th grades. In fact, they saw a relatively large increase in performance compared to the average change in scores across other schools. However, on average, across almost all 6-8 schools, where a transition was made, students saw a drop in their performance.
Bio:
Sara Stoudt is a rising senior Mathematics and Statistics major at Smith College. She is interested in pursuing a PhD in Statistics and working in government or industry. She is also interested in learning more about the intersection of statistics and machine learning. She has worked on a wide range of projects at Smith including estimating the proportion of the United States that is within one mile of a road and predicting the March Madness bracket. She has also worked as a summer research fellow at the National Institute of Standards and Technology.
The Value of Internships
Abstract
Internships can be a valuable mechanism for gaining exposure to the field of statistics and obtaining practical experience,
particularly early in a career. In addition to honing technical skills, interns also have the ability to develop their soft skills
in areas such as communication, teamwork, leadership, etc. Many companies, government agencies, and other
organizations offer internships to identify promising employees, to increase the available talent pool in a given field,
and to promote educational opportunities for students and early career participants. This session will discuss how
internships are structured, how to identify internship opportunities, and how to make the most out of an internship
experience from the perspective of three different presenters.
Susan Paddock, Senior Statistician at the RAND Corporation and Head of the Statistics Group, will describe the
opportunities available through RAND's Summer Associate Program and discuss examples of summer projects.
Jennifer Van Mullekom, Senior Consulting Statistician in DuPont's Applied Statistics Group, will discuss DuPont's
extended internship program as well as projects from past interns.
Joanne Wendelberger, Group Leader of the Statistical Sciences Group, will describe internship opportunities at
Los Alamos National Laboratory.
Bio:
Susan Paddock, Senior Statistician at the RAND Corporation and Head of the Statistics Group.
Jennifer Van Mullekom, Senior Consulting Statistician in DuPont's Applied Statistics Group.
Joanne Wendelberger is the Group Leader of the Statistical Sciences Group within the Computer, Computational, and Statistical Sciences Division at Los Alamos National Laboratory. She holds a Bachelor’s degree in Mathematics from Oberlin College and Masters and Ph.D. degrees in Statistics from the University of Wisconsin. She joined Los Alamos as a Technical Staff Member in 1992, progressing to Project Leader and Deputy Group Leader prior to her current position as Group Leader. She previously worked as a Statistical Consultant at the General Motors Research Laboratories. Her research interests include statistical experimental design, statistical bounding and uncertainty quantification, materials degradation modeling, sampling and analysis issues in large-scale computation and visualization, probabilistic computing, and education modeling. Dr. Wendelberger is a Fellow of the American Statistical Association (ASA). She has served as Chair and Program-Chair of the ASA Section on Physical & Engineering Sciences and has been a member of both the Editorial Board and the Management Committee for Technometrics, a Journal of Statistics for the Physical, Chemical, and Engineering Sciences.
SUBLIME and OASIS for Multiple Sclerosis Lesion Segmentation in Structural MRI
Abstract
Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients and is essential for evaluating disease-modifying therapies and monitoring disease progression. In practice, lesion load is often quantified by expert manual segmentation of MRI, which is time-consuming, costly, and associated with large inter- and intra- observer variability. We propose Subtraction-Based Logistic Inference for Modeling and Estimation (SuBLIME) and OASIS is Automated Statistical Inference for Segmentation (OASIS). SuBLIME is an automated method for segmenting incident lesion voxels between baseline and follow-up MRI studies. OASIS is an automated method for segmenting lesion voxels from a single MRI study. Both methods use carefully intensity-normalized T1-weighted, T2-weighted, fluid-attenuated inversion recovery (FLAIR) and proton density (PD) MRI volumes and are logistic regression models trained on manual lesion segmentations. We also present software implementations of SuBLIME and OASIS, where users can upload MRI studies to a website to produce lesion segmentations within minutes.
Bio:
Elizabeth Sweeney is a first year PhD student in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. She works with Ciprian Crainiceanu, Taki Shinohara, and Ani Eloyan. Her primary research interests are in developing methods for large data sets, particularly neruroimaging data with applications to multiple sclerosis, brain cancer and stroke. She is a member of the SMART working group at Johns Hopkins and the PennSIVEworking group at the University of Pennsylvania
Bayesian Sample Size Determination for Informative Hypotheses: The Effect of Model Dimension
Abstract
Researchers often analyze data assuming models with order-restricted parameters. In Bayesian hypothesis testing, incorporating inequality constraints on model parameters has lead to "informative hypotheses'' and associated priors. It is well known in the frequentist context that specifying inequality constraints in the alternative hypothesis increases power by eliminating ad hoc multiple tests and the associated inflated family-wise error. In Bayesian hypothesis testing, similar improvements are seen in operating characteristics. Additionally, due to model parameter estimation, model dimensionality is a key component to the sample size requirement. The effect of sample size in these problems has received little attention. In this poster session, we investigate informative hypotheses using Bayesian methods. We explore Bayesian sample size determination techniques for a variety of settings in the informative hypothesis context, including model dimension and order violations.
Multivariate Gradient Analysis for Species Presence-Absence in South Africa
Abstract
Spatial gradients, or directional derivatives, are well defined spatial processes initially elaborated to learn local rates of change for response surfaces under a Gaussian process model. We have extended these processes for post model fitting analysis of multivariate Gaussian process models where both response and covariate(s) can be assumed to follow a spatial process model. Gradient-based sensitivity analysis enables assessment of spatial variation in relative rates of change betweeen the variables, and gradient-based angular analysis compares directions of maximum gradient between response-covariate pairs. These gradient analyses are illustrated for a non-Gaussian response using presence-absence data from the Cape Floristic Region in South Africa, modeled as a response to elevation. The resulting inference indicates a more sophisticated relationship with elevation than could otherwise be learned through a spatial generalized linear model.
Bio:
Maria A. Terres is a Ph.D. candidate in Statistical Science (expected May 2014) at Duke University working with Alan E. Gelfand and will soon be a Postdoctoral Research Scholar at North Carolina State University working with Montserrat Fuentes. She holds a B.A. in Biology and Mathematics from Bard College at Simon's Rock and an M.A. in Statistics from Columbia University. Her research focuses on spatial-temporal modeling for environmental and ecological applications. Recent projects have included spatial survival models for budburst in cherry blossoms, spatial presence-absence models for snapper fish populations, and multivariate spatial gradient processes for post model fitting sensitivity analyses. As a graduate student at Duke she was awarded the James B. Duke Fellowship and the 2014 Dean's Award for Excellence in Teaching.
Quantitative trait mapping by inclusion of evolutionary relationships among genetic data
Abstract
A central goal in the biological and biomedical sciences is to identify locations along the genome that can explain variation in quantitative traits. Over the last decade, improvements in sequencing technologies coupled with the active development of association mapping methods have made it possible to detect locations (single nucleotide polymorphisms, or SNPs) linked to quantitative traits using data from randomly-sampled individuals. However, existing methods are severely limited by either their inability to consider complex, but biologically-realistic, scenarios or by their disregard of relationships imposed by the evolutionary history among SNPs during analysis. By using the evolutionary history among SNPs to incorporate a variance-covariance structure into analysis, the proposed association mapping method relaxes the assumption of independence among observations assumed by most classical statistical methods. Other previous methods have shown increased performance by accounting for the evolutionary history among SNPs during analysis, but are not flexible enough for application to complex data types. By using a broad-scale estimate of the evolutionary history among SNPs in a flexible modeling framework, the proposed method is computationally feasible and can be used to detect SNPs associated with quantitative trait variation in a variety of complex scenarios.
Bio:
Katherine is an Assistant Professor in the Department of Statistics at the University of Kentucky. In 2013, she received her Ph.D. in Statistics from The Ohio State University under the advisement of Professor Laura Kubatko. As an undergraduate student, she received degrees in Mathematics and Biology from the University of Kentucky. Her current research interests are in statistical genetics, specifically phylogenetic analysis of quantitative trait variation. Katherine also thoroughly enjoys teaching statistics and was recognized with a university-wide Graduate Associate Teaching Award at Ohio State in 2013.
Modeling Mental Health Recovery Using a Hierarchical Linear Growth Model
Abstract
Statistically modeling mental health recovery for individuals with severe and persistent mental health challenges has traditionally been accomplished using the medical model of recovery which encompasses the elimination or reduction of symptoms through medication and/or hospitalization. A more holistic approach gaining support among both clinicians and consumers is the Recovery-Centered Collaborative Approach to mental health recovery (the RCC model) which is a person-centered approach integrating medication with spirituality, hope, physical wellbeing, life skills, strategies for managing symptoms, and strong community and family support. Methods: Data from CooperRiis Healing Farm, a residential treatment facility for individuals with severe and persistent mental health conditions, specializing in the RCC approach were examined using a Hierarchical Linear Growth Model (HLGM) to assess recovery over a twelve month time period. Data included variables such as age upon admission, gender, and primary diagnosis. Results: The results from level 1 of the HLGM provided evidence that there was significant positive growth in recovery scores over time. Results from level 2 of the HLGM revealed that individuals with a diagnosis of personality disorder or depression had admission scores significantly lower than any other diagnoses. Variance in growth over time was not explained by any of the level 2 variables
Bio:
Karen Traxler, M.S., is completing her Ph.D. in Applied Statistics and Research Methods at the University of Northern Colorado. Her research interests encompass applied methodological approaches in the behavioral, educational, and social sciences, particularly measurement instrument design and development, including but not limited to assessing the psychometric properties of scores obtained on measurement instruments, using structural equation modeling, conducting hierarchical linear modeling/multi-level modeling, and evaluating latent trait growth curve models.
Using Change Point Models to Assess Drivers' Navigation of Rural Curves
Abstract
The second Strategic Highway Research Program (SHRP2) is a partnership of federal, state, and private transportation organizations which aims to increase the safety, efficiency, productivity, and reliability of highway systems in the United States. The Institute for Transportation at Iowa State University plays an important role in this program by investigating driver behavior when navigating rural curves, as curves have been shown to have a crash rate of nearly three times that of tangent sections. A key element in this research is determining how drivers change their behavior when navigating a curve. In this poster, I focus on driver behavior when approaching a curve and specifically on pinpointing at what point before a curve drivers react to the curve. I use frequentist and Bayesian methods to fit change point models to over 100 driver traces in order to determine how far upstream from a curve drivers should change their behavior in order to navigate the curve safely. This research has important implications for placement of chevrons and other signage on rural roads that could help reduce rate of accidents on rural highways.
Managing Your Career and Child Rearing
Abstract
So how do you manage your career and raise a family at the same time? There is no right answer to this question—only what is right for you. So what helps you navigate this path? For the panelists, it has been the wisdom of those that have gone before and a healthy dose of trial and error. This panel will give you the opportunity to hear perspectives from both academia and industry. Furthermore, we will highlight a framework for making your decisions about how to manage your career and family. This framework includes IWIKs – I Wish I Knew… before I embarked on this journey. We will also detail a series of key requirements essential to managing career and family. We will discuss alternative ways to meet those requirements. The goal is to get you thinking about alternatives that make you sucessful in both areas without sacrificing the principles that are most important to you. We are looking forward to the lively discussion that follows. We invite other experienced working parents to engage in the dialogue with us.
Bio:
Jennifer H. Van Mullekom is a Senior Consulting Statistician in DuPont's Applied Statistics Group. She supports the DuPont Protection Technologies business. She currently provides statistical leadership to the Tyvek(R) Medical Packaging Transition Project in the areas of product development, commercialization, and regulatory. Jen is a also a Six Sigma Master Black Belt and teaches various Six Sigma course corporately. She has been involved in the ASA's Section on; Physical and Engineering Sciences since 1998 and has held various positions including Publicity Chair, Marquardt Speaker Chair, and Section Chair. She has co-developed the ASA's course on "Effective Presentations for Statisticians" as a member of past ASA President Bob Rodriguez's Career Success Factors Task Group. Other contributions to ASA include JSM presentations and conference committee memberships including the Conference on Statistical Practice. Jen is a DuPont Associate Fellow. She holds three US Patents and has worked at Lubrizol and Capital One in addition to DuPont. Her statistical areas of interest include equivalence testing, regression modeling, response surface designs, and mixed models. Jennifer received her PhD and MS from Virginia Tech in 1998 and 1995 respectively. She holds undergraduate degrees from Concord University in Mathematics (1993) and Mathematics Education (1994).
Merlise A. Clyde is Professor and Chair of Statistical Science at Duke University. Her research interests include Bayesian model selection and methods that address model uncertainty in high dimensional problems, Bayesian non-parametrics using Levy random fields, wavelets, and experimental design. Her areas of application include air pollution and health effects, astronomy, bioinformatics, environmental sciences, medical devices, and neuroscience. She holds one US Patent, and is the recipient of the Leonard J. Savage Ph.D. Thesis Award and a NSF CAREER Award. She is a Fellow of the American Statistical Association and Past-President of the International Society of Bayesian Analysis.
From Drug Use to Dependence: a Multiparametric Approach.
Abstract
Escalation in frequency of drug self-administration is a hallmark of developing drug dependence. In NIH pre-clinical studies with species other than humans, the body of evidence implicates several processes that might promote this escalation, including development of tolerance to drug effects, sensitization, and possibly an allostatic reduction in the reward function subserved by each successive drug self-administration occasion. We posit that parametric mathematical models should be useful in the prediction of drug dose-effect curves (DECs) in response to these processes. To begin, extending this line of research to human clinical research, the focal point of our research starts with estimates of the probability of being a case of drug dependence, expressed as a function of the count of days or occasions of drug use. In these first steps, the DECs are plotted with the estimated probability of being a DD case on the y-axis and with the most recent count of days or occasions of use of the drug on the x-axis, based on an assumption of no feedback loops. That is, we begin with a simple model that ignores a likely feedback loop we mentioned in prior work -- namely, the likely influence of the drug dependence state on what happens next, including the possibility that becoming drug dependence starts to drive up the count of occasions of drug use. To illustrate our approach, we turn to recent empirical data from the National Surveys on Drug Use and Health (NSDUH) data and we generate DECs for different psychoactive drug subtypes, aiming to shed light on drug-specific variations in the DEC curves well as potential male-female variations.
Statistical Model Building, Machine Learning, and the Ah-Ha Moment
Abstract
This talk will partly follow an article of the same title, which is to appear in "Past, Present and Future of
Statistical Science" invited by COPSS for their 50th Anniversary Volume. (Available in the TRLISt in my
website). There and here I describe favorite parts of my research career over time which involved
serendipitous interactions with colleagues and students that provided a solution ("the ah-ha moment")
to some interesting problems. I will complete the talk by moving into the present by noting some
recent work on merging pairwise distance information with direct attribute information via the use
of reproducing kernel Hilbert space methods and distance correlation.
Bio:
Grace Wahba is the IJSchoenberg-Hilldale Professor of Statistics at the University of Wisconsin-Madison,
and also holds appointments in the Computer Sciences and Biostatistics & Medical Informatics Departments.
She has been at Madison since 1967, with the exception of Sabbaticals at Oxford, Weizmann Institute,
Technion, Australian National University, and Yale. She received the BA degree from Cornell in 1956,
the MS from University of Maryland-College Park in 1962 and the PhD from Stanford in 1966 while working
full time at IBM. She was elected to the National Academy of Sciences in 2000 and received the
Honorary D.Sc. from the University of Chicago in 2007. According to the Mathematics Genealogy Project
she has had 34 PhD students and has 196 scientific descendents, many with impressive academic careers,
at least 6 of whom are or have been chairs of Statistics or related departments, and others pursuing
interesting non-academic careers at Zillow, Facebook, Pharma, Finance and elsewhere. When not doing
Statistics she enjoys bicycle touring, ballroom and folk dancing, and X-C skiing with her long time partner
David Callan, a combinatorialist, and keeping up with the exploits of her three grandchildren.
Know Your Power
Abstract
Keynote
Bio:
Katherine Wallman serves as Chief Statistician at the United States Office of Management and Budget. She provides policy oversight, establishes priorities, advances long-term improvements, and sets standards for a Federal statistical establishment that comprises more than 80 agencies spread across every cabinet department. Ms. Wallman represents the U.S. Government in international statistical organizations, including the United Nations and the Organization for Economic Cooperation and Development. During her tenure as the United States’ Chief Statistician, Ms. Wallman has increased collaboration among the agencies of the U.S. statistical system, fostered improvements in the scope and quality of the Nation’s official statistics, strengthened protections for confidential statistical information, and initiated changes that have made the products of the system more accessible and usable.
Prior to 1992, Ms. Wallman served for more than a decade as Executive Director of the Council of Professional Associations on Federal Statistics. Ms. Wallman also has worked in the Office of Federal Statistical Policy and Standards and the National Center for Education Statistics.
Ms. Wallman, twice honored as a Presidential Meritorious Executive, is an elected member of the International Statistical Institute, a Fellow of the American Statistical Association (ASA) and the American Association for the Advancement of Science, and a Founder Member of the International Association for Official Statistics. In 1992, she served as ASA President, and in 2007 was honored with the association’s Founders Award. She is the recipient of the Robert G. Damus Award for significant, sustained contributions to the integrity and excellence of OMB (2009), and the Population Association of America’s Excellence in Public Service Award (2011). At the international level, Ms. Wallman served as Chairman of the U.N. Statistical Commission during 2004 and 2005; as Chairman of the Conference of European Statisticians, U.N. Economic Commission for Europe, from 2003 to 2007; as a Vice Chairman of the Statistics Committee, Organization for Economic Cooperation and Development from 2009 to 2011.
Model Based Clustering via Multinomial Logistic Model for Gaussian Data
Abstract
Most methods for clustering data are non-parametric, but parametric model-based clustering methods are becoming more common. Model-based clustering allows incorporation of the uncertainty of parameter estimates and also allows estimation of uncertainty about cluster membership for each observation. This uncertainty is not traditionally estimated in non-parametric cluster approaches. There are many potential applications of model-based clustering including social network, gene expression, image analysis and so on. In some applications it makes sense to incorporate covariates to predict cluster assignment. For clustering spatial location data, we often find that the correlation between important covariates and locations (the response) is very low. In this case, previously proposed linear regression-based model clustering does not work well. We propose a new model using multinomial logistic regression to capture the relation between covariates and locations based on the cluster assignments. We will also introduce an extended version of this model that allows some data to be classified into a noise cluster. We propose a Classification EM-algorithm (CEM) approach and a stochastic version of CEM to estimate parameters for our model-based clustering with classification likelihood.
Bio:
I am a PhD Candidate in Statistics in Colorado State University and working on my PhD thesis with my advisor Professor Jennifer Hoeting. I earned M.S. in Statistics in 2012 in CSU and B.S. in Mathematics in Southeast University in China in 2010. My research interests focus on model-based clustering, computing statistics and Bayesian statistics. I can speak Mandarin and English.
Improving the performance of cross validation in kernel density estimation
Abstract
Cross-validation methodology has long been a popular method for selecting tuning parameters in non and semiparametric models. However, it suffers from high variability and tends to overfit the data. We consider possible improvements of leave-one-out cross validation in kernel density estimation, and propose a subsampling-extrapolation procedure that reduces the variability of the conventional bandwidth selector dramatically. It is based on first evaluating the risk at a subsample size m (m<sample size n), and then extrapolating the optimal bandwidth from m to n. We have noticed that the proposed first-order extrapolated bandwidth selector is equivalent to the rescaled bagging CV method in Hall and Robinson (2009) if one sets the bootstrap size equal to the subsample size. However, our simple expression of the U-statistic form risk estimate enables us to compute the aggregated risk much more efficiently than bootstrapping. In addition, we also consider a second-order extrapolation as an alternative to improve the performance of approximating the true optimal bandwidth. Extensive simulation study has shown that the subsampling-extrapolation bandwidth selectors outperform the conventional bandwidth selector across a wide selection of distributions. To select the optimal subsample size in the subsampling stage, we propose a Nested Cross Validation method which is a data-driven, automatic selection criterion.
Bio:
Dr. Qing Wang is an Assistant Professor of Statistics in the Department of Mathematics and Statistics at Williams College, Massachusetts. She received her Ph.D. degree in Statistics under the supervision of Dr. Bruce G. Lindsay from Pennsylvania State University in 2012. Since then, she has been working at Williams College. She taught various courses in Statistics and developed upper-level course in Nonparametric Statistics at Williams. She enjoyed spending time and working with students. She has advised multiple senior colloquium projects and is currently supervising two senior students for their theses in Statistics at Williams College. In addition, she has continued her research collaboration with Dr. Bruce Lindsay, working on topics related to U-statistics and cross validation methodologies. She also has research collaborations with statisticians from Penn State and Georgia State University.
Bayesian Partially Ordered Probit and Logit Models with an Application to Course Redesign
Abstract
Large entry-level courses are commonplace at public 2- and 4-year institutions of higher education (IHEs) across the United States. Low pass rates in these entry-level courses, coupled with tight budgets, have put pressure on IHEs to look for ways to teach more students more effectively at a lower cost. Efforts to improve student outcomes in such courses are often called ``course redesigns.'' The difficulty arises in trying to determine the impact of a particular course redesign; true random-controlled trials are expensive and time-consuming, and few IHEs have the resources or patience to implement them. As a result, almost all evaluations of efforts to improve student success at scale rely on observational studies. At the same time, standard multilevel models may be inadequate to extract meaningful information from the complex and messy sets of student data available to evaluators because they throw away information by treating all passing grades equally. We propose a new Bayesian approach that keeps all grading information: a partially ordered multinomialprobit model with random effects fit using a Markov Chain Monte Carlo algorithm, and a logit model that can be fit with importance sampling. Simulation studies show that the Bayesian Partially Ordered Probit/Logit Models work well, and the parameter estimation is precise in large samples. We also compared this model with standard models considering Mean Squared Error and the area under the Receiver Operating Characteristic (ROC) curve. We applied these new models to evaluate the impact of a course redesign at a large public university using the students' grade data from the Fall semester of 2012 and the Spring semester of 2013.
Bio:
I am graduating with my PhD in statistics in May 2014 from the University of New Mexico, I have a Master's degree in statistics from the Unviersity of New Mexico and another Master's degree in Philosophy from Nanjing University in China.
Lag selection for single-index time series models
Abstract
Many of the time series data exhibit nonlinearity. Nonlinear autoregressive models are very popular to use in time series analysis because of its great exibility. In such modeling process, very often one needs to include many lagged variables in the model to capture the persistence of a time series. Sometimes, the lag length can be very long, or even close to the length of time series. Such "curse of dimensionality" problem is challenging for nonparametric modelling and there is a need to select signicant explanatory lagged variables. Single-index model is an appealing and fundamental tool for handling "curse of dimensionality". In this paper we consider nonlinear single-index time series models. In addition we propose a method to select signicant explanatory lagged variables. We apply polynomial spline basis function expansion and smoothly clipped absolute deviation penalty to perform estimation and lag selection in the framework of high-dimensional time series. Under stationary and strong mixing conditions, the resulting estimators enjoy the "oracle" property even when the number of index parameters tends to innity as the sample size increases. An ecient iterative algorithm has been developed to identify the lags and estimate the coecients simultaneously. Both numerical studies and real data application conrm a good performance of the proposed method.
Bayesian Large-Scale Multiple Testing for Dependent Data
Abstract
We consider the problem of large-scale multiple testing under temporal dependence. The observed data is assumed to be generated from an underlying two-state hidden Markov model. Bayesian methods are applied to develop the testing algorithm by optimizing the false negative rate while controlling the false discovery rate which is comparable to Sun and Cai (2009). Simulation studies show the similarity and the difference between the EM approach used in Sun and Cai (2009) and the Bayesian approach when the alternative has a simple or a mixture distribution. A nonparametric Bayesian model is proposed when the number of components in the mixture alternative is unknown. The model is applied to the Influenza-like illness (ILI) data.
Bio:
Dr. Xia Wang is an assistant professor in the Department of Mathematical Sciences at the University of Cincinnati. She received her doctoral degree in statistics from the University of Connecticut in 2009. She worked as a Postdoctoral Fellow at the National Institute of Statistical Sciences (NISS) 2009-2011. Her research interests include Bayesian methodology and computation, categorical data analysis, spatial statistics and spatial-temporal statistics, and applications of statistical models in genomics and proteomics data.
Bayesian Partially Ordered Multinomial Probit and Logit Models with an Application to Course Redesign
Abstract
Large entry-level courses are commonplace at public 2- and 4-year institutions of higher education (IHEs) across the United States. Low pass rates in these entry-level courses, coupled with tight budgets, have put pressure on IHEs to look for ways to teach more students more effectively at a lower cost. Efforts to improve student outcomes in such courses are often called ``course redesigns.'' The difficulty arises in trying to determine the impact of a particular course redesign; true random-controlled trials are expensive and time-consuming, and few IHEs have the resources or patience to implement them. As a result, almost all evaluations of efforts to improve student success at scale rely on observational studies. At the same time, standard multilevel models may be inadequate to extract meaningful information from the complex and messy sets of student data available to evaluators because they throw away information by treating all passing grades equally. We propose a new Bayesian approach that keeps all grading information: a partially ordered multinomialprobit model with random effects fit using a Markov Chain Monte Carlo algorithm, and a logit model that can be fit with importance sampling. Simulation studies show that the Bayesian Partially Ordered Probit/Logit Models work well, and the parameter estimation is precise in large samples. We also compared this model with standard models considering Mean Squared Error and the area under the Receiver Operating Characteristic (ROC) curve. We applied these new models to evaluate the impact of a course redesign at a large public university using the students' grade data from the Fall semester of 2012 and the Spring semester of 2013.
Bio:
I earned my PhD (2014) and MS (2008) in statistics from the University of New Mexico. I received my first Master's degree in Philosophy from Nanjing Univeristy in China in 2001. I have been teaching math and statistics and doing education program evaluation at the University of New Mexico for years. I also work for Albuquerque Public Schools for student performance accessment.
Big Opportunities in the Next Decade: SAMSI and NISS
Abstract
SAMSI and NISS are incubators of big ideas for cutting edge research. In the past decade, they have been the magnet attracting both established and young researchers from academia, industry, national laboratories and government. Their programs target at identifying high-impact and usually cross-disciplinary research areas involving statistical sciences and applied mathematics. They provide unique environment, facilities and support to foster collaborative work on central problems in those areas. This panel will discuss how these institutes work and highlight research projects that grew from their recent programs.
Bio:
Jessi Cisewski is currently a visiting assistant professor in the Department of Statistics at Carnegie Mellon University. She received her Ph.D. in Statistics in May of 2012 from the Department of Statistics and Operations Research at the University of North Carolina at Chapel Hill. She was a SAMSI graduate fellow during the 2011 - 2012 program on Uncertainty Quantification and has participated in several SAMSI programs.
Xia Wang is an assistant professor in the Department of Mathematical Sciences at the University of Cincinnati. She received her doctoral degree in statistics from the University of Connecticut in 2009. She worked as a Postdoctoral Fellow at the National Institute of Statistical Sciences (NISS) 2009-2011. Her research interests include Bayesian methodology and computation, categorical data analysis, spatial statistics and spatial-temporal statistics, and applications of statistical models in genomics and proteomics data.
Bailey Fosdick completed her Ph.D. last year in the Department of Statistics at the University of Washington. She is currently a postdoctoral fellow at Statistical and Applied Mathematical Sciences Institute and Duke University, and in the Fall she will be joining the faculty in the Department of Statistics at Colorado State University. Her primary research interests lie in covariance models for multiway data, social network analysis, and applications of Bayesian methodology in the social sciences.
Analyses Involving the Area Under the Curve Summary Measure
Abstract
The area under the curve (AUC) summary measure has had extensive use in the pharmaceutical sciences and is gaining popularity in animal sciences research as well. Summary measures are used to summarize multiple measurements on a subject into a single measurement, then a typical linear model-based analysis follows using the summary measure as either an independent or dependent variable. Analyses that use summary measures induce measurement error into the model, which should be accounted for. However, these methods require an estimate of the measurement error in the AUC. A simulation study is used to explore two different measurement error estimates, based on leave-one-out cross validation and bootstrapping. An application to an equine data set that uses the AUC summary measures demonstrates the potential advantages of accounting for measurement error in an AUC-based analysis.
Recruiting and Retaining Women and Minorities in Statistics
Abstract
The field of Statistics is growing; however, there remains a shortage of women and minorities who earn advanced degrees and attain senior positions.This panel will examine best practices for promoting inclusiveness in Statistics. We will present graduation rates of underrepresented groups in doctoral Statistics programs. The panel will also discuss factors contributing to the success of women and minorities in the profession, specifically in academia. Additionally, we will highlight initiatives to increase doctoral recipients and senior faculty from underrepresented groups.
Bio:
Kimberly S. Weems earned a B.S. in Mathematics from Spelman College and her MA and PhD degrees from the University of Maryland, College Park. Upon graduation, she accepted a post-doctoral research position in the Statistics Department at North Carolina State University, where she later joined the faculty. She also held a visiting research position at the University of Alicante in Spain. Her research interests include measurement error models and statistics education. Weems has received numerous honors and awards, notably the Outstanding Faculty Award from the College of Sciences. She is currently co-Director of Statistics Graduate Programs at NC State.
Marcia Gumpertz is Assistant Vice Provost for Faculty Diversity and Professor of Statistics at North Carolina State University. In her current role she focuses on promoting the careers and enhancing the climate for women and minority faculty at NC State. As part of this effort she served as principal investigator on NC State's ADVANCE Developing Diverse Departments project, which created climate workshops for department heads, leadership workshops for faculty, and a cadre of faculty committed to creating a welcoming experience for women faculty. Her statistical interests lie in applied statistics, mixed models, spatial statistics and design of experiments. She is coauthor, with Francis Giesbrecht, of the book Planning, Construction, and Statistical Analysis of Comparative Experiments. She holds a bachelors degree in Philosophy from the University of California at Berkeley, a masters degree in Statistics from Oregon State University, and a PhD in Statistics from North Carolina State University, and is a Fellow of the American Statistical Association.
Recruiting and Retaining Women and Minorities in Statistics
Abstract
The field of Statistics is growing; however, there remains a shortage of women and minorities who earn advanced degrees and attain senior positions. This panel will examine best practices for promoting inclusiveness in Statistics. Graduation rates of underrepresented groups in doctoral Statistics programs will be presented. The panel will also discuss factors contributing to the success of women and minorities in the profession, specifically in academia. Initiatives to increase doctoral recipients and senior faculty from underrepresented groups will be highlighted.
Bio:
Marcia Gumpertz is Assistant Vice Provost for Faculty Diversity and Professor of Statistics at North Carolina State University. In her current role she focuses on promoting the careers and enhancing the climate for women and minority faculty at NC State. As part of this effort she served as principal investigator on NC State's ADVANCE Developing Diverse Departments project, which created climate workshops for department heads, leadership workshops for faculty, and a cadre of faculty committed to creating a welcoming experience for women faculty. Her statistical interests lie in applied statistics, mixed models, spatial statistics and design of experiments. She is coauthor, with Francis Giesbrecht, of the book Planning, Construction, and Statistical Analysis of Comparative Experiments. She holds a bachelors degree in Philosophy from the University of California at Berkeley, a masters degree in Statistics from Oregon State University, and a PhD in Statistics from North Carolina State University, and is a Fellow of the American Statistical Association.
Gender Difference in Falls among Adults Treated in Emergency Departments and Outpatient Clinics
Abstract
This study examined the impact of gender on age-related increase for falls and injurious falls resulting in head injuries/fractures among adults, using data from both emergency department and clinic visits. We also estimated the percentages of falls treated in points of entry outside of emergency departments. The study population consisted of 259,611 adults seen at emergency department, inpatient, and/or outpatient facilities between January, 2007 and June, 2012 at a US medical center. After using both emergency department and clinic visit data, medically consulted falls and injurious falls resulting in head injuries/fractures increased with age for females aged ≥ 18 years. For males, these rates declined, reached the lowest point at age of 65-74, and then increased again. Thirty-nine percent of females and 63% of males treated their falls in clinics, instead of emergency departments. Gender disparity of medically consulted falls and related injuries exits among adults. Age and gender targeted fall injury prevention interventions need further development. Significant numbers of fall-related injuries were treated at clinics; future research is needed to determine whether fall injury surveillance should be expanded to include outpatient clinics.
Bio:
Feifei Wei, PhD, is an Associate Professor, Department of Biostatistics, Fay W. Boozman College of Public Health, University of Arkansas for Medical Sciences
A Monte Carlo approximation to model error
Abstract
We study Bayesian hierarchical models in environmental applications where likelihood calculations are infeasible for use in inference procedures such as Markov chain Monte Carlo. In this case, an approximation to the likelihood may be used. One such instance is when the likelihood depends on output from a computationally demanding computer model. A statistical emulator may be used to approximate the computer model output. In this work, we study the discrepancy between posterior distributions obtained by using the ‘true’ model versus the approximating model. We quantify the model error in the resulting posterior distributions using an estimate of the Kullback-Leibler divergence. Our estimator is general enough to be used in situations where the true model likelihood can only be evaluated a moderate number of times. We illustrate our methodology with ecological applications.
Bio:
I am originally from Chillicothe, Ohio. I earned my bachelor’s degree in Mathematics Education from Shawnee State University in Portsmouth, Ohio, where I was also a member of the varsity women’s soccer team. Currently, I am in my fourth year of graduate school at The Ohio State University. My research is in the area of uncertainty quantification and is advised by Prof. Radu Herbei. I am also interested in Bayesian modeling, spatial statistics, and environmental applications. I am very fortunate to have had many teaching opportunities in my time at Ohio State. I began as a recitation TA for an Introductory Statistics course, and now I am an independent lecturer of a calculus-based Statistics for Engineers class.
Setting Pilot Trial Sample Sizes to Minimise Overall Study Costs
Abstract
A large amount of money is invested in medical research. However, each year many trials are stopped early, not due to a lack of treatment efficacy but because of factors such as unexpectedly low recruitment rates or higher than expected variability. Before undertaking a large clinical trial a smaller pilot trial can be carried out. Performing a pilot trial prior to a large and relatively expensive definitive trial can help to highlight any unforeseen issues with the design.
The sample size is a major driver of trial cost; the more participants included in a trial the more expensive the trial will be in general. Therefore, when setting the sample size of the pilot trial consideration should be given to estimating the sample size in tandem with the further main trial. The aim being to optimise the sample size of both the pilot and the main trial together as opposed to each trial separately.
Furthermore, it is likely that the cost per patient in a pilot trial will be higher than the cost per patient in a main trial therefore, minimising the sample size across the two trials may not necessarily minimise the overall study costs. Using the relative cost of the pilot trial versus the main trial, a method of calculating the pilot trial sample size based on minimising the overall cost of the study programme will be presented. In addition, it will be described how this relative cost affects the sample sizes required for the two trials. This work could be used to inform decisions about clinical trial design to potentially increase the optimal use of resources across medical research.
Bio:
I am currently a PhD student at the University of Sheffield. The topic of my thesis is the statistical issues in the design of pilot trials. I am based in the School of Health and Related Research and I am the recipient of a teaching assistant studentship. Through my role I teach on a variety of courses from first year Undergraduate to Masters level and I have recently given a series of webinars for the Masters degree in Clinical Trials at the University of Edinburgh. Before starting my PhD I attended the University of Lancaster where I gained my first degree in Accounting, Finance and Mathematics and a Masters degree in Statistics graduating in 2010. I have always enjoyed statistics and I particularly enjoy my current role, which allows me to undertake research while also teaching and passing on my enthusiasm for the subject.
Working in Interdisciplinary Teams
Abstract
Many scientists are drawn to statistics and biostatistics by their passion to make a difference by using their skills in mathematics to tackle applied problems. We discuss a variety of aspects of working in interdisciplinary teams, including expectations and roles of statisticians and collaborators, responsibility for data management, authorship and publication, funding expectations, grant and proposal writing, and combining methodology with collaboration. The panel has participated in a broad range of interdisciplinary work. Dr. Wendelberger has experience with interdisciplinary teaming in a government laboratory setting at Los Alamos, in an industrial research setting at General Motors, and in an academic consulting laboratory at the University of Wisconsin. Dr. Wilson has worked collaboratively at the National Institutes of Health with clinical scientists, at a small defense contractor with military personnel, at a national laboratory with engineers, and in Washington with policy analysts. Dr. Stinnett has worked extensively with physician researchers in several departments at Duke University School of Medicine. Dr. Gaydos leads a cross-disciplinary team at Eli Lilly fosued on clinical development scenario planning and decision making through the use of modeling and simulation, novel trial designs and analyses to increase clinical development efficiencies.
Bio:
Joanne Wendelberger is the Group Leader of the Statistical Sciences Group within the Computer, Computational, and Statistical Sciences Division at Los Alamos National Laboratory. She holds a Bachelor’s degree in Mathematics from Oberlin College and Masters and Ph.D. degrees in Statistics from the University of Wisconsin. She joined Los Alamos as a Technical Staff Member in 1992, progressing to Project Leader and Deputy Group Leader prior to her current position as Group Leader. She previously worked as a Statistical Consultant at the General Motors Research Laboratories. Her research interests include statistical experimental design, statistical bounding and uncertainty quantification, materials degradation modeling, sampling and analysis issues in large-scale computation and visualization, probabilistic computing, and education modeling. Dr. Wendelberger is a Fellow of the American Statistical Association (ASA). She has served as Chair and Program-Chair of the ASA Section on Physical & Engineering Sciences and has been a member of both the Editorial Board and the Management Committee for Technometrics, a Journal of Statistics for the Physical, Chemical, and Engineering Sciences.
Sandra Stinnett is Associate Professor of Biostatistics and Bioinformatics at Duke. She holds a Bachelor’s degree in Psychology from the University of Houston (1970), Masters in Biometry from the University Of Texas School Of Public Health (1977) and a DrPH degree in Biostatistics from the University of North Carolina (1993). She came to Duke in 1994 to be the Director of Statistical Operations for the Duke Clinical Research Institute (DCRI) where she managed a group of statisticians and programmers involved in clinical trials. In 2001, she became the statistician for the Duke Eye Center, with a secondary appointment in Ophthalmology. Here, she assists faculty, residents and fellows with a wide variety projects in vision research. One component of her work is assessing agreement and reproducibility of graders’ readings of optical coherence tomography imaging of the eye. She also provides statistical expertise to the Departments of Radiology, Surgery and Community and Family Health. Dr. Stinnett also has extensive experience in teaching and training. For six years, she taught Introduction to Statistical Methods for the Duke Clinical Research Training Program (CRTP), a master’s program that trains clinical fellows in academic research. Currently, she teaches Medical Statistics to third-year medical students. Dr. Stinnett has served the American Statistical Association as Chair of the Committee on Women, Chair of the Section on Statistical Consulting, and President as the Caucus for Women in Statistics.
Alyson Wilson received her Ph.D. in Statistics from Duke University in 1995. From 1995-1999, Dr. Wilson worked at Cowboy Programming Resources in El Paso, TX, supporting the U.S. Army in the operational evaluation of air defense artillery. From 1999-2008, she worked in the Statistical Sciences Group at Los Alamos National Laboratory, where she was a Project Leader and Technical Lead for Department of Defense Programs. In this role, she developed and led a portfolio of work in the application of statistics to the reliability of conventional and nuclear weapons. From 2008-2011, she was an Associate Professor in the Department of Statistics at Iowa State University. In 2011-2012, she was Research Staff Member at the IDA Science and Technology Policy Institute in Washington, DC, where she helped provide research support to the White House Office of Science and Technology Policy and the Office of the Secretary of Defense. She came to North Carolina State University in 2013 as an Associate Professor in the Department of Statistics and a member of NCSU's Data-Driven Science cluster. Dr. Wilson’s research interests include statistical reliability, Bayesian methods, and the application of statistics to problems in defense and national security. In addition to many other publications, she is a co-author of the book Bayesian Reliability (2008). Dr. Wilson is a Fellow of the American Statistical Association.
Brenda Gaydos received her PhD in Mathematical Statistics from Pennsylvania State University. She has been employed by Eli Lilly and Company since 1997. She is an Adjunct Associate Professor of Biostatistics at the Indiana School of Medicine, and a Fellow of the American Statistical Association (ASA). Dr. Gaydos is active in the pharmaceutical community. She has held elected and appointed positions in ASA, has co-chaired the PhRMA working group on Adaptive Designs (2005-2009), chaired the PhRMA response to the FDA Draft Guidance on Adaptive Designs (2010), and chaired the DIA Adaptive Design Scientific Working Group (2010 to Q3 2013). She is an elected member of QSPI (Quantitative Sciences in the Pharmaceutical Industry), which serves the interests of senior leadership of statistical, data management and statistical programming organizations in the biotechnology and pharmaceutical industry. In addition she has contributed to the advancement of innovation through publications and education. She has given well over 50 invited lectures including conference presentations, webinars, workshops and training courses on statistical methods.
On Supervising Males (If You are a Female Statistican)
Abstract
Another title for this talk might have been, “On Supervising Males” or even “On Supervising Adults” or just “On Supervising”. At some level, supervising is supervising. After all, “A rose is a rose is a rose”. A statistician, or female, or a female statistician should use the same methods – in supervising others, be fair, be supportive, be firm. But as statisticians, one of our strengths is our constitutional uncertainty. The female part of many of us, perhaps not a strength, adds to our lack of certainty. At first blush, we should use the same methods whether we are supervising men, or women, or even children. But as the Renaissance painters learned, children are not miniature adults either in form or in substance. Similarly, as recent neurological research is demonstrating more and more clearly, men and women do not differ from each other solely with respect to external appearance or cultural influences that shape behavior. Our brains turn out to be wired very differently with profound consequences to our styles of living and work. Moreover, while behavioral studies of bias have shown that young professionals declare that are indifferent to their boss’s being a male or a female, they are willing to accept a lower salary if they can report to a male rather than to a female. In this talk, I discuss – mostly from personal experience but buttressed with data from research – general criteria for effective supervision and modifications stemming from our position as women as well as our professional outlook as statisticians.
Penalized Isotonic Regression
Abstract
In isotonic regression, the mean function is assumed to be monotone in- creasing (or decreasing) but otherwise unspecified. The classical isotonic estimator is known to be inconsistent at boundaries; this is called the “spiking” problem. A penalty on the range of the regression function is proposed to correct the spiking problem for univariate and multivariate isotonic models. The penalized estimator is shown to be consistent everywhere for a wide range of sizes of the penalty parameter. For the univariate case, the optimal penalty is shown to depend on the derivatives of the true regression function at the boundaries. Pointwise confidence intervals are constructed using the penalized estimator and bootstrapping ideas; these are shown through simulations to behave well in moderate sized samples. Simulation studies also show that the power of the hypothesis test of constant ver- sus increasing regression function improves substantially compared to the power of the test with unpenalized alternative, and also compares favorably to tests using parametric alternatives.
High Dimensional Tests for Multi-Level Brain Networks
Abstract
Large-scale resting-state fMRI studies have been conducted for patients with autism, and the existence of abnormalities in the functional connectivity between brain regions (containing more than one voxel) have been clearly demonstrated. Due to the ultra-high dimensionality of the data, current methods focusing on studying the connectivity pattern between voxels are often lack of power and computation-efficiency. In this talk, we introduce a new framework to identify the connection pattern of gigantic networks with desired resolution. We propose three procedures based on different network structures and testing criteria. The asymptotical null distributions of the test statistics are derived, together with its rate-optimality. Simulation results show that the tests are able to control type I error and yet very powerful. We apply our method to a resting-state fMRI study on autism. The analysis yields interesting insights about the mechanism of autism.
Bio:
Jichun Xie is an Assistant Professor of Department of Statistics at Temple University. Before she joined Temple, she had Ph.D. degree from Department of Biostatistics from University of Pennsylvania. Her research focuses on high dimensional methods and inference, with applications in genetics and neuron-imaging. In particular, she focuses on network estimating and testing problems.
Testing serial correlation in partially linear additive errors-in-variables models
Abstract
This paper considers testing serial correlation in partially linear additive errors-in-variables model. Based on the empirical likelihood based approach, a test statistic was proposed, and it was shown to follow asymptotically a chi-square distribution under the null hypothesis of no serial correlation. Finally, some simulation studies are conducted to illustrate the performance of the proposed method.
Bayesian Inference for Large Vector Autoregressive Models: Learning Shared and Individual Subspaces
Abstract
The number of parameters in the Vector Auto Regression Model (VAR) is typically very large, hence it is complicated to investigate the dynamic relations between the time series. Our focus here is on Granger causality relations modeled by mathematical graphs. One variable does not Granger-cause other variable, if past and current information about the former cannot improve the forecast of the latter. The time dimension of a process makes it more feasible to consider directed flow, or Granger causality, and it is therefore natural to set up a graph for a time series system with this notion defining the relations between variables. Due to the much larger space of possible graph structures in VAR, the widely used Markov chain Monte Carlo (MCMC) techniques do not provide a practical solution for the general models considered. To make inference jointly about the Granger causality structure and the lag length of the process feasible, we propose to use sparse low rank approximations to learn the loading matrices that share the same latent Granger causality graph while still to capture the individual time-dependent variations. We propose an efficient Gibbs sampler algorithm along with an optimization algorithm which enables fast computation in high dimensions.
Bio:
§Dr. Yang is a Research Staff Member in the Statistics & Forecasting Group, Department of Business Service and Mathematical Sciences at the IBM T.J. Watson Research Center.§Since joining IBM, Dr. Yang has been involved and developed analytical tools for several IBM projects, including the IBM Statistical Tracking and Assessment of Revenue (STAR). §Dr. Yang’s skills span the areas of Bayes statistics, time series analysis, spatial-temporal modeling, survival analysis, machine learning, data mining and their applications to problems in business analytics. §Dr. Yang holds a PhD degree in Statistics from Duke University.Rising Stars: Women Making Waves
Abstract
This panel aims to provide useful resources and discussions on how to write a successful proposal to apply for NSF Early Faculty Development (CAREER) award, the most prestigious awards in support of junior faculty who exemplify the role of teacher-scholars in research and education. The panelists will share their views and experiences, from the perspectives of awardees and reviewers, on various aspects of the award application. Specifically, the panel discussion will cover the proposal preparation, writing tips, budget, review criteria and selection process.
Bio:
Hao Helen Zhang earned her B.S. degree in Mathematics from Beijing University and Ph.D. degree in Statistics from University of Wisconsin at Madison. In 2002, she joined Department of Statistics in North Carolina State University as an Assistant Professor. Recently, she moved to Department of Mathematics, University of Arizona. She received the NSF CAREER Award in 2007 on her proposal “Nonparametric Models Building, Estimation, and Selection with Applications to High Dimensional Data Mining”.
Huixia (Judy) Wang earned her B.S. and M.S. in Statistics from Fudan University before coming to the U.S. in 2002 to conduct her doctoral work at the University of Illinois at Urbana-Champaign. After earning her Ph.D. in 2006, Wang joined the Department of Statistics at North Carolina State University as Assistant Professor and was promoted to Associate Professor in 2012. In 2012, Wang received the Tweedie New Researcher Award from Institute of Mathematical Statistics, and the NSF CAREER Award on her proposal “A new and pragmatic framework for modeling and predicting conditional quantiles in data-sparse regions.”
Nonparametric Bayesian inference for multivariate density functions using Feller priors
Abstract
Multivariate density estimation plays an important role in investigating the mechanism of high-dimensional data. This paper describes a nonparametric Bayesian approach to the estimation of multivariate densities. A general procedure is proposed for constructing Feller priors for multivariate densities and their theoretical properties as nonparametric priors are established. A blocked Gibbs sampling algorithm is devised to sample from the posterior of the multivariate density. A simulation study is conducted to evaluate the performance of the procedure.
The role of a biostatistician in biomedical research: an example of methodology development
Abstract
Biomedical research is increasingly data-driven and has become a fertile ground for innovation in statistical methodologies. Here we show an example where an interest of identifying genes associated with phenotype interactions using microarray data motivated the development of a novel method for identifying differentially altered (DA) biomolecules using observational high-throughput data. Most methods for DA bio-molecule detection can be considered as single model approaches since they rely on the ranking of the bio-molecules based on statistics derived from a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches are conceptually flawed since they unavoidably result in model misspecification. We show the evidence that DA bio-molecule detection based on high throughput data intrinsically requires a multi-model handling. To properly control for sample heterogeneity and to provide a flexible and coherent framework for identifying simultaneously DA bio-molecules associated with a single or multiple sample characteristics and/or their interactions, we developed a Bayesian model averaging approach with an empirical prior model probability specification. We demonstrated through simulated microarray data that this approach improves the performance of DE gene detection comparing to the single model approaches. Flexibility of this approach is illustrated through analysis of gene expression microarray data and metabolomics data.
Bio:
Xi Kathy Zhou is Associate Professor of Biostatistics in Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research at Weill Cornell Medical College. She received her PhD degree from the Institute of Statistics and Decision Sciences (current Statistics Department) at Duke University. Her research interests include but are not limited to large data analysis, Bayesian methods, variable selection, and predictive modeling.
Automatic stratification for an agricultural area frame using remote sensing data
Abstract
The June Area Survey is conducted by the USDA National Agricultural Statistics Service. This survey is used to estimate many variables including crop acreages and number of farms in the nation. The current sampling method uses a stratified two-stage design. The stratification uses categories constructed for ease of visual discernment because the frame is constructed and stratified manually. We propose a design with a permanent sampling frame and an automatic stratification algorithm to stratify elements in the frame, which will increase labor efficiency and statistical efficiency.
Bio:
Stephanie Zimmer is a graduate student in Statistics at Iowa State University. She is a member of the Center for Statistics and Survey Methodology. She has interests in survey statistics, government statistics, and statistical methodology for human rights.