Updated June 10th, 2004
Keynote AddressDynamic Multi-Resolution Spatial Models
Director, Program in Spatial Statistics and Environmental Sciences The Ohio State University, Columbus, OH
The problem of spatial-temporal prediction of global processes, using a model that recognizes multiple resolutions in the spatial domain, is considered. Here, optimal spatial-prediction procedures can be shown to be extremely fast. Similar ideas can be used in the spatial-temporal domain; a vector autoregressive model is assumed at the coarsest resolution and, at each time-point, a multi-resolution spatial structure is modeled. Then the idea is to use Bayesian updating to make the prior distribution of the coarse-resolution process more informative as time proceeds. Our spatial-temporal methodology will be compared to the spatial-only methodology on data from the Total Ozone Mapping Spectrometer (TOMS) instrument on the Nimbus-7 satellite. The material presented in this talk is the result of joint research with Gardar Johannesson (Lawrence Livermore National Labs) and Hsin-Cheng Huang (Academia Sinica).
FFT Regression and Cross-Noise Reduction for Comparing Images in Remote SensingGerald L. Anderson1 and Kalman Peleg2
1United States Department of Agriculture, Agricultural Research Service Sidney, MT 2Agricultural Engineering Department, Technion Israel Inst. of Technology, Haifa 32000 Israel.
In many remote sensing studies it is desired to quantify the functional relationship between images of a given target that were acquired by different sensors. Such comparisons are problematic because when the pixel values of one image are plotted versus the other, the "cross-noise" is quite high. Typically, the correlation coefficient is quite low, even when the compared images look very much alike. Nevertheless, we can try to quantify the functional relationship between two images by a suitable regression model function Y=f(X), while choosing one of them as "the reference" Y and using the other one as a "predictor" X. The underlying assumption of classical regression is that Y is absolutely correct while X is erroneous. Thus, the objective is to fit X to Y by choosing the parameters of Y=f(X), which minimize the "residuals" (Y - Y). When comparing images in remote sensing this objective is not valid because Y itself is error prone. The alternative, FFT regression method presented herein comprises a two-stage sensor fusion approach, whereby the initially low correlation between X and Y is increased and the residuals are drastically decreased. First, pairwise image transforms are applied to X and Y whereby the correlation coefficient is increased, e.g. form roughly 0.4 to about 0.8 - 0.85. A predicted image Yfft is then derived by least squares minimization between the amplitude matrices of X and Y, via the 2D FFT. In the second stage, there are two options: For one time predictions, the phase matrix of Y is combined with the amplitude matrix of Yfft, whereby an improved predicted image Yplock is formed. Usually, the residuals of Yplock versus Y are about half of the values of Yfft versus Y. For long term predictions, the phase matrix of a "field mask" is combined with the amplitude matrices of the reference image Y and the predicted image Yfft. The field mask is a binary image of a pre-selected region of interest in X and Y. The resultant images Ypref and Ypred are modified versions of Y and Yfft respectively. The residuals of Ypred versus Yprefm are even lower than the residuals of Yplock versus Y. Images Ypref and Ypred represent a close consensus of two independent imaging methods which view the same target. The practical utility of FFT regression is demonstrated by examples wherein remotely sensed NDVI images X are used for predicting yield distributions in agricultural fields. Reference yield maps Y, were derived by combine yield monitors which measure the flow rate of the crop, while it is being harvested. The 2D FFT transforms, as well as all other mathematical operations in this paper were performed in the "MATLAB" environment.
Bayesian Wombling: Estimating Spatial Gradients
Spatial process models are now widely used for inference in many areas of application. In such contexts interest is often in the rate of change of a spatial surface at a given location in a given direction. Examples include temperature or rainfall gradients in meteorology, pollution gradients for environmental data, and surface roughness assessment for digital elevation models. Because the spatial surface is viewed as a random realization, all such rates of change are random as well. This talk presents the notions of directional derivative processes building upon the concept of mean square differentiability. We discuss distribution theory results under the assumptions of a stationary Gaussian process model either for the data or for spatial random effects. We present statistical inference under a Bayesian framework which, in this setting, presents several advantages using a simulated dataset and also with a real estate dataset consisting of selling prices of individual homes.
Climate data is recorded at many different scales; global climate models yield data for a grid cell while weather stations record data at point locations. The extremes of climate data are of interest as they have significant impacts, but little work has been done relating the extreme values of the different scales. We propose a one-parameter model which relates the annual maximum at a point location to the annual maximum on the grid cell, after both have been rescaled to have standard Frechet marginals. Our model preserves the desired property of max-stability, and is flexible enough to accommodate spatial structure. The parameter of the model can be understood intuitively and can be shown to be related to the extremal index of the random variables. Finally, we propose an estimate to the extremal index (and thus the model's parameter) which is based on the madrogram.
Numerical experiments based on atmospheric-ocean general circulation models (AOGCMs) are one of the primary tools in deriving projections for future climate change. However, each model has its strengths and weaknesses within local and global scales. This motivates climate projections synthesized from results of several AOGCMs' output weighted according to model bias and convergence. We combine present day observations, present day and future climate projections in a single hierarchical Bayes model. The challenging aspect is the modeling of a meaningful covariance structure of the spatial processes. We propose several approaches thereof. The posterior distributions (in this case the individual model bias) are obtained with computer-intensive MCMC simulations. The novelty of our approach is that we use gridded, high-resolution data within a spatial framework. The primary data source is provided by the MAGICC/SCENGEN program (Wigley, T.M.L., 2003) and consists of 17 AOGCMs on a 5 by 5 degree grid under several different emission scenarios. We consider variables such as the precipitation, temperature, and min/max thereof. Extensions such as a multivariate approach and heavy tailed error distributions are discussed.
Spatial Cluster Detection Using Bayes Factors
from Overparameterized Models
We consider a partition model for estimation of regional disease rates and for detection of spatial clusters. Formal inference regarding the number of partitions (or clusters) can be obtained using a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. As an alternative, we consider models with a fixed, but overly large, number of partitions. We explore the ability of these models to provide informal inferences about the number and locations of clusters using localized Bayes factors. We illustrate these two approaches using the well-known New York leukemia data and data on breast cancer incidence in Wisconsin.
Spatial Models for the Distribution of Extremes
Chain Graphs for Spatial Dependence in Ecological Data Alix Gitelman Department of Statistics, Oregon State University, Corvallis, OR
Graphical models (alternatively, Bayesian belief networks, path analysis models) are increasingly used for modeling complex ecological systems (Varis & Kuikka 1997; Lee 2000; Borsuk, Stow & Reckhow 2003). Their implementation in this context leverages their utility in modeling interrelationships in multivariate systems, and in a Bayesian implementation, their intuitive appeal of yielding easily interpretable posterior probability estimates. Methods for incorporating correlational structure to account for observations collected through time and/or space---features of most ecological data---have not been widely studied, however (Haas 1992 is one exception). In this talk, an ``isomorphic'' chain graph (ICG) model is introduced to account for correlation between samples by linking site- (time-) specific Bayes network models. Several results show that the ICG preserves many of the Markov properties (conditional and marginal dependencies) of the site- (time-) specific models. The ICG model is compared with a model that does not account for spatial correlation using data from several stream networks in the Willamette Valley, Oregon.
Modeling Spatial-Temporal Binary Data Using Markov Random Fields
An autologistic regression model consists of a logistic regression of a response variable on explanatory variables and an auto-regression on responses at neighboring locations on a lattice. It is a Markov random field with pairwise spatial dependence and is a popular tool for modeling spatial binary responses. In this article, we add a temporal component to the autologistic model for spatial-temporal binary data. The spatial-temporal autologistic model captures both spatial dependence and temporal dependence simultaneously by a space-time Markov random field. We estimate the model parameters by maximum pseudo-likelihood and obtain optimal prediction of future responses on the lattice by a Gibbs sampler. For illustration, the method is applied to study the outbreaks of southern pine beetle in North Carolina. We also discuss the generality of our approach for modeling other types of spatial-temporal lattice data.
Centering the Effects of Neighbors in Markov Random Field Models
Models for Markov random fields may be specified on the basis of any number of conditional distributions, but models having Gaussian conditionals are by far the most common in applications. One of the reasons for this is that a Gaussian conditionals model is among the few for which the corresponding joint distribution can be derived in closed form. But models with Gaussian conditionals also possess other characteristics that make them useful. Among these is the expression of conditional expectations in a form that contains a sum of neighboring effects, each of which is taken as a discrepancy of the value from its own marginal mean. A primary consequence of this is that the sum of neighboring effects does not necessarily change as the number of neighbors varies. This is in contrast with general exponential family conditional distributions, which are usually parameterized with sums of "un-centered" neighboring effects contributing to the natural parameter. Locations with many neighbors can have natural parameters that are driven to extreme values, or for which this possibility must be offset by small dependence parameters. As a result, models are often difficult to fit and interpretation of parameters is clouded. We demonstrate that parameterizations exist for exponential family conditional distributions that can allow approximately the same type of "centering" of neighboring values as is used in the Gaussian case. This yields greater interpretability for parameters, greater stability in estimation across models with varying neighborhood size, and can help alleviate edge effects.
Multi-resolution (Wavelet) Based Non-stationary Covariance Modeling for Incomplete
Data: the EM Algorithm
Observational data encountered in most of the geophysical application of spatial statistics often consists of a large volume of spatially and temporally incomplete measurements. Furthermore, geophysical spatial processes often exhibit highly non-stationariness, and it is important that non-stationary stochastic properties are well represented by modeled covariance functions. Wavelets are versatile multi-resolution bases to characterize the stochastic features of a non-stationary spatial field. In this work we augment a method of multi-resolution (wavelet) based non-stationary covariance modeling to handle the irregularly distributed observational data. Application of the Expectation Maximization (EM) algorithm for estimation of wavelet-based covariance model parameters is used and takes advantage of the efficiency of the discrete wavelet transform.
A Case Study of the Implications of Atmospheric Data Pre-processing
Using "Correction Factors"
Atmospheric processes frequently are measured by several instruments, corresponding to a variety of space-time resolutions and sampling schemes. Types of measurement instruments include surface-based, satellite, balloon-based, and airplane-based instruments. Each instrument exhibits its own measurement error processes, with biases that may depend on atmospheric conditions. Atmospheric observations frequently are pre-processed prior to further analysis. For some atmospheric measurements, this pre-processing may involve "correcting" the data from certain instruments to improve agreement with observations from other measuring instruments. We use functional data analysis methods (in the context of a stratospheric ozone case study based on balloon-based ozonesonde data), to illustrate some of the resulting statistical challenges. We discuss the vital role that analyses of "correction factors" may play in studying data quality, specifically through this case study of balloon-based ozonesonde data which have been "corrected" based on other instruments. Data pre-processing of this type also has implications for attempts to combine data from different measurement instruments.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to
Improve Local Estimation for Mapping Aquatic Resources
Maps are useful tools for understanding, managing and protecting our marine environment. Despite the benefits, there has been little success in developing useful and statistically defensible maps of environmental quality and aquatic resources in the coastal regions. Heterogeneous oceanic conditions often make extrapolation to non-sampled locations questionable. Kriging is a commonly used statistical approach that uses information observed at sampled locations to improve predictions at non-sampled locations. The precision and accuracy of those predictions rely entirely on our ability to capture the spatial variability of the response. Knowing how many samples to collect and how far apart sampling points should be spaced are crucial to our ability to model the spatial variability or variogram accurately and hence improve the accuracy of our predictions. We investigate several design strategies for modeling the variogram, where the goal is to provide general guidelines for coastal water monitoring agencies and dischargers to map aquatic resources and contaminants. We also discuss a two-phase sampling approach currently being developed for the San Diego Sanitation District for the purpose of mapping chemical contaminants around their sewage outfall.
Practical Issues and Tools for Modeling Spatio-temporal Trends in
Atmospheric Pollutant Monitoring Data
There is a substantial literature on methods for modeling trends in space-time monitoring data of atmospheric pollutants. The choice of approach will reasonably depend on the spatio-temporal scales of the monitoring data as well as the scientific aims of the analysis, such as testing of long-term regional trends, computing various metrics of long-term exposure in chronic health effect models, or utilizing spatio-temporal tends in the computation of spatial estimates of exposure, for both long-term and acute health effects analyses. Any such modeling and analysis must consider the common issues of temporal and spatial correlation. Here I focus on the modeling of spatially varying seasonal structure for purposes of spatial estimation. I introduce a simple but flexible approach to modeling spatio-temporally varying seasonality and long-term trend in terms of basis functions derived from a singular value decomposition of the space x time data matrix. Demonstrations are provided for analyses of date from ozone and particular matter monitoring networks.
Characterization of Spatial Variability in Soil and Crop
Properties in Agricultural Fields Using Spatial Statistical Methods
Currently, most crop production inputs like irrigation, fertilizers, and pesticides are applied at uniform rates across agricultural landscapes. However, because of inherent spatial variability in most landscapes, not all field areas require the same level of inputs, resulting in either under- or over application. Hence, crop yields and economic returns may be limited in some areas due to suboptimal input levels, while environmental contamination may occur in over application areas, especially for nitrogen fertilizer. The goal of precision agriculture is to evaluate existing and develop new geospatial technologies that may provide a means of varying input application rates based on spatial variation present in landscapes. One recent approach in precision agriculture has focused on use of management zones (MZ) as a means to characterize landscape variability and provide a basis for more efficient input application. Management zones are defined as field areas possessing homogenous soil conditions, resulting in similar crop yield potential, input-use efficiency, and environmental impact. The objectives of our work were to determine: 1) if landscape attributes of topography, soil color, and apparent electrical conductivity (ECa) could be used to delineate MZ that characterize spatial variation in soil chemical properties as well as corn yields, and 2) if temporal variability affects expression of yield spatial variability. The work was conducted on an irrigated cornfield near Gibbon, NE. Landscape attributes, including a soil color aerial image (red, green, and blue bands), elevation, and ECa, were acquired for the field. A georeferenced soil-sampling scheme was used to determine soil chemical properties (soil pH, EC, P, and organic matter). Georeferenced yield monitor data were collected for five growing seasons. The five landscape attributes were aggregated into four MZ using principal component analysis (PCA) and unsupervised classification of PC scores. Unsupervised classification of PC scores produced four well-defined MZ for the field. All the soil chemical properties differed among the four MZ. Spatial patterns for yield and MZ, as determined by semivariogram analysis, were similar in three of five seasons, receiving average precipitation; however, the patterns were less similar in wet and dry seasons. These results illustrate the significant role temporal variability plays in altering crop yield spatial variability. Implication of these findings for developing precision agriculture technologies for corn productions systems will be discussed.
Comparison of Design-Based and Model-Based Techniques
for Selecting Spatially Balanced Samples of Environmental Resources
It is widely recognized that an efficient sample of a spatially distributed resource will have some degree of regularity. For example, locating sample points at the nodes of a regular grid is an optimal model-based design for some semivariograms and domain shapes. Locating point becomes more complicated if the domain has an irregular shape or if the design incorporates existing sample points. In this talk, I review some model-based techniques, such as simulated spatial annealing, for incorporating prior knowledge in locating new sample points. These techniques are contrasted with design-based techniques, such as generalized random tessellation stratification, that can also incorporate prior knowledge and existing sample points. The inferences that result from different approaches are also discussed.
Some New Spatial Statistical Models for Stream Networks
Models for spatial autocorrelation depend on the distance and direction separating two locations, and are constrained so that for all possible sets of locations, the covariance matrices implied from the models remain nonnegative definite. Although there are extensive sets of families of models for two-dimensional space, few models have been developed for stream networks. The only known model that is valid for stream networks is an exponential model, and it is based on stream distance. Even this model may not be appropriate when considering flow characteristics of streams. Recent research has shown that moving-average functions, also known as kernel convolutions, may be used to generate a large class of valid, flexible models in two dimensions. This paper develops moving average models for stream networks. The moving average models are easily applied to stream network situations; two general classes of models are those based on stream distance, and those that incorporate flow. An interesting property of flow models is that they have discontinuities at stream junctions that are not present for stream distance models. Flow models are more appropriate when considering variables such as stream chemistry, while distance models may be more appropriate for variables such as fish abundance. We give examples, including a flow model based on stream chemistry variables from northern Alaska.
Exploring Spatio-temporal Patterns
in Lyme Disease Incidence and Reporting, 1992-2000
The observed spatial pattern of vector-borne disease incidence is a function of spatial patterns of host populations, vector populations, host-vector contact, risk of transmission, and disease reporting. For an emerging infection, the pattern of reporting may not be homogeneous across space and time as physicians learn to diagnose and report the illness. Using reported county-level Lyme disease incidence from 1992-2000 in the north-eastern United States, we use exploratory methods to examine observed patterns in reports, and patterns in "no reports" in order to investigate evidence of evolving reporting practices. The observed patterns suggest future directions for the analysis and monitoring of reports of emerging infectious diseases, and, more generally for the development of infectious disease risk maps.