«Analysis methods of heavytailed data»
The duration of the course is approximately 2530 hours plus 20 hours of exercises.
Description
Heavytailed distributions are typical for phenomena in complex multicomponent systems such as biometry, economics, ecological systems, sociology, Web access statistics and Internet traffic, bibliometrics, finance and business. The typical examples of such distributions are Pareto, Weibull with shape parameter less than 1, Cauchy, ZipfMandelbrot laws. The analysis of heavytailed distributions requires special methods of estimation because of their specific features. These are slower decay to zero of heavy tails than that of an exponential rate; the violation of Cramer's condition; sparse observations in the tail domain of the distribution. Due to the lack of the information beyond the range of the empirical sample the nonparametric estimates use essentially the asymptotic distributions of the maximum of the empirical sample as models of the distribution behavior at infinity. The course will provide a detailed survey of classical results and some recent developments in the theory of nonparametric estimation of the density, tail index, high quantiles, hazard rate, renewal function and time seties assuming the data are heavy tailed distributed. Both asymptotical results like convergence rates of the estimates and results for the samples of moderate sizes supported by MonteCarlo investigation will be considered. The exposition will be accompanied by numerous illustrations and examples motivated by applications.
Target Audience
 Students of mathematics and statistics, computer science and electrical enginering who are interested in learning about practical applications in the area of heavytailed data analysis, and who are looking for new approaches and fundamental results, supported by proofs.
 Practitioners who wish to analyze heavy tailed empirical data and could be interested in rough methodology and algorithms of numerical calculations related to the analysis of heavytailed data.
Requirements
The course will assume prior knowledge of probability and basic statistical techniques.
The course will be taught in English.
Lectures

Introduction: definitions and basic properties of classes of heavytailed distributions. Tail index estimation. Methods for the selection of the number of the largest order statistics in Hill's estimator. Rough methods for the detection of heavy tails and the number of finite moments. (23 hours)
(Section 1 contains the introduction with necessary definitions, basic properties and examples of heavytailed data. The tail index indicates the shape of the tail and therefore it is the basic characteristic of heavytailed data. Methods for tail index estimation are presented. Finally, several rough tools for the detection of heavytailedness, the dependence and the number of finite moments are considered.) 
Density estimation. Main principles of the estimation. Nonparametric estimation of the densities of lighttailed distributions. Smoothing methods. (2 hours)
(In Section 2 the main principals of the density estimation like Lebesque's theorem, Fisher's scheme, L_1, L_2, \chi^2 approaches, exponent method and the estimation of the density as a solution of an illposed problem are considered. The links between these approaches are established. Classical methods of density estimation such as kernel estimators, projection estimators, histogram and polygram, and their smoothing tools like crossvalidation, the discrepancy method and other are presented.) 
Heavytailed density estimation. Combined parametricnonparametric methods, Barron's estimate and \chi^2optimality. Kernel estimates with variable bandwidth and their smoothing methods: the integrated squared error crossvalidation (ISE), weighted version of squared error crossvalidation (WISE), discrepancy method. Retransformed nonparametric estimates. (23 hours)
(In Section 3 the problems of heavytailed density estimation are discussed. Three approaches to heavytailed density estimation are considered. The first relates to combined parametricnonparametric methods, where the tail domain of the density is fitted by some parametric model and the main part of the density (the body) is fitted by some nonparametric method like a histogram. A similar approach realized by Barron's estimator is considered. The second approach is devoted to kernel estimates with variable bandwidth. The optimal accuracy of these estimates as well as their disadvantages for heavytailed density estimation are discussed. The last approach contains the preliminary transformation of the empirical sample to a new one, whose density is more convenient for restoration.) 
Transformation choice: finite and adapted transformations. Retransformed kernel estimates. Boundary kernels. Accuracy measuring: L_1, L_2 approaches, decay rate at infinity. (23 hours)
(In Section 4 specific transformations are presented. The quality of retransformed kernel estimates with regard to the metrics in spaces L_1, L_2 is considered. To improve the fitting at the tail domain, special boundary kernels are presented.) 
Retransformed density estimates and Bayesian classification algorithm. Risk of the misclassification. (2 hours)
(In Section 5 the empirical Bayesian classifier constructed by means of retransformed density estimates is considered. The quality of the classifier is presented both by theoretical and by a Monte Carlo study.) 
Estimation of high quantiles, endpoints, excess functions. (23 hours)
(In Section 6 several classical methods for quantile estimation are considered. The methods of estimating high quantiles, endpoints, excess functions for heavytailed distributions are presented. An application to WWWtraffic data is considered.) 
Nonparametric estimation of hazard rate function in light and heavytailed cases. (2 hours)
(In Section 7 the estimation of a hazard rate function is considered both for light and heavytailed distributions. For the heavytailed case a transformation approach is presented. For the lighttailed case the hazard rate is evaluated as the solution of an integral equation. Such tasks are illposed and hence, the solution is obtained by Tikhonov's regularization method. Regularized estimates are presented.) 
Estimation of the renewal function within the finite time interval and for infinite time. Histogramtype nonparametric estimator, its asymptotical properties and smoothing methods. (23 hours)
( Section 8 contains the estimates of the renewal function at infinite time. The nonparametric estimation of the renewal function, that means the mean number of events of interest in a finite time interval is considered, too. Smoothing of the histogramtype estimate is considered. Several known methods and original methods of the author are presented. The application to WWWtraffic data is considered.) 
Dependence detection by univariate and bivariate data. (23 hours)
(Section 9 contains the various mixing conditions, the autocorrelation function, portmanteau tests, extremal index estimation for the univariate case. The example of video traffic data analysis is given. For the bivariate case, the classical measures of dependence like Kendall's tau and Spearman's rho as well as the Pickands Afunction (that reflects the dependence of two maxima) and copulas are given. The application to TCPflow data control and Webdata is presented.)
Exercises are provided
Recommended Literature

 Aivazyan SA, Buchstaber VM, Yenyukov IS and Meshalkin LD (1989). Applied statistics. Classification and reduction of dimensionality. Financy i statistika. Moscow (in Russian). Relevant for Lecture 5.
 Beirlant J, Goegebeur Y, Teugels J and Segers J (2004) Statistics of Extremes: Theory and Applications. Wiley, Chichester, West Sussex. Relevant for Lecture 1.
 Devroye L, Gyorfi L (1985). Nonparametric density estimation. The L_1 view, John Wiley & Sons, New York. Relevant for Lectures 14.
 Embrechts P, Kluppelberg C and Mikosch T (1997). Modelling Extremal Events for Finance and Insurance. Springer, Berlin. Relevant for Lectures 1 & 6.
 Gnedenko BW and Kowalenko IN (1971). Einführung in die Bedienungstheorie. Oldenbourg Verlag, München. Useful for Lecture 8.
 Markovich NM (2007). Nonparametric Analysis of Univariate HeavyTailed data: Research and Practice, Wiley, Chichester, West Sussex. Useful for Lectures 110.
 Silverman BW (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall, New York.Essential for Lectures 1, 2 & 4.
 Simonoff JS (1996). Smoothing Methods in Statistics. Springer, New York. Essential for Lectures 1, 2 & 4.
 Tikhonov AN, Arsenin VY (1977). Solution of Illposed Problems. John Wiley, New York. Useful for Lecture 7.
 Wand MP, Jones MC (1995). Kernel smoothing. Chapman & Hall, New York.Essential for Lectures 2 & 4.