Weak properties and robustness of tHill estimators

We describe a novel method of heavy tails estimation based on transformed score (t-score). Based on a new score moment method we derive the t-Hill estimator, which estimates the extreme value index of a distribution function with regularly varying tail. t-Hill estimator is distribution sensitive, thus it differs in e.g. Pareto and log-gamma case. Here, we study both forms of the estimator, i.e. t-Hill and t-lgHill. For both estimators we prove weak consistency in moving average settings as well as the asymptotic normality of t-lgHill estimator in iid setting. In cases of contamination with heavier tails than the tail of original sample, t-Hill outperforms several robust P. Jordanova Faculty of Mathematics and Informatics, Shumen University, Universitetska Str. 115, 9700 Shumen, Bulgaria Z. Fabián Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodárenskou věžı́ 2, 18200 Prague, Czech Republic P. Hermann Department of Applied Statistics, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria L. Střelec Department of Statistics and Operation Analysis (FBE), Mendel University, Zemědelská 1, 61300, Brno, Czech Republic A. Rivera Laboratorio de Glaciologı́a, Centro de Estudios Cientı́ficos (CECs), Arturo Prat 514, Valdivia, Chile / Departamento de Geografı́a, Universidad de Chile, Portugal 84, Santiago S. Girard · S. Torres INRIA Rhône-Alpes, team Mistis, Inovallée, 655, av. de l’Europe, Montbonnot, 38334 Saint-Ismier cedex, France. M. Stehlı́k Institute of Statistics, Universidad de Valparaı́so, Valparaı́so, Chile / Department of Applied Statistics, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria Tel.: +43 732 2468 6806 Fax: +43 732 2468 6800 E-mail: mlnstehlik@gmail.com 2 P. Jordanova et al. tail estimators, especially in small samples. A simulation study emphasizes the fact that the level of contamination is playing a crucial role. The larger the contamination, the better are the t-score moment estimates. The reason for this is the bounded t-score of heavy-tailed distributions (and, consequently, bounded influence functions of the estimators). We illustrate the developed methodology on a small sample data set of stake measurements from Guanaco glacier in Chile.


Introduction
The aim of this paper is to introduce a novel method of heavy tails estimation based on transformed score. Especially for small or/and contaminated samples such a method can be better than standard maximum likelihood, since it relates to the method of moments for transformed scores (see Fabián (2015) and Hosking and Wallis (1987)). Let us denote RV a the class of regularly varying functions at infinity, with an index of regular variation equal to a ∈ R, i.e. positive measurable functions g (·) such that for all x > 0, g(tx)/g(t) → x a , as t → ∞.
We define α to be the tail parameter and γ := 1/α to be the extreme value index. Hill (1975) derived a procedure of Pareto tail estimation by MLE, obtaining the following Hill estimator H k n ,n = 1 k n k n i=1 log X (n−i+1,n) X (n−k n ,n) , k n = 1, 2, . . . , n − 1.
Later on, many authors tried to robustify the Hill estimator, but they still relied on maximum likelihood method. Alves (2001) has introduced a new lower bound and Gomes and Oliveira (2003) respectively Li et al. (2010) have introduced powers of original statistics. However, the influence function of Hill estimator is slowly increasing but unbounded, thus, the Hill procedure is not robust. Further approaches of robustifying the original Hill estimator were given in Beran and Schell (2012) and Vandewalle et al. (2007). In Fabián (2001) a new score method of score moment estimators has been proposed. It appeared that these score moment estimators are robust for very heavy tailed distributions, see Stehlík et al. (2010a). Jordanova and Pancheva (2012) consider an independent identically distributed (i.i.d.) sample and find the limit distribution of the t-Hill estimator for a fixed number k of the threshold order statistics. They prove that a sample of Pareto distributed observations does not have to be large in order to receive the corresponding limit distribution for fixed k. In that case under suitable normalizations and a large sample the t-Hill estimator is asymptotically normal for k(n) → ∞. Under the more general conditions, the t-Hill estimator is asymptotically normal for k(n) = o(n). The Hill estimator procedure with the score moment estimator has been investigated in Stehlík et al. (2012) for optimal testing for normality against Pareto tail. Recently, nice generalizations of t-Hill have been published, see Brilhante et al. (2013), Paulauskas and Vaiciulis (2013) and Beran et al. (2014). Resnick and Starica (1993) generalize the Hill estimator for more general settings with possibly dependent data. In this paper we continue these investigations, since dependencies are expected in real data applications. We obtain weak consistency of the t-Hill estimator for a special class of dependent data, the infinite moving average model. Moreover, we provide some examples showing that in contrast to the i.i.d. case the t-Hill and the Hill estimators applied to the moving average model are not robust with respect to large observations. Under the concept of the Hill estimator we understand the successive averaging of ordered values up to given k. In this paper we understand "The Hill estimator" as a specific procedure for studying the tail of Pareto like distributions. Instead of implementing "The Hill estimator" procedure, we implement the t-score moment procedure, which can give different estimators for different families. We illustrate t-Hill and we also quantify the robustness and compare efficiency with other competitors.
The paper is organized as follows: in the next section we recall the theory of score function of the distribution introduced firstly by Fabián (2001) and studied later in series of works. In section 3 we provide asymptotical results (asymptotic consistency and asymptotic normality) for t-Hill estimators for Pareto and log-gamma distributions, under iid and moving average settings. In section 4 we compare maximum likelihood, t-Hill, transformed score moment and Hill estimators. We also introduce t-lgHill estimator, which is a transformed score moment estimator for case of log gamma distribution. Contamination of underlying data is controlled by the extreme value index, shifting parameter or the means of transformed score variance of contaminating Pareto distribution. Comparisons show that in cases when the contamination has heavier tail than the original distribution, t-Hill estimator outperforms Hill estimator and several robust tail estimators (Integrated Squared Error Estimator (ISE), Partial Density Component (PDC) estimator (Vandewalle et al. 2007), Least Squares estimator (LS), moment estimator (ME, see ), QQ estimator (Kratz and Resnick 1996) or Weighted MLE estimator (WMLE, see Dupuis and Morgenthaler (2002) and Dupuis and Victoria-Feser (2006)). We investigate robustness of t-Hill estimator for a particular case of moving average sequence. In Section 5 we provide an application of t-Hill in comparison to Hill and Zipf's estimator for a small sample data set from Guanaco Glacier. Summary concludes the paper. For reasons of readability we provide all proofs in the Appendix.
2 Transformed score function and transformed score moment estimators Pearson and Filon (1898), Edgeworth (1908a, b), andFisher (1925) developed the basic statistical inference function, the so called score function. Their score is the gradient with respect to parameter θ of the logarithm of the likelihood function S(θ, X) = ∂ ∂θ log L(θ ; X), indicating the sensitivity of L (its derivative normalized by its value). The variance of the score is the Fisher information I(θ ) = E θ (S 2 ). Fabián (2001Fabián ( , 2007Fabián ( , 2010Fabián ( , 2015 has introduced a more general scalar-valued inference function, in this paper called transformed score function (or t-score) of distribution, reflecting main features of continuous probability distributions and enabling an introduction of their new relevant characteristics.
The t-score of distribution G with support R and unimodal differentiable density g is a function expressing the relative rate of the change of g. For a distribution G μ with location parameter μ ∈ R, that is, with density in the form g(x − μ), the generalized t-score function equals the Fisher score for μ as We consider that t-score in equation (3) is a significant function of parametric distribution G θ , θ ∈ ⊆ R, and the mode y * : S G (y; θ) = 0 its 'central' point.
In general, the t-score of distribution F with arbitrary interval support X ⊆ R is defined by the following construction.
Definition 1 Let η : X → R be a strictly increasing smooth mapping. The tscore of distribution F with interval support X ⊆ R and (almost surely) two-times differentiable density f (x) is Supposing that the solution x * of equation is unique, it is called the t-score mean and function the generalized t-score function of distribution F .
Definition 1 can be explained as follows: where η (x) = dη(x)/dx is the Jacobian of the transformation. It was shown in Fabián (2001) that the t-score is actually the transformed generalized t-score function of the prototype, T F (x) = S G (η(x)). The Jacobian of the transformation does not carry any information about the distribution. The term in brackets of Eq. 4 is the density without the Jacobian. The justification of expression (6) is the following: consider a location prototype G μ with density g(y−μ). The density of the transformed distribution is a so called transformed location parameter. It was proven in Fabián (2007) that the generalized t-score function with transformed location parameter is identical with the Fisher score for this parameter, Generalized t-score function (6) and its parametric form S F (x; θ) generalize the concept of the Fisher score for distributions with arbitrary support without a 'central' parameter or without parameters at all. If the prototype G has mode y * , the score mean of the transformed distribution is the transformed mode of the prototype G, x * = η −1 (y * ). The generalized t-score function is actually the (generalized) Fisher score for the score mean, which may not be a parameter of the distribution and is unique if the prototype is unimodal.
The mapping η(x) occurring in Definition 1 is often the inner part of f (x), so that η(x) and/or η (x) in Eq. 7 are clearly identifiable. This η is called an innate mapping; Fabián (2015) discusses how to choose a most suitable innate mapping if it is not apparent from the density formula.
Let us consider two examples, related to log-gamma and Pareto distributions studied later in this paper. For other examples, e.g. beta-prime distribution see Fabián (2015).
x log x and the innate mapping is η(x) = log log x. By Eq. 4 one obtains with score mean x * = e c/α .
Example 3 The density of the Pareto distribution with X = (1, ∞) does not contain any 'visible' Jacobian of any transformation. By using η (x) from the foregoing example we would receive which is the Fisher score for α. By using η(x) = log(x − 1), η (x) = 1/(x − 1), the t-score (4) is with score mean x * = (α + 1)/α. Since the 'shifted' Pareto with X = R + and density f (x) = α (x+1) α+1 is the particular case of the beta type II distribution with bounded generalized t-score, the latter one is to be taken as the t-score of the Pareto distribution.
The score moments (SM) were introduced for any k ∈ N instead of the ordinary moments by existing only if f satisfies the usual regularity requirements. It appeared that the score moments are often expressed by elementary functions of parameters. It is easy to see that M 1 = 0. The value M 2 = ES 2 F of distributions with (transformed) location parameter is the Fisher information for (transformed) location parameter. Accordingly, ES 2 F is the Fisher information for the t-mean. The reciprocal value the t-score variance, appeared to be a measure of the variability (dispersion) of the distribution even in cases where the usual variance does not exist (see Fabián (2007)). Notice, that in case of the Pareto distribution ET 2 Let X 1 , . . . , X n be i.i.d. random variables according to some F . Assuming F as a member of the model family {F θ , θ ∈ }, ∈ R m , Fabián (2001Fabián ( , 2010 introduced the t-score moment estimateθ SM as the solution of equationŝ the statistical counterpart of Eq. 11. It was shown thatθ SM is consistent and asymptotically normal. The score moment estimators take the assumed form of the distribution into account, similarly as the maximum likelihood (ML) ones. However, since data enter into estimation equations only by means of S F (x i ; θ), and transformed scores of heavy-tailed distributions appeared to be bounded, the transformed score moment estimates of all parameters are protected against outliers in cases of heavy-tailed distributions. Let us remark that since t-scores and generalized t-score functions differ only in a constant factor, one can use the t-scores instead of generalized t-score functions in the score moment equations. The t-score moment equations for log-gamma and Pareto distributions are: Example 4 Log-gamma distribution: As ET 2 F = c, Eq. 14 are

Example 5 Pareto distribution: From the first moment equation
We use the same notations of order statistics as in Section 1, see (1). Hence, due to (17), the t-Hill estimator of γ = α −1 was suggested in the form This estimator was firstly published in technical report of Fabián and Stehlík (2009). Since it is based on harmonic mean, which is a generalized t-Hill estimator, it is expected to be resistant to large observations to a certain extent so that it could yield more realistic values than the ordinary Hill estimator. Further generalizations have been made by Brilhante et al. (2013) and Beran et al. (2014). In particular, the tradeoff between efficiency and robustness has been studied in (Beran et al. 2014), a mean of order p ≥ 0 (MOP) generalization is given by Brilhante et al. (2013). For the definition of the t-lgHill estimator we use the same notations of order statistics as in Section 1, see Eq. 1, and Hence the t-lgHill estimator of γ = α −1 has the form If we understand the t-Hill estimator as an algorithm depending on the assumed distribution, the t-Hill for log-gamma distribution is, according to the solution of Eqs. 15 and 16 log X (n−i+1,n) X (n−k n ,n) k n = 1, 2, ..., n − 1.
(20) Let us denote H L k n ,n sequence by t-lgHill. Referring to Section 2, the t-lgHill estimate of the tail index is given by Eq. 16 and has a closed-form expression as

t-Hill estimator in case of moving average sequence
Suppose at least one of the real numbers c j , j = 0, 1, ... is positive and there exists δ ∈ (0, 1), δ < α such that Consider the moving average sequence where Z i , −∞ < i < ∞, are non-negative i.i.d. innovations with d.f. G, such that G ∈ RV −α , α > 0. We will use the t-Hill estimator as introduced before and consider the point measure as a random element in the space E + of positive Radon measures on (0, ∞] endowed with the vague topology. Here is the tail function of F . Let By Proposition 3.3 of Resnick and Starica (1993) However, for all n ∈ N, k n = 1, 2, ..., n and in distribution, therefore in E + . By Proposition 2.1. in Resnick and Starica (1993), having an intermediate sequence, and by Proposition 2.2. in Resnick and Starica (1993) in E + . We will use the following Potter's inequality for distribution functions with regularly varying tails. If a function H is regularly varying with exponent γ ∈ R, then for ε > 0 there will exist t 0 (ε), such that for t > t 0 (ε) and x ≥ 1, Its proof can be found in de Haan L. (1970).
Assume that k n → ∞, k n /n → 0 and μ X,k n ,n ⇒ μ as n → ∞. Define As a consequence of Lemma 1, considering ϕ(x) = 1/x yields that H * k n ,n , defined in Eq. 18 is a weakly consistent estimator of 1/α. Now we are ready to prove the weak consistency of the t-Hill estimator in case of infinite moving average (MA) sequence.

t-lgHill estimator in case of moving average sequence
Suppose at least one of the real numbers c j , j = 0, 1, ... is positive and there exists δ ∈ (0, 1), δ < α such that (21) is satisfied.
Consider the moving average sequence (22) Assume (24) holds. By weak consistency of Hill estimator we obtain that Now we need to prove that and then to use Slutsky arguments. Consider the point measure as a random element in the space E + of positive Radon measures on (0, ∞] endowed with the vague topology, where b is defined in (23). In Section 3.1 we proved that under these conditions in E + and (26) and (27) are true. As a consequence of Lemma 1, letting ϕ(x) = ln(x) entails that M n is a weakly consistent estimator for 2/α 2 .
Proposition 2 Suppose at least one of the real numbers c j , j = 0, 1, ... is positive and there exists δ ∈ (0, 1), δ < α such that condition (21) is satisfied. Consider the moving average sequence n is a weakly consistent estimator for 2α −2 ; ii.) t-lgHill estimator H L k,n is a weakly consistent estimator for α −1 .

t-lgHill estimator is asymptotically normal in iid case
Definition 2 If the tail function of a non-negative random variable X isF := 1 − F andF : R → [0; 1] satisfies thatF ∈ RV −α with α > 0. ThenF is said to be of second-order regular variation with parameter ρ ≤ 0, if there exists a function Q(t) that ultimately has a constant sign with lim t→∞ Q(t) = 0 and a constant c = 0 such that Then it is written asF ∈ 2RV −α,ρ and Q(t) is referred to as the auxiliary function ofF .
It is known from de Haan and Stadtmüller (1996) or a more relevant form in Geluk et al. (1997) ρ and |A| ∈ RV ρ and no other choices of ρ are consistent with Q(t) → 0. There are many distributions which satisfy the second order RV condition. These are (see e.g. Drees et al. (2000)): The most seminal results on this topic could be found in de Haan and Ferreira (2006).

Empirical study of the effect of contamination on Pareto tail index
The maximum likelihood estimatorα ML,n := 1 H kn,n of Pareto tail index is very sensitive to deviations from the theoretical distributions, namely in the heavy-tailed class of distributions, see Stehlík et al. (2010b). It is unbiased, asymptotically consistent and has smallest variance in the set of all unbiased estimators for α. Its variance is equal to α 2 /n. However, it is not robust with respect to large contamination because its finite sample upper breakdown point is 0. Therefore, we are looking for an estimatorα n with asymptotic relative efficiency V arα ML,n V arα n = α 2 nV arα n ≤ 1 and close to 1. The latter means that the estimator's variance should be close to α 2 /n, although it will be larger than this value. It seems difficult to theoretically compare mean values and variances of Hill and t-Hill estimators for all distribution functions with regularly varying tails. In this section we empirically compare their properties. A good discussion of this topic can be found in Finkelstein et al. (2006). First they make the Pareto probability integral transform of the initial data and then they use the properties of the Uniform distribution in order to obtain their estimator of α and to examine its properties. The biggest advantage of their approach is that in this way "even infinite contamination has a bounded effect on the transformed data". Robust methods considering contaminated distributions work much better, but sweep out the differences among distributions so that they do not use the complete prior information. The main concepts about the robustness are very well presented e.g. in Huber and Ronchetti (1981) and Rousseeuw and Stahel (1986).
In order to compare the quality of the estimators we calculated the percentage relative bias of Hill, t-Hill and t-lgHill estimators (RB). It is defined as Another characteristic that we use in order to determine the quality of these estimators is their relative root-mean-square error (RRMSE) Brzezinski (2015).
Analogously to the Hill plot we consider the set of points with coordinates k n , H * k n ,n , k n ∈ {1, 2, ..., n}. Further on we call this plot 't-Hill plot'. Similarly for t-lgHill plot. The straight line represents the true value of γ = 1/α in all following figures in this section.

t-Hill versus other robust estimators
In our Monte Carlo simulation we focus on parameter α of the classical (or Type I) Pareto distribution P (α, δ), which is defined using cumulative distribution function as follows: where δ is a scale parameter and α > 0 is a shape parameter, generally known as the Pareto tail index. The existing literature offers various estimators for the Pareto tail index α. In our simulation study we focus primarily on the following estimators: Hill estimator (maximum likelihood estimator introduced by Hill (1975), which we use as a non-robust benchmark for the Pareto tail index), Integrated Squared Error Estimator (ISE, estimator introduced by Vandewalle et al. (2007), which is based on the relative excesses of observations above a certain treshold), Partial Density Component estimator (PDC, estimator introduced by Vandewalle et al. (2007), which is also based on the relative excesses of observations above a certain treshold), Least Squares estimator (LS), moment estimator (ME, see ), QQ estimator (QQ, estimator based on the Quantile-Quantile approach, see Kratz and Resnick (1996)), and finally Weighted MLE (WMLE, see Dupuis and Morgenthaler (2002) and Dupuis and Victoria-Feser (2006)). For the purpose of comparison we use the percentage relative bias (RB) and the percentage relative root-mean-square error (RRMSE) as defined in Section 4. Therefore, the data sets are simulated from Pareto distribution P (α, 1), for various values of α, as well as from contaminated Pareto distribution with function where = 0.05, 0.10 and α 1 = 1, 1.7, α 2 = 0.3, 0.5, 1. For all cases we assume the appropriate sample sizes n. Table 1 presents the percentage relative bias (RB) and the percentage relative rootmean-square error (RRMSE) for the uncontamined Pareto distribution P (α, 1). If we suppose large sample size n = 1000, the results of RB and RRMSE of the t-Hill, the Hill estimator, ISE estimators are comparable. The largest relative bias as well as relative root-mean-square error have the ME and WML estimators.   Table 2. We also made analogous observations for samples with size n = 500. However, this figure is not provided here. The same conclusions could be made also from the values of RB and RRMSE, given in Table 3.

t-Hill versus Hill and t-lgHill for distribution function with regularly varying tail at a logarithmic rate
Hill, t-lgHill and t-Hill estimators may perform very poorly if the slowly varying function in the tail is faraway from a constant. In (4.16) of Embrechts et al. (1997) the authors consider with respect to the Hill estimator. We simulated samples of n = 10000 observations of random variables with quantile functions (36), separately for α = 0.3, 1, 1.7 and plotted the Hill, t-lgHill and the t-Hill plots for k n = 5, 6, ..., 499. The rate of convergence of these estimators can be observed in Fig. 2. Notice that the t-lgHill outperforms both Hill and the t-Hill estimators. Although Hill estimator seems to be slightly closer to the estimated value, it is clear that three estimators are not good for such c.d.fs, see Table 4, however the t-lgHill has the best performance.   Figure 3 shows the t-lgHill for different α equal to 0.3, 1, 1.7 and log-gamma distribution (8), c = 1. Green line is for t-Hill, blue is Hill, red is lgHill.

t-Hill versus Hill for contaminated data from Pareto sample
Practical examples, see Fabián and Stehlík (2009) among others, have shown that data are expected to be contaminated in specific situations. Frequently there are  outliers in the right tail of the distribution. It is known that the Hill estimator is not robust. This is due to the fact that data are entered by their logarithm in its calculation. But what about the t-Hill estimator? In (18) data are involved by their reciprocal value. Therefore, it is not difficult to deduce that these estimators are robust with respect to large values. They are sensitive to the center of the distribution.
Contamination, governed by α 2 t-Hill estimator provides the superior results with respect to the relative bias (RB) and the relative root-mean-square error (RRMSE) for the contaminated Pareto distribution with distribution function (35) among all employed estimators ISE, PDC, LS, ME, QQ and WMLE.
Here we simulated small n = 40 and bigger samples of n = 200 data from contaminated Pareto distribution as given in (35) separately for α 1 = 0.3, 1 or 1.7 and α 2 = 0.5 or 1. The relative part of the contamination is governed by the parameter .  Fig. 4 Rate of convergence of the t-Hill (first line) and Hill (second line) estimators with α 1 = 1, α 2 = 0.5 (left), α 1 = 1.7, α 2 = 0.5 (middle) and α 1 = 1.7, α 2 = 1 (right), ε = 0.1 for n = 40  We worked with two values 0.1 and 0.05. When considering a sample size with 10 % contamination, this would yield = 0.1 . For sample sizes of 40, see Fig. 4. To obtain proper estimations for 1/α 1 we need k n to be large and k n < n. When α 1 < 1 and α 2 ≥ 1 the Hill estimators are better than the t-Hill estimators. When α 1 < 1 and α 2 < 1, or α 1 ≥ 1 and α 2 > 1 the t-Hill estimators are comparable to the corresponding Hill estimators. Finally if we observe the means of the estimators we can conclude that t-Hill estimators are better than Hill estimators for α 1 ≥ 1 and α 2 ≤ 1.
The same conclusions could be made on RB and RRMSE basis of both Hill and t-Hill estimators. They are given in Table 5 for k = 10 or 20 and for k = 100 or 200, and different values of α 1 , α 2 and .

Comparison of maximum likelihood with t-score moment estimators
Let us compare maximum likelihood (ML) estimators with t-score moment (SM) estimators for two different cases, on the one hand based on a sample of one distribution and on the other hand in presence of contamination, i.e. distribution of the form where ω 1 > ω are score variances. Note: The score variance of the Pareto distribution is ω 2 = (α + 2)/α 3 . If ω = 1, we have 1/α = 0.657. Pareto distribution with score variance ω 2 will be denoted by P a(ω).
Random samples were generated from heavy-tailed distributions, with increasing level ω 1 of contamination and α = 1.
The variances of SM estimates are somewhat higher in accordance with the values of theoretical asymptotic efficiencies, but SM estimates are less biased and thus, do not seem to be much worse than the ML estimates even for not contaminated samples. However, when data contamination comes into consideration, i.e. = 0.1 of F (10), the estimation with t-scores gives better results than standard ML-method. In addition to a smaller bias for very small sample sizes (n ≤ 25), variances are smaller for SM-method compared with ML-method the smaller the number of observations. Recall that for the two experiments new simulations have been performed such that a comparison of the goodness of the results between contaminated and non-contaminated data is obsolete (Table 7).

t-Hill versus Hill for data from the log-gamma distribution
It is apparent that both, t-Hill and Hill, show systematic decrease with expecting heavier Pareto tail. Figure 5 shows paths of averages for H k n ,n , H * k n ,n and H L k n ,n from 50 experiments with contaminated log-gamma data from (37) with ω = 5.546 and ε = 0.05. The values of ω 1 in plots in Fig. 5 are computed from log-gamma distribution . Figure 5 shows paths of averages for H k n ,n , H * k n ,n and H L k n ,n from 50 experiments with contaminated log-gamma data from (37) with ω = 5.546 and ε = 0.05. The values of ω 1 in plots in Fig. 5 are computed from log-gamma distribution. It is apparent from Fig. 5 that for small contamination t-lgHill (red) estimates γ from log-gamma data perform better than both Hill (blue) and t-Hill (green). However, for larger contamination it is definitely not robust, perhaps due to the squared logarithmic term. Notice that t-Hill is more robust than the classical Hill even for non-Pareto data.

Infinite moving average process
In this subsection we consider the infinite moving average process with the following autoregressive form It is discussed in Resnick and Starica (1993) with respect to the Hill estimator. Here we compare Hill estimator with the corresponding t-Hill estimator.  Table 8 show that in this case the t-Hill estimators are better.

t-Hill versus Hill for contaminated Pareto noise
Contamination, governed by ε Here we contaminated the distribution of the noise component, calculated the means of the corresponding Hill and t-Hill estimators for k n = 10, 11, ..., 9999 and plotted them in Figs. 7 and 8 for relative parts on contamination ε = 0.1 and ε = 0.05. The observations in each sample are n = 10000. More precisely the distribution of the noise component Z i in Eq. 38 is Eq. 35. The   Fig. 7 Rate of convergence of the t-Hill (first line) and Hill (second line) estimators with α 1 = 1, α 2 = 0.5 (left), α 1 = 1.7, α 2 = 0.5 (middle) and α 1 = 1.7, α 2 = 1 (right), ε = 0.1, n = 10000 corresponding values of RRMSE and RB are given in Table 9. Although both estimators give almost the same results, in most of these cases the t-Hill estimator is slightly better than the Hill estimator.

Conclusions of the Simulations
Having in mind the observations in Section 4 we can make the following conclusions:  Fig. 8 Rate of convergence of the t-Hill (first line) and Hill (second line) estimators with α 1 = 1, α 2 = 0.5 (left), α 1 = 1.7, α 2 = 0.5 (middle) and α 1 = 1.7, α 2 = 1 (right), ε = 0.05, n = 10000 1. when the contamination has heavier tail than the original distribution, t-Hill estimator outperforms Hill estimator and several robust tail estimators (ISE, PDC, LS, ME, QQ and WMLE). In contaminated i.i.d. case, where the severity of the contamination is governed by the extremal index, the Hill estimator has more narrow confidence intervals than t-Hill estimator. When α 1 < 1 and α 2 ≥ 1 the Hill estimators are better than the t-Hill estimators. When α 1 < 1 and α 2 < 1, or α 1 ≥ 1 and α 2 > 1 the t-Hill estimators are comparable to the corresponding Hill estimators. Finally we can conclude that t-Hill estimators are better than Hill estimators for α 1 ≥ 1 and α 2 ≤ 1. This effect increases for small samples. 2. Both Hill and t-Hill estimators are asymptotically consistent, however the t-Hill estimator has bigger variance. 3. The reason that the Hill estimator is not robust seems to be the fact that in its formula the data are entered by their logarithm. But what about the t-Hill estimator? In Eq. 18 data are involved by their reciprocal value. Therefore, it is not difficult to deduce that these estimators are robust with respect to large values. They are sensitive to the center of the distribution. 4. When the i.i.d. data contamination comes into consideration, i.e.
= 0.1 of F (10), the estimation with t-scores gives better results than standard MLmethod. In addition to a smaller bias for very small sample sizes (n ≤ 25), variances are smaller for SM-method compared with ML-method, especially for small samples. 5. In moving average case without contamination and with Pareto noise and α 1 ≤ 1 the t-Hill estimator is better with respect to the unbiasedness when n → ∞ and k n /n → 0, but it has bigger variance than the Hill estimator. 6. In moving average cases with contaminated Pareto noise the t-Hill estimators are better than Hill only in case α 1 ≥ 1 and α 2 ≤ 1 and small relative part of contamination. The smaller the relative part of the contamination, the faster the rate of convergence. Moreover in any cases the estimators converge to 1/min(α 1 , α 2 ) for n → ∞, k n → ∞ and k n /n → 0. 7. In case when the observed random variable have log-gamma distribution, the t-lgHill estimator is more robust that t-Hill and Hill estimators. This is due to the fact that the t-lgHill estimator is distribution sensitive while the Hill and t-Hill estimators by construction in this case are useful only due to the information about the regular variation of the distribution. 8. From all of the charts it visible that the t-lgHill estimator has relatively small variance. It seems to be almost the same as of the Hill estimator. 9. In Moving Average case with Pareto noise the t-Hill outperforms both Hill and t-lgHill estimators. When alpha increases, the variance of all three estimators decreases.
The larger the number of samples m, e.g. m = 1000, the more visible are the conclusions about Hill, t-lgHill and t-Hill estimators, however computational time is very large. We shall recall a general consensus that we shall have a multi-criterial approach in order to estimate the parameters of heavy tail. Consistency is an important property, but we shall also consider importance of approximation of finite sample distribution and robustness.

Application to a small sample data from Guanaco glacier
A glacier is a solid ice mass which is fed by solid water (snow, hail or hoarfrost), transforms this solid water in ice and restores it via steam (evaporation/sublimation) or via liquid (water drained by the runoff stream-flow), see Francou and Pouyaud (2004). Such gain and loss of mass can be analyzed as a balance (or budget). Guanaco glacier is located in the III region of Chile (latitude 29 • S) in the semi-arid Andes, it has a surface area of 1.86[km 2 ] and a maximum thickness of 120[m], see Rabatel et al. (2011), the area contributing water to the Pacific Ocean (included in the above calculation of 23 %), is only 1.26[km 2 ], in other words, less than 35 % of the total ice area in this zone. The rest of the Guanaco glacier contributes water to the Atlantic ocean. In this sense, the relative importance of glacier meltwater to the river basin as a complete unit is very low compared to the snow and rain contributions. Its geographical location is shown in Fig. 9, this glacier (jointly with other minor glaciers and glacierets in the zone) is highly relevant for studies, because it is an important water reservoir in a semi-arid zone of Chile, contributing with up to the 23 % of the streamflow in the high altitude corresponding valley, see Gascoin et al. (2011). All glaciers in the Pascua Lama region (3.64 km2) can contribute between 3 and 23 % to the very high basin area of the Huasco river. The glacier meltwater contribution goes from 100 % at the foot of the glacier, down to near 0 at the sea outlet of this river (Huasco). Given that this zone is one of the world's most richly endowed territories of copper, there is high interest for the occupation of these water resources, resulting in a uneven mining-agricultural competition for water rights, see Oyarzún and Oyarzún (2011). The mass balance observations over the glacier consists of annual records for the period 2002-2014, the dataset corresponds to the negated mass balance because we are interested in extremely large losses of volume in the glacier (see Fig. 10).  In order to study a threshold model for this data, estimations for the tail index γ are computed using Hill, t-Hill (as previously defined) and Zipf's estimator (see Kratz and Resnick (1996)) which is a form of least square estimator (Schultze and Steinebach 1996) of the tail index. It follows the fact that if we suspect that X 1,n ≤ ... ≤ X n,n are the order statistics from a Pareto family, then the plot of should be roughly linear with intercept 0 and slopeγ . The Zipf's estimator is computed for a given k n , aŝ , log X n−k n +i,n , 1 ≤ i ≤ k n .
We usedσ = X n−k n +1,nγ as estimator of the scale parameter. Zipf's estimator plot (see Fig. 11) shows linear stability up to k n = 7, which makes this value of k n a good option for further computations. Hill and t-Hill estimators were computed along with Zipf's estimator, taking into account k n = 6 and k n = 7. The stability of the parameter estimation in a small sample context was tested using a parametric bootstrap approach, where 1000 samples of size 10 were generated from GP distribution with shape parameter computed with the three previously discussed estimators. The results are displayed in form of histograms in Fig. 12. Table 10 summarizes the information about the EVI obtained from the simulation: Given the results for each estimator, it is clear that k n = 7 is a better choice than k n = 6 because estimations are less biased and all estimators are more close to each other, also, Zipf's estimator should be treated with more care in this context because it shows an important bias in the parametric bootstrap without an improvement in the variance. The main advantage of usage of t-Hill estimator is its robustness and good small sample properties.
In Fig. 13 we plot Hill, t-Hill, Zipf and t-lgHill for 40+ data sample of the Pascualama zone.

Summary
In this paper we introduce and study properties of t-score estimator under different model settings. In particular, we derived asymptotic normality for t-lgHill estimator for iid case. We also derived consistency for t-Hill and t-lgHill under i.i.d. and moving average setup. We can conclude that: i) The t-score is an unbounded function in cases of light-tailed distributions and a bounded function when distributions are heavy-tailed. Therefore, it seems to be extraordinary suitable for the estimation of parameters by the generalized moment method. It secures robustness 'when it is needed by the assumed distribution'. In this sense, the suggested estimates are 'naturally' robust without need of using any further function of robust statistics. In particular robustness of t-Hill can be well applied in cases when existence of first or second moment is questionable. Such an application for LATAM returns, i.e. daily log-returns (in percent) of an equity fund investing in Latin America (LATAM), can be found in Stehlík and Hermann (2015). ii) It appeared that in cases of very small contamination, the maximum likelihood and t-moment method give similar results. However, the larger the contamination, the better are the t-moment estimates. iii) It is apparent that an introduction of the t-mean and t-variance for distributions without ordinary mean and variance enables to describe the behavior of the estimates in a unique way and thus, compares results of the estimation for various assumed families of distributions having different parameters. iv) We propose t-Hill estimator for possibly dependent data and investigate its weak consistency and asymptotic behavior. Particularly we consider the infinite moving average model. Besides the i.i.d. case there are cases where the t-Hill estimator is more robust than the Hill estimator. We applied t-Hill for small sample size real data for Glacier and received satisfactory results despite a small sample size.
From the definition of b, n/k n → ∞ implies b(n/k n ) → ∞ and n k n F (b(n/k n )) → 1 as n → ∞. Thus, in view of (30) and Potter's inequality for regularly varying functions, for any 0 < δ 1 < α there exist x 0 and n δ 1 large enough such that, for all x > max(1, x 0 ) and n > n δ 1 , and for all t > 1, We apply Fubini's theorem and obtain that for all t > 1 and n > n δ , As a consequence, lim t→∞ lim n→∞ P ∞ t μ X,kn,n X (n−kn,n) b(n/k n ) x, ∞ |ϕ (x)| dx > ε, X (n−kn,n) b(n/k n ) − 1 < δ = 0 and (39) is satisfied.

Proof of Proposition 1
Proof The random variables X n , −∞ < n < ∞, are identically distributed. Cline (1983) proves that under these settings We use (26), apply the above Lemma 1 and complete the proof.

Proof of Proposition 2
Proof i.) The random variables X n , −∞ < n < ∞, are identically distributed. Cline (1983) proves that under these settings We use (32), apply Lemma 1 and complete the proof. ii.) We use Slutsky's arguments and obtain weak consistency of the t-lgHill estimator in case of infinite moving average (MA) sequence.
limit distribution of the first term is established in (3.5.24), page 109 of de Haan and Ferreira (2006): The limit distribution of the second term has already been established to be 1 2 Q − 2P and therefore ξ n converges in distribution to γ P . Plugging ξ n and ξ n in (42)