Q–Q plot

If the two distributions being compared are similar, the points in the Q–Q plot will approximately lie on the identity line y = x.If the distributions are linearly related, the points in the Q–Q plot will approximately lie on a line, but not necessarily on the line y = x. Q–Q plots can also be used as a graphical means of estimating parameters in a location-scale family of distributions.[2][3] This can provide an assessment of goodness of fit that is graphical, rather than reducing to a numerical summary statistic.A more complicated construction is the case where two data sets of different sizes are being compared.[6] Many other choices have been suggested, both formal and heuristic, based on theory or simulations relevant in context.More generally, Shapiro–Wilk test uses the expected values of the order statistics of the given distribution; the resulting plot and line yields the generalized least squares estimate for location and scale (from the intercept and slope of the fitted line).However, this requires calculating the expected values of the order statistic, which may be difficult if the distribution is not normal.Alternatively, one may use estimates of the median of the order statistics, which one can compute based on estimates of the median of the order statistics of a uniform distribution and the quantile function of the distribution; this was suggested by Filliben (1975).These can be expressed in terms of the quantile function and the order statistic medians for the continuous uniform distribution by: where U(i) are the uniform order statistic medians and G is the quantile function for the desired distribution.The R programming language comes with functions to make Q–Q plots, namely qqnorm and qqplot from the stats package.The fastqq package implements faster plotting for large number of data points.
A normal Q–Q plot of randomly generated, independent standard exponential data, ( X ~ Exp(1) ). This Q–Q plot compares a sample of data on the vertical axis to a statistical population on the horizontal axis. The points follow a strongly nonlinear pattern, suggesting that the data are not distributed as a standard normal ( X ~ N(0,1) ). The offset between the line and the points suggests that the mean of the data is not 0. The median of the points can be determined to be near 0.7
A normal Q–Q plot comparing randomly generated, independent standard normal data on the vertical axis to a standard normal population on the horizontal axis. The linearity of the points suggests that the data are normally distributed.
A Q–Q plot of a sample of data versus a Weibull distribution . The deciles of the distributions are shown in red. Three outliers are evident at the high end of the range. Otherwise, the data fit the Weibull(1,2) model well.
A Q–Q plot comparing the distributions of standardized daily maximum temperatures at 25 stations in the US state of Ohio in March and in July. The curved pattern suggests that the central quantiles are more closely spaced in July than in March, and that the July distribution is skewed to the left compared to the March distribution. The data cover the period 1893–2001.
Q–Q plot for first opening/final closing dates of Washington State Route 20 , versus a normal distribution. [ 5 ] Outliers are visible in the upper right corner.
P–P plotexponentialsamplestatistical populationWeibull distributionstandardizedquantilesskewedgraphical methodprobability distributionsparametric curveidentity linelocation-scale familylocationskewnesstheoretical distributionsnon-parametrichistogramsgoodness of fitsummary statisticscatter plotprobability plot correlation coefficient plotWashington State Route 20cumulative distribution functioninterpolatedquantile functionsdispersedprobability plot correlation coefficientcorrelation coefficientnormal distributionnormal probability plotsampling distributionGerman tank problemmaximum spacing estimationrankitsShapiro–Wilk testgeneralized least squaresinterceptmedianaffinesymmetricalorder statisticsR programming languageEmpirical distribution functionProbitChester Ittner BlissmediansMINITABGumbel distributionBibcodepublic domain materialGibbons, Jean DickinsonStatisticsOutlineDescriptive statisticsContinuous dataCenterArithmeticArithmetic-GeometricContraharmonicGeneralized/powerGeometricHarmonicHeronianLehmerDispersionAverage absolute deviationCoefficient of variationInterquartile rangePercentileStandard deviationCentral limit theoremMomentsKurtosisL-momentsCount dataIndex of dispersionContingency tableFrequency distributionGrouped dataDependencePartial correlationPearson product-moment correlationRank correlationKendall's τSpearman's ρGraphicsBar chartBiplotBox plotControl chartCorrelogramFan chartForest plotHistogramPie chartRadar chartRun chartStem-and-leaf displayViolin plotData collectionStudy designEffect sizeMissing dataOptimal designPopulationReplicationSample size determinationStatisticStatistical powerSurvey methodologySamplingClusterStratifiedOpinion pollQuestionnaireStandard errorControlled experimentsBlockingFactorial experimentInteractionRandom assignmentRandomized controlled trialRandomized experimentScientific controlAdaptive clinical trialStochastic approximationUp-and-down designsObservational studiesCohort studyCross-sectional studyNatural experimentQuasi-experimentStatistical inferenceStatistical theoryProbability distributionOrder statisticEmpirical distributionDensity estimationStatistical modelModel specificationLp spaceParameterParametric familyLikelihood(monotone)Location–scale familyExponential familyCompletenessSufficiencyStatistical functionalBootstrapOptimal decisionloss functionEfficiencyStatistical distancedivergenceAsymptoticsRobustnessFrequentist inferencePoint estimationEstimating equationsMaximum likelihoodMethod of momentsM-estimatorMinimum distanceUnbiased estimatorsMean-unbiased minimum-varianceRao–BlackwellizationLehmann–Scheffé theoremMedian unbiasedPlug-inInterval estimationConfidence intervalLikelihood intervalPrediction intervalTolerance intervalResamplingJackknifeTesting hypotheses1- & 2-tailsUniformly most powerful testPermutation testRandomization testMultiple comparisonsParametric testsLikelihood-ratioScore/Lagrange multiplierSpecific testsZ-test (normal)Student's t-testF-testChi-squaredG-testKolmogorov–SmirnovAnderson–DarlingLillieforsJarque–BeraNormality (Shapiro–Wilk)Likelihood-ratio testModel selectionCross validationRank statisticsSample medianSigned rank (Wilcoxon)Hodges–Lehmann estimatorRank sum (Mann–Whitney)Nonparametric1-way (Kruskal–Wallis)2-way (Friedman)Ordered alternative (Jonckheere–Terpstra)Van der Waerden testBayesian inferenceBayesian probabilityposteriorCredible intervalBayes factorBayesian estimatorMaximum posterior estimatorCorrelationRegression analysisPearson product-momentConfounding variableCoefficient of determinationErrors and residualsRegression validationMixed effects modelsSimultaneous equations modelsMultivariate adaptive regression splines (MARS)Linear regressionSimple linear regressionOrdinary least squaresGeneral linear modelBayesian regressionNonlinear regressionSemiparametricIsotonicRobustHomoscedasticity and HeteroscedasticityGeneralized linear modelExponential familiesLogistic (Bernoulli)BinomialPoisson regressionsPartition of varianceAnalysis of variance (ANOVA, anova)Analysis of covarianceMultivariate ANOVADegrees of freedomCategoricalMultivariateTime-seriesSurvival analysisCohen's kappaGraphical modelLog-linear modelMcNemar's testCochran–Mantel–Haenszel statisticsRegressionManovaPrincipal componentsCanonical correlationDiscriminant analysisCluster analysisClassificationStructural equation modelFactor analysisMultivariate distributionsElliptical distributionsNormalDecompositionStationaritySeasonal adjustmentExponential smoothingCointegrationStructural breakGranger causalityDickey–FullerJohansenQ-statistic (Ljung–Box)Durbin–WatsonBreusch–GodfreyTime domainAutocorrelation (ACF)partial (PACF)Cross-correlation (XCF)ARMA modelARIMA model (Box–Jenkins)Autoregressive conditional heteroskedasticity (ARCH)Vector autoregression (VAR)Frequency domainSpectral density estimationFourier analysisLeast-squares spectral analysisWaveletWhittle likelihoodSurvivalSurvival functionKaplan–Meier estimator (product limit)Proportional hazards modelsAccelerated failure time (AFT) modelFirst hitting timeHazard functionNelson–Aalen estimatorLog-rank testApplicationsBiostatisticsBioinformaticsClinical trialsstudiesEpidemiologyMedical statisticsEngineering statisticsChemometricsMethods engineeringProbabilistic designProcessquality controlReliabilitySystem identificationSocial statisticsActuarial scienceCensusCrime statisticsDemographyEconometricsJurimetricsNational accountsOfficial statisticsPopulation statisticsPsychometricsSpatial statisticsCartographyEnvironmental statisticsGeographic information systemGeostatisticsKriging