Back to Cheatsheets

data_analysis

SOFTWARE

Table of Contents

OVERVIEW

CENTRAL TENDENCY

VARIATION

QUANTILES

CURVE FITTING

OVERVIEW

Data Analysis Type

UNIVARIATE DATA

BIVARIATE

MULTIVARIATE

FIVE-NUMBER SUMMARY

CENTRAL TENDENCY

Mean

df.column_name.mean()

np.mean(dataset)

np.average(dataset)

Median

df.column_name.median()

np.median(dataset)

Mode

stats.mode(array_nums)

VARIATION

Variance

np.var(dataset)

Standard Deviation

np.std(dataset)

Ranges

Max, Min

df.column_name.min()
df.column_name.max()

np.amin(dataset)
np.amax(dataset)

Interquartile Range

stats.iqr(dataset)

Skewness

Skew Peak Tails Median/Mode
Symmetric one similar similar
Skew-right left right median < mean
Skew-left right left median > mean

Kurtosis

Modality

Outliers

Plots

HISTOGRAM

np.histogram(array_name, range= (min, max), bins = #)

plt.hist(dataset, range=(min, max), bins=#bins, edgecolor = 'black')

Describe Histogram

  1. Center (mean or median)
  2. Range (max - min)
  3. Shape (skewness)
  4. Modality (peaks)
  5. Outliers

Boxplot

Violin Plot

QUANTILES

np.quantile(dataset, [list of quantiles])
Quantiles Split Arg
2-quantile (median) 2 groups 0.5
quartiles 4 groups [0,25, 0.5, 0.75]
quintiles 5 groups [0.2, 0.4, 0.6, 0.8]
deciles 10 [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
percentiles 100 [0.01, 0.02,….]

Percentiles

np.percentile(patrons, 30)

Quartiles

np.quantile(dataset, Q#)
OR 
np.percentile(dataset, %#)

PLOTS

Q-Q

Mean-difference

CURVE FITTING

Straight Line

Correlation Coeffecient (R)

Coeffecient of Determination

Pearsons

Spearmans

Least squares

Residual Inspection