pandas.Series.describe — pandas 2.2.2 documentation (2024)

Series.describe(percentiles=None, include=None, exclude=None)[source]#

Generate descriptive statistics.

Descriptive statistics include those that summarize the centraltendency, dispersion and shape of adataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as wellas DataFrame column sets of mixed data types. The outputwill vary depending on what is provided. Refer to the notesbelow for more detail.

Parameters:

percentileslist-like of numbers, optional

The percentiles to include in the output. All shouldfall between 0 and 1. The default is[.25, .5, .75], which returns the 25th, 50th, and75th percentiles.

include‘all’, list-like of dtypes or None (default), optional

A white list of data types to include in the result. Ignoredfor Series. Here are the options:

‘all’ : All columns of the input will be included in the output.
A list-like of dtypes : Limits the results to theprovided data types.To limit the result to numeric types submitnumpy.number. To limit it instead to object columns submitthe numpy.object data type. Stringscan also be used in the style ofselect_dtypes (e.g. df.describe(include=['O'])). Toselect pandas categorical columns, use 'category'
None (default) : The result will include all numeric columns.

excludelist-like of dtypes or None (default), optional,

A black list of data types to omit from the result. Ignoredfor Series. Here are the options:

A list-like of dtypes : Excludes the provided data typesfrom the result. To exclude numeric types submitnumpy.number. To exclude object columns submit the datatype numpy.object. Strings can also be used in the style ofselect_dtypes (e.g. df.describe(exclude=['O'])). Toexclude pandas categorical columns, use 'category'
None (default) : The result will exclude nothing.

Returns:

Series or DataFrame: Summary statistics of the Series or Dataframe provided.

DataFrame.min

Minimum of the values in the object.

DataFrame.mean

Mean of the values.

DataFrame.std

Standard deviation of the observations.

DataFrame.select_dtypes

Subset of a DataFrame including/excluding columns based on their dtype.

Notes

For numeric data, the result’s index will include count,mean, std, min, max as well as lower, 50 andupper percentiles. By default the lower percentile is 25 and theupper percentile is 75. The 50 percentile is thesame as the median.

For object data (e.g. strings or timestamps), the result’s indexwill include count, unique, top, and freq. The topis the most common value. The freq is the most common value’sfrequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then thecount and top results will be arbitrarily chosen fromamong those with the highest count.

For mixed data types provided via a DataFrame, the default is toreturn only an analysis of numeric columns. If the dataframe consistsonly of object and categorical data without any numeric columns, thedefault is to return an analysis of both the object and categoricalcolumns. If include='all' is provided as an option, the resultwill include a union of attributes of each type.

The include and exclude parameters can be used to limitwhich columns in a DataFrame are analyzed for the output.The parameters are ignored when analyzing a Series.

Examples

Describing a numeric Series.

>>> s = pd.Series([1, 2, 3])>>> s.describe()count 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0dtype: float64

Describing a categorical Series.

>>> s = pd.Series(['a', 'a', 'b', 'c'])>>> s.describe()count 4unique 3top afreq 2dtype: object

Describing a timestamp Series.

>>> s = pd.Series([...  np.datetime64("2000-01-01"),...  np.datetime64("2010-01-01"),...  np.datetime64("2010-01-01")... ])>>> s.describe()count 3mean 2006-09-01 08:00:00min 2000-01-01 00:00:0025% 2004-12-31 12:00:0050% 2010-01-01 00:00:0075% 2010-01-01 00:00:00max 2010-01-01 00:00:00dtype: object

Describing a DataFrame. By default only numeric fieldsare returned.

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d', 'e', 'f']),...  'numeric': [1, 2, 3],...  'object': ['a', 'b', 'c']...  })>>> df.describe() numericcount 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all')  categorical numeric objectcount 3 3.0 3unique 3 NaN 3top f NaN afreq 1 NaN 1mean NaN 2.0 NaNstd NaN 1.0 NaNmin NaN 1.0 NaN25% NaN 1.5 NaN50% NaN 2.0 NaN75% NaN 2.5 NaNmax NaN 3.0 NaN

Describing a column from a DataFrame by accessing it asan attribute.

>>> df.numeric.describe()count 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number]) numericcount 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[object])  objectcount 3unique 3top afreq 1

Including only categorical columns from a DataFrame description.

>>> df.describe(include=['category']) categoricalcount 3unique 3top dfreq 1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number])  categorical objectcount 3 3unique 3 3top f afreq 1 1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[object])  categorical numericcount 3 3.0unique 3 NaNtop f NaNfreq 1 NaNmean NaN 2.0std NaN 1.0min NaN 1.025% NaN 1.550% NaN 2.075% NaN 2.5max NaN 3.0

pandas.Series.describe — pandas 2.2.2 documentation (2024)

References