How to Use Pandas DataFrame groupby() Method in Python

09/13/2021

Contents

In this article, you will learn how to use Pandas DataFrame groupby() method in Python.

Pandas DataFrame groupby() Method

The Pandas DataFrame groupby() method is a powerful tool for grouping and aggregating data based on one or more variables. It allows you to perform operations on subsets of your data, making it easier to analyze and understand.

Here are the basic steps to use the groupby() method:

  1. Import pandas library:

    import pandas as pd
  2. Create a DataFrame with your data:

    data = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',                           'foo', 'bar', 'foo', 'foo'],
                         'B': ['one', 'one', 'two', 'three',                           'two', 'two', 'one', 'three'],
                         'C': [1, 2, 3, 4, 5, 6, 7, 8],
                         'D': [9, 10, 11, 12, 13, 14, 15, 16]})
    
  3. Call the groupby() method on your DataFrame and specify the variable(s) you want to group by. In this example, we group by columns ‘A’ and ‘B’:

    grouped_data = data.groupby(['A', 'B'])
  4. Use one of the available aggregation functions (e.g., sum(), mean(), count(), etc.) to compute a summary statistic for each group. Here, we compute the mean of column ‘C’ for each group:

    grouped_data['C'].mean()

    The output will be a Pandas Series with a MultiIndex (grouping variables) and the computed summary statistic (mean of ‘C’).

    You can also use agg() method to apply multiple aggregation functions to different columns. Here’s an example:

    grouped_data.agg({'C': 'mean', 'D': 'sum'})

    This will compute the mean of ‘C’ and sum of ‘D’ for each group, and return a DataFrame with a MultiIndex and the computed summary statistics for each group.

Overall, the groupby() method is a powerful and flexible way to analyze and summarize data in a Pandas DataFrame.