Introduction to pandas-profiling
pandas-profiling is an amazing library that extends the powerful capabilities of the pandas library. It allows you to quickly generate comprehensive reports that include various statistics, visualizations, and interactive consoles to aid in data analysis. Here, we provide a detailed guide to pandas-profiling with numerous API examples and a complete app walkthrough.
Creating a Basic Profile Report
The basic functionality of pandas-profiling is to create a simple profile report from a DataFrame.
import pandas as pd
from pandas_profiling import ProfileReport
# Load a dataset
df = pd.read_csv('data/your_data.csv')
# Create a profile report
profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
# Save the report to an HTML file
profile.to_file('output.html')
Customizing the Report
You can customize the generated report by tweaking various parameters in the ProfileReport constructor.
profile = ProfileReport(
df,
title="Custom Pandas Profiling Report",
explorative=True,
missing_diagrams={
"heatmap": False,
"dendrogram": True,
},
correlations={
"pearson": {"calculate": True},
"spearman": {"calculate": True},
"kendall": {"calculate": False}
}
)
profile.to_file('custom_output.html')
Interacting with Report in Jupyter Notebooks
If you’re using Jupyter Notebooks, you can directly display the generated report within the notebook environment.
profile.to_notebook_iframe()
Advanced Profiling Options
Import specific functions from pandas-profiling for granular control.
from pandas_profiling.model.describe import describe_1d, describe_numeric_1d, describe_categorical_1d
from pandas_profiling.model.summary import get_counts, get_missing
# Example: Custom function
def custom_profile(df):
description = describe_numeric_1d(df['column_name'])
missing_data = get_missing(description)
return description, missing_data
custom_profile(df)
Building an App Example with pandas-profiling
Here is a simple example of integrating pandas-profiling into a web application using Flask.
from flask import Flask, request, render_template
import pandas as pd
from pandas_profiling import ProfileReport
app = Flask(__name__)
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
file = request.files["file"]
df = pd.read_csv(file)
profile = ProfileReport(df)
profile.to_file("templates/report.html")
return render_template("report.html")
return '''
Upload CSV File
Upload CSV File for Profiling
'''
if __name__ == "__main__":
app.run(debug=True)
Conclusion
pandas-profiling is an essential library for anyone working with data analysis. It provides a simple and efficient way to generate insightful reports that help in understanding and diagnosing data. Use the examples provided and start integrating pandas-profiling into your data projects today.
Hash: 11fbda0e89013cdf45ca57a84b2223c85845f086aeb6974ce528c89ca097c6e6