Introduction to pandas-profiling
pandas-profiling is a powerful library for generating profile reports with various summary statistics and visualizations for a Pandas DataFrame. It makes data analysis simple, efficient, and comprehensive.
Basic Usage
Generating a simple report is straightforward with pandas-profiling:
import pandas as pd from pandas_profiling import ProfileReport # Load dataset df = pd.read_csv("path/to/dataset.csv") # Generate report profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True) # Save report to an HTML file profile.to_file("output_report.html")
Customizing the Report
pandas-profiling allows for various customizations to tailor the report to your needs:
profile = ProfileReport( df, title="Pandas Profiling Report", explorative=True, correlations={"pearson": {"calculate": True}}, missing_diagrams={"bar": {"calculate": False}}, interactions={"continuous": {"calculate": True}} )
API Examples
Here are some of the useful APIs and methods provided by pandas-profiling:
# Extracting the report as a JSON json_data = profile.to_json() # Extracting the report as a dictionary dict_data = profile.to_dict() # Loading a previously saved ProfileReport from disk profile = ProfileReport.load("output_report.html") # Comparing two dataframes profile2 = ProfileReport(df2, title="Comparison Report") comparison = profile.compare(profile2)
Using pandas-profiling in a Data Analysis App
You can integrate pandas-profiling into a simple web application using Flask to present the reports dynamically:
from flask import Flask, render_template_string import pandas as pd from pandas_profiling import ProfileReport app = Flask(__name__) @app.route('/') def home(): df = pd.read_csv("path/to/dataset.csv") profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True) return render_template_string(profile.to_html()) if __name__ == '__main__': app.run(debug=True)
This app will generate and display the profiling report via a web page, allowing users to dynamically interact with the data summaries.
Hash: 11fbda0e89013cdf45ca57a84b2223c85845f086aeb6974ce528c89ca097c6e6