Python data analysis and application-Python data analysis and application PDF internal full information version

I bring you an e-book resource related to Python data, which introduces the content of Python. This book is published by People's Posts and Telecommunications Press. The format is PDF and the resource size is 281 MB. Written by Huang Hongmei and Zhang Liangjun, the current comprehensive rating of e-books on Douban, Amazon, Dangdang, JD.com, etc. is: 7.8.

Content Introduction

Table of Contents

Chapter 1 Overview of Python Data Analysis 1

Task 1.1: Understanding Data Analysis 1

1.1.1 Master the concept of data analysis 2

1.1.2 Master the process of data analysis 2

1.1.3 Understand the application scenarios of data analysis 4

Task 1.2 Familiar with Python data analysis tools 5

1.2.1 Understand the common tools for data analysis 6

1.2.2 Understand the advantages of Python data analysis 7

1.2 .3 Understand the common class libraries for Python data analysis 7

Task 1.3 Install the Anaconda distribution version 9 of Python

1.3.1 Understand the Anaconda distribution version 9 of Python

1.3.2 Install Anaconda in Windows system 9

1.3.3 Install Anaconda in Linux system 12

Task 1.4 Master the common functions of Jupyter Notebook 14

1.4 .1 Master the basic functions of Jupyter Notebook 14

1.4.2 Master the advanced functions of Jupyter Notebook 16

Summary 19

After-class exercises 19

Chapter 2 NumPy Numerical Computation Basics 21

Task 2.1 Master the NumPy array object ndarray 21

2.1.1 Create an array object 21

2.1.2 Generating random numbers 27

2.1.3 Accessing arrays through indexes 29

2.1.4 Transforming the shape of arrays 31

Task 2.2 Mastering NumPy matrices and general functions 34

2.2.1 Create NumPy matrix 34

2.2.2 Master the ufunc function 37

Task 2.3 Use NumPy for statistical analysis 41

2.3.1 Reading/writing files 41

2.3.2 Using functions for simple statistical analysis 44

2.3.3 Task implementation 48

Summary 50

Practical training 50

Practical training 1: Create arrays and perform operations 50

Practical training 2: Create a chess board 50

After class Exercise 51

Chapter 3 Matplotlib Data Visualization Basics 52

Task 3.1 Master the basic syntax and common parameters of drawing 52

3.1.1 Master the basic syntax of pyplot 53

3.1.2 Setting the dynamic rc parameters of pyplot 56

Task 3.2 Analyzing the relationship between features 59

3.2.1 Drawing a scatter plot 59

3.2.2 Drawing Line Chart 62

3.2.3 Task Implementation 65

Task 3.3 Analyzing Characteristics

Internal data distribution and dispersion 68

3.3.1 Drawing a histogram 68

3.3.2 Drawing a pie chart 70

3.3.3 Drawing a box plot 71

3.3.4 Task realization 73

Summary 77

Practical training 78

Practical training 1 Analyzing the characteristics of 1996 and 2015 population data The relationship between 78

Practical training 2: Analyze the distribution and dispersion of various characteristics of the population data in 1996 and 2015 78

After-school exercises 79

Chapter 4 pandas Basics of statistical analysis 80

Task 4.1 Reading/writing data from different data sources 80

4.1.1 Reading/writing database data 80

4.1.2 Reading/ Writing text files 83

4.1.3 Reading/writing Excel files 87

4.1.4 Task implementation 88

Task 4.2 Mastering the common operations of DataFrame 89

4.2.1 View common properties of DataFrame 89

4.2.2 Check, modify, add and delete DataFrame data 91

4.2.3 Describe and analyze DataFrame data 101

4.2.4 Task Implementation 104

Task 4.3 Converting and processing time series data 107

4.3.1 Converting string time to standard time 107

4.3 .2 Extracting time series data information 109

4.3.3 Adding and subtracting time data 110

4.3.4 Task implementation 111

Task 4.4 Using grouping aggregation for grouping Internal calculation 113

4.4.1 Use the groupby method to split data 114

4.4.2 Use the agg method to aggregate data 116

4.4.3 Use the apply method to aggregate Data 119

4.4.4 Using the transform method to aggregate data 121

4.4.5 Task implementation 121

Task 4.5 Creating pivot tables and crosstabs 123

4.5.1 Use the pivot_table function to create a pivot table 123

4.5.2 Use the crosstab function to create a crosstab 127

4.5.3 Task implementation 128

Summary 130

Practical Training 130

Practical Training 1: Read and view the basic information of the P2P online loan data master table 130

Practical Training 2: Extract user information Time information of update table and login information table 130

Practical training 3: Use group aggregation method to further analyze user information update table and login information table 131

Practical training 4: User information update table Convert the length and width table with the login information table 131

After-class exercises 131

Chapter 5 Using pandas for data preprocessing 133

Task 5.1 Merging data 133

5.1.1 Number of stacked merges

Data 133

5.1.2 Primary key merged data 136

5.1.3 Overlapping merged data 139

5.1.4 Task implementation 140

Task 5.2 Cleaning data 141

5.2.1 Detecting and processing duplicate values ??141

5.2.2 Detecting and processing missing values ??146

5.2.3 Detection and processing Outliers 149

5.2.4 Task Implementation 152

Task 5.3 Standardized Data 154

5.3.1 Dispersion Standardized Data 154

5.3.2 Standard deviation standardized data 155

5.3.3 Decimal scaling standardized data 156

5.3.4 Task implementation 157

Task 5.4 Transformed data 158

5.4.1 Dummy variables to process categorical data 158

5.4.2 Discretized continuous data 160

5.4.3 Task implementation 162

Summary 163

Practical training 164

Practical training 1: Imputing missing values ??of user electricity consumption data 164

Practical training 2: Merging line loss and electricity consumption Volume trend and line alarm data 164

Practical training 3 Standardized modeling expert sample data 164

After-class exercises 165

Chapter 6 Use scikit-learn to build Model 167

Task 6.1 Use sklearn converter to process data 167

6.1.1 Load the data set in the datasets module 167

6.1.2 Divide the data set For training set and test set 170

6.1.3 Using sklearn converter for data preprocessing and dimensionality reduction 172

6.1.4 Task implementation 174

Task 6.2 Constructing and evaluating clustering models 176

6.2.1 Using sklearn estimator to construct clustering models 176

6.2.2 Evaluating clustering models 179

6.2 .3 Task implementation 182

Task 6.3 Construct and evaluate classification model 183

6.3.1 Use sklearn estimator to build classification model 183

6.3.2 Evaluate classification Model 186

6.3.3 Task implementation 188

Task 6.4 Build and evaluate regression model 190

6.4.1 Use sklearn estimator to build a linear regression model 190

6.4.2 Evaluation of regression model 193

6.4.3 Task implementation 194

Summary 196

Practical training 196

Practical training 1: Use sklearn to process wine and wine_quality data sets 196

Practical training 2: Construct a K-Means clustering model based on wine data set 196

Practical training 3: Construct a wine-based clustering model based on wine SVM classification model for data set 19

Practical training 4: Constructing a regression model based on the wine_quality data set 197

After-class exercises 198

Chapter 7 Airlines customer value analysis 199

Task 7.1: Understand the current situation of airlines and customer value analysis 199

7.1.1 Understand the current situation of airlines 200

7.1.2 Understand customer value analysis 201

7.1.3 Familiar with the steps and processes of aviation customer value analysis 201

Task 7.2 Preprocessing aviation customer data 202

7.2.1 Handling missing data and outliers 202

7.2.2 Constructing key features of aviation customer value analysis 202

7.2.3 Five features of the standardized LRFMC model 206

7.2.4 Task realization 207

Task 7.3 Use K-Means algorithm for customer segmentation 209

7.3.1 Understand K-Means clustering algorithm 209

7.3.2 Analyze clustering results 210

7.3.3 Model Application 213

7.3.4 Task Implementation 214

Summary 215

Practical Training 215

Practical training 1: Processing credit card data outliers 215

Practical training 2: Constructing key features of credit card customer risk assessment 217

Practical training 3: Constructing K-Means clustering model 218

After-class exercises 218

Chapter 8 Fiscal Revenue Forecast Analysis 220

Task 8.1 Understand the background and methods of fiscal revenue forecast 220

8.1.1 Analysis Background of fiscal revenue forecasting 220

8.1.2 Understand the methods of fiscal revenue forecasting 222

8.1.3 Familiar with the steps and processes of fiscal revenue forecasting 223

Task 8.2 Analyzing the correlation of fiscal revenue data characteristics 223

8.2.1 Understanding correlation analysis 223

8.2.2 Analyzing calculation results 224

8.2.3 Task realization 225

Task 8.3 Use Lasso regression to select key features of fiscal revenue forecast 225

8.3.1 Understand the Lasso regression method 226

8.3.2 Analyze the Lasso regression results 227

8.3.3 Task implementation 227

Task 8.4 Use gray prediction and SVR to build a fiscal revenue forecast model 228

8.4.1 Understand the gray prediction algorithm 228

8.4.2 Understanding the SVR algorithm 229

8.4.3 Analyzing prediction results 232

8.4.4 Task implementation 234

Summary 236

Practical training 236

Practical training 1: Obtain the correlation coefficient between various characteristics of corporate income tax 236

Practical training 2: Select key characteristics of corporate income tax prediction 237

Practical training 3: Constructing a corporate income tax prediction model

237

After-class exercises 237

Chapter 9 Household water heater user behavior analysis and event identification 239

Task 9.1: Understand the background and steps of household water heater user behavior analysis 239

9.1.1 Analyze the current situation of the household water heater industry 240

9.1.2 Understand the basic situation of water heater data collection 240

9.1.3 Familiar with the analysis of user behavior of household water heaters Steps and processes 241

Task 9.2 Preprocessing water heater user water use data 242

9.2.1 Deleting redundant features 242

9.2.2 Dividing water use events 243

9.2.3 Determining the duration threshold of a single water use event 244

9.2.4 Task implementation 246

Task 9.3 Constructing water use behavior characteristics and filtering water use events 247

9.3.1 Constructing water duration and frequency characteristics 248

9.3.2 Constructing water consumption and fluctuation characteristics 249

9.3.3 Screening candidate bathing events 250

9.3.4 Task Implementation 251

Task 9.4 Constructing a BP neural network model for behavioral event analysis 255

9.4.1 Understanding the BP neural network algorithm principle 255

9.4.2 Build model 259

9.4.3 Evaluate model 260

9.4.4 Task implementation 260

Summary 263

Practical training 263

Practical training 1: Cleaning operator customer data 263

Practical training 2: Screening customer operator data 264

Practical training 3: Building a neural network Prediction model 265

After-school exercises 265

Appendix A 267

Appendix B 270

References 295

Study Notes

Jupyter Notebook (formerly known as IPython notebook) is an interactive notebook that supports running more than 40 programming languages. Jupyter Notebook is essentially a web application that facilitates the creation and sharing of literary program documents, supporting live code, mathematical equations, visualization and markdown. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning, etc. Definition (Recommended learning: Python video tutorial) Users can share Jupyter Notebook with others through email, Dropbox, GitHub and Jupyter Notebook Viewer. In Jupyter Notebook, code can generate images, videos, LaTeX and JavaScript in real time. The data in Kaggle, the most popular competition in the field of data mining, is all in Jupyter format. Architecture Jupyter components Jupyter includes the following components: Jupyter Notebook and...

This article describes the WeChat friend data analysis function implemented in Python.

Share it with everyone for your reference, the details are as follows: Here we mainly use python to analyze personal WeChat friends and output the results into an html document. The main python packages used are itchat, pandas, pyecharts, etc. 1. Install itchat WeChat python sdk, used to obtain personal friend relationships. The obtained code is as follows: import itchatimport pandas as pdfrom pyecharts import Geo, Baritchat.login()friends = itchat.get_friends(update=True)[0:]def User2dict(User): User_dict = {} User_dict["NickName"] = User["NickName"] if User["NickName"] else "NaN" User_dict["City"] = User["City"] if User["City"] else "NaN" User_dict["Sex"] = User[ "Sex"] if User["Sex"] else 0 User_dict["Signature"] = User["Signature"] if User["Signature"] else "NaN" ……

Open based on WeChat The personal account interface python library itchat realizes the acquisition of WeChat friends and performs data analysis on province, gender, and WeChat signature. Effect: Directly upload the code, create three empty text files stopwords.txt, newdit.txt, and unionWords.txt, download the font simhei.ttf or delete the code required by the font, and you can run it directly. #wxfriends.py 2018-07-09import itchatimport sysimport pandas as pdimport matplotlib.pyplot as pltplt.rcParams['font.sans-serif']=['SimHei']#Chinese plt.rcParams['axes.unicode_minus can be displayed when drawing ']=False#Chinese can be displayed when drawing import jiebaimport jieba.posseg as psegfrom scipy.misc import imreadfrom wordcloud import WordCloudfrom os import path#Solve the encoding problem non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd ) #Get friend information def getFriends():...

Python data analysis of Shuangseqiu predicts the next lottery result based on linear regression algorithm.

This article describes the example of Python data analysis of Shuangseqiu based on Linear regression algorithm predicts the next lottery result. I would like to share it with you for your reference. The details are as follows: I have talked about the various algorithms of Shuangseqiu in the past. Here we will predict the numbers of Shuangseqiu in the next period. It is a little exciting to think about it. The linear regression algorithm is used in the code. This algorithm is used in this scenario, and the prediction effect is average. You can consider using other algorithms to try the results.

I found that a lot of code was repetitive work before. In order to make the code look more elegant, I defined a function and called it, and I suddenly became better #!/usr/bin/python# -*- coding:UTF-8 -*- #Import the required packages import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport operatorfrom sklearn import datasets, linear_model from sklearn.linear_model import LogisticRegression #Read file d...

The above is the Python data introduced this time All related content of the e-book, I hope the resources we have compiled can help everyone, thank you for your support of Guigui.

Note·How to obtain: private message (666)

How much interest does a Ping An Credit Card charge for 5,000 yuan per month?

Credit card limit too low

What kind of business is Lakala's "paying back for you"?

How to apply for suspension of credit card of China Merchants Bank?

Credit card minimum repayment interest refund process

Everbright credit card zone

What are the requirements for Shenzhen Ping An Bank credit card application?

How much is the interest of Huaxia Bank's credit card withdrawal?

China Merchants Bank Credit Card QQ Wallet Campus Card, is there a limit?

How long does it take to apply for opening a credit card counter online?