资讯详情

Datawhale数据分析第二章第二节:数据重构1

,这部分非常重要。只有当数据变得相对干净时,我们才能更有力地分析数据。在本节中,我们需要做的是数据重构,它仍然属于数据理解(准备)的范围。

开始前,导入numpy、pandas包和数据

import numpy as np import pandas as pd 
text=pd.read_csv('result.csv') text.head(2) 
Unnamed: 0 PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C

2 第二章:数据重构

2.4 数据的合并

2.4.1 任务一:将data文件夹中的所有数据都被载入并观察数据之间的关系

text_left_up = pd.read_csv("data/train-left-up.csv") text_left_down = pd.read_csv("data/train-left-down.csv") text_right_up = pd.read_csv("data/train-right-up.csv") text_right_down = pd.read_csv("data/train-right-down.csv") 
text_left_up.head(1) 
PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
text_left_down.head(1) 
PassengerId Survived Pclass Name
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson
text_right_up.head(1) 
Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1 0 A/5 21171 7.25 NaN S
text_right_down.head(1)
Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 31.0 0 0 C.A. 18723 10.5 NaN S

【提示】结合之前我们加载的train.csv数据,大致预测一下上面的数据是什么

2.4.2:任务二:使用concat方法:将数据text-left-up.csv和text-right-up.csv横向合并为一张表,并保存这张表为result_up

result_up = pd.concat([text_left_up,text_right_up],axis=1)
result_up.head(1)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.25 NaN S

2.4.3 任务三:使用concat方法:将text-left-down和text-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。

result_down = pd.concat([text_left_down,text_right_down],axis=1)
result_down.head(1)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson male 31.0 0 0 C.A. 18723 10.5 NaN S
result = pd.concat([result_up,result_down])
result.head(1)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.25 NaN S

2.4.4 任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务

#join 
tt1=text_left_up.join(text_right_up) 
tt2=text_left_down.join(text_right_down) 
tt1.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
tt2.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson male 31.0 0 0 C.A. 18723 10.50 NaN S
1 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield) female 45.0 1 1 F.C.C. 13529 26.25 NaN S
# append
tt2_whole= tt1.append(tt2)
tt2_whole.describe()
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

2.4.5 任务五:使用Panads的merge方法和DataFrame的append方法:完成任务二和任务三的任务

text_left_up.head(2)
PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th...
text_right_up.head(2)
Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1 0 A/5 21171 7.2500 NaN S
1 female 38.0 1 0 PC 17599 71.2833 C85 C
ddup = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True) #同时将行索引作为连接键
dddown = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)
dd_whole = ddup.append(dddown)
dd_whole.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C

【思考】对比merge、join以及concat的方法的不同以及相同。思考一下在任务四和任务五的情况下,为什么都要求使用DataFrame的append方法,如何只要求使用merge或者join可不可以完成任务四和任务五呢?

2.4.6 任务六:完成的数据保存为result.csv

#写入代码
dd_whole.to_csv('result.csv')

2.5 换一种角度看数据

2.5.1 任务一:将我们的数据变为Series类型的数据

df_whole = pd.read_csv('result.csv')
df_whole.head(2)
Unnamed: 0 PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
news = df_whole.stack().head(20)
news
0  Unnamed: 0                                                     0
   PassengerId                                                    1
   Survived                                                       0
   Pclass                                                         3
   Name                                     Braund, Mr. Owen Harris
   Sex                                                         male
   Age                                                         22.0
   SibSp                                                          1
   Parch                                                          0
   Ticket                                                 A/5 21171
   Fare                                                        7.25
   Embarked                                                       S
1  Unnamed: 0                                                     1
   PassengerId                                                    2
   Survived                                                       1
   Pclass                                                         1
   Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
   Sex                                                       female
   Age                                                         38.0
   SibSp                                                          1
dtype: object

标签: bradley端子块定时继电器

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

锐单商城 - 一站式电子元器件采购平台