Python,Pandasで削除する方法(2)は、サンプルコードがあります

4131 ワード

pandasには主に3つの削除用の関数があります.drop()、.drop_duplicates()、.dropna().以下にまとめる.drop()行、列を削除する.drop_duplicates()重複データを削除する.dropna()空の値(行、列)を削除するには、長文化を避けるために、パラメータの説明を見たくない場合はインスタンスを直接見ることができます.本篇介绍drop_duplicates(), df.dropna

drop_duplicates()

df.drop_duplicates() , 。 , 。
:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
Return DataFrame with duplicate rows removed, optionally only considering certain columns.
#           df,          

Parameters:	
subset : column label or sequence of labels, optional
subset:             
Only consider certain columns for identifying duplicates, by default use all of the columns
#                ,          

keep : {‘first’, ‘last’, False}, default ‘first’(  'first')
first : Drop duplicates except for the first occurrence.
#             。
last : Drop duplicates except for the last occurrence.
#     (         )。
False : Drop all duplicates.
#        (     )
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy
#             

dic = {'A': [1, 2, 1, 1], 'B': [2, 5, 2, 4], 
       'C': [3, 5, 3, 10], 'D': [4, 9, 4, 5]}
df = pd.DataFrame(dic, index=['one', 'two', 'three', 'four'])
print(df)
       A  B   C  D
one    1  2   3  4
two    2  5   5  9
three  1  2   3  4
four   1  4  10  5


#                (     )
df.drop_duplicates()
#           (     )
df.drop_duplicates(subset='A')
#              (      )
df.drop_duplicates(keep='last')
#       (    )
df.drop_duplicates(keep=False)
#             
print(df)
#inplace=True ,         
df.drop_duplicates(inplace=True)
print(df)

dropna()

df.dropna , 。 , , 。
:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Remove missing values.
     
  
  Parameters:	
axis : {0 or ‘index’, 1 or ‘columns’}, default 0(  0)
Determine if rows or columns which contain missing values are removed.
#             
0, or ‘index’ : Drop rows which contain missing values.
#0  'index',        
1, or ‘columns’ : Drop columns which contain missing value.
#1  'columns',        
    
how : {‘any’, ‘all’}, default ‘any’(  'any'
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
#            ,      
‘any’ : If any NA values are present, drop that row or column.
#       ,        
‘all’ : If all values are NA, drop that row or column.
#               ,       

thresh : int, optional
Require that many non-NA values.
#         (thresh  ) na   。

subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
#           ,        ,            。
inplace : bool, default False
If True, do operation inplace and return None.
   True,         


df = pd.DataFrame({'name': ['Alfred', 'Batman', 'Catwoman', np.nan],
                  'toy': [np.nan, 'Batmobile', 'Bullwhip', np.nan],
                  'born': [pd.NaT, pd.Timestamp('1940-06-25'), pd.NaT, np.nan]})

print(df)

#            
df.dropna()
#           
df.dropna(axis=1)
df.dropna(axis='columns')
#            
df.dropna(how='all')
#              
df.dropna(subset=['name', 'toy'])
#           ,inplace=True        
print(df)
df.dropna(inplace=True)