06_Split data by dates
3808 ワード
Split data by dates
日付に基づいてすべてのデータを分割
たとえば、2010/01/14のすべてのデータを表示するために、データは日付で格納されます.
Step 1. Read all data from each category
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
DF_chrome = pd.read_csv('dataset/chrome.csv',header=[0])
DF_firefox = pd.read_csv('dataset/firefox.csv',header=[0])
DF_dns2tcp = pd.read_csv('dataset/dns2tcp.csv',header=[0])
DF_dnscat2 = pd.read_csv('dataset/dnscat2.csv',header=[0])
DF_iodine = pd.read_csv('dataset/iodine.csv',header=[0])
Step 2. Save into one dataframe
DF_all = DF_chrome.append(DF_firefox).append(DF_dns2tcp).append(DF_dnscat2).append(DF_iodine)
Step 3. Split time into 2parts ["Dates"]&["Sec"]
DF_all[["Dates","Sec"]] = DF_all["TimeStamp"].str.split(" ",expand=True)
Step 4. Count how many duplicated dates are exist
from collections import Counter
counts = dict(Counter(DF_all['Dates']))
duplicates_dates = {key:value for key, value in counts.items()}
print(duplicates_dates)
OutputStep 5. Save same dates of data into individual Data Frame
df_2020_01_14 = DF_all.loc[DF_all['Dates'] == '2020-01-14']
df_2020_01_13 = DF_all.loc[DF_all['Dates'] == '2020-01-13']
df_2020_01_12 = DF_all.loc[DF_all['Dates'] == '2020-01-12']
df_2019_12_10 = DF_all.loc[DF_all['Dates'] == '2019-12-10']
df_2019_12_11 = DF_all.loc[DF_all['Dates'] == '2019-12-11']
df_2019_12_13 = DF_all.loc[DF_all['Dates'] == '2019-12-13']
df_2019_12_14 = DF_all.loc[DF_all['Dates'] == '2019-12-14']
df_2019_12_15 = DF_all.loc[DF_all['Dates'] == '2019-12-15']
df_2019_12_16 = DF_all.loc[DF_all['Dates'] == '2019-12-16']
df_2019_12_17 = DF_all.loc[DF_all['Dates'] == '2019-12-17']
df_2019_12_09 = DF_all.loc[DF_all['Dates'] == '2019-12-09']
df_2019_12_19 = DF_all.loc[DF_all['Dates'] == '2019-12-19']
df_2019_12_20 = DF_all.loc[DF_all['Dates'] == '2019-12-20']
df_2020_04_01 = DF_all.loc[DF_all['Dates'] == '2020-04-01']
df_2020_03_31 = DF_all.loc[DF_all['Dates'] == '2020-03-31']
df_2020_03_30 = DF_all.loc[DF_all['Dates'] == '2020-03-30']
df_2020_03_25 = DF_all.loc[DF_all['Dates'] == '2020-03-25']
df_2020_03_24 = DF_all.loc[DF_all['Dates'] == '2020-03-24']
df_2020_03_28 = DF_all.loc[DF_all['Dates'] == '2020-03-28']
df_2020_03_23 = DF_all.loc[DF_all['Dates'] == '2020-03-23']
df_2020_03_29 = DF_all.loc[DF_all['Dates'] == '2020-03-29']
df_2020_03_27 = DF_all.loc[DF_all['Dates'] == '2020-03-27']
df_2020_03_26 = DF_all.loc[DF_all['Dates'] == '2020-03-26']
df_2020_03_20 = DF_all.loc[DF_all['Dates'] == '2020-03-20']
df_2020_03_21 = DF_all.loc[DF_all['Dates'] == '2020-03-21']
df_2020_03_19 = DF_all.loc[DF_all['Dates'] == '2020-03-19']
df_2020_03_22 = DF_all.loc[DF_all['Dates'] == '2020-03-22']
df_2020_03_18 = DF_all.loc[DF_all['Dates'] == '2020-03-18']
Step 6. Check the values
Number of all data from 2020-03-18 are total 6416.
Reference
この問題について(06_Split data by dates), 我々は、より多くの情報をここで見つけました https://velog.io/@kakasi18/06Split-data-by-datesテキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。
Collection and Share based on the CC Protocol