06_Split data by dates

3808 ワード

Split data by dates


日付に基づいてすべてのデータを分割
たとえば、2010/01/14のすべてのデータを表示するために、データは日付で格納されます.

Step 1. Read all data from each category

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

DF_chrome = pd.read_csv('dataset/chrome.csv',header=[0])
DF_firefox = pd.read_csv('dataset/firefox.csv',header=[0])
DF_dns2tcp = pd.read_csv('dataset/dns2tcp.csv',header=[0])
DF_dnscat2 = pd.read_csv('dataset/dnscat2.csv',header=[0])
DF_iodine = pd.read_csv('dataset/iodine.csv',header=[0])

Step 2. Save into one dataframe

DF_all = DF_chrome.append(DF_firefox).append(DF_dns2tcp).append(DF_dnscat2).append(DF_iodine)

Step 3. Split time into 2parts ["Dates"]&["Sec"]

DF_all[["Dates","Sec"]] = DF_all["TimeStamp"].str.split(" ",expand=True)

Step 4. Count how many duplicated dates are exist

from collections import Counter
counts = dict(Counter(DF_all['Dates']))
duplicates_dates = {key:value for key, value in counts.items()}
print(duplicates_dates)
Output

Step 5. Save same dates of data into individual Data Frame

df_2020_01_14 = DF_all.loc[DF_all['Dates'] == '2020-01-14']
df_2020_01_13 = DF_all.loc[DF_all['Dates'] == '2020-01-13']
df_2020_01_12 = DF_all.loc[DF_all['Dates'] == '2020-01-12']
df_2019_12_10 = DF_all.loc[DF_all['Dates'] == '2019-12-10']
df_2019_12_11 = DF_all.loc[DF_all['Dates'] == '2019-12-11']
df_2019_12_13 = DF_all.loc[DF_all['Dates'] == '2019-12-13']
df_2019_12_14 = DF_all.loc[DF_all['Dates'] == '2019-12-14']
df_2019_12_15 = DF_all.loc[DF_all['Dates'] == '2019-12-15']
df_2019_12_16 = DF_all.loc[DF_all['Dates'] == '2019-12-16']
df_2019_12_17 = DF_all.loc[DF_all['Dates'] == '2019-12-17']
df_2019_12_09 = DF_all.loc[DF_all['Dates'] == '2019-12-09']
df_2019_12_19 = DF_all.loc[DF_all['Dates'] == '2019-12-19']
df_2019_12_20 = DF_all.loc[DF_all['Dates'] == '2019-12-20']
df_2020_04_01 = DF_all.loc[DF_all['Dates'] == '2020-04-01']
df_2020_03_31 = DF_all.loc[DF_all['Dates'] == '2020-03-31']
df_2020_03_30 = DF_all.loc[DF_all['Dates'] == '2020-03-30']
df_2020_03_25 = DF_all.loc[DF_all['Dates'] == '2020-03-25']
df_2020_03_24 = DF_all.loc[DF_all['Dates'] == '2020-03-24']
df_2020_03_28 = DF_all.loc[DF_all['Dates'] == '2020-03-28']
df_2020_03_23 = DF_all.loc[DF_all['Dates'] == '2020-03-23']
df_2020_03_29 = DF_all.loc[DF_all['Dates'] == '2020-03-29']
df_2020_03_27 = DF_all.loc[DF_all['Dates'] == '2020-03-27']
df_2020_03_26 = DF_all.loc[DF_all['Dates'] == '2020-03-26']
df_2020_03_20 = DF_all.loc[DF_all['Dates'] == '2020-03-20']
df_2020_03_21 = DF_all.loc[DF_all['Dates'] == '2020-03-21']
df_2020_03_19 = DF_all.loc[DF_all['Dates'] == '2020-03-19']
df_2020_03_22 = DF_all.loc[DF_all['Dates'] == '2020-03-22']
df_2020_03_18 = DF_all.loc[DF_all['Dates'] == '2020-03-18']

Step 6. Check the values



Number of all data from 2020-03-18 are total 6416.