Python Day 6

3199 ワード

ccc = dict() #create a dictionary
ccc['csev'] = 1 #Key csev Count 1
ccc['cwen'] = 1 #Key cwen Count 1
ccc['csev'] = ccc['cesv'] + 1 #Key csev count + 1
print(ccc)

{'csev': 1, 'cwen': 2}
Key csev has value 1 and Key cwen has value 2.
 
  
counts = dict()
names = ['csev', 'cwen', 'csev', 'zaqian', 'cwen']
for name in names:
    if name not in counts:
        counts[name] = 1
    else:
        counts[name] = counts[name] + 1
print(counts)
x = counts.get('csev', 0) #2
x = counts.get('bob', 0) #0 if not found, default is 0
counts = dict()
names = ['csev', 'cwen', 'csev', 'zaqian', 'cwen']
for name in names:
    counts[name] = counts.get(name, 0) + 1
print(counts)
jjj = {'chuck' : 1, 'fred' : 42, 'jan' : 100}
for aaa,bbb in jjj.items():
    print(aaa, bbb)
#Iteration two variables: aaa is the Key, bbb is the Value.

Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the "From ' line by finding the time and then splitting the string a second time using a colon. 

From [email protected] Sat Jan 5 09:14:16 2008

Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.

name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
hours = dict()
counts = list()
for line in handle:
    line = line.rstrip()
    if not line.startswith('From '): continue
    words = line.split()
    word = words[5].split(":")[0]
    hours[word] = hours.get(word, 0) + 1
for key, value in hours.items(): #.item() convert dictionary into tuple list.
    newtup = (key, value)
    counts.append(newtup)
counts = sorted(counts)    
for key, value in counts:
    print(key, value)
Python Regular Expression Quick Guide

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times 
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times 
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end
[^ ]* # 0 or more none-space characters.