2017年7月27日 星期四

Python - Convert String to datetime object and Convert datetime to String and timedalta - 字串轉日期、日期轉字串、日的加減

Version

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10

Code:

from datetime import datetime,timedelta
string_date = '20170727'
# Convert String to datetime object - 字串轉日期
datetime_date = datetime.strptime(string_date, "%Y%m%d").date()

# 日期的加減法
date_sub_1 = datetime_date - timedelta(1)

# Convert datetime to String - 日期轉字串 
date_sub_1_string = date_sub_1.strftime("%Y-%m-%d")

print('datetime_date :',datetime_date,'\t,type :',type(datetime_date))
print('date_sub_1 :',date_sub_1,'\t,type :',type(date_sub_1))
print('date_sub_1_string :',date_sub_1_string,'\t,type :',type(date_sub_1_string))

Result:

datetime_date : 2017-07-27     ,type : <class 'datetime.date'>
date_sub_1 : 2017-07-26     ,type : <class 'datetime.date'>
date_sub_1_string : 2017-07-26     ,type : <class 'str'>

Python - Convert list to dictionary with indexes - 將list轉為字典且將index設為字典的key

Version

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10

Code:

list1 = ['A','B','C']
dic1 = {k:v for k,v in enumerate(list1)}
dic1

Result:

{0: 'A', 1: 'B', 2: 'C'}

Python Regular Expression example - re.search & re.match - Python正規表示式使用

版本:

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas version :0.19.2
System version :Windows 10

Codes:

import re
s1 = 'abc123def'
print(re.search('abc',s1))
print(re.match('abc',s1))
print(re.search('123',s1))
print(re.match('123',s1))
print(re.search('abc',s1).group())
print(re.match('abc',s1).group())

執行結果:

<_sre.SRE_Match object; span=(0, 3), match='abc'>    
<_sre.SRE_Match object; span=(0, 3), match='abc'>    
<_sre.SRE_Match object; span=(3, 6), match='123'>    
None    
abc     
abc

2017年7月26日 星期三

Python - Looking up the list of sheets in an excel file - 找出指定excel檔的sheet name

Version

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas version :0.19.2
System version :Windows 10

Codes

import pandas as pd
file_path=r'E:\download\tmp\test.xlsx'
sheet_name_list = pd.ExcelFile(file_path).sheet_names
sheet_name_list

執行結果:

['sheet1', 'sheet2', 'sheet3']

Python - How can I list the contents of a directory - 取得路徑下檔案名稱與資料夾名稱

Version

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
分別取得資料夾路徑與檔案路徑

Code:

import os
dir_path = r'E:\download\tmp\test'
# 取得路徑下的檔案名稱與資料夾名稱
dir_list = os.listdir(dir_path)

# 取得路徑下的檔案名稱與完整路徑
only_file=[dir_path+'\\'+name for name in os.listdir(dir_path) if os.path.isfile(dir_path+'\\'+name) ]
# 取得路徑下的資料夾名稱與完整路徑
only_dir=[dir_path+'\\'+name for name in os.listdir(dir_path) if os.path.isdir(dir_path+'\\'+name) ]
# 取的路徑下的 file 名稱
only_file2 = [ele for ele in dir_list if os.path.isfile(os.path.join(dir_path,ele))]
# 取的路徑下的 dir 名稱
only_dir2 = [ele for ele in dir_list if os.path.isdir(os.path.join(dir_path,ele))]

print(dir_list)
print('')
print(only_file)
print(only_dir)
print('')
print(only_file2)
print(only_dir2)

Result:

['dir1', 'dir2', 'file1.txt', 'file2.txt']

['E:\\download\\tmp\\test\\file1.txt', 'E:\\download\\tmp\\test\\file2.txt']
['E:\\download\\tmp\\test\\dir1', 'E:\\download\\tmp\\test\\dir2']

[ 'file1.txt', 'file2.txt']
['dir1', 'dir2']
將深層的檔案路徑加到list中

Code:

import os
dir_path = r'E:\download\tmp2'
file_list=[]
for dirPath, dirNames, fileNames in os.walk(dir_path):
    for f in fileNames:
        file_list.append(os.path.join(dirPath, f))
print(file_list)

Result:

['E:\\download\\tmp2\\2017\\1\\1\\20170101_1.txt',
 'E:\\download\\tmp2\\2017\\1\\1\\20170101_2.txt',
 'E:\\download\\tmp2\\2017\\1\\2\\20170102_1.txt',
 'E:\\download\\tmp2\\2017\\1\\2\\20170102_2.txt']

2017年7月18日 星期二

Python - How to Include image in jupyter notebook? How to download image from url? - 在Jupyter上顯示本地端與網路上圖片、下載圖片到本地端

Version:

Python version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version:Windows 10

方法一

Code:

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

def mkdir(path):
    import os
    if  os.path.exists(path) == False:
        os.makedirs(path)
        print('mkdir:',path)
    elif  os.path.exists(path) == True:
        print('dir already exist:',path)

dir_path = './image/'
file_anme = 'google.jpg'
pic_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2f/Google_2015_logo.svg/1200px-Google_2015_logo.svg.png"
mkdir(dir_path)

# 下載圖片 download image
import requests
with open(dir_path + file_anme, 'wb') as handle:
        response = requests.get(pic_url, stream=True)
        if not response.ok:
            print (response)
        for block in response.iter_content(1024):
            if not block:
                break
            handle.write(block)

# 顯示網路圖片
from IPython.display import Image
from IPython.core.display import HTML 
# 顯示網路圖片(from url)
Image(url= pic_url)
# 顯示本地圖片(from local filesystem)
Image(url= dir_path + file_anme)

方法二

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2f/Google_2015_logo.svg/1200px-Google_2015_logo.svg.png">

Result:

2017年7月14日 星期五

Python - Printing to screen and writing to a file at the same time - 將結果同時輸出到console和存到file

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
import sys
import time
from datetime import datetime
class tee(object):
    def __init__(self, *files):
        self.files = files
    def write(self, obj):
        for f in self.files:
            f.write(obj)
            f.flush() # 如果想要output可以即時可視,若註解需等全部結果跑完才會在console與檔案看見
    def flush(self):
        pass
log_path =  './log-%s.txt' % (datetime.now().strftime('%Y-%m-%dT%H_%M_%S'))

# output only on screen
print('output only on screen:')
print(123)
with open(log_path,'w') as f:
    sys.stdout=tee(sys.stdout, f)
    # output on both console and file
    print('output on both console and file:')
    print(456)
    time.sleep(5)
執行結果:
``` # 在console可以看到以下結果 output only on screen: 123 output on both console and file: 456 # 在file可以看到以下結果 output on both console and file: 456 ```

2017年7月11日 星期二

Python - How to write a list to a file and read it as a list type using json - list與file的讀寫IO

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas version :0.19.2
System version :Windows 10
# write to file
import json
file_path='./dic1.json'
dic1 = {'list1':[1,2,3,4]}
dic1
with open(file_path, 'w') as f:  
    json.dump(dic1, f)

# read from file
import json
file_path='./dic1.json'
with open(file_path) as f:
    dic2 = json.load(f)
list2=dic2['list1']
list2
執行結果:
``` [1, 2, 3, 4] ```

2017年7月10日 星期一

Python - How to append new rows from pandas dataframe to existing excel - 如何將新資料append到已存在的excel檔

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas version :0.19.2
System version :Windows 10
import pandas as pd
import os
from datetime import datetime
# def
# create dir
def mkdir(path):
    import os
    if  os.path.exists(path) == False:
        os.makedirs(path)
        print('mkdir:',path)
    elif  os.path.exists(path) == True:
        print('dir already exist:',path)
# 建立樣本資料   
number = [1,2,3,4,5]
sex = ['male','female','female','female','male']
df_new = pd.DataFrame()
df_new['number'] = number
df_new['sex'] = sex
df_new

# excel存放的資料夾
file_dir='E:/download/tmp/python/%s/' % (datetime.now().strftime("%Y%m%d"))
# 若資料夾不存在,建立指定資料夾
mkdir(file_dir)
file_out=file_dir+'test.xlsx'
# 如果檔案不存在,建立檔案
if  os.path.exists(file_out) == False:
    df_new.to_excel(file_out,index=False)
# 如果檔案存在,append
elif  os.path.exists(file_out) == True:
    df_old = pd.read_excel(file_out)
    df_combine = df_old.append(df_new)
    df_combine.to_excel(file_out,index=False)
執行結果:

Python - Using a for loop to add values in a new list - 運用迴圈快速建立list

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
new_list = ['item' + str(a)  for a in range(5)]
new_list
執行結果:
['item0', 'item1', 'item2', 'item3', 'item4']

python - Command line arguments in python - Python獲取命令列參數的方法

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
將以下程式碼存成 test.py
from sys import argv
print('argv')
print(argv)
print(type(argv))
print('argv[1]')
print(argv[1])
print(type(argv[1]))
print('argv[2]')
print(argv[2])
print(type(argv[2]))
執行結果:
python test.py  abc  123
argv
['test.py', 'abc', '123']
<class 'list'>
argv[1]
abc
<class 'str'>
argv[2]
123
<class 'str'>

2017年7月7日 星期五

Python - How to get user input? - 透過input函數與使用者互動

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
repeat = True
while(repeat):
    str_press = input('Print \'Hello world\' if you press y  , don not print \'Hello world\' if you press n\n')
    if str_press=='y':
        repeat = False
        print('Hello world!')
    elif str_press=='n':
        repeat = False
        print('Nothing happened.')
    else:
        print('Please press y or n.')
執行結果:
Print 'Hello world' if you press y  , don not print 'Hello world' if you press n
y
Hello world!

# or
Print 'Hello world' if you press y  , don not print 'Hello world' if you press n
n
Nothing happened.

# or
Print 'Hello world' if you press y  , don not print 'Hello world' if you press n
h
Please press y or n.
Print 'Hello world' if you press y  , don not print 'Hello world' if you press n

Python - def mkdir - How can I create a directory if it does not exist - 使用python建立資料夾

Python version :Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version :Windows 10
# create dir
def mkdir(path):
    import os
    if  os.path.exists(path) == False:
        os.makedirs(path)
        print('mkdir:',path)
    elif  os.path.exists(path) == True:
        print('dir already exist:',path)
mkdir(r'E:\download\tmp\test123')
執行結果:
mkdir: E:\download\tmp\test123 or dir already exist: E:\download\tmp\test123

Python - How to sort a dataFrame in python pandas by two or more columns - 如何在pandas dataframe使用兩個或兩個以上的column排序資料

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10
import pandas as pd
import numpy as np
# set seed
set_random = np.random.RandomState(123)
length =6
dates = pd.date_range('1/1/2000', periods=length)
df_sample = pd.DataFrame(np.round(set_random.uniform(1,2,(length,2)) ), index=dates, columns=['A', 'B'])
df_sample

print('before sort')
print(df_sample)

print('after sort ascending')
df_sample = df_sample.sort_values(by=['A','B'],ascending=True)
print(df_sample)

print('after sort descending')
df_sample = df_sample.sort_values(by=['A','B'],ascending=False)
print(df_sample)
執行結果:

before sort
              A    B
2000-01-01  2.0  1.0
2000-01-02  1.0  2.0
2000-01-03  2.0  1.0
2000-01-04  2.0  2.0
2000-01-05  1.0  1.0
2000-01-06  1.0  2.0
after sort ascending
              A    B
2000-01-05  1.0  1.0
2000-01-02  1.0  2.0
2000-01-06  1.0  2.0
2000-01-01  2.0  1.0
2000-01-03  2.0  1.0
2000-01-04  2.0  2.0
after sort descending
              A    B
2000-01-04  2.0  2.0
2000-01-01  2.0  1.0
2000-01-03  2.0  1.0
2000-01-02  1.0  2.0
2000-01-06  1.0  2.0
2000-01-05  1.0  1.0

2017年7月6日 星期四

Python - Using vars() to assign a string to a variable - 使用vars()將字串指定為變數名稱

系統環境

Python version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version:Windows 10
vars()的使用方法如下,非常實用,目前我較常使用的場景為在迴圈中指定變數新值,而變數名稱隨迴圈中的變數a改變。

Code:

# 將 a 指定給變數var_name_a
for a in range(1,4):
    vars()['var_name_'+str(a)]=a
print(var_name_1)
print(var_name_2)
print(var_name_3)

Result:

1
2
3

Python - How to change the font size on a matplotlib pyplot - 如何改變matplotlib pyplot 顯示的字體大小

Python version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
matplotlib version:2.0.0
System version:Windows 10
# before change
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = (8,5)
plt.plot([1,2,3,4],[1,2,1,3])
plt.title('Series trend')
plt.show()
# after change
from matplotlib import pyplot as plt
fontsize_set = 18
plt.rcParams["figure.figsize"] = (8,5)
plt.rc('xtick' , labelsize=fontsize_set) 
plt.rc('ytick' , labelsize=fontsize_set)  
plt.rc('legend', fontsize=fontsize_set) 
plt.rc('font'  , size=fontsize_set)
plt.plot([1,2,3,4],[1,2,1,3])
plt.title('Series trend')
plt.show()
Result:
before change
after change

Python - pandas dataframe and csv read / write- pandas dataframe 與csv的讀寫IO

版本相關資訊:

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10

將 dataframe 寫入 csv 檔案

import pandas as pd
# Windows路徑的寫法
file_path=r'E:\download\tmp\test.csv'
# 相對路徑的寫法
file_path='./tmp/test.csv'
data.to_csv(file_path,sep=',',index=False)

讀入 csv 檔案,轉成dataframe

import pandas as pd
# Windows路徑的寫法
file_path=r'E:\download\tmp\test.csv'
# 相對路徑的寫法
file_path='./tmp/test.csv'
df_name= pd.read_csv(file_path,sep=',')
如果輸入時,出現以下錯誤: OSError: Initializing from file failed 將engine由C改為python
import pandas as pd 
file_path=r'E:\download\tmp\test.csv' 
df_name= pd.read_csv(file_path ,engine='python')
若要修改輸出的編碼:
import pandas as pd 
# utf-8
file_path=r'E:\download\tmp\test.csv' df_name.to_csv(file_path, encoding ='utf-8') 
# Big5
file_path=r'E:\download\tmp\test.csv' df_name.to_csv(file_path, encoding ='Big5')

2017年7月5日 星期三

Python - dataframe apply - Using conditional to generate new column in pandas dataframe - 在dataframe新建column以現有的其他column的value為條件

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10
在資料分析中,時常會遇到要將類別資料與數值資料間轉換的情況
numerical variable to categorical variable or categorical variable to numerical variable
import pandas as pd
# create dataframe
number = [1,2,3,4,5]
sex = ['male','female','female','female','male']
df_new = pd.DataFrame()
df_new['number'] = number
df_new['sex'] = sex
df_new.head()

# create def for category to number 0/1
def tran_cat_to_num(df):
    if df['sex'] == 'male':
        return 1
    elif df['sex'] == 'female':
        return 0
# create sex_new 
df_new['sex_new']=df_new.apply(tran_cat_to_num,axis=1)
df_new
執行結果:

Python - matplotlib pyplot - How to change figure size of Matplotlib pyplot - 更改Matplotlib pyplot 的畫布大小

Python version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
matplotlib version:2.0.0
System version:Windows 10
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = (8,5)
plt.plot([1,2,3,4],[1,2,1,3])
plt.show()
Result:

Python - pandas dataframe loc - Selecting pandas data using loc - 利用index與column對dataframe取特定值或特定範圍內的所有值

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10
import pandas as pd
import numpy as np
dates = pd.date_range('1/1/2000', periods=8)
df_sanple = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df_sanple
df_sanple.loc['2000-01-01','D']
df_sanple.loc['2000-01-01':'2000-01-03','C':'D']
執行結果:

2017年7月4日 星期二

Python - How to display full output in Jupyter notebook , not only last result? - 更改 jupyter notebook 或 ipython notebook互動顯示模式

Python Version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System Version:Windows 10
InteractiveShell.ast_node_interactivity的預設值為last_expr
改成all可以將過程的互動全部秀出,我覺得互動是jupyter notebook很棒的功能,特別是在測試的時候,coding變數名稱,不用print,就可以看到結果。在印出pandas的dataframe,也更美觀。

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# 改回default的語法
# InteractiveShell.ast_node_interactivity = "last_expr"

Python - How to construct pandas dataframe from dictionary in list - 將清單中的字典轉換為pandas dataframe的row

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10

import pandas as pd
dic1 = {'name':'dondon','number':1,'hobby':'carnivorous plant'}
dic2 = {'name':'Jay','number':2,'hobby':'singing'}
list1 = [dic1,dic2]
df_collect = pd.DataFrame()
df_temp = pd.DataFrame()
for dic_temp in list1:
    for key in dic_temp.keys():
        df_temp[key]=[dic_temp[key]]
    df_collect = df_collect.append(df_temp, ignore_index=True)
df_collect
執行結果:

Python - How to get and format current time in python - 取得現在時間並依需求格式化

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
系統版本:Windows 10
from datetime import datetime
datetime.now().strftime("%Y-%m-%d %H:%M:%S")
執行結果:
'2017-07-04 11:20:00'

Python - String-字串常用方法

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
系統版本:Windows 10
字串切割,透過指定符號,常用來剖析網址,以下為剖析氣象局網址
# 字串切割,得到list
Str1 = 'http://www.cwb.gov.tw/V7/forecast/'
Str_list = Str1.split('/')
print(Str_list)
print(Str_list[4])
執行結果:
['http:', '', 'www.cwb.gov.tw', 'V7', 'forecast', '']
forecast

2017年7月3日 星期一

Python - def -Time Seconds to h:m:s - 將秒轉換為小時,分鐘,秒

環境

Python version:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
System version:Windows 10

程式

# 中文提供時分秒
def secToHMS(secIn):
    h   = int(secIn / 3600)
    m  = int(( secIn -  h*3600 ) / 60)
    s    = float(secIn)  -  h*3600  - m*60
    return '花費%d時,%d分,%.1f秒' % (h,m,s)

# 英文提供時days:hours:minutes:seconds
def secToHMS_v2(secIn):
    m, s = divmod(secIn, 60)
    h, m = divmod(m, 60)
    d, h = divmod(h, 24)
    return 'Spend - days:hours:minutes:seconds = '+'%02d:%02d:%02d:%02d' % (d, h, m, s)

使用方式

import time
start_time = time.time()
print('start_time:',time.strftime("%H:%M:%S"))
# your code
time.sleep(3)
end_time = time.time()
print('end_time:',time.strftime("%H:%M:%S"))
print(secToHMS(end_time-start_time))
print(secToHMS_v2(end_time-start_time))

執行結果

start_time: 16:39:20
end_time: 16:39:23
花費0時,0分,3.0秒
Spend - days:hours:minutes:seconds = 00:00:00:03

Python - pandas dataframe and excel (xlsx) read / write- pandas dataframe 與excel的讀寫IO

Version

Python版本:Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Pandas版本:0.19.2
系統版本:Windows 10

Codes

將 dataframe 寫入excel檔案

import pandas as pd
file_path=r'E:\download\tmp\test.xlsx'
# 相對路徑的寫法
file_path='./tmp/test.xlsx'
df_name.to_excel(file_path,sheet_name='Sheet1',index=False)

讀入excel檔案,轉成dataframe

import pandas as pd
# Windows路徑的寫法
file_path=r'E:\download\tmp\test.xlsx'
# 相對路徑的寫法
file_path='./tmp/test.xlsx'
df_name= pd.read_excel(file_path,sheet_name='Sheet1')