Day 01 典型統計應用在社群媒體分析(Classical statistics applied to social data) part 1


source:https://github.com/DrSkippy/Data-Science-45min-Intros/blob/master/classical-stats-and-social-data-101/classical-stats-and-social-data-101.ipynb

大綱

二項分布(Binomial distribution) 是間斷變數 k 的函數。它的數值 B(k; p, N) 是在成功機率是 p, 的 N 次試驗下, 觀察到 k 次成功的機率
$$B(k;p,N) = \frac{N!}{k!(N-k)!}p^k(1-p)^{N-k}$$

一連串擲銅板過程 銅板落下來後所顯示的結果的分布 是 p=0.5 的二項分布

讓我們從檢視詳細分布 記住這是一個間斷變數的函數

# matplotlib plots are placed inline
%matplotlib inline  

# standard matplotlib and numpy imports
import matplotlib.pyplot as plt
import numpy as np

# use scipy.stats to define the distribution
import scipy
from scipy import stats

N = 10 # number of coin flips in a set
p = 0.5 # probability of head

x = scipy.linspace(0,N,N+1) # create bins
pmf = scipy.stats.binom.pmf(x,N,p) 
# "pmf" => probability mass function, which (in a snowstorm of ill-used vocabulary) is in this case what we would usually call
# the probability density function

plt.bar(x,pmf)

注意上圖中的單元正規化。 現在讓我們檢視從這個分布中隨機取得數值 建立的長條圖 這長條圖代表有限次數擲銅板集合。

N = 10 # number of trials ("coin-flips") in a set
p = 0.5 # probability of success ("heads")
size = 50 # number of sets of coin flips

np.random.binomial(N,p,size) # number of heads in a set of coin flips
# how many sets of trials are required to make the approximation appear visually identical to the exact distribution?
size = 10

data = np.random.binomial(N,p,size)
n_bins = N
binned_data, bins, patches = plt.hist(data, n_bins, range = (0,10), normed=True)

讓我們回到前面的分布,並加入參數。 確認調整好成功機率和試驗次數。

N = 100 # number of trials in a set
p = 0.5 # probability of success 

x = scipy.linspace(0,N,N+1) # create bins
pmf = scipy.stats.binom.pmf(x,N,p)
plt.bar(x,pmf)

問題: 你有預料到這分布代表社群媒體變數的行為?







留言討論