モナドを使うとは〜Haskellの確率的プログラミング
- 例えば、ベルヌーイ事象を繰り返して、その結果がどうなるか、というのをHaskellでやるとする
- ベルヌーイ事象の確率質量分布をモナドで作っていれば、2回繰り返しは、モナドを作って、それを>>=で次に渡して、もうひとつのモナドと合体させてモナドを作ればよい
- それだけだったら、別に、モナドでなくても良さそうだ
*Main Lib Numeric.Probability.Distribution> (choose 0.4 0 1) >>= (\x -> choose 0.4 (x, 0) (x, 1)) fromFreqs [((1,1),0.36),((0,1),0.24),((1,0),0.24),((0,0),0.16000000000000003)]
- 2回繰り返しは以下のようになる
- 長くてちょっといやかも
- とくに、>>= (\x -> choose 0.4 (x, 0) (x, 1))を2回やっているところが、いかにも無駄
*Main Lib Numeric.Probability.Distribution> (choose 0.4 0 1) >>= (\x -> choose 0.4 (x, 0) (x, 1)) >>= (\x -> choose 0.4 (x,0) (x,1)) fromFreqs [(((1,1),1),0.216),(((0,1),1),0.144),(((1,0),1),0.144),(((1,1),0),0.144),(((0,0),1),9.600000000000002e-2),(((0,1),0),9.6e-2),(((1,0),0),9.6e-2),(((0,0),0),6.400000000000002e-2)]
*Main Lib Numeric.Probability.Distribution> import Control.Monad *Main Lib Numeric.Probability.Distribution Control.Monad> replicateM 2 (choose 0.4 0 1) fromFreqs [([1,1],0.36),([0,1],0.24),([1,0],0.24),([0,0],0.16000000000000003)] *Main Lib Numeric.Probability.Distribution Control.Monad> replicateM 3 (choose 0.4 0 1) fromFreqs [([1,1,1],0.216),([0,1,1],0.144),([1,0,1],0.144),([1,1,0],0.144),([1,0,0],9.600000000000002e-2),([0,0,1],9.6e-2),([0,1,0],9.6e-2),([0,0,0],6.400000000000002e-2)]
- ほらできた、ということらしい
- 使用例
- k <- [1..n] 1からnまでのそれぞれの場合kについて、リストを作る
- dという確率事象の単純繰り返しをk回行う
- D.map maximum で、k回試行の中の最大値をとり、その値についての確率質量分布にする
- そのD.expected (期待値を取る)
import Numeric.Probability.Distribution as D let expmaxseq n d = [D.expected $ D.map maximum $ replicateM k d | k <- [1..n]] expmaxseq 7 (D.uniform [1..6]) [3.5,4.472222222222223,4.958333333333335,5.2445987654321105,5.4309413580246915,5.560292352536542,5.65411736969124]
-
- 少し変えてみる。これはただの目の総和の期待値
*Main Lib Numeric.Probability.Distribution Numeric.Probability.Shape Control.Monad D> expsumseq 7 (D.uniform [1..6]) [3.5,7.0,10.499999999999995,14.000000000000036,17.49999999999999,21.000000000000497,24.499999999998735]
MCMCをやってみる
executable pfptest-exe hs-source-dirs: app main-is: Main.hs ghc-options: -threaded -rtsopts -with-rtsopts=-N build-depends: base , pfptest , probability , histogram-fill , random-fu , statistics , json , vector , random-source , mtl
-
- どうしてこうするかというと、なぞりたいこのページの冒頭に以下があるから
> import Control.Monad > import Control.Monad.State (evalState) > import Data.Histogram (asList) > import Data.Histogram.Fill > import Data.Histogram.Generic (Histogram) > import Data.List > import Data.Random > import Data.Random.Source.PureMT > import Statistics.Distribution hiding (mean) > import Statistics.Distribution.Normal > import Statistics.Distribution.Uniform > import Text.JSON > import qualified Data.Histogram as H > import qualified Data.Vector.Unboxed as V
import Control.Monad import Control.Monad.State (evalState) import Data.Histogram (asList) import Data.Histogram.Fill import Data.Histogram.Generic (Histogram) import Data.List import Data.Random import Data.Random.Source.PureMT import Statistics.Distribution hiding (mean) import Statistics.Distribution.Normal import Statistics.Distribution.Uniform import Text.JSON import qualified Data.Histogram as H import qualified Data.Vector.Unboxed as V let predict :: Double -> Double-> Double -> Double; predict a b x = a * x + b let trueA, trueB, trueSd :: Double; trueA = 5; trueB = 0; trueSd = 10 let nSamples :: Int; nSamples = 31 let xs :: [Double]; xs = take nSamples $ [-15.0 ..] let arbSeed :: Int; arbSeed = 4022 -- Arbitrary seed for generating random numbers :{ let ys :: [Double]; ys = zipWith (+) noise (map (predict trueA trueB) xs) where noise = evalState (replicateM nSamples (sample (Normal 0.0 trueSd))) (pureMT $ fromIntegral arbSeed) :} let exportTestData :: IO ();exportTestData = writeFile "test-data.json" . encode . toJSObject $ [("x", xs), ("y", ys)] exportTestData
- こうやると、test-data.jsonという名前のファイルができて、その中身は
{"x":[-15,-14,-13,-12,-11,-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],"y":[-73.70187524627536,-53.32544108228022,-85.5561593826951,-53.2923266140698,-59.30169338011609,-34.71713105945278,-64.23269931716732,-37.70895778207112,-32.49602734293523,-12.5987398418178,-32.78105748127745,-17.02959946540419,-12.709949190288553,-24.007282280667503,0.8279906165058932,-9.244400639134525,17.09474416823518,23.228482221669115,14.690049579435886,29.41188167743224,39.42186361486318,31.413353262653644,33.189437506871066,31.840776644320762,42.815737959293465,54.51844255645821,34.5842642875685,65.82943043766038,70.11853692362467,56.332662810976046,67.91804611706799]}
- 乱雑項を含むデータxs,ysができたので、回帰直線の傾きの尤度を計算してみる
:{ let loglikelihood :: Double -> Double -> Double -> Double; loglikelihood a b sd = sum $ zipWith lh xs ys where lh x = logDensity (normalDistr (predict a b x) sd) :} let exportLikelihoodA :: IO (); exportLikelihoodA = writeFile "log-likelihood.json" . encode . toJSObject $ (\as -> [ ("x",as), ("loglikelihood", map (\a -> loglikelihood a trueB trueSd) as )]) [3, 3.05 .. 7::Double] exportLilekiooodA
- logDensityという関数は以下のように、連続分布と、そのサポート上の値とをとってその対数確率密度・対数尤度を返す関数
> :type logDensity logDensity :: ContDistr d => d -> Double -> Double >|| -normalDistrは平均とSDをとって、具体的な正規分布を返す関数 >|haskell| > :type logDensity logDensity :: ContDistr d => d -> Double -> Double
- precictは、2つの係数と台の値とをとって、その予測値を返す関数
> :type logDensity logDensity :: ContDistr d => d -> Double -> Double
{"x":[3,3.05,3.0999999999999996,3.1499999999999995,3.1999999999999993,3.249999999999999,3.299999999999999,3.3499999999999988,3.3999999999999986,3.4499999999999984,3.4999999999999982,3.549999999999998,3.599999999999998,3.6499999999999977,3.6999999999999975,3.7499999999999973,3.799999999999997,3.849999999999997,3.899999999999997,3.9499999999999966,3.9999999999999964,4.049999999999996,4.099999999999996,4.149999999999996,4.199999999999996,4.249999999999996,4.299999999999995,4.349999999999995,4.399999999999995,4.449999999999995,4.499999999999995,4.5499999999999945,4.599999999999994,4.649999999999994,4.699999999999994,4.749999999999994,4.799999999999994,4.849999999999993,4.899999999999993,4.949999999999993,4.999999999999993,5.049999999999993,5.0999999999999925,5.149999999999992,5.199999999999992,5.249999999999992,5.299999999999992,5.349999999999992,5.3999999999999915,5.449999999999991,5.499999999999991,5.549999999999991,5.599999999999991,5.649999999999991,5.69999999999999,5.74999999999999,5.79999999999999,5.84999999999999,5.89999999999999,5.9499999999999895,5.999999999999989,6.049999999999989,6.099999999999989,6.149999999999989,6.199999999999989,6.2499999999999885,6.299999999999988,6.349999999999988,6.399999999999988,6.449999999999988,6.499999999999988,6.549999999999987,6.599999999999987,6.649999999999987,6.699999999999987,6.749999999999987,6.7999999999999865,6.849999999999986,6.899999999999986,6.949999999999986,6.999999999999986],"loglikelihood":[-159.40621882876707,-157.14658427153978,-154.94894971431242,-152.81331515708507,-150.73968059985776,-148.72804604263047,-146.77841148540307,-144.89077692817577,-143.06514237094842,-141.30150781372106,-139.59987325649377,-137.9602386992664,-136.3826041420391,-134.86696958481176,-133.41333502758442,-132.02170047035708,-130.69206591312974,-129.4244313559024,-128.21879679867504,-127.07516224144769,-125.99352768422038,-124.97389312699303,-124.01625856976571,-123.12062401253839,-122.28698945531103,-121.51535489808369,-120.80572034085631,-120.158085783629,-119.57245122640164,-119.04881666917427,-118.58718211194697,-118.18754755471959,-117.8499129974923,-117.5742784402649,-117.3606438830376,-117.20900932581023,-117.1193747685829,-117.09174021135556,-117.12610565412821,-117.22247109690086,-117.38083653967347,-117.60120198244616,-117.8835674252188,-118.22793286799147,-118.63429831076412,-119.10266375353675,-119.63302919630945,-120.22539463908204,-120.87976008185471,-121.59612552462738,-122.37449096739998,-123.21485641017267,-124.11722185294529,-125.08158729571792,-126.10795273849061,-127.19631818126324,-128.3466836240359,-129.5590490668085,-130.83341450958116,-132.16977995235382,-133.56814539512644,-135.02851083789912,-136.55087628067176,-138.13524172344435,-139.78160716621701,-141.48997260898966,-143.2603380517623,-145.09270349453496,-146.98706893730758,-148.9434343800802,-150.96179982285287,-153.04216526562553,-155.1845307083981,-157.3888961511708,-159.6552615939434,-161.98362703671606,-164.37399247948872,-166.82635792226137,-169.34072336503394,-171.91708880780664,-174.5554542505793]}
|
事前分布、事後分布、MCMC
- 事前分布は、パラメタに分布を仮定し、その尤度が計算できることとして決まる
- a,b,sdに一様分布を設定し、何か具体的な値に対して、その総合的な対数尤度をlogpriorが返す
:{ let logprior :: Double -> Double -> Double -> Double;logprior a b sd = sum [ap, bp, sdp] where ap = logDensity (uniformDistr 0 10) a bp = logDensity (normalDistr 0 5) b sdp = logDensity (uniformDistr 0 30) sd :}
- 事後確率は、a,b,sdに対して、事前尤度に観察尤度を掛けあわせればよい。対数尤度なら加算
- 事前分布も尤度関数もa,b,sdの関数として作ってある
let logposterior :: Double -> Double -> Double -> Double; logposterior a b sd = loglikelihood a b sd + logprior a b sd let nIters :: Int; nIters = 100000
-
- seedをとってMT擬似欄数列発生器を作り、ランダムなevalStateをさせる。平均0,指定sdでの正規乱数の長さnItersの列を取り出す
:{ let normalisedProposals :: Int -> Double -> Int -> [Double]; normalisedProposals seed sigma nIters = evalState (replicateM nIters (sample (Normal 0.0 sigma))) (pureMT $ fromIntegral seed) :}
-
- Accept/Rejectの判別のための一様乱数列の発生
:{ let acceptOrRejects :: Int -> Int -> [Double]; acceptOrRejects seed nIters = evalState (replicateM nIters (sample stdUniform)) (pureMT $ fromIntegral seed) :}
-
- 採否の閾値を計算。新たな係数セットを現在の係数セットと比べる。比べるにあたって、事後分布としての尤度で返す。次のoneStepではこの尤度を確率的に採否するべく、上で発生した一様乱数の値と比較する
:{ let acceptanceProb :: Double -> Double -> Double -- ^ Proposal parameters -> Double -> Double -> Double -- ^ Old parameters -> Double; acceptanceProb a' b' sd' a b sd = exp ((logposterior a' b' sd') - (logposterior a b sd)) :}
-
- 上で説明した通り
:{ let oneStep :: (Double, Double, Double, Int) -- ^ Current location -> (Double, Double, Double, Double) -- ^ Delta location and acceptance probability -> (Double, Double, Double, Int) ; oneStep (a, b, sd, nAccs) (da, db, dsd, acceptOrReject) = if acceptOrReject < acceptanceProb (a + da) (b + db) (sd + dsd) a b sd then (a + da, b + db, sd + dsd, nAccs + 1) else (a, b, sd, nAccs) :}
-
- 更新するための諸関数ができたら、変数初期値のセットを作り、採否をかませた更新関数を使いつつ、scanl関数にて、連鎖・更新しながらそれを全部記録していく(scanlはこちら)
- 以下はただのコード記録
import Control.Monad import Control.Monad.State (evalState) import Data.Histogram (asList) import Data.Histogram.Fill import Data.Histogram.Generic (Histogram) import Data.List import Data.Random import Data.Random.Source.PureMT import Statistics.Distribution hiding (mean) import Statistics.Distribution.Normal import Statistics.Distribution.Uniform import Text.JSON import qualified Data.Histogram as H import qualified Data.Vector.Unboxed as V let predict :: Double -> Double-> Double -> Double; predict a b x = a * x + b let trueA, trueB, trueSd :: Double; trueA = 5; trueB = 0; trueSd = 10 let nSamples :: Int; nSamples = 31 let xs :: [Double]; xs = take nSamples $ [-15.0 ..] let arbSeed :: Int; arbSeed = 4022 -- Arbitrary seed for generating random numbers :{ let ys :: [Double]; ys = zipWith (+) noise (map (predict trueA trueB) xs) where noise = evalState (replicateM nSamples (sample (Normal 0.0 trueSd))) (pureMT $ fromIntegral arbSeed) :} let exportTestData :: IO ();exportTestData = writeFile "test-data.json" . encode . toJSObject $ [("x", xs), ("y", ys)] exportTestData :{ let loglikelihood :: Double -> Double -> Double -> Double; loglikelihood a b sd = sum $ zipWith lh xs ys where lh x = logDensity (normalDistr (predict a b x) sd) :} let exportLikelihoodA :: IO (); exportLikelihoodA = writeFile "log-likelihood.json" . encode . toJSObject $ (\as -> [ ("x",as), ("loglikelihood", map (\a -> loglikelihood a trueB trueSd) as )]) [3, 3.05 .. 7::Double] :{ let logprior :: Double -> Double -> Double -> Double;logprior a b sd = sum [ap, bp, sdp] where ap = logDensity (uniformDistr 0 10) a bp = logDensity (normalDistr 0 5) b sdp = logDensity (uniformDistr 0 30) sd :} let logposterior :: Double -> Double -> Double -> Double; logposterior a b sd = loglikelihood a b sd + logprior a b sd let nIters :: Int; nIters = 100000 :{ let normalisedProposals :: Int -> Double -> Int -> [Double]; normalisedProposals seed sigma nIters = evalState (replicateM nIters (sample (Normal 0.0 sigma))) (pureMT $ fromIntegral seed) :} :{ let acceptOrRejects :: Int -> Int -> [Double]; acceptOrRejects seed nIters = evalState (replicateM nIters (sample stdUniform)) (pureMT $ fromIntegral seed) :} :{ let acceptanceProb :: Double -> Double -> Double -- ^ Proposal parameters -> Double -> Double -> Double -- ^ Old parameters -> Double; acceptanceProb a' b' sd' a b sd = exp ((logposterior a' b' sd') - (logposterior a b sd)) :} :{ let oneStep :: (Double, Double, Double, Int) -- ^ Current location -> (Double, Double, Double, Double) -- ^ Delta location and acceptance probability -> (Double, Double, Double, Int) ; oneStep (a, b, sd, nAccs) (da, db, dsd, acceptOrReject) = if acceptOrReject < acceptanceProb (a + da) (b + db) (sd + dsd) a b sd then (a + da, b + db, sd + dsd, nAccs + 1) else (a, b, sd, nAccs) :}
Numeric.Probability を使う〜Haskellの確率的プログラミング
- Haskellで確率密度分布を扱ってランダムサンプリングするときにモナドを使う、という話がある
- こちらの記事によると、それは遡ること2006年のこちらのペイパーとそれが作成したProbabilistic Fucntional Programming用パッケージが始まりだという。
- その後、これをとり込もうよ、ということになり、今では、Preludeでとってくれば、後はインポートすればよい(らしい)
- 以下、stackを使って、ここの処理がなぞれるようにするための処置
- stackは入っているものとして、適当な場所でpfptestという名前で環境を作ることにする
stack new pfptest cd probHoge stack build
- これで、GHCも入るし、準備はほぼOK
- 次に、probability関係を使うために、pfptestフォルダのcabalファイルに、probability パッケージを使うことを記載して、再度 buildする
- Main.hsで使うことにしましたよ、と
executable pfptest-exe hs-source-dirs: app main-is: Main.hs ghc-options: -threaded -rtsopts -with-rtsopts=-N build-depends: base , pfptest , probability default-language: Haskell2010
- こうしてから、再度build
stack build
- これで、probabilityパッケージを使う環境として、pfptestが出来上がったから
stack ghci
- としてghciを開始
- cabalの記載状況から、Main Libという「環境」にでスタート
- そこでNumeric.Probability.Distributionをimportすれば、choose関数が使えるようになりました、と。
*Main Lib> import Numeric.Probability.Distribution *Main Lib Numeric.Probability.Distribution> choose 0.3 0 1 fromFreqs [(1,0.7),(0,0.3)] *Main Lib Numeric.Probability.Distribution>
- chooseという関数について確認する
*Main Lib Numeric.Probability.Distribution> choose 0.3 0 1 fromFreqs [(1,0.7),(0,0.3)]
-
- 値0を取る確率が0.3、値1を取る確率が0.7という、確率質量分布ができて
- それをfromFreqsなる関数が受け取った状態、が返ってきている
- 次のようなバリエーションができる。事象は0 1と指定したり、1 0と指定しても、'a' 'b'でも。
*Main Lib Numeric.Probability.Distribution> choose 0.3 1 0 fromFreqs [(0,0.7),(1,0.3)] *Main Lib Numeric.Probability.Distribution> choose 0.3 'a' 'b' fromFreqs [('b',0.7),('a',0.3)]
-
- 現れる関数choose,fromFreqについて確認する、さらに現れるTも示してみる
*Main Lib Numeric.Probability.Distribution> :type choose choose :: Num prob => prob -> a -> a -> T prob a *Main Lib Numeric.Probability.Distribution> :type fromFreqs fromFreqs :: Fractional prob => [(a, prob)] -> T prob a *Main Lib Numeric.Probability.Distribution> :kind T T :: * -> * -> * *Main Lib Numeric.Probability.Distribution> :info T newtype T prob a = Cons {decons :: [(a, prob)]} -- Defined in ‘Numeric.Probability.Distribution’ *Main Lib Numeric.Probability.Distribution> :type Cons Cons :: [(a, prob)] -> T prob a *Main Lib Numeric.Probability.Distribution> :type decons decons :: T prob a -> [(a, prob)] *Main Lib Numeric.Probability.Distribution> decons $ choose 0.3 'a' 'b' [('a',0.3),('b',0.7)] *Main Lib Numeric.Probability.Distribution> :type decons $ choose 0.3 'a' 'b' decons $ choose 0.3 'a' 'b' :: Fractional prob => [(Char, prob)]
- 次の関数 uniform
*Main Lib Numeric.Probability.Distribution> uniform [0,1,2,3] fromFreqs [(0,0.25),(1,0.25),(2,0.25),(3,0.25)] *Main Lib Numeric.Probability.Distribution> decons $ uniform [0,1,2,3] [(0,0.25),(1,0.25),(2,0.25),(3,0.25)] (decons $ uniform [0,1,2,3]) !! 2 (2,0.25) *Main Lib Numeric.Probability.Distribution> fst $ (decons $ uniform [0,1,2,3]) !! 2 2 *Main Lib Numeric.Probability.Distribution> snd $ (decons $ uniform [0,1,2,3]) !! 2 0.25
-
- これを使うと、0以上のすべての整数の一様確率質量分布もできる。長さが無限なので全部表示させようとすると大変
- また、確率値は1/無限大だが、表示させようとすると計算が終わらないので表示されない
- これを使うと、0以上のすべての整数の一様確率質量分布もできる。長さが無限なので全部表示させようとすると大変
*Main Lib Numeric.Probability.Distribution> let uni = uniform [0..] *Main Lib Numeric.Probability.Distribution> fst (( decons uni) !! 1) 1
-
- 起きうる事象が1つしか無いとき。事象は"ab"とする。ふたとおりの指定の仕方がある(ようだ)
*Main Lib Numeric.Probability.Distribution> certainly "ab" fromFreqs [("ab",1)] *Main Lib Numeric.Probability.Distribution> uniform ["ab"] fromFreqs [("ab",1.0)]