2015-01-11

Rcppで格子遊び１

Rstudio R C++ Rcpp

頂点数を数える
- 頂点数はただの漸化式。C++化する必要もないだろう。してもよいけど…

n <- 500
a <- rep(0,n)
a[1] <- 1
for(i in 2:n){
	a[i] <- (i+2)*a[i-1]/(i-1)
}
a

- Rcppを使ってみる
  - コメント行は"//"で始まる
  - RcppArmadilloを使うので"//Rcpp::depends(RcppArmadillo)"。これが必要な行かどうかの確認はしていないけれど、入れるべきだと覚えておこう
  - #include行はRcppArmadillo.h
  - "arma"はArmadilloで定義されたもろもろを使うため
  - "Rcpp::export"できちんとRにつなぐ
  - "arma::vec"は関数numTetrahedron()の返り値の型。Armadilloが定めるところのvectorを返す
  - "vec(n+1)"は長さn+1のベクトルを作る。その第１番地は 0

// latticeConnect.cpp
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;  // use the Armadillo library for matrix computations
using namespace Rcpp;


// [[Rcpp::export]]
arma::vec numTetrahedron(int n) {
  arma::vec ret = vec(n+1);
  ret[0] = 1;
  for(int i=1; i < n+1 ; i++){
    ret[i] = (i+3)*ret[i-1]/i;
  }
  return ret;
}
// END

- これを"latticeConnect.cpp"という名前で保存して、コンパイルして使えばよい。

Rcpp::sourceCpp("latticeConnect.cpp", showOutput = TRUE, rebuild = FALSE)
numTetrahedron(30)

- 返ってきたのは、行数１の行列のようです

> numTetrahedron(20)
      [,1]
 [1,]    1
 [2,]    4
 [3,]   10
 [4,]   20
 [5,]   35
 [6,]   56
 [7,]   84

頂点座標リストを作る
- 頂点数がわかれば、その座標を納める行列のサイズが決まる。C,C++は格納容器のサイズを決めておく方がよいので、好都合だし、Rの場合も数が不定でappendを繰り返すよりは、最初にサイズを決めて番地に代入する方がよい(かな？)
- いくつか考え方があるけれど、(x,y,z,w)という４非負整数から移行できる格子点は(x+1,y,z,w),(x,y+1,z,w),(x,y,z+1,w),(x,y,z,w+1)の４点。これをx+y+z+w=n-1のすべての点に対して作成すると重複が生じる。全部作って重複を削る、という作業も良いが、uniqueを取る関数はどういうアルゴリズムでやってもそれほど軽くはならない。たとえば、Rのunique()のヘルプ記事を見てもそうだ…

Warning

Using this for lists is potentially slow, especially if the elements are not atomic vectors (see vector) or differ only in their attributes. In the worst case it is O(n^2).

- このあたりのアルゴリズムには良いものがありそうだけれど、すぐに見つからないので、こうしてみる
- まず、 $x \to x+1$ にしてやる。これは、すべての(x,y,z,w)に対して行う
- 次に $y \to y+1$ とするが、この場合は $x == 0$ を条件にする。なぜなら、 $x\to x+1$ の段階で、 $x\ne 0$ の場合はすべて作ってしまってあるから
- 同様に、次は、 $z \to z+1$ は $x == y == 0$ を条件にし、さらに $w \to w+1$ は $x==y==z==0$ を条件にする
- ユニークを取るにしろ、ユニーク処理をしないように頂点数をあらかじめ算出して、適切に座標を作成するにしろ、ある程度の条件分岐が入るわけで、無理して慣れないC++を使わなくても、Rのベクトライズ処理で押してもよいのではないかと思われる。実際、この部分の処理と、以降の「エッジリスト作成」処理をRで書いて、その処理時間を測ると、エッジリスト作成が圧倒的に重い(これも格子の特性を使うとかハッシュを使うとかで後半の重さを減らす(前半が重くなりそうだが)工夫もありそうだが)ので、Rで書くことにする
- 以下は、すべての点から４移行点を作り、ユニークを取る方法と、ユニークな座標だけを作る方法とをRで書いたもの。前半が座標リストを作る部分。後半がエッジリストを作る部分(後半が重いので、それをC++化する(次の記事)

table.graph2 <- function(n,d){
	# Make a list of tables when i = 0,1,2,...,n, 
	ret <- list()
	ret[[1]] <- matrix(0,1,d)
	for(i in 1:n){
		tmpn <- nrow(ret[[i]])
		tmp <- matrix(0,tmpn*d,d)
		for(j in 1:d){
			tmp2 <- ret[[i]]
			tmp2[,j] <- tmp2[,j]+1
			tmp[(1+(j-1)*tmpn):(j*tmpn),] <- tmp2
		}
		ret[[i+1]] <- unique(tmp) # ユニーク行を取る
	}
	# Make matrices that indicate connection between tables for n=k and tables for n=k+1
	ret2 <- list()
	for(i in 1:n){
		ret2[[i]] <- matrix(0,nrow(ret[[i]])*d,2)
		cnt <- 1
		for(j1 in 1:nrow(ret[[i]])){
			tmp.dist <- apply(abs(t(ret[[i+1]]) - ret[[i]][j1,]),2,sum)
			tmp.s <- which(tmp.dist==1)
			ret2[[i]][cnt:(cnt+length(tmp.s)-1),1] <- j1
			ret2[[i]][cnt:(cnt+length(tmp.s)-1),2] <- tmp.s
			cnt <- cnt+length(tmp.s)
		}
	}
	#ret4 <- lapply(ret2,function(x)rep(0,nrow(x)))
	#return(list(tblmat=ret,connection=ret2,prob.v=ret3,prob.e=ret4))
	return(list(tblmat=ret,connection=ret2))
}
table.graph3 <- function(n,d){
	# Make a list of tables when i = 0,1,2,...,n, 
	num.tab <- rep(0,n+1)
	num.tab[1] <- 1
	for(i in 2:(n+1)){
		num.tab[i] <- (i+2)*num.tab[i-1]/(i-1)
	}
	ret <- list()
	ret[[1]] <- matrix(0,1,d)
	for(i in 1:n){
		ret[[i+1]] <- matrix(0,num.tab[i+1],d)
		cnt <- 1
		tmp <- ret[[i]]
		tmp[,1] <- tmp[,1] + 1
		ret[[i+1]][cnt:(cnt+length(tmp[,1])-1),] <- tmp
		cnt <- cnt + length(tmp[,1])
		for(j in 2:d){
			if(j==2){
				s <- which(ret[[i]][,1]==0)
			}else{
				s <- which(apply(matrix(ret[[i]][,1:(j-1)],ncol=j-1),1,sum)==0)
			}
			
			tmpret <- matrix(ret[[i]][s,],nrow=length(s),ncol=d)
			tmpret[,j] <- tmpret[,j] + 1
			#tmp <- rbind(tmp,tmpret)
			ret[[i+1]][cnt:(cnt+length(s)-1),] <- tmpret
			cnt <- cnt+length(s)
		}
		#ret[[i+1]] <- tmp
	}
	# Make matrices that indicate connection between tables for n=k and tables for n=k+1
	ret2 <- list()
	for(i in 1:n){
		#ret2[[i]] <- matrix(0,nrow(ret[[i]]),nrow(ret[[i+1]]))
		ret2[[i]] <- matrix(0,nrow(ret[[i]])*d,2)
		cnt <- 1
		for(j1 in 1:nrow(ret[[i]])){
			tmp.dist <- apply(abs(t(ret[[i+1]]) - ret[[i]][j1,]),2,sum)
			tmp.s <- which(tmp.dist==1)
			ret2[[i]][cnt:(cnt+length(tmp.s)-1),1] <- j1
			ret2[[i]][cnt:(cnt+length(tmp.s)-1),2] <- tmp.s
			cnt <- cnt+length(tmp.s)
		}
	}
	return(list(tblmat=ret,connection=ret2))
}