Türkçe için
A few months ago, I started to attend R lectures at Coursera R is a programming language that can help me analyzing data; although my interest is in unstructured text data, since lectures are perfect, I used R to convert unstructered text to tabular data in this case. May be Python would be a better choice.
The raw data: http://kuyruksuzbipolarpisi.blogspot.com.tr/p/bagisraw.html
What I converted it to: (first 10 rows) http://kuyruksuzbipolarpisi.blogspot.com.tr/p/namez-tarih-bagis-1-nalcabesmez-11.html
namez | tarih | bagis | ||
>Murat Nalçabesmez | 11.05.2014 | 2.5 kg Adolt goody Mama | ||
MAT-TAV | 10.05.2014 | 250 kg Tavuk Eti | ||
Gökser Yasar | 10.05.2014 | 21adet410gr Konserve Köpek Mamasi | ||
MAT-TAV | 09.05.2014 | 300 kg Tavuk Eti | ||
Rabia Sen,Ekin Çaliskan,Nida Kuttas,Merve Demir | 07.05.2014 | 8 adet 1lt Süt,5kg Köpek Mamasi Açik | ||
Deniz Ünsal- Kargo | 06.05.2014 | Smart Dog 15 kg Kuru Köpek Mamasi | ||
Çagla Jansel | 04.05.2014 | 10 adet Konserve Mama | ||
Pinar Karabudak | 30.04.2014 | 30 Konserve Mama,4 adet10 kg Kutu Mama | ||
Canan Sayin. Doga Ipek Sayin | 29.04.2014 | Goody 2.5 Kg Kuru Mama | ||
Göksu Bilgiç | 29.04.2014 | Goody 2.5 kg Kuru Mama | ||
Mat_Tav | 28.04.2014 | 310 kg Tavuk Eti |
I am not a programmer, however I don't like to see data in an unstructured form. Since unstructured data can not be analyzed, reports about it can not be formulated easily. So a person in fact don't know what s/he has ig the data is kept in text format.
If you don't want to search or have responsibility for searching information, it is the easiest way to keep data as it happened in https://cankayabldbarinagi.wordpress.com
( As I see they are deleting the records :))
The people who donate to Mühye shelter were kept in an html file, not tabular.
- The people who donates most
- The categories of donations: food, vaccines, infrastructre materials
- The people who donates regularly can not be identified.
Format 1 - 2014 Name (Date) Donation
Murat Nalçabesmez(11.05.2014) 2.5 kg Adolt goody Mama
MAT-TAV(10.05.2014) 250 kg Tavuk Eti
Gökser Yaşar(10.05.2014)21adet410gr Konserve Köpek Maması
MAT-TAV(09.05.2014) 300 kg Tavuk Eti
Rabia Şen,Ekin Çalışkan,Nida Kuttaş,Merve Demir(07.05.2014) 8 adet 1lt Süt,5kg Köpek Maması Açık
Deniz Ünsal- Kargo(06.05.2014)Smart Dog 15 kg Kuru Köpek Maması
Format 2- Name-Substrings
Talatpaşa İÖO Hayvanları koruma kulübü (01.06.2011)
2 x 20 kg. köpek kuru maması
2 x 13,5 kg. köpek kuru maması
Fulya Aydın.Fatma Şahin (31.05.2011)
7,5 numara cerrahi eldiven
kutu non steril eldiven
kutu cerrahi maske6 different formats to record the donations.... Congrats :)
The code is messy, but I got bored and don't want to refactor it. This format has names of people who donate, dates and donations. Donations should be classified,too.
bagistodf<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis.txt",encoding="UTF-8")
bagis201<-bagisfile[1:201]
g<-function(x){x[2]}
j<-function(x){x[3]}
f<-function(x){x[1]}
bagis201s<-strsplit(bagis201,"\\(|\\)")
namez<-sapply(bagis201s,f)
tarih<-sapply(bagis201s,g)
bagis<-sapply(bagis201s,j)
df<-data.frame(namez,tarih,bagis)
write.table(df,"bagis333.csv",sep=",")
head(df)
}
bagis202304<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis.txt",encoding="UTF-8")
bagis304<-bagisfile[202:304]
split<-strsplit(bagis304,"\\(|\\)")
pattern<-"[0-9][0-9]\\.[0-9][0-9]\\.[0-9][0-9][0-9][0-9]"
for (i in length(split):2){
if (grepl(pattern,split[i])==FALSE && grepl(pattern,split[i-1])==FALSE) {
split[[i-1]][[1]]<-paste(split[[i]][[1]],split[[i-1]][[1]],sep=";")}
else if (grepl(pattern,split[i])==FALSE && grepl(pattern,split[i-1])==TRUE){
split[[i-1]][[3]]<-split[[i]][[1]]
}
}
newsplit<-list()
for(i in 1:length(split)) {
if (length(split[[i]])>=3)
newsplit[i]<-split[i]
}
class(newsplit)
newsplit<-newsplit[lapply(newsplit,is.null)==FALSE]
g<-function(x){x[2]}
j<-function(x){x[3]}
f<-function(x){x[1]}
namez<-sapply(newsplit,f)
tarih<-sapply(newsplit,g)
bagis<-sapply(newsplit,j)
df304<-data.frame(namez,tarih,bagis)
nrow(df304)
write.table(df304,"bagis333.csv",sep=",",append=TRUE)
}
bagis560685<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis2.txt",encoding="UTF-8")
bagis685<-bagisfile[560:685]
b6<-sub("\\?","\\%",bagis685)
b6<-strsplit(b6,"\\(|\\)|\\%")
# 83 ve 84 hatalı bölünmüş
g<-function(x){x[2]}
j<-function(x){x[3]}
f<-function(x){x[1]}
bagis<-sapply(b6,f)
namez<-sapply(b6,g)
tarih<-sapply(b6,j)
df685<-data.frame(namez,tarih,bagis)
df685
write.table(df685,"bagis333.csv",sep=",",append=TRUE)
}
bagis391539<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis2.txt",encoding="UTF-8")
bagis539<-bagisfile[391:539]
b539<-sub("\\?","\\%",bagis539)
b539<-strsplit(b539,"\\%")
g<-function(x){x[2]}
f<-function(x){x[1]}
bagis<-sapply(b539,f)
namez<-sapply(b539,g)
tarih<-rep("17.03.2013",length(b539))
df539<-data.frame(namez,tarih,bagis)
df539
write.table(df539,"bagis333.csv",sep=",",append=TRUE)
}
bagis544559<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis2.txt",encoding="UTF-8")
bagis559<-bagisfile[544:559]
b559<-sub("\\?","\\%",bagis559)
b559<-strsplit(b559,"\\%")
g<-function(x){x[2]}
f<-function(x){x[1]}
bagis<-sapply(b559,f)
namez<-sapply(b559,g)
tarih<-rep("17.03.2013",length(b559))
df559<-data.frame(namez,tarih,bagis)
df559
write.table(df559,"bagis333.csv",sep=",",append=TRUE)
}
bagis334340<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis2.txt",encoding="UTF-8")
bagis340<-bagisfile[334:340]
b340<-sub("\\?","\\%",bagis340)
b340<-strsplit(b340,"\\(|\\)")
g<-function(x){x[2]}
j<-function(x){x[3]}
f<-function(x){x[1]}
bagis<-sapply(b340,f)
tarih<-sapply(b340,g)
namez<-sapply(b340,j)
df340<-data.frame(namez,tarih,bagis)
df340
write.table(df340,"bagis333.csv",sep=",",append=TRUE)
}
bagis377388<-function(){
setwd("D:/Belgeler/Coursera")
bagisfile<-readLines("bagis2.txt",encoding="UTF-8")
bagis388<-bagisfile[377:388]
b388<-sub("\\?","\\%",bagis388)
b388<-strsplit(b388,"\\(|\\)")
g<-function(x){x[2]}
j<-function(x){x[3]}
f<-function(x){x[1]}
bagis<-sapply(b388,f)
tarih<-sapply(b388,g)
namez<-sapply(b388,j)
df388<-data.frame(namez,tarih,bagis)
df388
write.table(df388,"bagis333.csv",sep=",",append=TRUE)
}
Hiç yorum yok:
Yorum Gönder