Benchmarks for the data-mining

This web page gathers informations about databases, which are used to test data-mining algorithms. It provides link to different files as well. Originals files come fron the UCI Machine Learning Database Repository. Thanks to Pierre Renaux
See here for details about the binarization process.
FieldDescription
Namethe name of the folder containing all files. There is a link to a zipped file when all following files are present.
Linesthe number of instances in the database
Att.the number of attributes for every instances
Cont.the number of continuous attributes
nClassthe number of class
Missinginformations about the missing values
.datathe original data file
.namesinformations about the data file
.colinformations about the attributes. See here for details about the binarization process
.dicothe dictionnary of all attributes (see above)
.binthe result of the binarization process (instances can be reordered so that lines with missing values are at the end of the file, the class column is set to the first column)
.traducthe translation file
.descthe description file, used for classification tasks
.occthe co-occurrence matrix

Name Lines Att. Cont. nClass Missing Comment .data .names .col .dico .bin .traduc .desc .occ
annealing800386?bcp.data.names.col.dico.bin.traduc.desc.occ
artificial-characters1000+50007210non        
audiology20071??24ouiformat bizarre.data.names      
australian69014620.data.names.col.dico.bin.traduc  
auto-mpg400853infime.data.names.col.dico.bin.traduc.desc.occ
autos2052615??peu.data.names.col.dico.bin.traduc.desc.occ
balance-scale625503non.data.names.col.dico.bin.traduc.desc.occ
balloons1640??nonpetites bases .names      
breast-cancer699100216.data.names.col.dico.bin.traduc.desc.occ
breast-cancer-wisconsin699100216.data.names.col.dico.bin.traduc.desc.occ
bridges10813????qqformat bizarre .names      
chess280566018non        
cleveland30313020.5.data .col.dico.bin.traduc.desc 
connect-4675574203non .names      
cpu-performance2091018non.data.names      
credit-screening70015625%.data.names.col.dico.bin.traduc.desc.occ
echocardiogram13213????bcp.data.names      
flags19430????non.data.names      
german100020720.data.names.col.dico.bin.traduc  
glass2141086non.data.names.col.dico.bin.traduc.desc.occ
hayes-roth132504non.data.names      
heart27013620.data.names.col.dico.bin.traduc  
hepatitis1552002qq.data.names.col.dico.bin.traduc.desc.occ
horse-colic368287230%.data.names.col.dico.bin.traduc  
housing5061313??non.data.names.col.dico.bin.traduc.desc.occ
hypo3163197221.data.names.col.dico.bin.traduc  
image230019197non.data.names      
ionosphere35134342non.data.names.col.dico.bin.traduc.desc.occ
iris150443non.data.names.col.dico.bin.traduc.desc.occ
isolet600060060026non.data.names.col     
kinship104120??non.data.names      
labor-negotiations57168??non.data.names      
lenses24403non.data.names      
letter-recognition20000171626non.data.names.col.dico.bin.traduc.desc.occ
liver-disorders3457??2non.data.names.col.dico.bin.traduc.desc.occ
lung-cancer325703qqtrop petit.data.names      
lymphography14819??4non.data.names.col.dico.bin.traduc.desc.occ
mechanical-analysis209836nonformat bizarre        
molecular-biology10659??2nonformat bizarre        
monks-problems432803non3 problèmes différents .names      
mushroom81242202attr.11        
musk4761681662nonformat bizarre        
page-blocks54731045non.data.names.col.dico.bin.traduc.desc.occ
pima-indians-diabetes768882non.data.names.col.dico.bin.traduc.desc.occ
postoperative-patient-data90913attr.8sans classe I.data.names.col.dico.bin.traduc.desc.occ
primary-tumor33918??21qq.data.names.col.dico.bin.traduc.desc.occ
servo16740??non.data.names.col.dico.bin.traduc.desc.occ
shuttle-landing-control15702non.data.names.col.dico.bin.traduc.desc.occ
sick280024027.6.data.names.col.dico.bin.traduc  
solar-flare323130??nonversion 2.data.names.col.dico.bin.traduc.desc.occ
sonar20860020 .names.col.dico.bin.traduc.desc 
soybean30735019qqsans petites classes.data.names.col.dico.bin.traduc.desc.occ
space-shuttle2355????trop petite .names      
spectrometer531103????noneformat bizarre        
sponge7645312attr.39où est la classe ?.data.names      
statlogplusieurs bases .names      
student-loanprolog .names      
tic-tac-toe958902non.data.names.col.dico.bin.traduc.desc.occ
trains101002nonpetite base .names      
vehicle846181840.data.names.col.dico.bin.traduc.desc 
voting-records4351702qq.data.names.col.dico.bin.traduc.desc.occ
water-treatment527383813qqplusieurs classes.data.names      
waveform5000212130.data.names.col.dico.bin.traduc  
wine17813133non.data.names.col.dico.bin.traduc.desc.occ
zoo1011817non.data.names.col.dico.bin.traduc.desc.occ