Disclaimer: The original post can be found here [Italian]
A few months ago I noticed that almost all of the municipal websites follow a common pattern, ie:
www.cityname.province.it
except for provincial capitals and other websites that have completely different domains. So I thought that I could scan with nmap all the sites of the cities of Italy, statistically analyzing the data with R and publishing the results.
The list of sites is not available in text format. So I downloaded the list of the cities from the Istat website and wrote a script that follows the above pattern.
The number of websites today is 7983, while in my list there are 7161 sites, since, when scanning some were not reachable or were offline.
After I got the list of sites, I run nmap with the command:
nmap -sV -p 80,443 -O --script ssl-heartbleed --script http-enum --script-args 'http-enum.category="cms"' --script banner -iL ListofCities -oX ListofCities.xml
I didn’t find a specific parser for my purpose, so I wrote another script to parse the result. The script took a XML file and printed:
The resulting file was uploaded on R, and I ran on it a series of commands, issued in the spoiler for those who want to better understand the graphics.
args<-commandArgs(TRUE) Regione<-read.delim(args[1],sep=",",na.string=c("",NA)) cat(file=paste(args[1],".txt"),“Number of sites: “, nrow(Regione),sep="\n”) cat(“Su “,sum(table(Regione$Ip)),“host ci sono “,length(unique(Regione$Ip)),” indirizzi IP unici e il massimo possiede “,max(table(Regione$Ip)),” siti \n”,file=paste(args[1],“output”), append=TRUE)
png(paste(args[1],“1Hosting.png”), units=“in”, width=11, height=8.5, res=300) op <- par(mar = c(0,17,4,2) + 0.1) barplot(sort(prop.table(table(Regione$Hosting))*100),horiz=TRUE,las=1,space=1,col=rainbow(length(table(Regione$Hosting))),names=paste(names(sort(prop.table(table(Regione$Hosting))*100)),"-",signif(sort(prop.table(table(Regione$Hosting))*100),2),"%"),main=paste(“Hosting utilizzato su”,sum(table(Regione$Hosting)),“host”),axes=FALSE)
percentlabelsSIS pielabelsSIS png(paste(args[1],“2SistemiOP.png”), width=700, height=700, res=75) pie(prop.table(table(Regione$Sistema)),main=paste(“Sistemi operativi su”,sum(table(Regione$Sistema)),“host”),col=rainbow(length(table(Regione$Sistema))),labels=pielabelsSIS) legend(“topright”,legend=levels(Regione$Sistema),cex=0.8,fill=rainbow(length(table(Regione$Sistema))))
png(paste(args[1],“3VersioniWindows.png”),width=700, height=500, res=75) barplot(table(Regione[Regione$Sistema==“windows”,“Anno”]),col=rainbow(length(table(Regione$Anno))),main=paste(“Versioni di Windows su”,sum(table(Regione[Regione$Sistema==“windows”,“Anno”])),“host”),ylab=“Numero di host”,xlab=“Anno”)
png(paste(args[1],“4VersioniLinux.png”), width=700, height=500, res=75) barplot(table(droplevels(sort(Regione[Regione$Sistema==“linux”,“Versione”]))),main=paste(“Versioni di linux su”,sum(table(droplevels(Regione[Regione$Sistema==“linux”,“Versione”]))),“host”),col=rainbow(length(table(Regione$Versione))),xlab=“Versione”,ylab=“Numero di host”)
percentlabelsCMS pielabelsCMS png(paste(args[1],“5CMS.png”),width=700, height=700, res=75) pie(prop.table(table(Regione$Cms)),main=paste(“Tipi di Cms di”,sum(table(Regione$Cms)),“host”),col=rainbow(length(table(Regione$Cms))),labels=pielabelsCMS) legend(“topright”,legend=levels(Regione$Cms),cex=0.8,fill=rainbow(length(table(Regione$Cms))))
percentlabelsSER80 pielabelsSER80 png(paste(args[1],“6TipidiServer80.png”), width=700, height=700, res=75) pie(prop.table(table(Regione$Server80)),main=paste(“Tipologie di Server [porta 80] su”,sum(table(Regione$Server80)),“host”),col=rainbow(length(table(Regione$Server80))),labels=pielabelsSER80) legend(“topright”,legend=levels(Regione$Server80),cex=0.8,fill=rainbow(length(table(Regione$Server80))))
png(paste(args[1],“7MaxServer80Versioni.png”),width=700, height=500, res=75) barplot(table(droplevels(Regione[Regione$Server80==Regione[sum(table(Regione$Server80)),“Server80”],“Versione80”])),main=paste(“Versioni di”,Regione[sum(table(Regione$Server80)),“Server80”],“su”,sum(table(droplevels(Regione[Regione$Server80==Regione[sum(table(Regione$Server80)),“Server80”],“Versione80”]))), “host”) ,col=rainbow(length(table(Regione$Versione80))),xlab=“Versione server”,ylab=“Numero di host”)
percentlabelsSER443 pielabelsSER443 png(paste(args[1],“8TipidiServer443.png”), width=700, height=700, res=75) pie(prop.table(table(Regione$Server443)),main=paste(“Tipologie di Server [porta 443] su”,sum(table(Regione$Server443)),“host”),col=rainbow(length(table(Regione$Server443))),labels=pielabelsSER443) legend(“topright”,legend=levels(Regione$Server443),cex=0.8,fill=rainbow(length(table(Regione$Server443))))
percentlabelsSSL pielabelsSSL png(paste(args[1],“9SSL.png”), width=700, height=700, res=75) pie(prop.table(table(Regione$SSL)),main=paste(“Presenza di SSL su”,sum(table(Regione$SSL)),“host”),col=rainbow(length(table(Regione$SSL))),labels=pielabelsSSL) legend(“topright”,legend=levels(Regione$SSL),cex=0.8,fill=rainbow(length(table(Regione$SSL))))
percentlabelsHEA pielabelsHEA png(paste(args[1],“10Heartbleed.png”), width=700, height=500, res=75) barplot(prop.table(table(Regione[Regione$SSL==“true”,“Heartbleed”]))*100,main=paste(“Presenza vulnerabilità ad Heartbleed su”,sum(table(Regione[Regione$SSL==“true”,“Heartbleed”])),“host”),col=rev(rainbow(length(table(Regione[Regione$SSL==“true”,“Heartbleed”]))))) legend(“topright”,legend=paste(pielabelsHEA,” “,levels(Regione[Regione$SSL==“true”,“Heartbleed”])),cex=0.8,fill=rev(rainbow(length(table(Regione[Regione$SSL==“true”,“Heartbleed”])))))
Number of cities: 7983
Number of analyzed hosts: 7161
Information of IP addresses: On 7161 hosts there are 2083 unique IP address. One of them maintains 779 websites
As you can see, there are a lot of outdated services. The most relevant data are:
Perhaps the government should invest on information security, rather than design.