I did something similar a while back. A Java programmer had taken a look at Ruby and declare that he didn’t like all that "line noise" in the language. He was refering to the "@" and "$" characters used to mark instance variables and globals. I pointed out that Ruby actually uses quite a bit less punctuation than Java, and wrote the following linenoise program do demonstrate.
#!/usr/bin/env ruby
ARGV.each { |fn|
noise = open(fn) { |file| file.read }.gsub(/[A-Za-z0-9_ \t\n]/m, "")
puts "#{fn} (#{noise.size}): #{noise}"
}
Linenoise will strip out all alphanumeric characters and white space, leaving only the "line noise" behind. Running linenoise on a series of small programs written in different languages produces this (edited slightly for line breaks) …
animal.cc (83): #<>{:()=;};:{:();};::(){::<<"\";}:{:();};::(){::
<<"\";}(){*[]={,};(=;<;++)[]->();;}
Animal.java (67): {{();}{(){..("");}}{(){..("");}}([]){[]=[]{(),()}
;(=;<.;++)[].();}}
animal.pl (41): ;{{};}{"\";};{{};}{"\";};$(->,->){$->();}
animal.py (23): :():"":():""[(),()]:.()
animal.rb (10): """"[.,.].
The number in the paranthesis is the number of line noise characters in the file.
What I find interesting is the amount of semantic information that still comes through the "line noise". For example, the "#<>" sequence in the C++ code is obviously an include statement for something in the standard library and the "<<" are output statements using "cout".
It would be interesting to see if you could determine the language given only the line noise. You could tell Java from C++ by the ";}" vs ";};" punctuation. Python is pretty clear from the ’:():"":():’ style patterns.
Before I go, here is the source code to the Animal programs I used in my examples…
#include <iostream>
class Animal {
public:
virtual void talk() = 0;
};
class Dog : public Animal {
public:
virtual void talk();
};
void Dog::talk() {
std::cout << "WOOF\n";
}
class Cat : public Animal {
public:
virtual void talk();
};
void Cat::talk() {
std::cout << "MEOW\n";
}
int main() {
Animal * (a[]) = { new Dog, new Cat };
for (int i=0; i<2; i++)
a[i]->talk();
return 0;
}
public class Animal {
interface IAnimal {
void talk();
}
static class Dog implements IAnimal {
public void talk () {
System.out.println("WOOF");
}
}
static class Cat implements IAnimal {
public void talk() {
System.out.println ("MEOW");
}
}
public static void main (String args[]) {
IAnimal[] zoo = new IAnimal[] { new Dog(), new Cat() };
for (int i=0; i<zoo.length; i++)
zoo[i].talk();
}
}
package Dog;
sub new {
bless {};
}
sub talk {
print "WOOF\n";
}
package Cat;
sub new {
bless {};
}
sub talk {
print "MEOW\n";
}
package main;
for $a (Dog->new, Cat->new) {
$a->talk();
}
class Dog:
def talk(self):
print "WOOF"
class Cat:
def talk(self):
print "MEOW"
for a in [Dog(), Cat()]:
a.talk()
class Dog
def talk
puts "WOOF"
end
end
class Cat
def talk
puts "MEOW"
end
end
for a in [Dog.new, Cat.new]
a.talk
end