Scala Shell
Friday, September 12, 2008 | Labels: scala, tool | 7 comments
Scala Shell (scalash) is a shell for programming in Scala. Scalash is run from the command line and allows the programmer to experiment with code in real time. It allows you to enter Scala commands at the prompt and have the interpreter respond immediately.
A quick summary of the features present:
- colourized output (highlighting)
- auto-completion, aka Tab-completion
- start script support - when an interactive shell is started, Scala Shell reads and executes commands from ~/.scalarc, if that file exists.
- persistent history (~/.scala_history)
- command history
- command load
You can install the current release directly with Scala Bazaar:
sbaz update
sbaz install scalashell-scala
Scala API lookup with Ubiquity
Sunday, September 7, 2008 | Labels: scala, ubiquity | 2 comments
A quickly hacked up Scala API lookup for Ubiquity, the first prototype of a natural language web service connector created by Mozilla Labs. It basically prepends a google search with the site: parameter pointing to the Scala API. Click here to install it.
Escape from Zurg
Tuesday, August 26, 2008 | Labels: scala | 1 comments
Here's the puzzle:
Buzz, Woody, Rex, and Hamm have to escape from Zurg. They merely have to cross one last bridge before they are free. However, the bridge is fragile and can hold at most two of them at the same time. Moreover, to cross the bridge a flashlight is needed to avoid traps and broken parts. The problem is that our friends have only one flashlight with one battery that lasts for only 60 minutes. The toys need different times to cross the bridge (in either direction):
Buzz: 5 minutes
Woody: 10 minutes
Rex: 20 minutes
Hamm: 25 minutes
Let's start defining the Toys as a case class with name and time fields.
case class Toy(name: String, time: Int)
Like as the Toys, the sides of the bridge are case classes, in truth, case objects.
abstract class Direction
case object Left extends Direction
case object Right extends DirectionWhat are case classes/objects?
- Case classes implicitly come with a constructor function, with the same name as the class.
- Case classes and case objects implicitly come with implementations of methods toString, equals and hashCode.
- Case classes implicitly come with nullary accessor methods which retrieve the constructor arguments.
- Case classes allow the constructions of patterns which refer to the case class constructor (Pattern Matching).
class Move(direction: Direction, toys: List[Toy]) {
def cost = Iterable.max(toys.map{_.time})
override def toString = "Move: " + direction + " " + toys.map{_.name}.mkString("[", ",", "]")
}State comprises two fields, direction which represents the current flashlight position and group which represents the toys remaining on the left-hand side of the bridge.class State(direction: Direction, group: List[Toy]) {
def done = group.isEmpty
def next(f: (Move, State) => Unit) = direction match {
case Left => for { tuple <- group.zipWithIndex
toy <- group drop (tuple._2 + 1)
toys = List(toy, tuple._1) }
f(new Move(Right, toys), new State(Right, group diff toys))
case Right => for(toy <- (ToyStory.toys diff group))
f(new Move(Left, List(toy)), new State(Left, toy :: group))
}
}
What are the Scala features in State class?
- First-Class Functions
- Pattern Matching
- For-Comprehensions
For more details about functions see here or in Scala Documentation
> Pattern Matching
unapply methods in extractor objects. The next method starts with the statement: direction match..., a pattern matching expression with two options: Left and Right.For more details about pattern matching and extractor objects see here or in Scala Documentation.
> For-Comprehensions
Scala offers special syntax to express combinations of certain higher-order functions more naturally. For comprehensions are a generalization of list comprehensions found in languages like Haskell and Python. They are mapped to combinations involving methods foreach and filter. For instance, the for loop for (path <- problem) ... in ToyStory object is mapped to problem foreach (path => ...) defined in SearchProblem class.
class SearchProblem(initial: State) {
def foreach(f: List[Move] => Unit) {
def solve(path: List[Move], state: State) {
if (state.done) {
f(path)
} else {
state next { (move, state) => solve(move :: path, state) }
}
}
solve(Nil, initial)
}
}
object ToyStory extends Application {
val toys = Toy("Buzz", 5) :: Toy("Woody", 10) :: Toy("Rex", 20) :: Toy("Hamm", 25) :: Nil
val problem = new SearchProblem(new State(Left, toys))
for (path <- problem)
if ((0 /: path) {(cost, move) => cost + move.cost} <= 60)
println("Solution: " + path)
}
The complete source code can be downloaded here.
Guide to Scala Bazaar auto completion using BASH
Friday, August 22, 2008 | Labels: bash, linux, scala | 0 comments
The Scala Bazaar system, "sbaz" in short, is a system used by Scala enthusiasts to share computer files with each other. In particular, it makes it easy to share libraries and applications. In this post, I'll show you how easy it to use one of the nicest facilities of the modern shell, the built in "completion" support, to become more easy to use sbaz in command line.
First you must go to the following site to install the BASH programmable auto completion setup if your distro doesn't have it by default. I don't think many do so you'll need to go to the Programmable Completion Website.
Once you've setup your system for auto completion you need to take the following:
#!/bin/bash
_sbaz_complete()
{
local cur commands
COMPREPLY=()
cur=${COMP_WORDS[COMP_CWORD]}
commands='available compact help install installed keycreate keyforget
keyknown keyremember keyremoteknown keyrevoke pack remove retract setuniverse
setup share show showuniverse update packages upgrade'
cur=`echo $cur | sed 's/\\\\//g'`
COMPREPLY=($(compgen -W "${commands}" ${cur} | sed 's/\\\\//g') )
}
complete -F _sbaz_complete -o filenames sbaz
And place it in /etc/bash_completion.d/sbaz. Once you've done that the next time you start up your BASH shell you will have sbaz auto completion!
Download snippet code here.
SHadoop
Wednesday, May 14, 2008 | Labels: hadoop, mapreduce, scala, shadoop | 7 comments
What is Scala and Hadoop?
Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, and type-safe way. It smoothly integrates features of object-oriented and functional languages including mixins, algebraic datatypes with pattern matching, genericity, and more.
Hadoop is a free Java software platform that supports running applications to process vast amounts of data. It has been developed under the Apache Lucene Project and was originally developed to support distribution for Nutch, which is an effort to build an open source search engine for the search and index component. Hadoop consists of an open source implementation of Google’s published computing infrastructure, specifically MapReduce and the Google File System.
Scala + Hadoop!?
The Hadoop Map-Reduce framework is map based, whose keys and values are serializable objects which implements a simple serialization protocol. This serialization protocol is defined by the Writable interface. In addition, Hadoop provides Writable's implementations for each basic types(Ints, Long, Float, String, ...). This implementations wrap a value of the basic type in an Writable object. Let us take this example, and think about possible utilization of int values in Hadoop:
private final static IntWritable one = new IntWritable(1);
Sounds like primitive wrappers before Java 5 boxing!
Why Scala?
Read this and this. Moreover, Scala offers a bag of others features:
- Implicit conversion methods
- Type inference
In short, Scala provides a clear, concise and stylized syntax.
SHadoop = Scala + Hadoop
What we would like is something like this:
val one = 1Or like this:
def map(key: LongWritable, value: Text, output: OutputCollector[Text, IntWritable], reporter: Reporter) =
(value split " ") foreach (output collect (_, one))
The interesting point is that with Scala, this is quite simple to implement. SHadoop is the proof!!! SHadoop consists in only one source file containing a Scala object with some implicit methods that are often used for converting primitives Java types (including String) to writable instances. Furthermore, the SHadoop object provides implicit methods that are often used for converting writable java iterators to primitives type scala iterators - scala iterators provides a lot of useful methods, like foreach, map, filter and others.
Usage
The Hadoop Map-Reduce Tutorial shows a very simple Map-Reduce application that counts the number of occurences of each word in a given input set.
Source Code - WordCount.scala
package shadoop
import SHadoop._
import java.util.Iterator
import org.apache.hadoop.fs._
import org.apache.hadoop.io._
import org.apache.hadoop.mapred._
object WordCount {
class Map extends MapReduceBase with Mapper[LongWritable, Text, Text, IntWritable] {
val one = 1
def map(key: LongWritable, value: Text, output: OutputCollector[Text, IntWritable], reporter: Reporter) =
(value split " ") foreach (output collect (_, one))
}
class Reduce extends MapReduceBase with Reducer[Text, IntWritable, Text, IntWritable] {
def reduce(key: Text, values: Iterator[IntWritable],
output: OutputCollector[Text, IntWritable], reporter: Reporter) = {
val sum = values reduceLeft ((a: Int, b: Int) => a + b)
output collect (key, sum)
}
}
def main(args: Array[String]) = {
val conf = new JobConf(classOf[Map])
conf setJobName "wordCount"
conf setOutputKeyClass classOf[Text]
conf setOutputValueClass classOf[IntWritable]
conf setMapperClass classOf[Map]
conf setCombinerClass classOf[Reduce]
conf setReducerClass classOf[Reduce]
conf setInputFormat classOf[TextInputFormat]
conf setOutputFormat classOf[TextOutputFormat[_ <: WritableComparable, _ <: Writable]]
conf setInputPath(args(0))
conf setOutputPath(args(1))
JobClient runJob conf
}
}
Source code explained: Java x Scala
1. The one field from the Map class
Java
private final static IntWritable one = new IntWritable(1);
Scala
val one = 12. The map method from the Map class
Java
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
Scala
def map(key: LongWritable, value: Text,
output: OutputCollector[Text, IntWritable], reporter: Reporter) =
(value split " ") foreach (output collect (_, one))
Wow!!! Scala implicitly converts value to String and applies String's split method that returns a String array. This array iterate over each String adding it as key(implicitly converted to Text) from output object and whose value is one. Note: Scala doesn't require semicolons at the end of each instruction, they are optionals.
3. The reduce method from the Reduce class
Java
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
Scala
def reduce(key: Text, values: Iterator[IntWritable],
output: OutputCollector[Text, IntWritable], reporter: Reporter) = {
val sum = values reduceLeft ((a: Int, b: Int) => a + b)
output collect (key, sum)
}
Again, wow!!! On first line Scala calculates the sum of the values using the reduceLeft method from a Int iterator, implicitly converted from IntWritable java iterator. After, the output object collects the sum result.
Running
Assuming HADOOP_HOME is the root of the installation from Hadoop:
- Copy the scala-library.jar to ${HADOOP_HOME}/lib directory
- Run the application:
$> ${HADOOP_HOME}/bin/hadoop jar shadoop-0.0.1-alpha.jar shadoop.WordCount input/ output/
input/ - a directory containing the text-files as input set
ouput/ - a ouput directory
Download jar with sources here.