Wednesday, March 11, 2009

This blog is now deprecated

Please redirect your feed reader to

Sunday, November 30, 2008

Working with strings in Prolog

Strings in Prolog can be quite confusing if you come from another language. In this short article I will illustrate how they work, mainly by showing different examples and by creating a term that checks if a string is contained within another string.

First off, strings in Prolog are written in single quotes. Terms written in double quotes are immediately converted to a list of character codes. Therefore, strings can be handed directly to write. Strings are also atoms.

?- 'bergen' = String.
String = bergen.

?- "bergen" = List.
List = [98, 101, 114, 103, 101, 110].

?- atom('bergen').

?- is_list("bergen").

?- write('sdf').

?- write("sdf").
[115, 100, 102]

?- writef("%s", ["sdf"]).

Since strings in Prolog are atoms, they naturally cannot be manipulated. So for
certain tasks they need to be converted to char lists. This can be done with
name. So to print the first character of a string you'll have to
convert it to a char list first.

?- name('bergen', CharList), nth0(0, CharList, FirstChar), put(FirstChar).
CharList = [98, 101, 114, 103, 101, 110],
FirstChar = 98.

We now know enough to create our term, contains/2. For illustration,
we will make it work for both strings and char lists, by converting to a char
list when needed. It will succeed if A is contained within B. We'll need a
utility term for this as well. Note that I am using SWI-Prolog, so I am
utilizing built in terms that might not be available in other implementations.
The full program as well as an example that generates all possible substrings contained within A follows below.

sublist(S, L) :-
append(_, L2, L),
append(S, _, L2).

contains(A, B) :-
name(A, AA),
name(B, BB),
contains(AA, BB).

contains(A, B) :-
name(A, AA),
contains(AA, B).

%% The empty list is removed mainly for nicer output in the following example.
contains(A, B) :-
sublist(B, A),
B \= [].

?- forall( contains('bergen', X) , writef("%s\n", [X]) ).

Sunday, October 26, 2008

Experimenting with automatons in Ruby, part 1

A finite automaton is an abstract computing device, which more often than not is used to recognize formal languages. One can look at an automaton as a tape reader which consists of a initial state, a set of final states and a set of transitions between states. The automaton accepts its input if it has reached the end of the input tape and is in a final state.

For the purpose of language processing we will implement different kinds of automatons in Ruby, starting with the simplest one we can conceive. We'll start out with a deterministic finite-state automaton (DFSA). Such an automaton is called deterministic, since it in any given state only has one possible transition for a given input symbol.

First we need an example language to work with, so I've created the "Ruby-language". It can be described with the following regular expression: /Ru+by/ (yes, DFSA's correspond to the languages that can be described by (pure) regular expressions). I admit this is a quite boring language, but we have to start somewhere, right? The alphabet for this language is the set {R,u,b,y}.

To create an automaton for this language we need four states (visualize this as a directed graph):

  • q0: 'R' -> q1
  • q1: 'u' -> q2
  • q2: 'u' -> q2, 'b' -> q3
  • q3: 'y' -> q4
The initial state is q0 and the set of final states is {q4}. As you can see there is a cycle at q2 to allow for an (theoretically) infinite amount of 'u' characters. Also note that there is no ambiguity of which state is the next one at any given point for any given input. Hence, the automaton is deterministic.

On to the implementation considerations. One of the most important design decisions is how we should represent the state transitions. For a DFSA almost anything would work, but we want to choose something flexible. Let's try with a nested hash:

:q0 => {"R" => :q1},
:q1 => {"u" => :q2},
:q2 => {"u" => :q2, "b" => :q3},
:q3 => {"y" => :q4}

I think this will work quite well. If we look up a given state in the hash, we will get another hash of all the possible transitions. Matching this up with a input tape should be quite easy. I've sketched up a spec for how this should work in practice:

describe "the Ruby-language" do
before(:each) do
@r ={
:q0 => {"R" => :q1},
:q1 => {"u" => :q2},
:q2 => {"u" => :q2, "b" => :q3},
:q3 => {"y" => :q4}
}, :q0, [:q4])

it "should recognize valid sentences" do
@r.should recognize %w(R u b y)
@r.should recognize %w(R u u u u b y)

it "should reject invalid sentences" do
@r.should_not recognize %w(r u b y)
@r.should_not recognize %w(R u b)

Note that we give the input to the automaton as a list (I've picked the %w() syntax since it feels kinda Lispy). This is called tokenization, and for now we'll do it by hand. The second argument to the DFSA constructor is the symbol for the initial state, and the third one is the set of final states. We won't worry about the actual implementation yet, as we want to let this sink in for a bit.

The next article in this series will be to create a DSL for setting up the automaton, then I'll write up on non-deterministic automatons and pushdown automatons. I'll also show more actual code (I'll create a GitHub repos for the automaton library). Possibly I'll get to transducers as well. If you have any suggestions, please leave a comment or ship me an e-mail.

Saturday, August 9, 2008

Use your .gitignore as the ignore pattern for echoe

Echoe is a library for quickly creating gems in Ruby. I I've hacked up a quick patch for it so you can use your .gitignore file as the manifest ignore pattern. I found it boring to re-list the ignore stuff since it in 99% of the cases (at least for me) already is present in .gitignore.

I'm sure that the patch doesn't work perfectly though. I know there is differences between the globbing syntax in gitignore and in Ruby's File#fnmatch - the latter is stricter. But it still works perfectly for the (small) projects I've tried it on.

Usage goes down like this: do |p|
p.ignore_pattern = FileList[".gitignore"]

Grab it from my fork on GitHub.

UPDATE: This feature is now included in echoe 3.0.2!

Monday, July 21, 2008

The lack of Database Abstraction in PHP

Unfortunately, almost none of the most popular PHP software supports other databases than MySQL (with MediaWiki being a notable exception - it supports Postgres). When you're used to programming in Ruby, multi-db compatibility is something you just take for granted after a while, because of the excellent ActiveRecord and DataMapper projects. In PHP however, this is a real deal breaker. I don't know if its a cultural problem or a library problem. I did look at Propel some time ago, but honestly the Java-esque XML configuration and over-complex API turned me off real quick. If someone could come up with something better, maybe even WordPress will support SQLite and PostgreSQL one day...

Friday, June 13, 2008

Zshfully Yours, Gem Shortcuts

Stephen Celis provided a slick bash-method for quickly opening and browsing RDocs, over at his blog. I have merely ported the function to zsh. As a slight modification, $BROWSER and $BROWSER_OPTIONS are used instead of just open, so it works cross platform. The code:

gemdoc() {
local gemdir=`gem env gemdir`
$BROWSER $BROWSER_OPTIONS $gemdir/doc/`ls $gemdir/doc/ | grep $1 | sort | tail -1`/rdoc/index.html

_gemdoc() {
compadd $(ls `gem env gemdir`/doc)

compdef _gemdoc gemdoc

Friday, June 6, 2008

Arch Linux: Changing Gnome's default browser

I'm a big fan of OpenBox, but out of habit I use a couple of Gnome apps. Most notably Gnome Terminal. For some reason, when I clicked URLs in the terminal they opened in Epiphany. Firefox is still my main browser, so I wanted to change that. I solved it like this:

  1. sudo pacman -Sy gconf-editor
  2. Run gconf-editor , edit the /desktop/gnome/url-handlers/http(s) keys to your liking. In my case "firefox %s".
That should be it!