Sunday, November 30, 2008

Working with strings in Prolog

Strings in Prolog can be quite confusing if you come from another language. In this short article I will illustrate how they work, mainly by showing different examples and by creating a term that checks if a string is contained within another string.

First off, strings in Prolog are written in single quotes. Terms written in double quotes are immediately converted to a list of character codes. Therefore, strings can be handed directly to write. Strings are also atoms.


?- 'bergen' = String.
String = bergen.

?- "bergen" = List.
List = [98, 101, 114, 103, 101, 110].

?- atom('bergen').
true.

?- is_list("bergen").
true.

?- write('sdf').
sdf
true.

?- write("sdf").
[115, 100, 102]
true.

?- writef("%s", ["sdf"]).
sdf
true.


Since strings in Prolog are atoms, they naturally cannot be manipulated. So for
certain tasks they need to be converted to char lists. This can be done with
name. So to print the first character of a string you'll have to
convert it to a char list first.


?- name('bergen', CharList), nth0(0, CharList, FirstChar), put(FirstChar).
b
CharList = [98, 101, 114, 103, 101, 110],
FirstChar = 98.


We now know enough to create our term, contains/2. For illustration,
we will make it work for both strings and char lists, by converting to a char
list when needed. It will succeed if A is contained within B. We'll need a
utility term for this as well. Note that I am using SWI-Prolog, so I am
utilizing built in terms that might not be available in other implementations.
The full program as well as an example that generates all possible substrings contained within A follows below.



sublist(S, L) :-
append(_, L2, L),
append(S, _, L2).

contains(A, B) :-
atom(A),
atom(B),
name(A, AA),
name(B, BB),
contains(AA, BB).

contains(A, B) :-
atom(A),
name(A, AA),
contains(AA, B).

%% The empty list is removed mainly for nicer output in the following example.
contains(A, B) :-
sublist(B, A),
B \= [].

?- forall( contains('bergen', X) , writef("%s\n", [X]) ).
b
be
ber
berg
berge
bergen
e
er
erg
erge
ergen
r
rg
rge
rgen
g
ge
gen
e
en
n
true.

Sunday, October 26, 2008

Experimenting with automatons in Ruby, part 1

A finite automaton is an abstract computing device, which more often than not is used to recognize formal languages. One can look at an automaton as a tape reader which consists of a initial state, a set of final states and a set of transitions between states. The automaton accepts its input if it has reached the end of the input tape and is in a final state.

For the purpose of language processing we will implement different kinds of automatons in Ruby, starting with the simplest one we can conceive. We'll start out with a deterministic finite-state automaton (DFSA). Such an automaton is called deterministic, since it in any given state only has one possible transition for a given input symbol.

First we need an example language to work with, so I've created the "Ruby-language". It can be described with the following regular expression: /Ru+by/ (yes, DFSA's correspond to the languages that can be described by (pure) regular expressions). I admit this is a quite boring language, but we have to start somewhere, right? The alphabet for this language is the set {R,u,b,y}.

To create an automaton for this language we need four states (visualize this as a directed graph):

  • q0: 'R' -> q1
  • q1: 'u' -> q2
  • q2: 'u' -> q2, 'b' -> q3
  • q3: 'y' -> q4
The initial state is q0 and the set of final states is {q4}. As you can see there is a cycle at q2 to allow for an (theoretically) infinite amount of 'u' characters. Also note that there is no ambiguity of which state is the next one at any given point for any given input. Hence, the automaton is deterministic.

On to the implementation considerations. One of the most important design decisions is how we should represent the state transitions. For a DFSA almost anything would work, but we want to choose something flexible. Let's try with a nested hash:


{
:q0 => {"R" => :q1},
:q1 => {"u" => :q2},
:q2 => {"u" => :q2, "b" => :q3},
:q3 => {"y" => :q4}
}


I think this will work quite well. If we look up a given state in the hash, we will get another hash of all the possible transitions. Matching this up with a input tape should be quite easy. I've sketched up a spec for how this should work in practice:


describe "the Ruby-language" do
before(:each) do
@r = DFSA.new({
:q0 => {"R" => :q1},
:q1 => {"u" => :q2},
:q2 => {"u" => :q2, "b" => :q3},
:q3 => {"y" => :q4}
}, :q0, [:q4])
end

it "should recognize valid sentences" do
@r.should recognize %w(R u b y)
@r.should recognize %w(R u u u u b y)
end

it "should reject invalid sentences" do
@r.should_not recognize %w(r u b y)
@r.should_not recognize %w(R u b)
end
end


Note that we give the input to the automaton as a list (I've picked the %w() syntax since it feels kinda Lispy). This is called tokenization, and for now we'll do it by hand. The second argument to the DFSA constructor is the symbol for the initial state, and the third one is the set of final states. We won't worry about the actual implementation yet, as we want to let this sink in for a bit.

The next article in this series will be to create a DSL for setting up the automaton, then I'll write up on non-deterministic automatons and pushdown automatons. I'll also show more actual code (I'll create a GitHub repos for the automaton library). Possibly I'll get to transducers as well. If you have any suggestions, please leave a comment or ship me an e-mail.

Saturday, August 9, 2008

Use your .gitignore as the ignore pattern for echoe

Echoe is a library for quickly creating gems in Ruby. I I've hacked up a quick patch for it so you can use your .gitignore file as the manifest ignore pattern. I found it boring to re-list the ignore stuff since it in 99% of the cases (at least for me) already is present in .gitignore.

I'm sure that the patch doesn't work perfectly though. I know there is differences between the globbing syntax in gitignore and in Ruby's File#fnmatch - the latter is stricter. But it still works perfectly for the (small) projects I've tried it on.

Usage goes down like this:

Echoe.new(PROJECT) do |p|
...
p.ignore_pattern = FileList[".gitignore"]
...
end


Grab it from my fork on GitHub.

UPDATE: This feature is now included in echoe 3.0.2!

Monday, July 21, 2008

The lack of Database Abstraction in PHP

Unfortunately, almost none of the most popular PHP software supports other databases than MySQL (with MediaWiki being a notable exception - it supports Postgres). When you're used to programming in Ruby, multi-db compatibility is something you just take for granted after a while, because of the excellent ActiveRecord and DataMapper projects. In PHP however, this is a real deal breaker. I don't know if its a cultural problem or a library problem. I did look at Propel some time ago, but honestly the Java-esque XML configuration and over-complex API turned me off real quick. If someone could come up with something better, maybe even WordPress will support SQLite and PostgreSQL one day...

Friday, June 13, 2008

Zshfully Yours, Gem Shortcuts

Stephen Celis provided a slick bash-method for quickly opening and browsing RDocs, over at his blog. I have merely ported the function to zsh. As a slight modification, $BROWSER and $BROWSER_OPTIONS are used instead of just open, so it works cross platform. The code:



gemdoc() {
local gemdir=`gem env gemdir`
$BROWSER $BROWSER_OPTIONS $gemdir/doc/`ls $gemdir/doc/ | grep $1 | sort | tail -1`/rdoc/index.html
}

_gemdoc() {
compadd $(ls `gem env gemdir`/doc)
}

compdef _gemdoc gemdoc

Friday, June 6, 2008

Arch Linux: Changing Gnome's default browser

I'm a big fan of OpenBox, but out of habit I use a couple of Gnome apps. Most notably Gnome Terminal. For some reason, when I clicked URLs in the terminal they opened in Epiphany. Firefox is still my main browser, so I wanted to change that. I solved it like this:

  1. sudo pacman -Sy gconf-editor
  2. Run gconf-editor , edit the /desktop/gnome/url-handlers/http(s) keys to your liking. In my case "firefox %s".
That should be it!

Wednesday, June 4, 2008

Script for removing orphans in Arch Linux

Shell scripting is fun, so I created a small function that removes all orphaned packages as reported by Pacman. It calls itself recursively if removing the current orphans revealed any new ones. Tested with zsh since that is my personal preference the best shell around :) - but it probably runs under bash as well. Here is the code:


remove_orphans() {
for pkg in `pacman -Qdt|awk -F ' ' '{ printf("%s\n", $1) }'`; do
sudo pacman -R $pkg
done
if [ `pacman -Qdt|wc -l` -gt 0 ]; then remove_orphans; fi
}

Tuesday, June 3, 2008

Arch Linux: Alt+gr not working in Java apps

I got quite confused when my good old trusty Alt+gr key did not work in NetBeans (since I use Norwegian keyboard layout, this key is essential for progamming). The key worked everywhere except in Java apps, actually. Luckily, I figured out how to fix it:

  1. Install scim.
  2. Add your locale to the scim global config (/etc/scim/global).
  3. Start scim from your xinitrc.
  4. Restart your X server - it now should work!
For good measure, here's my .xinitrc file:


# SCIM
export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export XIM_PROGRAM="scim -d"
export QT_IM_MODULE="scim"
scim -d
# OPENBOX
exec openbox-session

Monday, June 2, 2008

Arch Linux: First impressions

So, I finally got fed up with Ubuntu. I really wanted to use FreeBSD again - but unfortunately the hardware in this box is not compatible. Arch Linux looked like a decent alternative so I downloaded the 2007.08-2 image and started installing. Misfortune struck again as I had to spend a couple of hours trying to get wireless working, but everything worked well as soon as I switched to the 2008.04-RC image. I wonder why they don't link to the RC image on the "Get Arch" page.

After some configuration I ended up with a really slick system - It's amazing how much faster stuff runs on OpenBox! Also, the BSD inspired rc.conf and init system made my day. The only real problem I encountered was when I tried to install an older version of a library. Pacman (Arch's package manager) doesn't support this. I consider this to be a flaw in the package manager (which all over feels immature compared to apt or BSD ports), and I predict the devs will realize this sooner or later - even though they currently consider it to be a "feature".

I find it amusing how things tend to move in cycles. When I started out with Linux ~8 years ago Slackware was the coolest thing around. Then we somehow ended up with Ubuntu. And now the "KISS"-distros are turning into the hippest kids on the block again. If you're fed up with the bloat, give Arch a try. :)

Sunday, May 18, 2008

(str)tr for JavaScript

I couldn't find a function resembling Ruby's tr or PHP's strtr in JavaScript, so I created one. Here's the code:


function tr(str, from, to) {
var subst;
for (i = 0; i < from.length; i++) {
subst = (to[i]) ? to[i] : to[to.length-1];
str = str.replace(new RegExp(str[str.indexOf(from[i])], 'g'), subst);
}
return str;
}

Tuesday, May 6, 2008

Java oddity

I really don't like the fact that you can run static class-methods from an object in Java. In effect, object.staticMethod(); gets translated into something like object.getClass().staticMethod();. AFAIK, no other language has this "feature". Is there any reason behind this madness?

Tuesday, January 15, 2008

Nginx 0.5.35 released

I don't know of any feeds that track nginx releases, so this is just a heads up for people that missed the release of nginx 0.5.35. Release notes:


*) Change: now the ngx_http_userid_module adds start time microseconds
to the cookie field contains a pid value.

*) Change: now the uname(2) is used on Linux instead of procfs.
Thanks to Ilya Novikov.

*) Feature: the "If-Range" request header line support.
Thanks to Alexander V. Inyukhin.

*) Bugfix: in HTTPS mode requests might fail with the "bad write retry"
error; bug appeared in 0.5.13.

*) Bugfix: the STARTTLS in SMTP mode did not work.
Thanks to Oleg Motienko.

*) Bugfix: large_client_header_buffers did not freed before going to
keep-alive state.
Thanks to Olexander Shtepa.

*) Bugfix: the "limit_rate" directive did not allow to use full
throughput, even if limit value was very high.

*) Bugfix: the $status variable was equal to 0 if a proxied server
returned response in HTTP/0.9 version.

*) Bugfix: if the "?" character was in a "error_page" directive, then
it was escaped in a proxied request; bug appeared in 0.5.32.


Now go and upgrade your servers!