What to expect from the Ruby expect library?

A little background first - expect is a library to interact with programs using ruby. Conceptually it’s based on the original UNIX expect program which is commonly used to automate UNIX administration. Expect library provides an API which can be passed a regex to match the expected output from the program, and optionally an action to take on a match. I have been experimenting with the expect library to automate a gdb session. Expect is an undocumented module and :Ri does not help. So just in case you are not able to get it to work for you, read on.

Table of Contents

Spawning interactive programs

Ruby has several ways to spawn programs. For a nice treatment of the subject check out Avdi’s excellent blog posts on the subject. We’ll use Ruby’s PTY library to spawn the program, and interact with it using expect. Using PTY library prevents IO buffering, which can cause problems during interaction with a spawned process. Let’s see how to make gdb print out a help message, using two methods.

Method 1 - Single interaction
This ruby script will take a string as an argument, and print out gdb help for it. We’ll spawn a gdb process, send it the gdb help command and read the results from its stdout. Listing 1 shows the script. You’d notice that PTY.spawn is passed a Ruby block, which is executed immediately after the gdb process is spawned. When the block ends the gdb process gets terminated. PTY.spawn passes in the input/output File objects for the child gdb process, as well as its pid to the block. The till_prompt() method, reads all output from gdb till the next gdb prompt is seen, and returns the data read as a string. Notice the use of IO.getc() method. We don’t use IO.gets because when gdb prints out its prompt, it waits for input from the use and therefore does not print a newline, after the prompt. If IO.gets is used, it will stall waiting for a newline after the prompt is printed by gdb.

Sample output after executing this script is shown in Listing 2. This is a deliberately minimal program, designed to demonstrate concepts, and does not attempt to do any error handling.

 1 #!/usr/bin/env ruby
 2 require pty
 3 
 4 def till_prompt(cout)
 5     buffer = ""
 6     loop { buffer << cout.getc.chr; break if buffer =~ /\(gdb\)/ }
 7     return buffer
 8 end
 9 
10 PTY.spawn("gdb") do |gdb_out, gdb_in, pid|
11     printf till_prompt(gdb_out)
12     gdb_in.printf("help #{ARGV[0]}\n")
13     puts till_prompt(gdb_out)
14 end

Listing 1 - gdb.rb

[sudhanshu@sudhanshu-desktop]$ ./gdb.rb break
help break
Set breakpoint at specified line or function.
break [LOCATION] [thread THREADNUM] [if CONDITION]
LOCATION may be a line number, function name, or “*” and an address.
If a line number is specified, break at start of code for that line.
If a function is specified, break at start of code for that function.
If an address is specified, break at that exact address.
With no LOCATION, uses current execution address of selected stack frame.
This is useful for breaking on return to a stack frame.

THREADNUM is the number from “info threads”.
CONDITION is a boolean expression.

Multiple breakpoints at one place are permitted, and useful if conditional.

Do “help breakpoints” for info on other commands dealing with breakpoints.
(gdb)
[sudhanshu@sudhanshu-desktop]$

Listing 2 - gdb.rb output

Notice the one shot usage of this program. It exits immediately and that’s why sending PTY.spawn a block for execution makes sense here. We’ll see why we’d not want to send a block of code to execute, in the next method of spawning interactive processes.

Method 2 - Multiple interactions
This method is useful in those circumstances, where you’d like to save the input, output objects returned by PTY.spawn for later and interact with the process multiple times using these objects. Let’s rewrite the gdb.rb program in Listing 1, to be used as a loadable library in irb, instead of a one-shot program. Listing 3 shows this program. You’d notice the new gdb() method which is provided as an API to run arbitrary gdb commands. Also notice that this time we store a reference to the input/output objects returned by PTY.spawn. Listing 4 shows how this version can be used by the user more interactively.

 1 #!/usr/bin/env ruby
 2 require pty
 3 
 4 def till_prompt(cout)
 5     buffer = ""
 6     loop { buffer << cout.getc.chr; break if buffer =~ /\(gdb\)/ }
 7     return buffer
 8 end
 9 
10 def gdb(string)
11     @gdb_in.printf("#{string}\n")
12     puts till_prompt(@gdb_out)
13 end
14 
15 @gdb_out, @gdb_in, @pid = PTY.spawn("gdb")
16 printf till_prompt(@gdb_out)

Listing 3 - gdb_irb.rb

[sudhanshu@sudhanshu-desktop]$ irb
>> require ‘gdb_irb.rb’
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb)=> true
>> gdb "help break"
 help break
Set breakpoint at specified line or function.
break [LOCATION] [thread THREADNUM] [if CONDITION]
LOCATION may be a line number, function name, or "*" and an address.
If a line number is specified, break at start of code for that line.
If a function is specified, break at start of code for that function.
If an address is specified, break at that exact address.
With no LOCATION, uses current execution address of selected stack frame.
This is useful for breaking on return to a stack frame.

THREADNUM is the number from "info threads".
CONDITION is a boolean expression.

Multiple breakpoints at one place are permitted, and useful if conditional.

Do "help breakpoints" for info on other commands dealing with breakpoints.
(gdb)
=> nil
>> gdb "file /bin/date"
 file /bin/date
Reading symbols from /bin/date…(no debugging symbols found)…done.
(gdb)
=> nil
>> gdb "r"
 r
Starting program: /bin/date
[Thread debugging using libthread_db enabled]
Tue Aug 10 05:37:22 IST 2010

Program exited normally.
(gdb)
=> nil
>> @gdb_in.inspect
=> "#<File:/dev/pts/4>"
>>
?> @gdb_out.inspect
=> "#<File:/dev/pts/4>"
>>

Listing 4 - Interactive GDB in irb

Notice how irb “wraps” around gdb and created a much more powerful debugging environment on top of gdb. We could for instance read large amounts of data from the process being debugged into Ruby variables/classes and analyze it using the far more powerful facilities provided by the Ruby programming enviroment, as compared to gdb macros/scripts.

Expect library

Now that we have seen how to spawn processes in Ruby, and interact with them with custom code, let’s see how to use the expect library to interact with them. The expect library adds an expect() method to the IO class, which is the basis of all input output in Ruby. The expect() method is just a beefed up version of the till_prompt() method that we saw above.

While the till_prompt() method used a fixed pattern to match the next gdb prompt, the expect() method takes a ruby String or a regular expression object of type Regexp as a pattern to match against program output.

The till_prompt() method simply returned the whole buffer after matching the fixed pattern. However, the expect() method can optionally take a Ruby block to execute as soon as the pattern matches. This block is passed in the array containing the result of the match. Alternatively, if a block is not given, it will return the result array containing the buffer against which the pattern was matched, followed by the flattened, MatchData object returned by Regexp#match().

Apart from these the expect() method can optionally take a timeout value in seconds as its second argument. If no match is found within the given time limit, it returns nil. Thus the API can be summarized in the following way:

1 result = IO.expect("pattern" | /pattern/ [, timeout in secs]) [ { |array| …. } ]

Note that if a block is passed into expect(), the return value is that returned by the block, which can be anything and not necessarily an array. Now let’s see the implementation of the above two programs using the expect() method.

1 #!/usr/bin/env ruby
2 require pty
3 require expect

5 PTY.spawn("gdb") do |gdb_out, gdb_in, pid|
6     gdb_out.expect(/\(gdb\)/) { |r| gdb_in.printf("help #{ARGV[0]}\n") }
7     puts gdb_out.expect(/\(gdb\)/)[0]
8 end

Listing 5 - gdb_expect.rb

 1 #!/usr/bin/env ruby
 2 require pty
 3 require expect
 4 
 5 def gdb(string)
 6     @gdb_in.printf("#{string}\n")
 7     puts @gdb_out.expect(/\(gdb\)/)[0]
 8 end
 9 
10 @gdb_out, @gdb_in, @pid = PTY.spawn("gdb")
11 puts @gdb_out.expect(/\(gdb\)/)[0]

Listing 6 - gdb_irb_expect.rb

Beginners with the Ruby expect library can get into trouble if they assume that the pattern being passed in to the expect() method will be matched against each line output by the spawned program. This assumption is incorrect, because as we saw the pattern is actually matched against all the characters read into a buffer which includes newline characters. It’s also worth mentioning that the newlines seen by expect() are “\r\n” and not “\n”.

As a debugging mechanism there’s a global variable called $expect_verbose, provided by the expect library. Set this variable to true in your program, and expect() method will print every character read at each intermediate step on stdout. This is an extremely useful tool for debugging expect programs.

Resources

20 high performance program design related links

This is a list of links to various articles, papers etc. related to high performance program design.

  1. What every programmer should know about memory?
  2. There is a lot more to hash functions than they teach you at school.
  3. How to search for the word pen1s in 185 emails every second.
  4. Regular expression matching can be simple and fast
  5. A scalable concurrent malloc implementation for FreeBSD
  6. Eventual consistency
  7. Latency lags bandwidth
  8. High Performance Server Design
  9. Michael Abrash’s Graphics Programming Black Book
  10. Virtual Machine Showdown: Stack vs Registers
  11. The C10k problem
  12. Judy Arrays
  13. Map-Reduce Framework
  14. Amdahl’s Law
  15. Pipelining: An overview - Part I
  16. Pipelining: An overview - PartII
  17. Wikipedia: CPU Cache
  18. Pentium: An Architectural History - Part I
  19. Pentium: An Architectural History - Part II

Ok, so you might be wondering, where’s the 20th link? I lied. You just get 19. In case anyone has more interesting links related to high performance program design, please do leave them in the comments!

Add 42klines search engine to Firefox’s search bar

I wanted an easier way to get to the 42klines search engine I created for UNIX programmers. So, I found out a few ways to put it on the Firefox search bar (IE7 supported as well). There are easy ways and another slightly more hands-on, pull up your sleeves, grab the tools and get to work type.

The 3-step Install
First the easy one. Mozilla provides a Firefox extension called Add to Search Bar, which can apparently add any search bar on any website to the Firefox search bar. These are the steps:

  • Install the extension.
  • Right click on the 42klines search bar and select the “Add to Search Bar…” option
  • Change the name of the engine in the dialog box that pops up, if you like and you are done.

More details here. So that was the easy way. You can use this method to add a search bar from any web page to your browser’s search bar. If that worked for you and you are not interested in knowing anymore, skip the rest of the post.

Generate Plugin & Install
There is another way to add a search engine to your Firefox search bar. There is an online search engine plugin generator available for generating search plugins at www.searchplugins.net. Here are the steps to follow:

  1. Register on the www.searchplugins.net website
  2. Go to the search engine page and make a search for the word TEST.
  3. When the search results appear, copy the URL in the browser’s location bar. For the 42klines search engine this is the URL.
  4. Go to the plugin generator page and paste the URL in the “Search URL:” field of the form.
  5. Fill up the rest of the form and create the search engine plugin.
  6. You will be given the option to add the search engine plugin to the search bar. Click on the link and you are done.

This method is useful for people who provide a search engine of their own. Their search engine will now be listed among all the other search engines listed on www.searchplugins.net if they made it public. You can also grab the generated plugin’s source file in the OpenSearch XML format from your account on this site. This allows you to use the source file to provide a link on your website, which triggers a javascript, which installs the search engine to a user’s browser search bar easily from your own website.

Getting your hands dirty
Now how about getting down to brass tacks and do it the hard way. Mozilla provides a way to easily add search engines from their search-engine add page. But what if the search engine you want to add is not listed among those. Or what if you have created your own search engine and want to allow people to add it to their browser’s search bar through a link on your web page? Read on.

To allow adding your search engine to a user’s Firefox browser search bar from your web page, you need to follow these two steps:

  1. Create an icon for your search engine and encode it in BASE64.
  2. Create a search engine descriptor file.
  3. Provide a link on your website which installs the search engine through Javascript.
  4. Optionally add search engine plugin auto-discovery support on your website.

Older browsers did not have the support for search bars, so you need not worry about them. Firefox 1.5 used another format called Sherlock for adding a search engine to the browser search bar. Newer browsers (Firefox 2.0+ & IE7) use a format called OpenSearch. The above links describe creating a search engine plugin using OpenSearch. OpenSearch allows a lot more than described in the links present in the above steps. Read through the documentation in case you are interested to know more.

Resources
Mozilla Mycroft Project: Search engine plugins for Firefox & IE7
Mycroft Plugin Generator: Advanced search engine plugin generator
Mycroft Search Engine Submission: Submit plugin to the Mycroft directory (OpenSearch format)
Mycroft Sherlock Submission: Mycroft page for submitting legacy Sherlock plugins
Mozilla Extensions: Create a Firefox extension if you like.
Encode data in Base64: Online tool you can use for encoding the search engine image in Base64.

42klines: A Search Engine For UNIX Programmers

Are you a UNIX programmer? Then this may be very useful to you.

Google has offered the ability to create a customized search engine (CSE) which searches a list of sites given by you. I decided to take it for a test drive. I ended up with a surprisingly useful search engine customized to serve UNIX programmers. You can find the search engine box at the top of this blog. It currently searches more than 400 websites which are useful for UNIX programmers. You will find a search box which looks like this on the top of this blog.

Unix Programmer's Search Engine

Table of Contents

Why is it useful?

If you do a Google web search, the search engine cannot identify the context in which you have done the search, immediately. A keyword such as “signals” can imply different things (traffic signals, hand signals, UNIX signals?). In order to be useful to all people, Google gives search results from different contexts, if applicable, in its search results. This means Google web search can end up wasting your time (you’ll have to filter results manually) while reducing the relevance of results in your context. A CSE, however returns results related to exactly what you want.

I realized the usefulness of this, while discussing the semantics of handling signals by multi-threaded processes in Linux, with a colleague recently. The problem we were facing was related to the way gdb was handling signals received by a multi-threaded process, we were tracing. We were not sure about the current Linux semantics so we decided to search. Co-incidentally I had added around 200 websites to this custom search engine related to UNIX programming a few days ago. So I decided to give it a test drive. I searched for signals thread. The top 5 results from the CSE gave me more than I needed to know, about Linux signal handling in multi-threaded processes. I compared the results with Google web search, and found that a very good article related to this topic, was not present at all in the first few pages of the web search results! Moreover, I found that almost all the CSE’s results in the first page were directly relevant to what I wanted to know, while the quality of web search results wasn’t that high.

The results on the web weren’t that bad, but they were not the best either. Google has done a good job with the custom search engine offering. Take a look at the results from the first page of web search below. Then try the 42klines search. Do you see the difference?

Results of Web Search

No! I am not interested in girls who give me mixed signals.

What websites does 42klines search?

For a start I have seeded the engine with more than 400 websites which can be useful to UNIX programmers. They loosely fall in the following categories:

  1. Research organizations (IEEE, ACM, Citeseer etc.)
  2. UNIX/Programming Magazines (DDJ, Linux Journal, LWN , KernelTrap etc.)
  3. Forums (Interesting google groups, etc)
  4. OS development resources (NonDot, Sandpile, x86 etc)
  5. Bookmarking sites (Reddit, Del.icio.us)
  6. Free web hosted books (Linux Device Drivers, OpenBookProject etc.)
  7. Document hosting sites (Scribd, Wikipedia, Linux HOWTOs etc.)
  8. Blogs and personal websites hosting useful programming information (Robert Love, Ulrich Drepper etc.)
  9. University courses available online and useful for UNIX programmers (MIT Open Courseware etc.)
  10. Application hosting/indexing websites (Sourceforge, FSF etc.)
  11. Conferences (USENIX, Linux Conferences etc.)
  12. Miscellaneous pages

Can I put this search engine on my own website?

Yes, you can easily do that. The search engine hosted on this website is a linked CSE. Another flavor of it called the stored CSE, is hosted in Google’s databases. The differences between the two flavors have been detailed later on in the post. You can easily add the stored CSE flavor to your iGoogle page as a gadget. You can download this code to put the 42klines search engine on your blog or website. Customize the look in whatever way you want. The search results are hosted on a page on this website, because that page requires another snippet of code from Google. If you want to host the results on your website, let me know. I’ll provide the code necessary to do so. You can skip the rest of the post, if you are not interested in knowing how the search engine works. If you want to add your own bookmarks useful for UNIX programmers in the 42klines search engine, read on. A few useful resources are listed at the end of this post.

Can I add my own bookmarks to 42klines?

Whenever I find good links or websites useful to me as a UNIX programmer I plan to add them to this search engine, for everyone’s benefit. The list of websites which are currently indexed can be given to Google in an annotation file in the XML format. The annotation file for 42klines search engine is hosted in a Subversion repository: http://svn2.assembla.com/svn/42klines_search on Assembla. Assembla hosts subversion repositories for projects. If you are interested in adding more links to 42klines, send a mail to me at sudhanshu.goswami at 42klines dot com. I’ll send an invite to you from Assembla. Checkout the 42klines search engine’s websites list by running this command:

svn checkout http://svn2.assembla.com/svn/42klines_search

If you prefer GUIs, you can also use RapidSVN on Linux to do the same. The 42klines search engine on this website is a linked CSE. It has a stored CSE flavor as well. The difference between the two flavors are detailed in the next section. List of websites to be searched are maintained in a different way for each flavor. Going forward, I plan to update the linked CSE first, while periodically bringing the stored CSE in sync with it. I maintain two flavors because, it is easy to add the stored CSE to iGoogle as a gadget.

Custom Search Engine Flavors

The table below describes the differences between a linked CSE and a stored CSE.

Stored Custom Search Engine Linked Custom Search Engine
Can be built using wizards hosted here. Metafiles can only be created manually.
Websites searched are stored in Google's database. Websites searched are stored in an annotation file hosted on your server.
Websites added to search engine database get immediately reflected in the search results. Websites added to annotation files will get reflected in the search results on the next refresh by Google. To immediately refresh or test annotation file, you can use this tool.
Maximum number of sites = 5000. Multiple annotation files allowed. Each file's max size = 3MB. Total file sizes <= 10 MB.
Get their own Google hosted web pages like this. No home page for a linked CSE created on Google. You can create your own home page for it.
People can volunteer to contribute from a stored CSE's home page. This option is not available for a linked CSE.
Restricted in number of things possible. Be creative. You can customize your annotation files on the fly. How? You can switch from a stored CSE to a linked CSE like this.
Google provides links to add this kind of an engine easily to your blog or iGoogle home page. E.g. use this to add it to your iGoogle page. Linked CSE has to be manually added to a website. E.g. Linked CSE flavor of 42klines search engine can be added by downloading and adding this piece of code to your website.

Getting your hands dirty

This section is just a blurb about things to know, while working with Google’s custom search engine. I’ll list them down pointwise.

  1. Opera’s latest version does not seem to be supported. Some features like saving options for the search engine worked, but the “Save” button got permanently disabled after saving. These kinds of problems may occur if you are using uncommon browsers. YMMW.
  2. I tried to replace the context file of the stored search engine with that of the linked search engine using the Advanced tab of the search engine’s wizard interface, however it did not work. So, no home page for the linked CSE could be created on Google.
  3. If you are not trying to customize a search engine in non-traditional ways, and just want a search box for your blog/homepage, you are better off sticking to a Stored custom search engine. However, if you have got special needs or have more than 5000 websites to search, you’ll have to use a linked search engine.
  4. Google’s custom search engines can be customized to a great extent to give highly targeted results. This can be achieved by assigning topics to websites and labeling them. Labels can be used to tweak the search results in the favor of websites stamped with a particular label or completely provide search results only from websites stamped with that label. Further a boost factor can be associated with websites to boost search results from them. You can refer to this CSE glossary, if you are having trouble following these terms.
  5. Google’s management interface for stored CSEs does not provide the ability to assign labels, boost strengths for some websites, add filters, created nested search engines etc. You can do all of these with stored CSEs, but you will have to first download the annotation file for the stored websites and the context file for your stored search engine. Then you will have to edit them manually and upload them. This can be done from the Advanced tab of the management interface.

Resources

42klines CSE: Download code to put on your website here
42klines iGoogle gadget: Add this search engine to your iGoogle page
42klines subversion repository
Coopdir: Directory of custom search engines.
GooglePicks: Picked custom search engines by Google.
RubyCorner: A custom search engine for Ruby programmers.
Python CSE: A custom search engine for Python programmers.
Linux: A custom search engine for linux users created by a sysadmin.

Update: Some cleanups done to the post. Added a table of contents, but unfortunately the anchor links did not work as expected. Still trying to figure out how to fix this. [Mar 2: Fixed. At the cost of breaking previous permalinks. Please update any bookmarks to permanent links. This site is going through some initial growing pains.]