20 high performance program design related links

This is a list of links to various articles, papers etc. related to high performance program design.

  1. What every programmer should know about memory?
  2. There is a lot more to hash functions than they teach you at school.
  3. How to search for the word pen1s in 185 emails every second.
  4. Regular expression matching can be simple and fast
  5. A scalable concurrent malloc implementation for FreeBSD
  6. Eventual consistency
  7. Latency lags bandwidth
  8. High Performance Server Design
  9. Michael Abrash’s Graphics Programming Black Book
  10. Virtual Machine Showdown: Stack vs Registers
  11. The C10k problem
  12. Judy Arrays
  13. Map-Reduce Framework
  14. Amdahl’s Law
  15. Pipelining: An overview - Part I
  16. Pipelining: An overview - PartII
  17. Wikipedia: CPU Cache
  18. Pentium: An Architectural History - Part I
  19. Pentium: An Architectural History - Part II

Ok, so you might be wondering, where’s the 20th link? I lied. You just get 19. In case anyone has more interesting links related to high performance program design, please do leave them in the comments!

Add 42klines search engine to Firefox’s search bar

I wanted an easier way to get to the 42klines search engine I created for UNIX programmers. So, I found out a few ways to put it on the Firefox search bar (IE7 supported as well). There are easy ways and another slightly more hands-on, pull up your sleeves, grab the tools and get to work type.

The 3-step Install
First the easy one. Mozilla provides a Firefox extension called Add to Search Bar, which can apparently add any search bar on any website to the Firefox search bar. These are the steps:

  • Install the extension.
  • Right click on the 42klines search bar and select the “Add to Search Bar…” option
  • Change the name of the engine in the dialog box that pops up, if you like and you are done.

More details here. So that was the easy way. You can use this method to add a search bar from any web page to your browser’s search bar. If that worked for you and you are not interested in knowing anymore, skip the rest of the post.

Generate Plugin & Install
There is another way to add a search engine to your Firefox search bar. There is an online search engine plugin generator available for generating search plugins at www.searchplugins.net. Here are the steps to follow:

  1. Register on the www.searchplugins.net website
  2. Go to the search engine page and make a search for the word TEST.
  3. When the search results appear, copy the URL in the browser’s location bar. For the 42klines search engine this is the URL.
  4. Go to the plugin generator page and paste the URL in the “Search URL:” field of the form.
  5. Fill up the rest of the form and create the search engine plugin.
  6. You will be given the option to add the search engine plugin to the search bar. Click on the link and you are done.

This method is useful for people who provide a search engine of their own. Their search engine will now be listed among all the other search engines listed on www.searchplugins.net if they made it public. You can also grab the generated plugin’s source file in the OpenSearch XML format from your account on this site. This allows you to use the source file to provide a link on your website, which triggers a javascript, which installs the search engine to a user’s browser search bar easily from your own website.

Getting down and dirty
Now how about getting down to brass tacks and do it the hard way. Mozilla provides a way to easily add search engines from their search-engine add page. But what if the search engine you want to add is not listed among those. Or what if you have created your own search engine and want to allow people to add it to their browser’s search bar through a link on your web page? Read on.

To allow adding your search engine to a user’s Firefox browser search bar from your web page, you need to follow these two steps:

  1. Create an icon for your search engine and encode it in BASE64.
  2. Create a search engine descriptor file.
  3. Provide a link on your website which installs the search engine through Javascript.
  4. Optionally add search engine plugin auto-discovery support on your website.

Older browsers did not have the support for search bars, so you need not worry about them. Firefox 1.5 used another format called Sherlock for adding a search engine to the browser search bar. Newer browsers (Firefox 2.0+ & IE7) use a format called OpenSearch. The above links describe creating a search engine plugin using OpenSearch. OpenSearch allows a lot more than described in the links present in the above steps. Read through the documentation in case you are interested to know more.

Resources
Mozilla Mycroft Project: Search engine plugins for Firefox & IE7
Mycroft Plugin Generator: Advanced search engine plugin generator
Mycroft Search Engine Submission: Submit plugin to the Mycroft directory (OpenSearch format)
Mycroft Sherlock Submission: Mycroft page for submitting legacy Sherlock plugins
Mozilla Extensions: Create a Firefox extension if you like.
Encode data in Base64: Online tool you can use for encoding the search engine image in Base64.

42klines: A Search Engine For UNIX Programmers

Are you a UNIX programmer? Then this may be very useful to you.

Google has offered the ability to create a customized search engine (CSE) which searches a list of sites given by you. I decided to take it for a test drive. I ended up with a surprisingly useful search engine customized to serve UNIX programmers. You can find the search engine box at the top of this blog. It currently searches more than 400 websites which are useful for UNIX programmers. You will find a search box which looks like this on the top of this blog.

Unix Programmer's Search Engine

Table of Contents

Why is it useful?

If you do a Google web search, the search engine cannot identify the context in which you have done the search, immediately. A keyword such as “signals” can imply different things (traffic signals, hand signals, UNIX signals?). In order to be useful to all people, Google gives search results from different contexts, if applicable, in its search results. This means Google web search can end up wasting your time (you’ll have to filter results manually) while reducing the relevance of results in your context. A CSE, however returns results related to exactly what you want.

I realized the usefulness of this, while discussing the semantics of handling signals by multi-threaded processes in Linux, with a colleague recently. The problem we were facing was related to the way gdb was handling signals received by a multi-threaded process, we were tracing. We were not sure about the current Linux semantics so we decided to search. Co-incidentally I had added around 200 websites to this custom search engine related to UNIX programming a few days ago. So I decided to give it a test drive. I searched for signals thread. The top 5 results from the CSE gave me more than I needed to know, about Linux signal handling in multi-threaded processes. I compared the results with Google web search, and found that a very good article related to this topic, was not present at all in the first few pages of the web search results! Moreover, I found that almost all the CSE’s results in the first page were directly relevant to what I wanted to know, while the quality of web search results wasn’t that high.

The results on the web weren’t that bad, but they were not the best either. Google has done a good job with the custom search engine offering. Take a look at the results from the first page of web search below. Then try the 42klines search. Do you see the difference?

Results of Web Search

No! I am not interested in girls who give me mixed signals.

What websites does 42klines search?

For a start I have seeded the engine with more than 400 websites which can be useful to UNIX programmers. They loosely fall in the following categories:

  1. Research organizations (IEEE, ACM, Citeseer etc.)
  2. UNIX/Programming Magazines (DDJ, Linux Journal, LWN , KernelTrap etc.)
  3. Forums (Interesting google groups, etc)
  4. OS development resources (NonDot, Sandpile, x86 etc)
  5. Bookmarking sites (Reddit, Del.icio.us)
  6. Free web hosted books (Linux Device Drivers, OpenBookProject etc.)
  7. Document hosting sites (Scribd, Wikipedia, Linux HOWTOs etc.)
  8. Blogs and personal websites hosting useful programming information (Robert Love, Ulrich Drepper etc.)
  9. University courses available online and useful for UNIX programmers (MIT Open Courseware etc.)
  10. Application hosting/indexing websites (Sourceforge, FSF etc.)
  11. Conferences (USENIX, Linux Conferences etc.)
  12. Miscellaneous pages

Can I put this search engine on my own website?

Yes, you can easily do that. The search engine hosted on this website is a linked CSE. Another flavor of it called the stored CSE, is hosted in Google’s databases. The differences between the two flavors have been detailed later on in the post. You can easily add the stored CSE flavor to your iGoogle page as a gadget. You can download this code to put the 42klines search engine on your blog or website. Customize the look in whatever way you want. The search results are hosted on a page on this website, because that page requires another snippet of code from Google. If you want to host the results on your website, let me know. I’ll provide the code necessary to do so. You can skip the rest of the post, if you are not interested in knowing how the search engine works. If you want to add your own bookmarks useful for UNIX programmers in the 42klines search engine, read on. A few useful resources are listed at the end of this post.

Can I add my own bookmarks to 42klines?

Whenever I find good links or websites useful to me as a UNIX programmer I plan to add them to this search engine, for everyone’s benefit. The list of websites which are currently indexed can be given to Google in an annotation file in the XML format. The annotation file for 42klines search engine is hosted in a Subversion repository: http://svn2.assembla.com/svn/42klines_search on Assembla. Assembla hosts subversion repositories for projects. If you are interested in adding more links to 42klines, send a mail to me at sudhanshu.goswami at 42klines dot com. I’ll send an invite to you from Assembla. Checkout the 42klines search engine’s websites list by running this command:

svn checkout http://svn2.assembla.com/svn/42klines_search

If you prefer GUIs, you can also use RapidSVN on Linux to do the same. The 42klines search engine on this website is a linked CSE. It has a stored CSE flavor as well. The difference between the two flavors are detailed in the next section. List of websites to be searched are maintained in a different way for each flavor. Going forward, I plan to update the linked CSE first, while periodically bringing the stored CSE in sync with it. I maintain two flavors because, it is easy to add the stored CSE to iGoogle as a gadget.

Custom Search Engine Flavors

The table below describes the differences between a linked CSE and a stored CSE.

Stored Custom Search Engine Linked Custom Search Engine
Can be built using wizards hosted here. Metafiles can only be created manually.
Websites searched are stored in Google's database. Websites searched are stored in an annotation file hosted on your server.
Websites added to search engine database get immediately reflected in the search results. Websites added to annotation files will get reflected in the search results on the next refresh by Google. To immediately refresh or test annotation file, you can use this tool.
Maximum number of sites = 5000. Multiple annotation files allowed. Each file's max size = 3MB. Total file sizes <= 10 MB.
Get their own Google hosted web pages like this. No home page for a linked CSE created on Google. You can create your own home page for it.
People can volunteer to contribute from a stored CSE's home page. This option is not available for a linked CSE.
Restricted in number of things possible. Be creative. You can customize your annotation files on the fly. How? You can switch from a stored CSE to a linked CSE like this.
Google provides links to add this kind of an engine easily to your blog or iGoogle home page. E.g. use this to add it to your iGoogle page. Linked CSE has to be manually added to a website. E.g. Linked CSE flavor of 42klines search engine can be added by downloading and adding this piece of code to your website.

Burning your own fingers

This section is just a blurb about things to know, while working with Google’s custom search engine. I’ll list them down pointwise.

  1. Opera’s latest version does not seem to be supported. Some features like saving options for the search engine worked, but the “Save” button got permanently disabled after saving. These kinds of problems may occur if you are using uncommon browsers. YMMW.
  2. I tried to replace the context file of the stored search engine with that of the linked search engine using the Advanced tab of the search engine’s wizard interface, however it did not work. So, no home page for the linked CSE could be created on Google.
  3. If you are not trying to customize a search engine in non-traditional ways, and just want a search box for your blog/homepage, you are better off sticking to a Stored custom search engine. However, if you have got special needs or have more than 5000 websites to search, you’ll have to use a linked search engine.
  4. Google’s custom search engines can be customized to a great extent to give highly targeted results. This can be achieved by assigning topics to websites and labeling them. Labels can be used to tweak the search results in the favor of websites stamped with a particular label or completely provide search results only from websites stamped with that label. Further a boost factor can be associated with websites to boost search results from them. You can refer to this CSE glossary, if you are having trouble following these terms.
  5. Google’s management interface for stored CSEs does not provide the ability to assign labels, boost strengths for some websites, add filters, created nested search engines etc. You can do all of these with stored CSEs, but you will have to first download the annotation file for the stored websites and the context file for your stored search engine. Then you will have to edit them manually and upload them. This can be done from the Advanced tab of the management interface.

Resources

42klines CSE: Download code to put on your website here
42klines iGoogle gadget: Add this search engine to your iGoogle page
42klines subversion repository
Coopdir: Directory of custom search engines.
GooglePicks: Picked custom search engines by Google.
RubyCorner: A custom search engine for Ruby programmers.
Python CSE: A custom search engine for Python programmers.
Linux: A custom search engine for linux users created by a sysadmin.

Update: Some cleanups done to the post. Added a table of contents, but unfortunately the anchor links did not work as expected. Still trying to figure out how to fix this. [Mar 2: Fixed. At the cost of breaking previous permalinks. Please update any bookmarks to permanent links. This site is going through some initial growing pains.]