March 28, 2003

Writing code

Being aligned with your work as a programmer or software engineer.  It is a good article about acceptance.

March 25, 2003

Web-Scraping is still a valuable skill despite what the pundits might say

Web-scraping may have several meanings so I will just define it in the context of this article so as not to confuse you, the reader.  It is to take some specific information available when you visit a specific webpage, and do something with it.  Things that are intesting to a person when you go to a website may be: 

  • Text embedded within a specific table or other text (most of the time, this is what you would like to scrape)

  • Hyperlinks provided by the access of this page.  They can be static or dynamic.

  • Dynamically generated graphics.  It could be charts or maps created from a provided input.

Well, if they are interesting to a person, maybe they would be interesting to a program too.  As an example, go to this URL:

The returned page has lots of information available regarding the Microsoft stock.  The specific information that your application needs depend upon you.  It can be a piece of relevant text (current stock price, volume information, amount changed), a list of links with the latest press releases about Microsoft, or anything else interesting returned by the Yahoo server.  Assuming that we are interested in the stock price, various methods of text processing can be used to extract the number that we are interested in.  From that number, we can go on to more fully utilize our computer to perform many other calculations and presentations on our desktop. 

Of course, these companies presenting the information intend for the public to access the information by hand (i.e. go to the site with your web browser, surf to the page, access the information as well as all the advertisement that accompany it).  This is all fine and dandy if all we want is a handful of data (count your fingers)... but what if we want a basket-ful?  Then what about the occational.... carload-ful?  That's a job for the computer... 

Ok, the typical pundit would say that this is the perfect example for using "web-services".  I have touched upon web services a little a year ago.  True, indeed, if the company exports that specific interface for that specific data that resides on that specific webpage.  As of this writing, I still haven't found enough useful web-services to comment on.  However there currently exists many sites that provide useful enough information not to ignore.  Let me just rattle off a few besides the Yahoo finance:

  • Getting SEC Filing information (10-K/8-K) from Edgar:  Interesting bits would be the number of shares outstanding embedded within the 10-K.

  • Google searches returning the number of hits for your query.

  • Reading Yahoo groups without the advertisement banners.

  • Tracking packages - UPS, FedEx, USPS

In truth, the services provided by websites exists in their human accessible form (HaF).  The "web services" proponent push for their computer accessible form (CaF).  My contention is that the time-gap in order to have access to the HaF is too great to ignore.  By the time CaF is made available, opportunities may have disappeared.  Web Scraping is the alternative that can exist in the meantime.  It creates an intermediary CaF from an existing HaF.  I admit that the algorithm to extract the data is fragile and is broken once someone changes format of the webpage output.  But if you can accept this drawback, you can make the web work for you today.  Not tomorrow, next week, or next year.... now.


March 18, 2003

Useful sites that are Unusable

As a developer using the Python language, I find the ASPN Python Cookbook to be a valuable resource for information about a variety of things that you can do.  The site mainly contain text recipes about how to perform certain tasks using Python.  It is decently designed so that getting around isn't too much of a problem.  Unfortunately, using the site IS very much a struggle.  It is so slow and when using it, I can barely tolerate just a few clicks.  Such a pattern of "difficult to use" sites are becoming common.  Another example is , which ranks right up there in terms of containing useful information.

I wonder what could be the culprit for the sites being slow?  Is it because the application servers being used are too resource intensive that it takes longer to respond to a page request?  Is it because the servers being run are too under-powered because the owners cannot afford a better server?  Whatever the reason, it is a shame because there are enough users that patronize these two sites that make it worth while to give them a better option.  Has anyone thought about mirroring these sites?  This is a useful (relatively low-tech) pattern that is increasingly being used on the internet.  It is one of the many ways which edge networks can be implemented in a low-cost manner.

March 14, 2003

More class browsing

As Kevin mentioned in the comment for "Class Browser" article, CTAGS would be nice. However, the last I came across CTAGS was 10 years ago when I was still using VI. Call me lazy but I have grown used to the full featured scrolling mouse text editor with syntax highlighting and a little code-completion. There is just enough memory of VI for me to do simple editing when I log into my Linux Host. Well, the more I talk about it, the more I do appreciate the completeness of Visual Studio.

Continuing in the same thought process, I have just completed reading Patrick O'Brien's article on introspection. Really interesting stuff and really impossible (or at the very least, extremely difficult) to do in C++. This is the type of information required to start building a class-browser. Hum.... Additionally, there is the pyclbr module which provides the basic mechanisms for implementing your own class-browser.

What was that saying? Necessity is the mother of invention.....?

March 13, 2003

Python class browsers

After getting some pointers from folks in the comp.lang.python newsgroup, I decided to do some investigation on the various editor/developer tools for Python programming.  The original request and replies are here.  I concentrate on the suggested tools first and will only mention the class browsing aspect.

  • IDLE - this is the tkinter based tool that comes with Python.  You can browse the python path for the classes that you are interested in.  Upon drilling into a specific method, a new instance is created allowing you to edit the source code of that class.  This is very nice and the tool is lightweight enough to be very quick.  However, the editor does not have code-folding.  Additionally, you can quickly end up with many windows on your desktop.

  • ERIC3 - I never got this to run because it was based on PyQT.  I downloaded it, it required me to install QT.  I got QT and it was an evaluation and I don't know how long it will last.  It is questionable to me whether QT has enough of a following for me to try out a tool that is 2 dependencies away.  In short, I gave up trying to get it to run.

  • pyCrust - This tool allows me to inspect modules/classes loaded at realtime in the shell.  As you import more modules, it is automatically added to the Ingredients pane and ready to be inspected.  However, pyCrust is an exploratory tool and does not launch an editor upon choosing a method or attribute to drill into.

  • pythonwin - this is Mark Hammond's integrated editor and shell.  The editor is a notepad with code-folding.  It has a COM browser, file browser, and a realtime object browser like pyCrust.  There are lots of details revealled in the introspection, but it doesn't jump to your code upon drilling down.

  • Boa - this class browser maps out wxPython's class hierarchies but doesn't map to code.

  • Komodo - promises that it will be in a future release.  When will it be and what will be in it?  only the shadow knows.

There are at least twice as many more tools that allow for editing and various other levels of introspection.  However, my original question remains and the pattern of doing development goes unanswered in Python.  Sniff++ was the first tool that allows me to use that pattern to read/understand/write large bodies of code.  Visual Studio provided the mechanism to make MFC's large class library understandable.  They make large projects manageable.  The way each of these tools did it was that they abstracted out the File concept.  It didn't matter what File, the attribute or method was in, the tool would take me to it.  

The pattern works very well.... now if only I can find a Python tool that implements it.

Posted by Hoang at 03:34 PM | 4 Comments | Python Programming

Software tools

These are some good tools I have recently come across:

I have been looking for a Python Class Browser for some time but have not come across a decent one.  Despite all the bashing of Microsoft technologies, it must be admitted that Visual Studio is the premier tool of choice for developers.  I find myself entirely dependent and hooked on its integration of a class browser when doing development with MFC.  Why is a class browser such a powerful tool?  Well, it allows the developer to abstract his thinking beyond the line-by-line code.  In an object oriented world, we will assume the lower dependencies work.  Just like we assume a nail and hammer works when we are trying to put together a cabinet or something even bigger (a house).

So once it is understood that our dependencies work, we just put it aside and concentrate on the problem we are concerned with.  The class browser allows us to collapse the class and work with those classes which we are concerned with.  Additionally, the class list display is not constrained by only displaying classes within one file.  It displays the entire class hierarchy of a particular project.  I know, some of you may say "what's the big deal about manually openning up a different file".  Not much, actually.  But it is an impediment in the thought process.  If you do it enough time, all those little impediments add up to be a big stumbling block of creating your solution.

If anyone has been using such a tool for Python, please drop me a tip.

March 11, 2003


A favorite read of mine, Dave Roger's Time' Shadow (which I have been following for quite some time), talks about Individualism.  True individualism in America is very difficult.  It offers no social rewards and exacts a great amount of personal sacrifice.  Its pursuit is more difficult than swimming upstream.  We are always bombarded with onslaughts of advertisements and various tools of social manipulations that having even a handful of personal thoughts a day is difficult.

Being situated in Florida (near Cocoa beach, I think), Dave gets to see the eastern sunrise often.  Sometimes it's the smallest of things that help us in the road to find ourselves.

March 10, 2003

Just say NO to Release Early, Release Often

I tried out Sam Ruby's 3 pane aggregator the other day.   It worked and is a fairly good demonstration of the 3 pane concept.  I was using wxPython .  Today I upgraded to wxPython and ran the application again.  There were some deprecation warnings about using True rather than true.  Good.... wxPython was using a new built-in type.  But then the application crashed 'UnregisterClass(canvas)'.   Umm, this was bad.  After looking at the source-code, I didn't see anything significant that was changed.   Maybe it is something in "Unicode".  So I tried the unicode version to no avail.  I backed off to and everything is fine-and-dandy.

Ok, the experience was unpleasant and silly but it does tell me one thing I should keep in the back of my mind.  The old adage of "release early and release often" widely proclaimed by open source advocates has a very sharp double-edge.  The other edge which stings is that the released software tends to be not thoroughly tested.  Additionally, there is practically none or minimal regression testing that is done.  Regression testing implies that which worked yesterday should continue working today.

There is a lesson here that the open source developers can learn from commercial software developers.  Breaking customer code that use your product as its dependency is a devastating thing.  In the commercial world, the managers make sure you don't alienate customers by doing so.  If you tread forward with this point-of-view, maybe you would think twice about having so many releases.

March 07, 2003

Trends in Technology

It's March of 2003 so the flurry of predictions for the new year should be over.  However, there is a good write-up of some trends about the IT industry and observations about the directions of Microsoft.  For those who work in the Tech industry, it helps to be aware of some of the long term trends in the field which you have chosen for your career.  It does talk about the commoditization of hardware and software.  Additionally, with globalization, most of today's entry level work will be farmed to higher-educated, lower cost, third world countries. 

On the software side, things have even become more gray.  Linux and Open-Source can save you tons of money (if you know how to make it work).  The Internet and web-browsers have become like the electricity or water in your house.  It is increasingly becoming people's main link to information outside their homes.  HTML, HTTP, sockets, NNTP, FTP, as well as other common IP based protocols are very good and useful to know.  I have been on the sidelines about XML for 3 years and I think I will still stay there until something changes.

Now without further embellishments, here is the link to the article.

March 03, 2003

The Quiet American

"The Quiet American" finally opened in Vegas last week. It only seemed to be playing at the Brendan Theatres inside the Palms. I hurried to go watch it because serious movies don't tend to stay in the theatres long.
This is especially true when it opened near the release dates of Jet Li's new movie as well as DareDevil. I liked the film very much and highly recommend that you catch it if it is playing at your nearby theatre.

The story takes place during the Vietnam war, between the years 1960 and 1966. It presents a snapshot of the political struggles during the period when the French were being pushed out of Vietnam and the entry of the Americans. For the older folks or those who have a sense of Vietnam history, the scenes of the older Vietnam will be very nostalgic. Additionally, there are various perspectives about the Americans, British, and French involvement in the Vietnam struggle.

The filming took place in Hanoi in order to portray some historic scenes. The actual story mostly takes place in Saigon and provides a snapshot of some of the major events that happened during that period.
If you do get a chance to see it in the theatres, please don't miss it. The scenery is wonderful and watching it on the small screen won't be quite the same.