Being aligned with your work as a programmer or software engineer. It is a good article about acceptance.
Web-scraping may have several meanings so I will just define it in the context of this article so as not to confuse you, the reader. It is to take some specific information available when you visit a specific webpage, and do something with it. Things that are intesting to a person when you go to a website may be:
Well, if they are interesting to a person, maybe they would be interesting to a program too. As an example, go to this URL: http://finance.yahoo.com/q?s=msft&d=v1
The returned page has lots of information available regarding the Microsoft stock. The specific information that your application needs depend upon you. It can be a piece of relevant text (current stock price, volume information, amount changed), a list of links with the latest press releases about Microsoft, or anything else interesting returned by the Yahoo server. Assuming that we are interested in the stock price, various methods of text processing can be used to extract the number that we are interested in. From that number, we can go on to more fully utilize our computer to perform many other calculations and presentations on our desktop.
Of course, these companies presenting the information intend for the public to access the information by hand (i.e. go to the site with your web browser, surf to the page, access the information as well as all the advertisement that accompany it). This is all fine and dandy if all we want is a handful of data (count your fingers)... but what if we want a basket-ful? Then what about the occational.... carload-ful? That's a job for the computer...
Ok, the typical pundit would say that this is the perfect example for using "web-services". I have touched upon web services a little a year ago. True, indeed, if the company exports that specific interface for that specific data that resides on that specific webpage. As of this writing, I still haven't found enough useful web-services to comment on. However there currently exists many sites that provide useful enough information not to ignore. Let me just rattle off a few besides the Yahoo finance:
In truth, the services provided by websites exists in their human accessible form (HaF). The "web services" proponent push for their computer accessible form (CaF). My contention is that the time-gap in order to have access to the HaF is too great to ignore. By the time CaF is made available, opportunities may have disappeared. Web Scraping is the alternative that can exist in the meantime. It creates an intermediary CaF from an existing HaF. I admit that the algorithm to extract the data is fragile and is broken once someone changes format of the webpage output. But if you can accept this drawback, you can make the web work for you today. Not tomorrow, next week, or next year.... now.
As a developer using the Python language, I find the ASPN Python Cookbook to be a valuable resource for information about a variety of things that you can do. The site mainly contain text recipes about how to perform certain tasks using Python. It is decently designed so that getting around isn't too much of a problem. Unfortunately, using the site IS very much a struggle. It is so slow and when using it, I can barely tolerate just a few clicks. Such a pattern of "difficult to use" sites are becoming common. Another example is Zope.org , which ranks right up there in terms of containing useful information.
I wonder what could be the culprit for the sites being slow? Is it because the application servers being used are too resource intensive that it takes longer to respond to a page request? Is it because the servers being run are too under-powered because the owners cannot afford a better server? Whatever the reason, it is a shame because there are enough users that patronize these two sites that make it worth while to give them a better option. Has anyone thought about mirroring these sites? This is a useful (relatively low-tech) pattern that is increasingly being used on the internet. It is one of the many ways which edge networks can be implemented in a low-cost manner.
As Kevin mentioned in the comment for "Class Browser" article, CTAGS would be nice. However, the last I came across CTAGS was 10 years ago when I was still using VI. Call me lazy but I have grown used to the full featured scrolling mouse text editor with syntax highlighting and a little code-completion. There is just enough memory of VI for me to do simple editing when I log into my Linux Host. Well, the more I talk about it, the more I do appreciate the completeness of Visual Studio.
Continuing in the same thought process, I have just completed reading Patrick O'Brien's article on introspection. Really interesting stuff and really impossible (or at the very least, extremely difficult) to do in C++. This is the type of information required to start building a class-browser. Hum.... Additionally, there is the pyclbr module which provides the basic mechanisms for implementing your own class-browser.
What was that saying? Necessity is the mother of invention.....?
After getting some pointers from folks in the comp.lang.python newsgroup, I decided to do some investigation on the various editor/developer tools for Python programming. The original request and replies are here. I concentrate on the suggested tools first and will only mention the class browsing aspect.
There are at least twice as many more tools that allow for editing and various other levels of introspection. However, my original question remains and the pattern of doing development goes unanswered in Python. Sniff++ was the first tool that allows me to use that pattern to read/understand/write large bodies of code. Visual Studio provided the mechanism to make MFC's large class library understandable. They make large projects manageable. The way each of these tools did it was that they abstracted out the File concept. It didn't matter what File, the attribute or method was in, the tool would take me to it.
The pattern works very well.... now if only I can find a Python tool that implements it.
These are some good tools I have recently come across:
I have been looking for a Python Class Browser for some time but have not come across a decent one. Despite all the bashing of Microsoft technologies, it must be admitted that Visual Studio is the premier tool of choice for developers. I find myself entirely dependent and hooked on its integration of a class browser when doing development with MFC. Why is a class browser such a powerful tool? Well, it allows the developer to abstract his thinking beyond the line-by-line code. In an object oriented world, we will assume the lower dependencies work. Just like we assume a nail and hammer works when we are trying to put together a cabinet or something even bigger (a house).
So once it is understood that our dependencies work, we just put it aside and concentrate on the problem we are concerned with. The class browser allows us to collapse the class and work with those classes which we are concerned with. Additionally, the class list display is not constrained by only displaying classes within one file. It displays the entire class hierarchy of a particular project. I know, some of you may say "what's the big deal about manually openning up a different file". Not much, actually. But it is an impediment in the thought process. If you do it enough time, all those little impediments add up to be a big stumbling block of creating your solution.
If anyone has been using such a tool for Python, please drop me a tip.
A favorite read of mine, Dave Roger's Time' Shadow (which I have been following for quite some time), talks about Individualism. True individualism in America is very difficult. It offers no social rewards and exacts a great amount of personal sacrifice. Its pursuit is more difficult than swimming upstream. We are always bombarded with onslaughts of advertisements and various tools of social manipulations that having even a handful of personal thoughts a day is difficult.
Being situated in Florida (near Cocoa beach, I think), Dave gets to see the eastern sunrise often. Sometimes it's the smallest of things that help us in the road to find ourselves.
I tried out Sam Ruby's 3 pane aggregator the other day. It worked and is a fairly good demonstration of the 3 pane concept. I was using wxPython 22.214.171.124 . Today I upgraded to wxPython 126.96.36.199 and ran the application again. There were some deprecation warnings about using True rather than true. Good.... wxPython was using a new built-in type. But then the application crashed 'UnregisterClass(canvas)'. Umm, this was bad. After looking at the source-code, I didn't see anything significant that was changed. Maybe it is something in "Unicode". So I tried the unicode version to no avail. I backed off to 188.8.131.52 and everything is fine-and-dandy. Ok, the experience was unpleasant and silly but it does tell me one thing I should keep in the back of my mind. The old adage of "release early and release often" widely proclaimed by open source advocates has a very sharp double-edge. The other edge which stings is that the released software tends to be not thoroughly tested. Additionally, there is practically none or minimal regression testing that is done. Regression testing implies that which worked yesterday should continue working today. There is a lesson here that the open source developers can learn from commercial software developers. Breaking customer code that use your product as its dependency is a devastating thing. In the commercial world, the managers make sure you don't alienate customers by doing so. If you tread forward with this point-of-view, maybe you would think twice about having so many releases.
I tried out Sam Ruby's 3 pane aggregator the other day. It worked and is a fairly good demonstration of the 3 pane concept. I was using wxPython 184.108.40.206 . Today I upgraded to wxPython 220.127.116.11 and ran the application again. There were some deprecation warnings about using True rather than true. Good.... wxPython was using a new built-in type. But then the application crashed 'UnregisterClass(canvas)'. Umm, this was bad. After looking at the source-code, I didn't see anything significant that was changed. Maybe it is something in "Unicode". So I tried the unicode version to no avail. I backed off to 18.104.22.168 and everything is fine-and-dandy.
Ok, the experience was unpleasant and silly but it does tell me one thing I should keep in the back of my mind. The old adage of "release early and release often" widely proclaimed by open source advocates has a very sharp double-edge. The other edge which stings is that the released software tends to be not thoroughly tested. Additionally, there is practically none or minimal regression testing that is done. Regression testing implies that which worked yesterday should continue working today.
There is a lesson here that the open source developers can learn from commercial software developers. Breaking customer code that use your product as its dependency is a devastating thing. In the commercial world, the managers make sure you don't alienate customers by doing so. If you tread forward with this point-of-view, maybe you would think twice about having so many releases.
It's March of 2003 so the flurry of predictions for the new year should be over. However, there is a good write-up of some trends about the IT industry and observations about the directions of Microsoft. For those who work in the Tech industry, it helps to be aware of some of the long term trends in the field which you have chosen for your career. It does talk about the commoditization of hardware and software. Additionally, with globalization, most of today's entry level work will be farmed to higher-educated, lower cost, third world countries.
On the software side, things have even become more gray. Linux and Open-Source can save you tons of money (if you know how to make it work). The Internet and web-browsers have become like the electricity or water in your house. It is increasingly becoming people's main link to information outside their homes. HTML, HTTP, sockets, NNTP, FTP, as well as other common IP based protocols are very good and useful to know. I have been on the sidelines about XML for 3 years and I think I will still stay there until something changes.
Now without further embellishments, here is the link to the article.
"The Quiet American" finally opened in Vegas last week. It only seemed to be playing at the Brendan Theatres inside the Palms. I hurried to go watch it because serious movies don't tend to stay in the theatres long.
This is especially true when it opened near the release dates of Jet Li's new movie as well as DareDevil. I liked the film very much and highly recommend that you catch it if it is playing at your nearby theatre.
The story takes place during the Vietnam war, between the years 1960 and 1966. It presents a snapshot of the political struggles during the period when the French were being pushed out of Vietnam and the entry of the Americans. For the older folks or those who have a sense of Vietnam history, the scenes of the older Vietnam will be very nostalgic. Additionally, there are various perspectives about the Americans, British, and French involvement in the Vietnam struggle.
The filming took place in Hanoi in order to portray some historic scenes. The actual story mostly takes place in Saigon and provides a snapshot of some of the major events that happened during that period.
If you do get a chance to see it in the theatres, please don't miss it. The scenery is wonderful and watching it on the small screen won't be quite the same.