Data Mining for Real Estate Within the Web 2.0 Context
jwubbel
Prevalent discussions have been taking place within the web development community among technicians, programmers and administrators for some time now regarding various methods of re-purposing harvested data from web sites. Screen scraping is the term used in the popular press as well as in the wide circle of folks working on Web 2.0 technologies to explain a form of mining data from web pages, particularly those pages containing dynamic presentation of data that otherwise resides locked up in someone else’s proprietary relational database. And what better industry than the real estate business could such an application be used given the data intensive nature of the inventory.
Most of the technical talk is beyond any usefulness to the average investor, broker, developer or seller. Thus, the complexities of screen scraping is not part of this discussion. Screen scraping is a means to another end. The most common end result is to data mine multiple web sites and aggregate the data. Taking the information and putting it into another relational database (i.e., refactoring) may not have much added value for real estate professionals as it lacks expressiveness and may appear to be redundant as syndicated news as hosted on legacy or non-Web 2.0 infrastructures.
Within the context of Web 2.0 technologies available today, screen scraping is being used because it allows you to turn a regular web page into a regular web page plus semantic data. It is said that it “frees the data from the page/site that contains it” and this is important as the main advantage is it makes machine processing of that data a lot easier. For the real estate professional, the only real thing that counts is presentation. And now, manipulation of this data in ways not previously considered is very appetizing. The term “semantic data” at the user level simply means this information is now more meaningful.
At MIT, research on the Simile project (http://simile.mit.edu) has opened this world up. Data retrieved from a web page is transformed into a format that is understood by the browser. Using the Firefox Browser and its powerful extension capability, the Simile “Piggy Bank” extension now allows you to view this data in its most “pure” form. If you have ever figured out how much time you spend on the bulletin board property listing services filling out search criteria forms, you immediately understand the limitations of traditional relational database models. There is no semantic expressiveness to speak of, but in the Web 2.0 development realm, the idea of liberating data gives us 3 basic mechanisms of abstraction - aggregation, classification and generalization.
When I am mining for commercial real estate listings, I am building Property Banks transforming them into the format that Piggy Bank understands. By building Property Banks I am aggregating from many web sites. During this period I am not so much interested yet in the analytical value arrested in the data. Real estate listings found on the dedicated listing boards tend to lack detail, accuracy and suffer from over aging. Typically, I look for the independent broker, Mom and Pop Agencies and last but not least international property listings. Once you make your bank deposits, what you automatically inherit is classification. You will see this when your Property Bank loads into Piggy Bank. This is the focal point for the real estate professional or investor. Searching the bank is intuitive and as easy as browsing without racking your brain for the perfect query criteria. Understand, you are no longer chained to a database. You will immediately be able to drill down to the properties containing the features and attributes most important or intuitive to you or your client because of classification. You would be able to view comparable listings and know their locale all without entering a single search criteria.
Given the data is free of a structured web page, it is machine processable and within Piggy Bank it gives you some very powerful search capability against the Property Bank for analytical work. Property Banks can be private, made public or shared for collaborative activities. Typically brokers complain about having to spam their listings to a dozen different listing services in order to get the most exposure. With a public Property Bank, a listing would only need to be posted once. Why? Because Property Banks can be combined. As Property Banks become more numerous and you surf around the Internet with Piggy Bank visiting Property Banks, do a search on the bank to find a candidate property for a client. Piggy Bank allows you to save one or many properties to your own private bank on the fly. Your interest or specialty may be with industrial manufacturing facilities so consequently your bank inventory of properties is customized for your client list.
While data mining web pages is a good illustration here, does it make for a viable business model? Is it really practical for the average real estate agency? In reality anyone can RDF enable their property listings and embed them on their web site for a Semantic Browser like Piggy Bank. But once again when new technology becomes consumable for and by consumers, Semantic Banks are extensible. Let me give you an example of my vision or thinking with respect to a practical model.
At PropertyClubPro.com our business objective is to make it easy for people to list and manage their properties for sale or lease in a standard relational database. The reason for this input provision is it allows for the accurate capture of as much data as possible on a listing. Unlike residential property, commercial properties can be many times more complex to market and the documentation can be staggering. Take for instance the sale of a food manufacturing facility. It probably comes with “As-Builts”, Qualification and Validation documentation, auto cad drawings such as engineering isometric piping diagrams for Pure Water, WFI and Clean Steam, etc., any data crucial to investment analysis and buying decisions. Instead of screen scraping, we are scraping the ClubPro database to build and update a wide variety of Property Banks on a real-time basis. It would probably make more sense for real estate agencies to host their properties in larger Property Bank repositories. And as Property Banks proliferate, Semantic Banks can be used to map available repositories by states, region or international boundaries which is exactly what I mean by the extensibility of the Web 2.0 technology.
One would hope through the use of these tools investors would find the best investments or enable sellers to market more efficiently and quickly. Corporate real estate departments would have better control over facilities data management from acquisition to disposal. Sellers could transfer the data asset from the Property Bank to the buyer at the closing. In the very least, owners would maintain ownership and control over their property listing data and with any luck at all professionals in the industry will not be hammered by the extra net commercialism found on every listing board with the coming liberation of real estate data. It would seem more like an in-house application and that in itself is worth a million or two.
Posted in Uncategorized |
No Comments »