Can You Leave the OS Intact While Wiping the Hard Disk of Data?

Can You Leave The OS Intact While Wiping The Hard Disk of Data?Now what? Your computer needs to be replaced, but you know that there people out there who would love to get their hands on the data you have saved on your old hard disk. Sadly, this is a problem that we will all be subjected to at some point, whether it’s due to our computer’s age or the desire to replace an older personal computer with a newer toy. In replacing our older systems, however, there is no need to fear, because the way to keep your information out of the hands of strangers is actually fairly simple. You just have to destroy the hard disk and make it unusable. One of the simplest tricks — and one that I have used — is to pound a screwdriver through the unit using a hammer.

However, you can’t use this simple trick if you are one of the many who want to sell or donate your older unit. If this is your intent, you will want to clear the hard drive of any personal information / data that you have stored there. The procedure that you choose to use to achieve this goal will most likely depend on the type of information you have stored on your computer and how sensitive the information is. For example, if you store your tax returns and file them via electronic means, you know that you don’t want these documents to become public. You should also be concerned about any credit card information, social security numbers, dates of birth, birth locations, phone numbers, or addresses you may have stored; any or all of these could cause you a real migraine if they fell into the wrong hands. But what about those individuals who only have email addresses and other non-personal information stored on their system?

I only have non-personal information or email addresses stored on my system. How do I get rid of this stuff?

First off, it is unlikely that anyone reading this fits into this category since you are currently surfing the Web and have therefore left behind a trail of sites you have visited. These sites are recorded in the hard drive, leaving a history of files that are stored in the cache memory. If you don’t think this applies to you, the next reason this probably won’t work for you is that, if you have installed any personal software, you were most likely required to enter a registration or serial number. This would be applicable to anyone who has stored applications on their computer system itself rather than storing them on the cloud. The bottom line is that there is much more information stored on your computer than you might think.

However, if you are convinced that your system contains no personal information that could come back to bite you in the rear later, here are few programs that you can try.

  • Revo Uninstaller: To uninstall software programs from your computer, I would recommend trying Revo Uninstaller. Revo is a free download that will completely remove all traces of a software program from your system. In addition, Revo Uninstaller has additional tools that will enable you to clean up your browsing history as well as tools that will remove files and documents created using Microsoft Office.
  • CCleaner: Another free program that will remove a lot of the gunk and junk is CCleaner. This software has been around for years and has become a staple in any good geek’s arsenal of software products. It will remove all of the leftovers left behind on a system, including the junk that builds up over time in the Windows registry.
  • Remove all user accounts except Administrator: This process for deleting user accounts off of a Windows system will remove all of the settings you have previously set up on your personal computer. Microsoft has set up a website that describes the procedure for various operating systems. If you follow these instructions exactly, as described, all should be OK when you are done performing the task. Here’s how to delete user accounts by Microsoft.
  • Turn off system files: I would also recommend turning off system files such as Restore, Paging File, and Hibernation (if used) to delete any personal information that could be stored on your system.

The problem with using this type of an approach is that there is still the possibility of leaving behind some personal information that the next owner can access. Sadly, there is no 100% guaranteed way to clean off a Windows system while attempting to leave Windows intact. In other words, I believe that the only true way to clean a personal computer system is to wipe the drive and reinstall the operating system. Notice I said operating system versus a reinstall of Windows. The reason I say this is because some of you who read this may not be able to find your Windows CD, DVD, or other media you were supposed to make when you bought your PC. This latter is often the case since it is not uncommon for new computers to arrive with no installation disks, making it the responsibility of the new owner to make their own restore disks. There is a possibility that the information you need to make your own restore disk is still on your computer. This information will allow you to make your own recovery disk, including the OS (prior to cleaning the system of your old files).

In order to completely clean a hard disk and install a clean copy of the operating system, I would recommend you use Darik’s Boot and Nuke for CD or DVD. This free program will completely erase your hard drive so that a clean copy of the OS can be installed. When you use this program, no personal information will be left behind on the hard disk.

I can’t find my Windows disks and I can’t locate any recovery options on my PC. What do I do now?

If this is where you find yourself, you may wish to leave the hard disk clean (without an OS), or you may wish to install one of the free copies of Linux that is available at numerous sites on the Internet.

Comments, as always, are welcome.

CC licensed Flickr photo above shared by Dread Pirate Jeff

HD DVR Boxes and HD Receivers Use Energy when Turned Off

HD DVR Boxes and HD Receivers Use Energy when Turned OffI always believed that when you shut down an appliance, whether a computer or other electrical device, that it ceased to use power. However, this is not the case as I found out when reviewing my energy output from my two DirecTV receivers (one of which is an HD DVR box and the other is an HD receiver). So when I read the information from the NRDC (National Resource Defense Council), on how much energy these types of set-top boxes use, I was surprised. I was surprised mainly by the fact that these set-top boxes use more energy than a new 21 cubic foot refrigerator / freezer combo that we know runs continuously. This info set me to considering how a turned off unit can use more energy than one that is running continuously.

Among the interesting facts released by the NRDC is that set-top manufacturers like Cisco, Samsung, and Motorola are ambivalent to the fact that the devices they are producing use so much energy. The NRDC estimates that there are approximately 160 million TV set-top boxes in the U.S. requiring a minimum of nine coal fired power plants just to power these boxes. That means that US consumers are shelling out approximately $3 billion a year just to keep our set-top boxes running.

One disturbing issue considering the cost of energy is that cable and satellite companies require these set-top boxes to be on 24 x 7 for their convenience in updating software as needed. This means that even when we think the boxes are off, they are not off, but on standby — thus requiring that power be available to them. What is also surprising is that this stand-by mode requires the same amount of power, whether or not we are using the receiver.

While this should come as no surprise, some states, like California, have begun to address these issues in regards to other devices, like televisions in an effort to save energy. But to date, these same efforts have not targeted set-top boxes. Obviously I, for one, would like to see cable or satellite companies forced into a position where they are required to take responsibility for the energy that these set-top boxes use and find a way to reduce the power that they require. If they would do this, not only would it help consumers reduce their energy cost, but it would also help us, as a nation, to reduce the carbon footprint we are leaving on the planet for future generations.

Read the complete NRDC report (in .PDF format).

comScore’s Device Essentials Program Deserves Industry Accolades

There should be an image here!If the title of this post caught your eye and you’re reading this right now, chances are that you spend enough time on the Internet to find value in knowing where its traffic originates and where it goes. A new program by comScore called Device Essentials, in my opinion, is about to shake up the technology industry. This program has been designed to monitor Internet traffic for not just computers, which still command a huge lead over mobile devices, but also for all mobile devices, including tablets, gaming devices, music players, mobile phones, eReaders, and all Web-based devices. Its importance cannot be underestimated as it will continually monitor Internet usage. Additionally, Device Essentials can accurately provide information from devices like the Apple iPad, which controls 89% of the worldwide tablet market and 97% of the U.S. tablet market.

This program is unique in that not only does it have the ability to monitor Internet traffic but can determine its country of origin. Even more amazing: if we use the statistics provided by comScore, we can conclude that we have not even reached what is being called the ‘Post PC Era.’ These statistics are based on the fact that desktop and laptop computers still comprise over 90% of devices connecting to the Internet worldwide, while in the US, desktops and laptops comprise 93.3% of all Internet traffic. In comparison, mobile phones make up only 4.9% of Internet connections, while tablets stand at 1.8%, and other devices make up a mere 0.3%. Given that information, I conclude that the personal computer is going to be around for a long time and that its death is highly exaggerated.

For additional information and to view the statistic tables that comScore has put together, visit the comScore Web site.

I would be interested in hearing your opinion on if you think that mobile phones, tablets, and other devices are just supplements to the traditional computer or if you believe that mobile devices can actually replace desktop and laptop computers.

Comments welcome.

Why Mobile Health Care Applications Should Be Regulated

Why Mobile Health Care Applications Should Be RegulatedThe next time you go to your doctor or dentist’s office, pay attention to the mobile device they are using. I know the last time I went to my doctor’s office, the nurse had a portable device in which she was recording my blood pressure, weight, and my reasons for seeing the doctor. I didn’t give it much thought until after I left and the examination was complete.

Before I proceed with my opinion, I would like to state that I believe we should have the least amount of intrusion by the government as possible. However, there are certain things, especially when it comes to our health, that need to be controlled and regulated by some type of governing authority. The FDA (U.S. Food and Drug Administration) has the responsibility to monitor the food that we eat and the prescription drugs that we take.

So why do mobile devices and their applications that are designed for health care professionals even need regulating? First of all, these applications need to be accurate in their record keeping abilities. The applications and the way the information is shared with other health care providers, or third parties, needs to be controlled. Our privacy needs to be protected by those who may use the data or personal information for other than its intended purpose.

Today it may seem like a non-issue when only vital statistics are saved onto the mobile device. In the future, one can see that more and more information will be stored as the applications improve. Security comes into question as to how safe our information will be from prying eyes. I would also be concerned as to how the insurance companies in the future could use this data to discriminate against those applying for health insurance.

I believe that the FDA must place requirements on application developers, not as to how their software functions, but on how this information will be stored and used. Can the application developers meet a simple security requirement to keep our data safe? In addition to safeguards to protect our data, we also need to protect a patient’s ID from being used by those who could use another’s insurance to pay for their own operations or medication.

Comments welcome.

Why We All Should Be Concerned About The Sony PlayStation Data Loss Incident

When I first read about the Sony PlayStation incident in which data theft occurred, I thought to myself it didn’t concern me. So what if some hacker obtained the names, addresses, email addresses credit card information of some 77 million users of Sony PlayStation online, since I don’t own such a beast. But as the week progressed and we learned more about the intrusion it became more apparent that this incident could have a far reaching affect for all of us.

Many of us, me included, use our credit cards for online purchases for everything from buying electronic equipment to buying applications for other devices we own. Though Sony has turned over the information about the break in to the F.B.I. and has warned Sony PlayStation users of the break in, what hasn’t been addressed as of yet is why the data was insecure to begin with?

This afternoon I sat down and added up the number of companies that have stored my informatoin on their servers. Information that could compromise my credit card account or my checking account. PayPal has both. I determined that there are 17 companies that have this information including my name, home address, email address, credit card account information and unfortunately my DOB and social security number, in some cases.

What has always amazed me is that social security numbers were never designed to be a secure means of identification. There was a time when it was stated on the social security card that it was not to be used to ID a person. But this has slipped through the cracks over the years and social security numbers are used to identify us. Armed with our DOB and social security number, thieves can steal our identity easily.

We should be concerned because it seems that whenever an incident happens in which consumer identification could be compromised along with credit information, little is ever done to hold the company responsible accountable. Telling the consumer to monitor your credit charges in case of illegal activity is like telling a murder victim to apprehend the suspect who killed them.

I believe that all companies who store our identification on their servers owe us the proper protections to keep prying eyes away from the information. Until this happens we are all at risk of identity theft.

Reinstall Windows Or Bamboo Under Your Fingernails – Which Would You Prefer?

I guess some things in life can bring grown men to their knees and tears to their eyes. This afternoon I was reading an article in the May 2011 issue of PC World magazine in which the procedure was described on how to reinstall Windows without losing data. Several thoughts flashed through my mind when I read the title. I thought to myself, is this like comparing having sex without getting pregnant? Can reinstalling Windows plus all of your data really be done in one day? What would hurt worse: reinstalling Windows or having bamboo shoved under your fingernails?

It has been about a year since I did a clean install of Windows on a computer system. It was an old Gateway laptop system that I formatted and did a clean install on. I didn’t need to restore data because I was selling the system. The system was running Windows XP and the restore process took the entire day. No, I wasn’t sitting watching the screen for the entire day. Most of what I needed to reinstall was either service packs, updates and revised hardware drivers.

The following day I reinstalled anti-virus software and an old copy of Microsoft Office that I was including in the sale price. I would say the entire process took about eight hours of my time. I would venture a guess that had I needed to reinstall my data the time would have taken at least double, or about 16 hours. However, my experience in reinstalling Windows usually takes me from four to five days for my own personal computer system. I have a lot of software on my system plus a ton of data, pictures, and other various things. Plus I take my time when I do a reinstall. I want my system to be 100% perfect when I am done.

Yet there was something that struck me as I read the article. Outside of the system I sold, I have not had to do a reinstall of any of my personal systems for at least five years or more. The computer gods have blessed me with a sound operating system, no viruses, and unblemished hardware. I guess I completely forgot about reinstalls until I read the article mentioned above. But just because I haven’t done a reinstall for many years does not mean I have forgotten the aggravation of past experiences.

So what has your experience been?

Comments welcome.

Your 3G Apple iPhone And Apple 3G iPad Are Recording Your Location And Time Stamp

Do you own an Apple iPhone or an Apple iPad? You may wish to be aware of a recent discovery that your location and time stamp are being recorded to a secret file. The discovery came from two researchers who noticed, after the introduction of iOS 4, the secret file has been recording data about user location as well as a time stamp. But no one knows exactly what Apple is going to do with the data. What is also alleged is that this data file is being stored in backups and is even present when stored to other devices.

The presence of the file presents security and privacy concerns that Apple hasn’t explained. The file is not encrypted nor is it protected from prying eyes. In addition, it is automatically synced with any of your other devices using iOS 4. In theory, if someone got a hold of your device, they can determine when and where you have been since the release of iOS 4. This information could be used by law enforcement to track your previous locations or by some other governmental agency. Even TSA could use the information to determine when and where you have traveled, even when not doing so by airplane.

The file is called consolidated.db and contains the latitude and longitude coordinates and also contains the time you were at the coordinates. Though not always 100% accurate, the information seems to be very detailed in nature. The file is also quite large and can contain tens of thousands of data files showing where you have been since iOS 4 was first introduced. One thing that is not consistent is the fact that the data is kept erratically and doesn’t follow any specific pattern.

Before you get too excited about the data being stored, there doesn’t seem to be a way that the data is being sent to Apple. Not as of yet. Also, this same information is kept by cell phone companies, but the release of this information takes a court order. It is unknown if one would need a court order to access the information from your Apple iPhone or Apple iPad.

Just the fact that this information is being stored on your iOS 4 device is disturbing. It makes one wonder how many other non-Apple devices could be storing similar information. Since the Google Android tablet industry as well as Google Android powered phones are relatively new, could these devices be storing similar data? It should be interesting see if this is the case or not.

So what do you think? Do you think Apple should be able to access this data?

Comments welcome.

Source – O’Reilly Radar

AT&T To Customers: The All You Can Use DSL Buffet Is Closed

The folks at AT&T are about to impose caps for all of their DSL customers and also for their U-Verse customers. What is interesting about the proposal are the limits that are going to be placed on existing customers and the additional charges when they exceed the limits. AT&T has an interesting premise it cites to show that 98% of customers will not be affected since the average AT&T customer uses only 18GB of data per month.

The new AT&T limits will be 150GB monthly usage cap for all DSL customers and a 250 GB usage cap on all U-Verse users. Notices to all of these AT&T customers will start going out this month and will become effective starting on May 2. But AT&T tells us not to worry, be happy, and accept the limited usage rates since this is the way it is going to be, like it or not.

There are two things that bother me about the data limitation plan being instituted by AT&T. First, I believe that existing customers should be exempt and that the limitations should apply to new customers only. Why is it that large corporations seem to think they can employ or change new features on a whim and that customers are obligated to go along with the new plan? Yes, you can change providers, but that misses the point. When these people signed up, they signed up for unlimited usage. Just because the folks at AT&T were stupid and did not forecast that the system would be abused is their problem, not the problem of the customer.

Second is that this limitation on data will have long-range effects for all of us. If AT&T doesn’t lose a large amount of customers, other providers will also enforce data limits. Depending on how you use you Internet service, this could limit the amount of TV programs or movies you watch on the Internet.

What AT&T has also failed to mention is if the data plan is restricted from unlimited usage, will there be a price reduction? We all know the answer to that question. AT&T will provide less service but will want to keep its pricing the same.

I look at the proposal being made by AT&T this way. Let us say that you go down and purchase a new automobile, SUV, or truck. The company offers a 10 year, 100,000 mile power train warranty. After you buy the vehicle you receive a letter that states that 2% of vehicle owners have abused their vehicles so the company is dropping the warranty to 5 years, 100,000 miles on the power train. I know that I would not be a happy camper.

I personally believe that AT&T has already calculated the cost of a civil suit into the equation and knows it will be cheaper paying consumers a stipend than to continue on with its unlimited plan. All of you who use DSL from AT&T, I feel your pain.

Comments welcome.

Source – Broadband DSL Reports

South Korea’s DDoS Bunkers

I’ve been studying North Korean history, the armistice, and other related factors that make the recent history of Korea interesting to me. But it seems that these days we are seeing evidence that North Korea, despite working from an economic vacuum, is managing a healthy cyber warfare campaign.

Obviously this isn’t healthy for South Korea and has caused the southern portion of the country to force itself to rethink its cyber security program to protect itself. It’s becoming a real problem.

In response, it seems that South Korea has had to step things up by implementing what are being referred to as “digital bunkers.” The idea is to provide some kind of protection over what is being called a tremendous amount of data being sent to South Korea designed to make things difficult for the southern nation. Attacking small to medium sized businesses is a big deal and there is a lot of hope that South Korea will be better protected by using the digital bunker approach to security.

New Standard Proposed For Supercomputing

A new supercomputer rating system will be released by an international team led by Sandia National Laboratories at the Supercomputing Conference 2010 in New Orleans on November 17.

The rating system, Graph500, tests supercomputers for their skill in analyzing large, graph-based structures that link the huge numbers of data points present in biological, social and security problems, among other areas.

“By creating this test, we hope to influence computer makers to build computers with the architecture to deal with these increasingly complex problems,” Sandia researcher Richard Murphy said.

Rob Leland, director of Sandia’s Computations, Computers, and Math Center, said, “The thoughtful definition of this new competitive standard is both subtle and important, as it may heavily influence computer architecture for decades to come.”

There should be an image here!The group isn’t trying to compete with Linpack, the current standard test of supercomputer speed, Murphy said. “There have been lots of attempts to supplant it, and our philosophy is simply that it doesn’t measure performance for the applications we need, so we need another, hopefully complementary, test,” he said.

Many scientists view Linpack as a “plain vanilla” test mechanism that tells how fast a computer can perform basic calculations, but has little relationship to the actual problems the machines must solve.

The impetus to achieve a supplemental test code came about at “an exciting dinner conversation at Supercomputing 2009,” said Murphy. “A core group of us recruited other professional colleagues, and the effort grew into an international steering committee of over 30 people.” (See graph500.org.)

Many large computer makers have indicated interest, said Murphy, adding there’s been buy-in from Intel, IBM, AMD, NVIDIA, and Oracle corporations. “Whether or not they submit test results remains to be seen, but their representatives are on our steering committee.”

Each organization has donated time and expertise of committee members, he said.

While some computer makers and their architects may prefer to ignore a new test for fear their machine will not do well, the hope is that large-scale demand for a more complex test will be a natural outgrowth of the greater complexity of problems.

Studies show that moving data around (not simple computations) will be the dominant energy problem on exascale machines, the next frontier in supercomputing, and the subject of a nascent U.S. Department of Energy initiative to achieve this next level of operations within a decade, Leland said. (Petascale and exascale represent 10 to the 15th and 18th powers, respectively, operations per second.)

Part of the goal of the Graph500 list is to point out that in addition to more expense in data movement, any shift in application base from physics to large-scale data problems is likely to further increase the application requirements for data movement, because memory and computational capability increase proportionally. That is, an exascale computer requires an exascale memory.

“In short, we’re going to have to rethink how we build computers to solve these problems, and the Graph500 is meant as an early stake in the ground for these application requirements,” said Murphy.

How does it work?

Large data problems are very different from ordinary physics problems.

Unlike a typical computation-oriented application, large-data analysis often involves searching large, sparse data sets performing very simple computational operations.

To deal with this, the Graph 500 benchmark creates two computational kernels: a large graph that inscribes and links huge numbers of participants and a parallel search of that graph.

“We want to look at the results of ensembles of simulations, or the outputs of big simulations in an automated fashion,” Murphy said. “The Graph500 is a methodology for doing just that. You can think of them being complementary in that way — graph problems can be used to figure out what the simulation actually told us.”

Performance for these applications is dominated by the ability of the machine to sustain a large number of small, nearly random remote data accesses across its memory system and interconnects, as well as the parallelism available in the machine.

Five problems for these computational kernels could be cybersecurity, medical informatics, data enrichment, social networks and symbolic networks:

  • Cybersecurity: Large enterprises may create 15 billion log entries per day and require a full scan.
  • Medical informatics: There are an estimated 50 million patient records, with 20 to 200 records per patient, resulting in billions of individual pieces of information, all of which need entity resolution: in other words, which records belong to her, him or somebody else.
  • Data enrichment: Petascale data sets include maritime domain awareness with hundreds of millions of individual transponders, tens of thousands of ships, and tens of millions of pieces of individual bulk cargo. These problems also have different types of input data.
  • Social networks: Almost unbounded, like Facebook.
  • Symbolic networks: Often petabytes in size. One example is the human cortex, with 25 billion neurons and approximately 7,000 connections each.

“Many of us on the steering committee believe that these kinds of problems have the potential to eclipse traditional physics-based HPC [high performance computing] over the next decade,” Murphy said.

While general agreement exists that complex simulations work well for the physical sciences, where lab work and simulations play off each other, there is some doubt they can solve social problems that have essentially infinite numbers of components. These include terrorism, war, epidemics and societal problems.

“These are exactly the areas that concern me,” Murphy said. “There’s been good graph-based analysis of pandemic flu. Facebook shows tremendous social science implications. Economic modeling this way shows promise.

“We’re all engineers and we don’t want to over-hype or over-promise, but there’s real excitement about these kinds of big data problems right now,” he said. “We see them as an integral part of science, and the community as a whole is slowly embracing that concept.

“However, it’s so new we don’t want to sound as if we’re hyping the cure to all scientific ills. We’re asking, ‘What could a computer provide us?’ and we know we’re ignoring the human factors in problems that may stump the fastest computer. That’ll have to be worked out.”

[Photo above by James Vaughan / CC BY-ND 2.0]

Neal Singer @ DOE/Sandia National Laboratories

Rensselaer Team Shows How To Analyze Raw Government Data

Who is the White House’s most frequent visitor?

Which White House staffer has the most visitors?

How do smoking quit rates, state by state, relate to unemployment, taxes, and violent crimes?

How do politics influence U.S. Supreme Court decisions?

How many earthquakes occurred worldwide recently?

Where and how strong were they?

Which states have the cleanest air and water?

If you know how to look, the answers to all of these questions, and more, can be found in the treasure trove of government documents now available on Data.gov. In the interest of transparency, the Obama Administration has posted 272,000 or more sets of raw data from its departments, agencies, and offices to the World Wide Web. But, connecting the dots to derive meaning from the data is difficult.

“Data.gov mandates that all information is accessible from the same place, but the data is still in a hodgepodge of different formats using differing terms, and therefore challenging at best to analyze and take advantage of,” explains James Hendler, the Tetherless World Research Constellation professor of computer and cognitive science at Rensselaer Polytechnic Institute. “We are developing techniques to help people mine, mix, and mash-up this treasure trove of data, letting them find meaningful information and interconnections.

There should be an image here!“An unfathomable amount of data resides on the Web,” Hendler continues. “We want to help people get as much mileage as possible out of that data and put it to work for all mankind.”

Mining Data.gov

The Rensselaer team has figured out how to find relationships among the literally billions of bits of government data, pulling pieces from different places on the Web, using technology that helps the computer and software understand the data, then combine it in new and imaginative ways as “mash-ups,” which mix or mash data from two or more sources and present them in easy-to-use, visual forms.

By combining data from different sources, data mash-ups identify new, sometimes unexpected relationships. The approach makes it possible to put all that information buried on the Web to use and to answer myriad questions, such as the ones asked above. (Answers can be found on the Web site).

“We think the ability to create these kinds of mash-ups will be invaluable for students, policy makers, journalists, and many others,” says Deborah McGuinness, another constellation professor in Rensselaer’s Tetherless World Research Constellation. “We’re working on designing simple yet robust Web technologies that allow someone with absolutely no expertise in Web Science or semantic programming to pull together data sets from Data.gov and elsewhere and weave them together in a meaningful way.”

While the Rensselaer approach makes government data more accessible and useful to the public, it also means government agencies can share information more readily.

“The inability of government agencies to exchange their data has been responsible for a lot of problems,” says Hendler. “For example, the failure to detect and scuttle preparations for 9/11 and the ‘underwear bomber’ were both attributed in a large part to information-sharing failures.”

The Web site developed by Hendler, McGuinness, and Peter Fox — the third professor in the Tetherless World Research Constellation — and students, provides stunning examples of what this approach can accomplish. It also has video presentations and step-by-step do-it-yourself tutorials for those who want to mine the treasure trove of government data for themselves.

Rensselaer offers the country’s first undergraduate degree in Web Science and has one of the first academic research centers dedicated to the field. The White House has officially acknowledged Rensselaer’s pioneering efforts in the field. Hendler has been named the “Internet Web Expert” by the White House, and the Web Science team at Rensselaer includes some of the world’s top Web researchers.

“Rensselaer has pre-eminent expertise in what the Web is and what the Web future will be,” says Hendler.

Data.gov offers opportunity

Hendler started Rensselaer’s Data-Gov project in June 2009, one month after the government launched Data.Gov, when he saw the new program as an opportunity to demonstrate the value of Semantic Web languages and tools. Hendler and McGuinness are both leaders in Semantic Web technologies, sometimes called Web 3.0, and were two of the first researchers working in that field.

Using Semantic Web representations, multiple data sets can be linked even when the underlying structure, or format, is different. Once data is converted from its format to use these representations, it becomes accessible to any number of standard Web technologies.

One of the Rensselaer demonstrations deals with data from CASTNET, the Environmental Protection Agency’s Clean Air Status and Trends Network. CASTNET measures ground-level ozone and other pollutants at stations all over the country, but CASTNET doesn’t give the location of the monitoring sites, only the readings from the sites.

The Rensselaer team located a different data set that described the location of every site. By linking the two along with historic data from the sites, using RDF, a semantic Web language, the team generated a map that combines data from all the sets and makes them easily visible.

his data presentation, or mash-up, that pairs raw data on ozone and visibility readings from the EPA site with separate geographic data on where the readings were taken had never been done before. This demo and several others developed by the Rensselaer team are now available from the official US data.gov site.

Many examples on the Web

Other mash-up demos on the site include:

  • The White House visitors list with biographical information taken from Wikipedia and Google (now also available in a mobile version through iTunes);
  • U.S. and British information on aid to foreign nations;
  • National wild fire statistics by year with budget information from the departments of Agriculture and Interior and facts on historic fires;
  • A state-by-state comparison of smoking prevalence compared with smoking ban policies, cigarette tax rates, and price;
  • The number of book volumes available per person per state from all public libraries;
  • An integration of basic biographical information about Supreme Court Justices with their voting records from 1953 to 2008, with a motion chart that looks at justices’ decisions over the years on issues such as crime and privacy rights.

The aim is not to create an endless procession of mash-ups, but to provide the tools and techniques that allow users to make their own mash-ups from different sources of data, the Rensselaer researchers say. To help make this happen, Rensselaer researchers have taught a short course showing government data providers how to learn to do it themselves, allowing them to do their own data visualizations to release to the public.

Many potential users

The same Rensselaer techniques can be applied to data from other sources. For example, public safety data can show a user which local areas are safe, where crimes are most likely to occur, accident prone intersections, proximity to hospitals, and other information that may help a decision on where to shop, where to live, even areas to avoid at night. In an effort McGuinness is leading at Rensselaer along with collaborators at NIH, the team is exploring how to make medical information accessible to both the general public and policy makers to help explore policies and their potential impact on health. For example, one may want to explore taxation or smoking policies and smoking prevalence and related health costs.

The Semantic Web describes techniques that allow computers to understand the meaning, or “semantics,” of information so that it can find and combine information, and present it in usable form.

“Computers don’t understand; they just store and retrieve,” explains Hendler. “Our approach makes it possible to do a targeted search and make sense of the data, not just using keywords. This next version of the Web is smarter. We want to be sure electronic information is increasingly useful and available.”

“Also, we want to make the information transparent and accountable,” adds McGuinness. “Users should have access to the meta data – the data describing where the data came from and how and when it was derived — as well as the base information so that end users can make better informed decisions about when to rely on the information.”

The Rensselaer team has also been working to extend the technique beyond U.S. government data. They have recently developed new demos showing how this work can be used to integrate information from the U.S. and the U.K. on crime and foreign aid, to compare U.S. and Chinese financial information, to mashup government information with World Bank data, and to apply the techniques to health information, new media, and other Web resources.

Some Mash-ups:

Clean Air Status and Trends Network (CastNet)

US Global Foreign Aid from 1947-2008

White House Visitor Search

Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices

[Photo above by James Vaughan / CC BY-ND 2.0]

Mark Marchand @ Rensselaer Polytechnic Institute

Google Fires Employee That Leaked Pay Raise Information

When a memo was leaked referencing a pay raise and bus Google was giving employees, I don’t believe many of us gave it much thought. But now it appears that the Google folks in charge, may have terminated the employee that leaked the information. In addition some are wondering why this has happened. Since this is a company that prides itself on their motto ‘do no evil’, was the firing justified?

In a recent article it states that:

Some corporate memos are confidential.

And some memos are “confidential”–meaning there’s no real expectation that they’ll stay within the family.

You’d assume that Eric Schmidt’s memo to “Googlers” announcing big raises falls in the latter category. If that’s all Googlers, that’s some 23,300 people, so that’s pretty much the same thing as issuing a press release, right?

But Google thinks otherwise, says CNNMoney’s David Goldman. He reports that the company has fired an engineer who passed the information along to Business Insider yesterday:

Within hours, Google notified its staff that it had terminated the leaker, several sources told CNNMoney. A Google spokesman declined to comment on the issue, or on the memo.

If this is true, I find the termination surprising. Google is a company that collects mountains of data about their users, yet when something as simple as a pay raise announcement is leaked, that clam up and fire the employee who leaked the information. One would suspect that this information was going to be made public eventually, or was Google trying to do this under the table? Did they believe they could keep this information a secret from their stock holders or the SEC?

What do you think? Was the termination justified?

Comments welcome.

Source – All Things Digital

Google Web Domination

There should be an image here!Google are clearly masters of their universe when you consider that in the search market, they make up 91.4% of the search market with EVERYONE else combined, sharing the meager 8.6% slice of the same pie. Clearly this is something to watch. Even worse, Google dominates mobile search with 98.3% and everyone else at 1.7%. This means Google IS search. Sorry Bing.

Seems fairly obvious that Google is becoming much more than a mere search engine. No, they are becoming brokers of information and a means of controlling said information. I don’t know about you, but I find this rather scary. Even if it’s not my info they have access to, the idea of anyone company controlling this kind of data is going to be something we all need to watch closely.

As for where Google is going to end up in the future, it’s difficult to say for sure. One thing is for certain, if the word media appears anywhere in the title, Google is sure to be seeking a presence in that medium. Bundle this with Google’s existing data dominance, I don’t think any one company has a chance in the world of catching up with the search engine giant.

Beautiful Data: The Stories Behind Elegant Data Solutions

There should be an image here!How do the experts solve difficult problems when working with data? Beautiful Data explains this and more as it gives you an opportunity to look over the shoulders of prominent data designers, managers, and handlers for a glimpse into some of their most interesting projects.

The authors think aloud as they work through their projects and show readers the evolution of simple and elegant solutions to problems encountered along the way (even if they sometimes break the rules). You’ll hear from the best data practitioners in the field, and find out just how wide-ranging — and beautiful — working with data can be.

With Beautiful Data, you will:

  • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
  • Learn how to visualize trends in urban crime, using maps and data mashups
  • Discover the challenges of designing a data processing system that works within the constraints of space travel
  • Learn how crowdsourcing and transparency have combined to advance the state of drug research
  • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
  • Learn about the massive infrastructure required to create, capture, and process DNA data

In the preface of Beautiful Data, co-editors Toby Segaran and Jeff Hammerbacher talk about the process of compiling this challenging book. “When we were first approached with the idea… we found [it] exciting and very ambitious. Collecting, visualizing, and processing data now touches every professional field and so many aspects of daily life that a great collection [of stories] would have to be almost unreasonably broad in scope. So we contacted a highly diverse group of people whose work we admired, and were thrilled that so many agreed to contribute.”

They continue, “This book is the result, and we hope it captures just how wide-ranging (and beautiful) working with data can be. In it you’ll learn everything from fighting with governments to working with the Mars lander; you’ll learn how to use statistics programs, make visualizations, and remix a Radiohead video; you’ll see maps, DNA, and something we can only really call ‘data philosophy.'”

For anyone who handles data, this is a truly fascinating book, and one that will make you consider how your own encounters with data shape the world.

Backupify

When you really stop and think about it, you begin to realize how much personal data you have stored online through an assortment of accounts. Accounts related to social networking, e-mail, photos, bookmarks, blogs, and so on are trusted to keep all of our data safe and secure, but they can’t always be trusted. Many times we think about backing up the data on our computers, but we should also be thinking about backing up the data in our online accounts because it’s just as important. Before you get caught in a bad situation, consider using Backupify.

They’re offering free storage accounts until January 31st, so you’ll want to create an account as soon as possible in order to give the service a try. You’ll see that Backupify supports many of the popular online services that are available today. Once you’re a member, you just provide your login information for the supported services and Backupify goes to work by backing up your data automatically and securely. The data is yours and you’re in control, so you can rest easy knowing that you’re covered.