We all know that the economy is slow (and we listen to scary predictions), but we're also looking at what's happening in IT. I don't want to jinx anything, but IT is still moving along. In fact, it seems that companies are in a position where they have to continue the projects they've started. Installing and using CRM (Consumer Relationship Management), BPM (Business Process Management), BI (Business Intelligence), etc. has become a necessity for most. The actual words are "competitive necessity." And, in order to make these systems work, data integration technologies must be implemented. This means data assurance (or data quality) has to be in place. And this means
.
You get the picture.
Add to that, new hardware. Smaller, thinner, lighter, etc. computers are announced daily. Do you want a PC or a PDA? Do you want an iPhone or a Blackberry? But wait, there are others
.
When are you going to get a new computer?
This is such an exciting time in IT.
And, it really means it's important to keep up.
Here's the schedule (or you can view the complete Schedule on our website).
CSTA Web sessions: May 7, 8 June 4, 5 July 9, 10 August 6, 7 September 10, 11 October 8, 9 November 12, 13 December 10, 11
UITJ (Understanding IT Jobs) Web sessions: May 8 July 10 September 11 December 13
Web 2.0 Web session: June 5
TR Web sessions: March 26 May 21 September 24
Keep in touch and keep up with technology!
 Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Data Quality
We're collecting data at unbelievable rates. Companies commonly keep data stores that are measured in TBs (terabytes trillions of bytes). The next measure is PB (petabytes quadrillions of bytes). Think of a petabyte as the equivalent of 250 billion pages of text. That would fill 20-million four-drawer filing cabinets. And, we're quickly getting there. Currently Google processes over 20 PBs of data each day. Finland plans to finish a 500PB database containing all health information in 2011. The amount of data the average business collects and stores is doubling each year. This means that a company with a 70-terabyte system today will hit the 1 petabyte threshold--1,000 terabytes--within four years.
How do we know if this data is good? And what does good mean? Data quality is called data assurance now. What is that? Data assurance states that data must have consistency, currency, and accuracy. And, perhaps, validity and uniqueness.
Consistency: Simply said, everyone must see the same thing. Picture an address. First, an address must have a street number and name, an optional apartment number, or a P.O. Box. Wait could this be and a P.O. Box? I'm constantly seeing addresses with both street names and P.O. boxes. Then, a city name, a state name. Wait the full name or the two character alpha code? Then the postal code. Wait is this five or nine characters? Now, what do we do about Canada?
Currency. Information must be timely. It's got to be the current address. Think about an individual working for a retail store H.H. Store. Information about that employee, including his or her address, is in several places. It's in the Employee Database and the Payroll Database. And probably in the Customer Database and the Sales Database. Hopefully not, but maybe in the Collections Database. What happens when this employee moves? Are all these databases updated correctly at the same time? Currency is actually hard to come by.
Accuracy. The information must be correct. This is actually the easiest aspect of data. It's either correct or it isn't.
The two optional aspects are interesting too.
Validity. The data must be important to the business and be within the businesses definitions. Go wild with this one. Who decides? How important? Do we really need to maintain this data or not? What are we going to use it for? How does the business define account-numbers what is the range of numbers? Then, the accuracy of the data collection must be evaluated.
Uniqueness. No duplicates can be maintained in the data. This sounds simple until you think of the large volumes of data we're now processing, and the diversity of the sources of data. If Bank of America has a checking account customer, John Jones, and a savings account customer named John F. Jones, do they have one customer with two accounts, or two customers with one account apiece?
There are many different techniques used to ensure the highest quality data. Data scrubbing, or cleansing helps to keep duplicates out of the system. For example, if a system had detected both John Jones and John F. Jones, it would then check addresses, phone numbers, etc, to determine if this was the same person. Deduplication identifies and removes duplicate names and addresses from a database. Data profiling is growing in use. This technique examines a data store and collects statistics and information including maximum and minimum values, ranking levels, and percent of null data. Data validation techniques check for the accuracy of dates by ensuring that in mm/dd/yyyy the first number does not exceed 12, and if the first number is 04, the second cannot exceed 30. Other data checks perform pattern analysis, builds relationship rules, and validates metadata against the detail data.
As data grows, the problem of good data gets bigger and bigger. The easiest way to ensure good data quality is to ensure that the data is carefully checked as it is entered into any system. Once bad data is in a system, it's awfully hard to even find it much less correct it.
Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Are you a Crackberry?
2. What kind of a computer is the MacBook Air?
3. What size companies are most likely to use SaaS?
4. Which of the following does not belong? a. Open Source development b. Scrum c. TDD (Test Driven Development) d. XP (eXtreme Programming)
5. And what is Hardy Heron?
Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We've just added Web 2.0 to the list of short Web sessions that we offer. These short sessions (45 minutes) let us go into more depth than we do in CSTA on topics that most of you don't see regularly. We have the coverage for those who need it, and will schedule the sessions on request. If you need information on any of these, just call and we'll put it on the schedule to run within a couple of weeks. If it turns out that no one else needs that training right now, we'll actually run it for the one who requested it. You'll get a private session!
Web 2.0. the newest addition, and currently scheduled to run on June 5th at 2:15. Covers Web 2.0 sites, RIA (Rich Internet Applications), Web feeds, and development issues and tools.
Networking goes into detail on the physical aspects of networking, including the equipment, connections, protocols and layers, and specific networks.
Wireless Technology discusses the wireless networks, the wireless platforms, and development issues for wireless applications. There is some overlap with networking.
Embedded Systems covers products and technologies, memory and processing chips, embedded databases, RTOSs (Real Time Operating Systems) and development issues and tools.
Business Intelligence - The material in this session actually is included in CSTA, but we have it for people who haven't attended our flagship course. It covers data warehousing, data integration, and BPM (Business Process Management).
Check out the details of any of these topics that you run across. These shorter sessions will add so much to your knowledge base and increase your comfort level when dealing with these specialized topics. And remember, if you've attended CSTA within the past year you get a 50% discount.
Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Individual Computers
One of the TechCheck questions actually addresses this topic. We have so many different kinds of "individual" computers that it's often hard to choose one! We'll talk about the options in the next issue, and discuss the direction and growth of individual computers.
desktop computers. Desktop computers require no special environment or controls. Desktop systems vary from a completely contained single-user microcomputer sitting on someone's desk, to a network of computers linked together sharing printers and data files. Desktop computers include all of the following: handheld computers, laptops, micros, microcomputers, PCs (Personal Computers), PDAs (Personal Digital Assistants), Internet appliances, and portable computers. These systems are often called client computers to reflect their usage in client/server systems.
handheld computers. Light-weight (1 pound and less) task specific computers. These machines can be pen- or stylus-based, offer voice recognition, fax and modem communication, and include a pager. Typical use: write in "lunch - John" and the system will send John a fax and enter the lunch date in the user's appointment book. Many offer connections to the Internet for email and Web surfing. Also called PDAs (Personal Digital Assistants).
information appliances. Computers that are built with embedded systems, which can be any of: operating systems, DBMSs, application systems, and Web browsers. This category includes smart displays, set-top boxes, Web tablets, photo-frames, and kiosks. An information appliance can be anything. For example, the Screenfridge is an information, or Internet appliance that is a Web-connected refrigerator. It has a PC screen in the door and a built-in modem and food can be ordered over the Internet. It also includes speakers, a microphone and video camera so family members can leave video messages. Also called Internet appliances.
MID (Mobile Internet Device) Computer. Individual computer sized between a laptop and a PDA (handheld device). Designed for the consumer (home) market rather than for the enterprise (business) market, and run Linux as the operating system rather than Windows. Very similar to UMPC (Ultra Mobile Personal Computer), which is the term for enterprise machine with the same general description, and many people feel the two acronyms are redundant. At any rate, these machines typically have 5" 7" screens (diagonal measure), and WiFi internet connectivity. Term was introduced by Intel in April, 2007, and many vendors make devices they call MIDs.
Notebook Computers. Notebook computers are portable computers that can weigh as little as three pounds. Most notebooks can plug into larger computers, which gives the user the capability of working away from the office and simply uploading whatever information is collected or processed while on the road. Most are close to the size of a standard notebook (8 1/2" x 11") and have internal hard disks, modems, CD and diskette drives. The smallest of these systems, the sub-three-pound ones, are called ultraportable notebooks. Also called laptops.
portable computers. Desktop computers that are used for field operations such as surveying, geographic surveys. They are more rugged than other desktop systems and most can handle being dropped (although it's not recommended). They come in various sizes, from 20-30 pound machines to ultraportables that weigh 2-3 pounds.
Smartphone. Individual computer. Standard cell phone that, in addition to phone functions, has Internet connections to send and receive email and browse the Web. Most people agree that smartphones have operating systems and the ability to run applications, even those created by third parties. Most include embedded PIMs (Personal Information Managers), and even office programs such as word processors and spreadsheets. Other functionality could include a full keyboard, a touch screen, built-in camera, media software for playing music, browsing photos, and viewing video.
Tablet PC Desktop computer, or PC (Personal computer). Built has a flat tablet that resembles a small chalk board. Has a touch screen and accepts pen input. Some tablets offer functionality almost equivalent to full PCs, and others provide only Internet access and basic functions such as calendaring and address books. Some need to connect to phone lines and/or network cables while others provide wireless access. Pure tablets (or slate tablets) work with pen and stylus only, and convertible tablets have an add-on keyboard and can function as a regular laptop.
UMPC (UltraMobile Personal Computer) Computer. A machine that's smaller than a laptop yet larger than a PDA (handheld computer). The devices provide video and Internet functions with both Wi-Fi and Bluetooth support, email, screen size is typically between 5 and 7 inches (diagonal measure), and touch panels if not touch screens. They run Windows operating systems and are considered enterprise computers - they are designed for the business market. There are two styles. Laptop UMPCs open in the middle to show the screen and keyboard. Slider UMPCs slide the screen away from the keyboard.
Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. You start out with a Blackberry and become addicted! The term refers to a blackberry computer that is used excessively, or to the person who is obsessed with, or addicted to, a Blackberry. Use of the term became so widespread that in November 2006 Webster's New World College Dictionary named "crackberry" the "New Word of the Year.
2. This brings us to all the definitions of laptop, or notebook, computers. The MacBook Air is considered to be a subnotebook, because of its weight. Computers in the 3 lb. range are enough lighter than standard laptops that they have a unique name. Now, don't mix that with UMPCs (UltraMobile Personal Computers), which are machines that are smaller than laptops yet larger than PDAs (handheld computers) and have a screen size typically between 5 and 7 inches (diagonal measure). And, remember MIDs (Mobile Internet Devices). This term describes computers that are really the same as UMPCs, but were designed for the consumer market rather than the business market. We don't need all these names, and I was kidding about mixing them up. What's important is that we now have choices in size and weight as well as in speed, memory size, and disk size.
3. You have to start out by defining SaaS. It's Software as a Service, and basically it means subscribing to software rather than purchasing it. Application vendors started paying particular attention to this option when they addressed the mid-market midsize and smaller companies couldn't - to purchase the product, so the vendors started offering a subscription service. To their surprise, large companies became their major market. It appears having someone else provide security, backups, updates, etc. makes sense to all size companies.
4. Okay, even I agree this is not completely fair. They all belong! At least they're all examples of Agile development. If you picked Open Source development I'll give you credit because it's not officially listed as an Agile technique but it follows the same rules, I always include it.
5. Hardy Heron is the latest release of Ubuntu Linux. Officially it's release 8.4, and is the follow up to Gutsy Gibbon and Feisty Fawn.
Back to top
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SemCo Enterprises, Inc. respects your privacy. We do not sell, rent or share your information with anyone.
|